A powerful and lightweight web scraping library with LLM extraction capabilities. This library combines web scraping with AI-powered content extraction using either OpenAI or OpenRouter APIs.
-
Updated
Feb 24, 2025 - Python
A powerful and lightweight web scraping library with LLM extraction capabilities. This library combines web scraping with AI-powered content extraction using either OpenAI or OpenRouter APIs.
This project is a command-line tool that extracts text from web pages and PDF files, including scanned documents. It supports various extraction methods. This tool is ideal for data scraping, NLP preprocessing, and content analysis.
Add a description, image, and links to the web-extraction topic page so that developers can more easily learn about it.
To associate your repository with the web-extraction topic, visit your repo's landing page and select "manage topics."