web-crawler

Here are 492 public repositories matching this topic...

ScrapeGraphAI / Scrapegraph-ai

Python scraper based on AI

markdown crawler ai html-to-markdown web-crawler scraping web-scraping rag automated-scraper scraping-python web-crawlers llm ai-scraping

Updated Jul 3, 2025
Python

adithya-s-k / omniparse

Sponsor

Star

Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks

ocr parser-library web-crawler parse-server whisper-api ingestion-api vision-transformer omniparser

Updated Jun 11, 2025
Python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

python crawler scraper automation web-crawler headless scraping crawling pip web-scraping beautifulsoup web-crawling hacktoberfest headless-chrome apify playwright

Updated Aug 1, 2025
Python

jasonxtn / Argus

Star

The Ultimate Information Gathering Toolkit

osint web-crawler whois-lookup virustotal information-gathering server-info dns-lookup reconnaissance cms-detection recon-tools email-harvester ssl-analitcs directory-finder txt-records pastebin-monitoring

Updated Oct 8, 2024
Python

xianhu / PSpider

Star

简单易用的Python爬虫框架，QQ交流群：597510560

python crawler multi-threading spider multiprocessing web-crawler proxies python-spider web-spider

Updated Jun 10, 2022
Python

Algebra-FUN / WeReadScan

Star

扫描“微信读书”已购图书并下载本地PDF的爬虫

web-crawler selenium weread book-downloader

Updated Sep 19, 2023
Python

cxcscmu / Craw4LLM

Star

Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"

crawler web-crawler crawling web-crawling pre-training pretraining large-language-models llm

Updated Feb 24, 2025
Python

scrapfly / scrapfly-scrapers

Star

Scalable Python web scraping scripts for +40 popular domains

Updated Jun 13, 2025
Python

hyunwoongko / kochat

Sponsor

Star

Opensource Korean chatbot framework

deep-learning web-crawler chatbot korean deeplearning sentence-classification korean-chatbot sequance-tagging

Updated May 22, 2023
Python

lefterisloukas / edgar-crawler

Star

The only open-source toolkit that can download SEC EDGAR financial reports and extract textual data from specific item sections into nice & clean structured JSON files. Presented at WWW 2025 @ Sydney, Australia (https://dl.acm.org/doi/10.1145/3701716.3715289)

python nlp finance natural-language-processing business data-mining web-crawler sec edgar edgar-crawler

Updated Jul 18, 2025
Python

rivermont / spidy

Star

The simple, easy to use command line web crawler.

python crawler web-crawler crawling python3 web-spider

Updated Aug 8, 2024
Python

jpjacobpadilla / Stealth-Requests

Star

Undetected web-scraping & seamless HTML parsing in Python!

python data web-crawler http-client http-requests requests web-scraping xpath data-extraction html-parsing webscraping python-web-scraper python-scraping

Updated Jul 14, 2025
Python

lucasxlu / LagouJob

Star

Data Analysis & Mining for lagou.com

nlp machine-learning data-mining web-crawler python3 data-analysis lagou

Updated Apr 19, 2019
Python

xiayouran / Musicer

Star

旨在将网易云、酷狗、QQ、酷我等各音乐平台集于一体

python music-player web-crawler web-spider music-downloader music-download-script qq-music wangyiyunmusic kugou-music kuwo-music music-robot

Updated Nov 26, 2022
Python

elliotxx / zhihu-crawler-people

Star

A simple distributed crawler for zhihu && data analysis

python crawler spider web-crawler python-crawler web-spider

Updated Dec 7, 2022
Python

Hecate2 / Ignareo-ISML-auto-voter

Star

Ignareo the Carillon, a web crawler/spider template of ultimate high concurrency built for leprechauns. Carillons as the best web spiders; Long live the golden years of leprechauns! (ISML=international saimoe; 2022 ISML is last ISML)

python http microservice high-performance web-crawler concurrency distributed asyncio gevent web-spider isml sukasuka chtholly sukamoka ignareo tiat

Updated Jun 10, 2025
Python

Madi-S / Lead-Generation

Star

Python script, which empowers people with no programming background to generate robust leads on a mass scale. This repo will be compiled of various versatile techniques used in lead generation.

python parser scraper web-crawler leads chromedriver lead-generation leadscanner playwright

Updated Sep 24, 2024
Python

abaykan / CrawlBox

Star

Easy way to brute-force web directory.

python crawler web-crawler wordlist admin-finder

Updated Jun 2, 2019
Python

skytruine / OSpider

Star

开源矢量地理数据获取与预处理工具(POI/AOI/行政区/路网/土地利用)

download ad web-crawler free poi building street aoi land-use

Updated May 23, 2023
Python

havanagrawal / GoodreadsScraper

Star

Scrape data from Goodreads using Scrapy and Selenium 📚

python crawler data-mining scraper books web-crawler scraping selenium scrapy-spider goodreads python3 web-scraping scrapy goodreads-data

Updated May 25, 2024
Python

Improve this page

Add a description, image, and links to the web-crawler topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-crawler topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

web-crawler

Here are 492 public repositories matching this topic...

ScrapeGraphAI / Scrapegraph-ai

adithya-s-k / omniparse

apify / crawlee-python

jasonxtn / Argus

xianhu / PSpider

Algebra-FUN / WeReadScan

cxcscmu / Craw4LLM

scrapfly / scrapfly-scrapers

hyunwoongko / kochat

lefterisloukas / edgar-crawler

rivermont / spidy

jpjacobpadilla / Stealth-Requests

lucasxlu / LagouJob

xiayouran / Musicer

elliotxx / zhihu-crawler-people

Hecate2 / Ignareo-ISML-auto-voter

Madi-S / Lead-Generation

abaykan / CrawlBox

skytruine / OSpider

havanagrawal / GoodreadsScraper

Improve this page

Add this topic to your repo