ScrapeCraft is a web-based scraping editor similar to Cursor but specialized for web scraping. It uses AI assistance to help users build scraping pipelines with the ScrapeGraphAI API.
- π€ AI-powered assistant using OpenRouter (Kimi-k2 model)
- π Multi-URL bulk scraping support
- π Dynamic schema definition with Pydantic
- π» Python code generation with async support
- π Real-time WebSocket streaming
- π Results visualization (table & JSON views)
- π Auto-updating deployment with Watchtower
- Backend: FastAPI, LangGraph, ScrapeGraphAI
- Frontend: React, TypeScript, Tailwind CSS
- Database: PostgreSQL
- Cache: Redis
- Deployment: Docker, Docker Compose, Watchtower
- Docker and Docker Compose
- OpenRouter API key (Get it from OpenRouter)
- ScrapeGraphAI API key (Get it from ScrapeGraphAI)
-
Clone the repository
git clone https://github.com/ScrapeGraphAI/scrapecraft.git cd scrapecraft
-
Set up environment variables
cp .env.example .env
Edit the
.env
file and add your API keys:OPENROUTER_API_KEY
: Get from OpenRouterSCRAPEGRAPH_API_KEY
: Get from ScrapeGraphAI
-
Start the application with Docker
docker compose up -d
-
Access the application
- Frontend: http://localhost:3000
- API: http://localhost:8000
- API Docs: http://localhost:8000/docs
-
Stop the application
docker compose down
If you want to run the application in development mode without Docker:
cd backend
pip install -r requirements.txt
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
cd frontend
npm install
npm start
- Create a Pipeline: Click "New Pipeline" to start
- Add URLs: Use the URL Manager to add websites to scrape
- Define Schema: Create fields for data extraction
- Generate Code: Ask the AI to generate scraping code
- Execute: Run the pipeline to scrape data
- Export Results: Download as JSON or CSV
The application includes Watchtower for automatic updates:
- Push new Docker images to your registry
- Watchtower will automatically detect and update containers
- No manual intervention required
POST /api/chat/message
- Send message to AI assistantGET /api/pipelines
- List all pipelinesPOST /api/pipelines
- Create new pipelinePUT /api/pipelines/{id}
- Update pipelinePOST /api/pipelines/{id}/run
- Execute pipelineWS /ws/{pipeline_id}
- WebSocket connection
Variable | Description | How to Get |
---|---|---|
OPENROUTER_API_KEY | Your OpenRouter API key | Get API Key |
SCRAPEGRAPH_API_KEY | Your ScrapeGraphAI API key | Get API Key |
JWT_SECRET | Secret key for JWT tokens | Generate a random string |
DATABASE_URL | PostgreSQL connection string | Auto-configured with Docker |
REDIS_URL | Redis connection string | Auto-configured with Docker |
MIT