# WebSCP **Repository Path**: openemor/web-scp ## Basic Information - **Project Name**: WebSCP - **Description**: 通用crwal组件 - **Primary Language**: Python - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-07-05 - **Last Updated**: 2025-07-08 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Web Scraper Service A Flask-based web scraping service with support for XPath, CSS selectors, regex, chained scraping, resource downloading (including M3U8 videos), and MinIO uploads. ## Features - **Multiple Scraping Methods**: XPath, CSS selectors, and regex - **List Scraping**: Extract lists of similar items (tables, UL/LI elements) - **Chained Scraping**: Use data from one page to scrape additional pages - **Resource Downloading**: Download files, images, and videos (including M3U8 with AES encryption support) - **Ad Filtering**: Filter out ads from M3U8 streams based on name patterns, MD5 hashes - **MinIO Upload**: Upload downloaded/scraped resources to MinIO storage - **Configuration Driven**: Most aspects configurable via YAML/ENV ## Quick Start 1. Clone the repository 2. Install dependencies: `pip install -r requirements.txt` 3. Configure in `config.yaml` or via environment variables 4. Run: `python -m app` ## Docker ```bash docker build -t web-scraper . docker run -p 5000:5000 web-scraper