# amazon-playwright-scraper **Repository Path**: my6521/amazon-playwright-scraper ## Basic Information - **Project Name**: amazon-playwright-scraper - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-01-02 - **Last Updated**: 2026-01-02 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # π·οΈ Amazon Product Scraper v1.0 [](https://www.python.org/downloads/) [](https://opensource.org/licenses/MIT) [](https://github.com/yourusername/pytray-reminder) ## βοΈ Overview ***AmazonPlaywrightSpider*** is a powerful Scrapy + Playwright-based web scraper built to extract product details (title, price, rating, image) from Amazon.com. It automates a Chromium browser to safely and efficiently scrape dynamic product data, even from JavaScript-heavy pages. You can extract data without any kind ***Proxy*** and this is able to do 300+ requests on ***Amazone***. ## β¨ Features - πΉ **Playwright-powered Scraping** β Handles JavaScript-rendered Amazon pages. - π **Colorful CLI** β Fully color-coded output with banners and warnings. - β οΈ **Ethical Notice System** β Shows a warning box before starting. - π¦ **Auto Data Export** β Saves results to `product.csv` and `product.json`. - π§ **Pagination Support** β Automatically crawls through multiple result pages. - π» **Lightweight & Customizable** β Works directly with `scrapy crawl amazon_playwright`. --- ## π§° Requirements - Python 3.9+ - Scrapy - Scrapy-Playwright - Playwright (Chromium browser) - Node.js (for Playwright backend) --- ## π§ Technology Stack ### Python - Programming language. ### Scrapy - Web scraping framework ### Playwright - Headless browser automation ### Twisted Reactor - Async I/O event system for Scrapy ## π§© Project Structure ```bash amazon_scraper/ β βββ amazon/ β βββ spiders/ β β βββ amazon_playwright_spider.py # main spider (this file) β βββ settings.py # Scrapy configuration β βββ product.json # output file (auto-generated) βββ product.csv # output file (auto-generated) βββ README.md # documentation ``` ## βοΈ Installation Guide 1οΈβ£ **Clone Repository** ```bash git clone https://github.com/your-username/amazon-playwright-scraper.git cd amazon-playwright-scraper ``` 2οΈβ£ **Create Virtual Environment** ```bash python -m venv venv venv\Scripts\activate # (Windows) # or source venv/bin/activate # (Linux/Mac) ``` 3οΈβ£ **Install Dependencies** ```bash pip install scrapy scrapy-playwright ``` 4οΈβ£ **Install Playwright Browsers** ```bash playwright install ``` ## βΆοΈ How to Run **Option 1 β From Scrapy CLI** ```bash scrapy crawl amazon_playwright ``` **Option 2 β Run Script Directly** ```bash python amazon_playwright_spider.py ``` **When you run it directly, it will:** - Show a fancy banner - Display a warning box - Show version and author - Ask confirmation before crawling ### π§Ύ Output Example **Sample JSON Output** ```json [ { "title": "Logitech Wireless Mouse M510", "price": "$24.99", "rating": "4.7 out of 5 stars", "image": "https://images.amazon.com/...jpg" }, { "title": "HP USB Keyboard 320K", "price": "$17.45", "rating": "4.5 out of 5 stars", "image": "https://images.amazon.com/...jpg" } ] ``` ## π Sample CSV Output **When the spider finishes running, it automatically saves results in **product.csv** and **product.json**.** **Hereβs an example of how the CSV output looks:** | title | price | rating | image | |-------------------------------------------------|--------|---------|------------------------------------------------------------------------| | Logitech MX Master 3S Wireless Mouse | $99.99 | 4.8 out of 5 stars | https://m.media-amazon.com/images/I/71X9ppvP+aL._AC_SL1500_.jpg | | Corsair K70 RGB TKL Mechanical Gaming Keyboard | $129.99| 4.7 out of 5 stars | https://m.media-amazon.com/images/I/81uO-KnH1HL._AC_SL1500_.jpg | | Razer Kraken V3 X Gaming Headset | $49.99 | 4.5 out of 5 stars | https://m.media-amazon.com/images/I/61QyH9PoWQL._AC_SL1500_.jpg | **π The files are saved automatically in your project root directory after each crawl:** ```bash product.csv product.json ``` # β οΈ Important Notes - This script is for educational and research use only. - Do NOT use it for aggressive or commercial scraping. - Always respect Amazonβs Terms of Service. - Use download delays and low concurrency to prevent blocking. # π§βπ» Author & Credits ### Developer: MS Coder - ***Version: v1.0** - ***Language: Python*** - ***Framework: Scrapy + Playwright** # π‘ Future Plans - **Add support for multiple Amazon categories** - **Implement rotating user-agents & proxy pool** - **Add progress bar for live scraping status** - **Build web dashboard for live scraped data** --- ## π§Ύ License & Credits
---
Made with β€οΈ by MS Coder
Version 1.0 β’ Built for learning, with style & responsibility π§ **