# amazon-playwright-scraper **Repository Path**: my6521/amazon-playwright-scraper ## Basic Information - **Project Name**: amazon-playwright-scraper - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-01-02 - **Last Updated**: 2026-01-02 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # πŸ•·οΈ Amazon Product Scraper v1.0 [![Python](https://img.shields.io/badge/python-3.8%2B-blue.svg)](https://www.python.org/downloads/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Made with Love](https://img.shields.io/badge/made%20with-%E2%9D%A4-red.svg)](https://github.com/yourusername/pytray-reminder) ## βš™οΈ Overview ***AmazonPlaywrightSpider*** is a powerful Scrapy + Playwright-based web scraper built to extract product details (title, price, rating, image) from Amazon.com. It automates a Chromium browser to safely and efficiently scrape dynamic product data, even from JavaScript-heavy pages. You can extract data without any kind ***Proxy*** and this is able to do 300+ requests on ***Amazone***. ## ✨ Features - πŸ•Ή **Playwright-powered Scraping** – Handles JavaScript-rendered Amazon pages. - 🌈 **Colorful CLI** – Fully color-coded output with banners and warnings. - ⚠️ **Ethical Notice System** – Shows a warning box before starting. - πŸ“¦ **Auto Data Export** – Saves results to `product.csv` and `product.json`. - 🧭 **Pagination Support** – Automatically crawls through multiple result pages. - πŸ’» **Lightweight & Customizable** – Works directly with `scrapy crawl amazon_playwright`. --- ## 🧰 Requirements - Python 3.9+ - Scrapy - Scrapy-Playwright - Playwright (Chromium browser) - Node.js (for Playwright backend) --- ## 🧠 Technology Stack ### Python - Programming language. ### Scrapy - Web scraping framework ### Playwright - Headless browser automation ### Twisted Reactor - Async I/O event system for Scrapy ## 🧩 Project Structure ```bash amazon_scraper/ β”‚ β”œβ”€β”€ amazon/ β”‚ β”œβ”€β”€ spiders/ β”‚ β”‚ └── amazon_playwright_spider.py # main spider (this file) β”‚ β”œβ”€β”€ settings.py # Scrapy configuration β”‚ β”œβ”€β”€ product.json # output file (auto-generated) β”œβ”€β”€ product.csv # output file (auto-generated) └── README.md # documentation ``` ## βš™οΈ Installation Guide 1️⃣ **Clone Repository** ```bash git clone https://github.com/your-username/amazon-playwright-scraper.git cd amazon-playwright-scraper ``` 2️⃣ **Create Virtual Environment** ```bash python -m venv venv venv\Scripts\activate # (Windows) # or source venv/bin/activate # (Linux/Mac) ``` 3️⃣ **Install Dependencies** ```bash pip install scrapy scrapy-playwright ``` 4️⃣ **Install Playwright Browsers** ```bash playwright install ``` ## ▢️ How to Run **Option 1 β€” From Scrapy CLI** ```bash scrapy crawl amazon_playwright ``` **Option 2 β€” Run Script Directly** ```bash python amazon_playwright_spider.py ``` **When you run it directly, it will:** - Show a fancy banner - Display a warning box - Show version and author - Ask confirmation before crawling ### 🧾 Output Example **Sample JSON Output** ```json [ { "title": "Logitech Wireless Mouse M510", "price": "$24.99", "rating": "4.7 out of 5 stars", "image": "https://images.amazon.com/...jpg" }, { "title": "HP USB Keyboard 320K", "price": "$17.45", "rating": "4.5 out of 5 stars", "image": "https://images.amazon.com/...jpg" } ] ``` ## πŸ“Š Sample CSV Output **When the spider finishes running, it automatically saves results in **product.csv** and **product.json**.** **Here’s an example of how the CSV output looks:** | title | price | rating | image | |-------------------------------------------------|--------|---------|------------------------------------------------------------------------| | Logitech MX Master 3S Wireless Mouse | $99.99 | 4.8 out of 5 stars | https://m.media-amazon.com/images/I/71X9ppvP+aL._AC_SL1500_.jpg | | Corsair K70 RGB TKL Mechanical Gaming Keyboard | $129.99| 4.7 out of 5 stars | https://m.media-amazon.com/images/I/81uO-KnH1HL._AC_SL1500_.jpg | | Razer Kraken V3 X Gaming Headset | $49.99 | 4.5 out of 5 stars | https://m.media-amazon.com/images/I/61QyH9PoWQL._AC_SL1500_.jpg | **πŸ“ The files are saved automatically in your project root directory after each crawl:** ```bash product.csv product.json ``` # ⚠️ Important Notes - This script is for educational and research use only. - Do NOT use it for aggressive or commercial scraping. - Always respect Amazon’s Terms of Service. - Use download delays and low concurrency to prevent blocking. # πŸ§‘β€πŸ’» Author & Credits ### Developer: MS Coder - ***Version: v1.0** - ***Language: Python*** - ***Framework: Scrapy + Playwright** # πŸ’‘ Future Plans - **Add support for multiple Amazon categories** - **Implement rotating user-agents & proxy pool** - **Add progress bar for live scraping status** - **Build web dashboard for live scraped data** --- ## 🧾 License & Credits

MIT License Made with Python Powered by Scrapy Playwright Integration

---

Made with ❀️ by MS Coder
Version 1.0 β€’ Built for learning, with style & responsibility 🧠**

# 🏁 Final Note ### β€œScrape responsibly. Automate smartly. Respect platforms.” β€” MS Coder