# amazon-playwright-scraper

**Repository Path**: my6521/amazon-playwright-scraper

## Basic Information

- **Project Name**: amazon-playwright-scraper
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-01-02
- **Last Updated**: 2026-01-02

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# 🕷️ Amazon Product Scraper v1.0
[![Python](https://img.shields.io/badge/python-3.8%2B-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Made with Love](https://img.shields.io/badge/made%20with-%E2%9D%A4-red.svg)](https://github.com/yourusername/pytray-reminder)

## ⚙️ Overview

  ***AmazonPlaywrightSpider*** is a powerful Scrapy + Playwright-based web scraper built to extract product details (title, price, rating, image) from Amazon.com.
  It automates a Chromium browser to safely and efficiently scrape dynamic product data, even from JavaScript-heavy pages.
  You can extract data without any kind ***Proxy*** and this is able to do 300+ requests on ***Amazone***.


## ✨ Features
- 🕹 **Playwright-powered Scraping** – Handles JavaScript-rendered Amazon pages.  
- 🌈 **Colorful CLI** – Fully color-coded output with banners and warnings.  
- ⚠️ **Ethical Notice System** – Shows a warning box before starting.  
- 📦 **Auto Data Export** – Saves results to `product.csv` and `product.json`.  
- 🧭 **Pagination Support** – Automatically crawls through multiple result pages.  
- 💻 **Lightweight & Customizable** – Works directly with `scrapy crawl amazon_playwright`.

---

## 🧰 Requirements
- Python 3.9+
- Scrapy
- Scrapy-Playwright
- Playwright (Chromium browser)
- Node.js (for Playwright backend)

---
## 🧠 Technology Stack
### Python	
- Programming language.
### Scrapy
- Web scraping framework
### Playwright	
- Headless browser automation
### Twisted Reactor	
- Async I/O event system for Scrapy
## 🧩 Project Structure
```bash
amazon_scraper/
│
├── amazon/
│   ├── spiders/
│   │   └── amazon_playwright_spider.py   # main spider (this file)
│   ├── settings.py                       # Scrapy configuration
│
├── product.json                          # output file (auto-generated)
├── product.csv                           # output file (auto-generated)
└── README.md                             # documentation
```
## ⚙️ Installation Guide
1️⃣ **Clone Repository**
```bash
git clone https://github.com/your-username/amazon-playwright-scraper.git
cd amazon-playwright-scraper
```
2️⃣ **Create Virtual Environment**
```bash
python -m venv venv
venv\Scripts\activate  # (Windows)
# or
source venv/bin/activate  # (Linux/Mac)
```
3️⃣ **Install Dependencies**
```bash
pip install scrapy scrapy-playwright
```
4️⃣ **Install Playwright Browsers**
```bash
playwright install
```
## ▶️ How to Run
**Option 1 — From Scrapy CLI**
```bash
scrapy crawl amazon_playwright
```
**Option 2 — Run Script Directly**
```bash
python amazon_playwright_spider.py
```

**When you run it directly, it will:**

- Show a fancy banner

- Display a warning box

- Show version and author

- Ask confirmation before crawling
### 🧾 Output Example

**Sample JSON Output**
```json
[
    {
        "title": "Logitech Wireless Mouse M510",
        "price": "$24.99",
        "rating": "4.7 out of 5 stars",
        "image": "https://images.amazon.com/...jpg"
    },
    {
        "title": "HP USB Keyboard 320K",
        "price": "$17.45",
        "rating": "4.5 out of 5 stars",
        "image": "https://images.amazon.com/...jpg"
    }
]
```
## 📊 Sample CSV Output

**When the spider finishes running, it automatically saves results in **product.csv** and **product.json**.**

**Here’s an example of how the CSV output looks:**

| title                                           | price  | rating  | image                                                                 |
|-------------------------------------------------|--------|---------|------------------------------------------------------------------------|
| Logitech MX Master 3S Wireless Mouse            | $99.99 | 4.8 out of 5 stars | https://m.media-amazon.com/images/I/71X9ppvP+aL._AC_SL1500_.jpg |
| Corsair K70 RGB TKL Mechanical Gaming Keyboard  | $129.99| 4.7 out of 5 stars | https://m.media-amazon.com/images/I/81uO-KnH1HL._AC_SL1500_.jpg |
| Razer Kraken V3 X Gaming Headset                | $49.99 | 4.5 out of 5 stars | https://m.media-amazon.com/images/I/61QyH9PoWQL._AC_SL1500_.jpg |

**📁 The files are saved automatically in your project root directory after each crawl:**
```bash
product.csv
product.json
```
# ⚠️ Important Notes

- This script is for educational and research use only.

- Do NOT use it for aggressive or commercial scraping.

- Always respect Amazon’s Terms of Service.

- Use download delays and low concurrency to prevent blocking.
# 🧑‍💻 Author & Credits

### Developer: MS Coder

- ***Version: v1.0**
- ***Language: Python***
- ***Framework: Scrapy + Playwright**

# 💡 Future Plans

 - **Add support for multiple Amazon categories**

 - **Implement rotating user-agents & proxy pool**

 - **Add progress bar for live scraping status**

 - **Build web dashboard for live scraped data**

---

## 🧾 License & Credits

<p align="center">
  <a href="https://choosealicense.com/licenses/mit/">
    <img src="https://img.shields.io/badge/License-MIT-green.svg?style=for-the-badge" alt="MIT License" />
  </a>
  <a href="https://www.python.org/">
    <img src="https://img.shields.io/badge/Made%20with-Python-3776AB.svg?style=for-the-badge&logo=python&logoColor=white" alt="Made with Python" />
  </a>
  <a href="https://scrapy.org/">
    <img src="https://img.shields.io/badge/Powered%20by-Scrapy-60A839.svg?style=for-the-badge&logo=scrapy&logoColor=white" alt="Powered by Scrapy" />
  </a>
  <a href="https://playwright.dev/python/">
    <img src="https://img.shields.io/badge/Playwright%20Integration-45BA4B.svg?style=for-the-badge&logo=playwright&logoColor=white" alt="Playwright Integration" />
  </a>
</p>

---

<p align="center">
  <b>Made with ❤️ by <a href="https://github.com/mscoder-py">MS Coder</a></b><br>
  <sub> Version 1.0 • Built for learning, with style & responsibility 🧠** </sub>
</p>



# 🏁 Final Note

### “Scrape responsibly. Automate smartly. Respect platforms.”
— MS Coder