# alibaba_spider **Repository Path**: cucy/alibaba_spider ## Basic Information - **Project Name**: alibaba_spider - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-10-28 - **Last Updated**: 2023-12-29 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # 环境依赖 python3.12版本 pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ # 更新代码 git pull # 推送代码 git push 1ww # 可能会用到的插件 https://gitee.com/g1879/DrissionPage https://www.cnblogs.com/teark/p/17148431.html https://www.cnblogs.com/teark/p/16914702.html clicknium DrissionPage亮点 mac playwright # json格式化 https://spidertools.cn/#/formatJSON # 插件 SelectorsHub # 页面向下滑动 ```python driver.get("https://www.example.com") # 获取当前页面的高度 last_height = driver.execute_script("return document.body.scrollHeight") # 模拟下拉操作,直到滑动到底部 while True: # 模拟下拉操作 driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") # 等待页面加载 time.sleep(2) # 获取当前页面的高度 new_height = driver.execute_script("return document.body.scrollHeight") # 判断是否已经到达页面底部 if new_height == last_height: break # 继续下拉操作 last_height = new_height ``` # 保存mht格式 ```python from selenium import webdriver driver = webdriver.Chrome() driver.get('https://www.qq.com/') # 1. 执行 Chome 开发工具命令,得到mhtml内容 res = driver.execute_cdp_cmd('Page.captureSnapshot', {}) # 2. 写入文件 with open('qq.mhtml', 'w', newline='') as f: # 根据5楼的评论,添加newline='' f.write(res['data']) driver.quit() ```