# steemit **Repository Path**: null_639_7345/steemit ## Basic Information - **Project Name**: steemit - **Description**: 爬虫项目steemit,自动查找文章上传steemit - **Primary Language**: NodeJS - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2018-01-14 - **Last Updated**: 2020-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README 不要问我为啥要写这样的代码,嘿嘿嘿~~~~~ ## 1.文章自动上传到[steemit](https://steemit.com/login.html)上 本来想分析网站的api进行上传的,没想到steemit做了众多限制,感觉有些麻烦所以采用直接控制浏览器的方法上传文章。 node.js使用[selenium-webdriver](https://www.npmjs.com/package/selenium-webdriver)模块可以轻松控制浏览器。 ```js /** * 登录 */ async login() { let url = `https://steemit.com/login.html`; let _this = this; let returnJson = { result: false, value: "" }; return new Promise(async (resolve, reject) => { await _this.web.get(url); await _this.web.findElements(By.css("input")).then(async res => { await res[1].sendKeys(_this.__username);//设置登录用户名 await res[2].sendKeys(_this.__password);//设置登录密码 await _this.web .findElement(By.css(".login-modal-buttons>button")) .click(); await _this.web.wait(until.urlIs("https://steemit.com/welcome"), 4000); await _this.web .findElement(By.css(".show-for-medium.submit-story")) .click(); returnJson.result = true; returnJson.value = "login success"; console.log("login success"); resolve(returnJson); return returnJson; }); }); } /** * 提交文章 * @param {*} title 文章标题 * @param {*} post 文章体 * @param {*} tags 文章标签 */ async submit(title, post, tags) { let url = `https://steemit.com/submit.html`; let _this = this; let returnJson = { result: false, value: "" }; return new Promise(async (resolve, reject) => { await _this.web.get(url); //设定延时,确保进入提交页面 await _this.web.wait( until.urlIs("https://steemit.com/submit.html"), 4000 ); //标签定位,打开网页分析一下就oK了 await _this.web.findElements(By.css("input")).then(async res => { await res[1].sendKeys(title); await res[3].sendKeys(tags); await _this.web.wait(until.elementLocated(By.css("textarea")), 3000); await _this.web.findElement(By.css("textarea")).sendKeys(post); await _this.web.findElement(By.css(".button")).click(); returnJson.result = true; returnJson.value = "submit success"; resolve(returnJson); return returnJson; }); }); } ``` 标签定位大家只用打开浏览器看一下就知道了 先调用登录接口,登录成功后调用上传文章接口 ## 2.文章来源-从[medium](https://medium.com)批量下载文章到redis队列 根据一位道友提供的api接口可以根据作者名获取到该作者的文章列表。 ```js /** * 获取Medium文章列表 * @param {*} user * @param {*} limit */ async getPostByUsername(username, limit) { let returnJson = { result: false, value: "" }; let _this = this; return new Promise((resolve, reject) => { if ( typeof limit != "number" || limit <= 0 || limit > 100 || typeof username != "string" || username === "" ) { returnJson.result = false; returnJson.value = "参数不正确"; resolve(returnJson); return returnJson; } else { let JSONDate = { query: `query PostQuery($username: String!, $limit: Int!){ posts(username: $username, limit: $limit) { title firstPublishedAt url content { subtitle } } user(username: $username) { username name bio } }`, variables: `{ "username": "${username}", "limit": ${limit} }`, operationName: "PostQuery" }; superagent .post("https://micro-medium-api.now.sh/graphql") .send(JSONDate) // sends a JSON post body .set("accept", "json") .end((err, res) => { if (err) { console.log("获取文章列表出错: ", err); returnJson.result = false; returnJson.value = "获取文章列表出错: " + err; resolve(returnJson); return returnJson; } else { returnJson.result = true; returnJson.value = res.text; resolve(returnJson); return returnJson; } }); } }); } ``` 又是一个神器h2m,(根据url将html转化成md) ```js npm install h2m -g h2m https://baidu.com ``` 于是我有封装了一个接口 ```js /** * 根据url下载Medium文章到本地 * @param {*} url * @param {*} savePath * @param {*} saveName */ async downloadPostByUrl(url, saveName,author,category) { let returnJson = { result: false, value: "" }; let _this = this; url = encodeURI(url); return new Promise((resolve, reject) => { if (url === "" || saveName === "" ) { returnJson.result = false; returnJson.value = "参数不正确"; resolve(returnJson); return returnJson; } else { let command = `h2m ${url}`; console.log(command); exec(command, {timeout: 1000*60*3,maxBuffer: 20*1024*1024},(error, stdout, stderr) => { console.log(`stderr: ${stderr}`); if (error) { console.log(`exec error: ${error}`); returnJson.result = false; returnJson.value = `exec error: ${error}`; resolve(returnJson); } else { returnJson.result = true; returnJson.value = `根据url下载Medium文章到本地成功`; console.log('根据url下载Medium文章到本地成功'); resolve(returnJson); let saveStr = stdout.slice(stdout.indexOf('\n\n---\n'),stdout.indexOf('One clap, two clap, three clap, forty?')) let title = saveName; let postJson = { title: title, author: author, category: category, content: saveStr } //存储到redis队列中 _this.queueJson[category].publish(postJson) } return returnJson; }); } }); } ``` > 小坑:exec使用时stdout, stderr默认大小是200K,要把maxBuffer设置大一点才行。 ## 3.如何保证上传的文章不重复呢? 真心感谢无所不能的npm 使用[redis-message-queue](https://www.npmjs.com/package/redis-message-queue)可以轻松创建出值唯一的redis队列。 ```js this.client = new rmq.UniqueQueue(this.name, port, host); ``` ## 4.运行 代码位置:https://gitee.com/null_639_7345/steemit 1.git代码到本地 2.下载[firefox驱动文件](https://www.npmjs.com/package/selenium-webdriver)到本地,安装firefox浏览器 3.npm i h2m -g 4.修改配置文件config/default.js ```js module.exports = { cwd : 'F:\\test1\\steemit\\',//项目根目录 redisHost: '192.168.10.6',//redis服务器ip redisPost: 6379, promulgatorName: '***',//steemit用户名 promulgatorPassword: "***",//steemit密码 categories:[ { name: 'popular',//steemit上传文章的分类(tag) //Medium的作者列表 origin: ['joshrose','JessicaLexicus','ThunderPuff','usemuzli','black_metallic'] } ] }; ``` 5.node index.js 最后可以告诉大家一个激动的好消息。 ![image.png](//dn-cnode.qbox.me/FoFBUiNarQfNHAVdTq3bAkeuuw5E)