# cjhtmlparser **Repository Path**: fuckcpps/cjhtmlparser ## Basic Information - **Project Name**: cjhtmlparser - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-01-22 - **Last Updated**: 2021-01-22 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # cjhtmlparser html parser on windows and linux 基于gumbo-parser 和gumbo-query 改造成为了 可以适用 Windows和Linux的 html解析库 编译: gumbo-parser 所有文件全部直接加入到工程直接编译即可 基本用法如下: ``` #pragma once #include "stdafx.h" #include "enumtest.cpp" #include "gumbo-parser/Selector.h" #include "gumbo-parser/Document.h" #include "gumbo-parser/Selection.h" #include "gumbo-parser/Node.h" void test_parser() { std::string page("

wrong linksome link

"); CDocument doc; doc.parse(page.c_str()); CSelection c = doc.find("h1 a.special"); CNode node = c.nodeAt(0); printf("Node: %s\n", node.text().c_str()); std::string content = page.substr(node.startPos(), node.endPos() - node.startPos()); printf("Node: %s\n", content.c_str()); } void test_html() { std::string page = "
1\n2\n
"; CDocument doc; doc.parse(page.c_str()); CNode pNode = doc.find("div").nodeAt(0); std::string content = page.substr(pNode.startPos(), pNode.endPos() - pNode.startPos()); printf("Node: #%s#\n", content.c_str()); } void test_escape() { std::string page = "
1\n2\n
"; CDocument doc; doc.parse(page.c_str()); CNode pNode = doc.find("span[id=\"that's\"]").nodeAt(0); std::string content = page.substr(pNode.startPos(), pNode.endPos() - pNode.startPos()); printf("Node: #%s#\n", content.c_str()); } int main() { test_parser(); test_html(); test_escape(); } ```