# pdf_searcher

**Repository Path**: along_coding/pdf_searcher

## Basic Information

- **Project Name**: pdf_searcher
- **Description**: searching content in massive pdf document
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2016-09-10
- **Last Updated**: 2020-12-19

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

pdf文本检索的小工具

folder tree had been shown in tree.txt

轮子：
    pdfminer,pypdf2,pdfbox(java),whoosh,jieba

get requirement:
    pip install -r requirement.txt

HOWTOUSE:
    1.make ./pdf_searcher/source linked to your document folder direction
    2.make sure ./pdf_searcher/src/webapp/static linked to the same direction as ./pdf_searcher/source
    3.run
        python ./pdf_searcher/src/Main_Daemonlize.py start|stop|restart
      to setup web service and auto update service as daemon process and manage the service
      web service run in port 8080
    4.run
        python ./pdf_searcher/src/Main_CMD.py
      for more administrator operation,type help for operation detail

TODO:
    2017.5.29:
        1.明确结果排序，增加结果显示前端分页功能，美化前端
        2.优化index性能

        2017.6.3 update:
        1.结果按照库内部评分可以排序，分页已经实现
        2.前端待美化，index待优化

    2017.6.3：先完整，后优化，先做到能用。
        1.增强参数配置的灵活性，把配置操作集中到一个config模块中
        2.开始考虑部署以及使用的交互，使软件易于使用
        3.开始尝试看自己用的库的源代码，从自己已经用的功能入手，学习规范开发和单元测设技术