pdf转markdown

You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

Go to file

zhangzhichao d30f2e3bf1 文本方向分类优化&更新版面分析模型		4 weeks ago
helper	文本方向分类优化&更新版面分析模型	4 weeks ago
models/PaddleDetection/inference_model	文本方向分类优化&更新版面分析模型	4 weeks ago
third_party	文本方向分类优化&更新版面分析模型	4 weeks ago
visual_images	first commit	1 month ago
.env	文本方向分类优化&更新版面分析模型	4 weeks ago
.env.dev	文本方向分类优化&更新版面分析模型	4 weeks ago
.gitignore	表格识别和扫描件识别内部的ocr改为paddleocr	1 month ago
README.md	first commit	1 month ago
download_MinerU_models.py	first commit	1 month ago
magic-pdf.json	first commit	1 month ago
pipeline.py	文本方向分类优化&更新版面分析模型	4 weeks ago
requirements.txt	文本方向分类优化&更新版面分析模型	4 weeks ago
server.py	文本方向分类优化&更新版面分析模型	4 weeks ago

README.md

Unescape Escape

环境说明

# 1. pdf转图片需要安装以下依赖
apt install -y poppler-utils
# 2. 解决paddle的兼容问题
export FLAGS_enable_pir_api=0
# 3. 安装paddlepaddle-gpu后需要配置的环境
apt install libcudnn8
apt install libcudnn8-dev
echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/python3.11/site-packages/nvidia/cublas/lib/" >> /etc/profile
# 4. 手动安装MinerU需要用到的模型，下载路径:
# model_dir is: /root/.cache/modelscope/hub/models/opendatalab/PDF-Extract-Kit-1___0/models
# layoutreader_model_dir is: /root/.cache/modelscope/hub/models/ppaanngggg/layoutreader
# The configuration file has been configured successfully, the path is: /root/magic-pdf.json
# 需要将项目中的magic-pdf.json链接到/root/magic-pdf.json
ln -s `pwd`/magic-pdf.json /root/magic-pdf.json
python download_MinerU_models.py
# 5. python连接postgresql需要下载的依赖
apt install postgresql postgresql-contrib libpq-dev

README.md Unescape Escape

环境说明

README.md

Unescape Escape