Skip to content

PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/Docker

License

Notifications You must be signed in to change notification settings

Byaidu/PDFMathTranslate

Repository files navigation

English | 简体中文

PDF2ZH

PDFMathTranslate

PDF scientific paper translation and bilingual comparison.

Feel free to provide feedback in GitHub Issues, Telegram Group or QQ Group.

Updates

  • [Nov. 26 2024] CLI now supports online file(s) (by @reycn)
  • [Nov. 24 2024] ONNX support to reduce dependency sizes (by @Wybxc)
  • [Nov. 23 2024] 🌟 Public Service online! (by @Byaidu)
  • [Nov. 23 2024] Firewall for preventing web bots (by @Byaidu)
  • [Nov. 22 2024] GUI now supports Italian, and has been improved (by @Byaidu, @reycn)
  • [Nov. 22 2024] You can now share your deployed service to others (by @Zxis233)
  • [Nov. 22 2024] Now supports Tencent Translation (by @hellofinch)
  • [Nov. 21 2024] GUI now supports downloading dual-document (by @reycn)
  • [Nov. 20 2024] 🌟 Demo online! (by @reycn)

Preview

Public Service 🌟

Free Service (https://pdf2zh.com/)

You can try our public service online without installation.

Hugging Face Demo

You can try our demo on HuggingFace without installation. Note that the computing resources of the demo are limited, so please avoid abusing them.

Installation and Usage

We provide three methods for using this project: Commandline, GUI, and Docker.

Method I. Commandline

  1. Python installed (3.8 <= version <= 3.12)

  2. Install our package:

    pip install pdf2zh
  3. Execute translation, files generated in current working directory:

    pdf2zh document.pdf

Method II. GUI

  1. Python installed (3.8 <= version <= 3.12)

  2. Install our package:

    pip install pdf2zh
  3. Start using in browser:

    pdf2zh -i
  4. If your browswer has not been started automatically, goto

    http://localhost:7860/

See documentation for GUI for more details.

Method III. Docker

  1. Pull and run:

    docker pull byaidu/pdf2zh
    docker run -d -p 7860:7860 byaidu/pdf2zh
  2. Open in browser:

    http://localhost:7860/
    

For docker deployment on cloud service:

Deploy Deploy to Koyeb Deploy on Zeabur Deploy to Koyeb

Advanced Options

Execute the translation command in the command line to generate the translated document example-zh.pdf and the bilingual document example-dual.pdf in the current working directory. Use Google as the default translation service.

cmd

In the following table, we list all advanced options for reference:

Option Function Example
files Local files pdf2zh ~/local.pdf
links Online files pdf2zh http://arxiv.org/paper.pdf
-i Enter GUI pdf2zh -i
-p Partial document translation pdf2zh example.pdf -p 1
-li Source language pdf2zh example.pdf -li en
-lo Target language pdf2zh example.pdf -lo zh
-s Translation service pdf2zh example.pdf -s deepl
-t Multi-threads pdf2zh example.pdf -t 1
-o Output dir pdf2zh example.pdf -o output
-f, -c Exceptions pdf2zh example.pdf -f "(MS.*)"

Some services require setting environmental variables.

Full / partial document translation

  • Entire document

    pdf2zh example.pdf
  • Part of the document

    pdf2zh example.pdf -p 1-3,5

Specify source and target languages

See Google Languages Codes, DeepL Languages Codes

pdf2zh example.pdf -li en -lo ja

Translate with Different Services

  • DeepL

    See DeepL

    Set ENVs to construct an endpoint like: {DEEPL_SERVER_URL}/translate

    • DEEPL_SERVER_URL (Optional), e.g., export DEEPL_SERVER_URL=https://api.deepl.com
    • DEEPL_AUTH_KEY, e.g., export DEEPL_AUTH_KEY=xxx
    pdf2zh example.pdf -s deepl
  • DeepLX

    See DeepLX

    Set ENVs to construct an endpoint like: {DEEPL_SERVER_URL}/translate

    • DEEPLX_SERVER_URL (Optional), e.g., export DEEPLX_SERVER_URL=https://api.deeplx.org
    • DEEPLX_AUTH_KEY, e.g., export DEEPLX_AUTH_KEY=xxx
    pdf2zh example.pdf -s deeplx
  • Ollama

    See Ollama

    Set ENVs to construct an endpoint like: {OLLAMA_HOST}/api/chat

    • OLLAMA_HOST (Optional), e.g., export OLLAMA_HOST=https://localhost:11434
    pdf2zh example.pdf -s ollama:gemma2
  • LLM with OpenAI compatible schemas (OpenAI / SiliconCloud / Zhipu)

    See SiliconCloud, Zhipu

    Set ENVs to construct an endpoint like: {OPENAI_BASE_URL}/chat/completions

    • OPENAI_BASE_URL (Optional), e.g., export OPENAI_BASE_URL=https://api.openai.com/v1
    • OPENAI_API_KEY, e.g., export OPENAI_API_KEY=xxx
    pdf2zh example.pdf -s openai:gpt-4o
  • Azure

    See Azure Text Translation

    Following ENVs are required:

    • AZURE_APIKEY, e.g., export AZURE_APIKEY=xxx
    • AZURE_ENDPOINT, e.g, export AZURE_ENDPOINT=https://api.translator.azure.cn/
    • AZURE_REGION, e.g., export AZURE_REGION=chinaeast2
    pdf2zh example.pdf -s azure
  • Tencent Machine Translation

    See Tencent Machine Translation

    Following ENVs are required:

    • TENCENT_SECRET_ID, e.g., export TENCENT_SECRET_ID=AKIDxxx
    • TENCENT_SECRET_KEY, e.g, export TENCENT_SECRET_KEY=xxx
    pdf2zh example.pdf -s tencent

Translate wih exceptions

Use regex to specify formula fonts and characters that need to be preserved:

pdf2zh example.pdf -f "(CM[^RT].*|MS.*|.*Ital)" -c "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])"

Specify threads

Use -t to specify how many threads to use in translation:

pdf2zh example.pdf -t 1

TODO

  • Parse layout with PaddleX, PaperMage, SAM2

  • Fix page rotation, table of contents, format of list

  • Fix pixel formula in old paper

  • Support multiple language with Noto Font, Ubuntu Font

  • Retry except KeyboardInterrupt

Acknowledgements

Contributors

Alt

Star History

Star History Chart

About

PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/Docker

Topics

Resources

License

Stars

Watchers

Forks

Packages