Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
-
Updated
Dec 30, 2025 - Python
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
📄 Awesome OCR multiple programing languages toolkits based on ONNXRuntime, OpenVINO, PaddlePaddle and PyTorch.
OpenOCR: An Open-Source Toolkit for General OCR Research and Applications, integrates a unified training and evaluation benchmark, commercial-grade OCR and Document Parsing systems, and faithful reproductions of the core implementations from a wide range of academic papers.
A program for extracting hard coded (burned in) subtitle from a video and generating an external subtitle.
Boosting Document Intelligence
This repository offers a simple OCR library that leverages system APIs like VisionKit and Media OCR for accurate text recognition. Check out the examples and start integrating with ease! 🐙✨
Open Models For Document Intelligence
🖼️ Enhance text recognition efficiency with this AI-driven multilingual OCR tool, designed for high accuracy and automated image preprocessing.
Add a description, image, and links to the chineseocr topic page so that developers can more easily learn about it.
To associate your repository with the chineseocr topic, visit your repo's landing page and select "manage topics."