Stars
An Open Source Machine Learning Framework for Everyone
A community-supported supercharged version of paperless: scan, index and archive all your physical documents
Automated listing of repos in GitHub with XML files containing teiHeader. Find a project using TEI today!
🎡 Build Python wheels for all the platforms with minimal configuration.
Project repository for the backend module of OCR-D Implementation Project OLA-HD
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…
Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Read and extract text and other content from PDFs in C# (port of PDFBox)
A high-performance, zero-overhead, extensible Python compiler using LLVM
Layout analysis to find layout elements in documents (similar to P2PaLA)
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
ICIP 2022: Adaptive Radial Projection on Fourier Magnitude Spectrum for Document Image Skew Estimation
A feature-rich command-line audio/video downloader
Custom tooling for pylint and other repo management tools
A cross-platform command-line utility that creates projects from cookiecutters (project templates), e.g. Python package projects, C projects.
Collection of OCR-related python tools and wrappers from @OCR-D
Website for OCR-D specs, formats, requirements
Tesseract Open Source OCR Engine (main repository)
A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.