Lists (6)
Sort Name ascending (A-Z)
Stars
Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
A principled instruction benchmark on formulating effective queries and prompts for large language models (LLMs). Our paper: https://arxiv.org/abs/2312.16171
A context-based spellchecker for correcting OCR output.
A packaged and flexible version of the CRAFT text detector and Keras CRNN recognition model.
📄 Awesome OCR multiple programing languages toolkits based on ONNXRuntime, OpenVINO and PaddlePaddle.
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
Official implementation of Character Region Awareness for Text Detection (CRAFT)
AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.
text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network
Text recognition (optical character recognition) with deep learning methods, ICCV 2019
OpenMMLab Text Detection, Recognition and Understanding Toolbox
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
Fast and simple OCR library written in Swift
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Tesseract Open Source OCR Engine (main repository)
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…
PDFium - Project to compile PDFium library to multiple platforms.
A python module that wraps the pdftoppm utility to convert PDF to PIL Image object
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class…
A Python library for reading and writing PDF, powered by QPDF
Simple wrapper of tabula-java: extract table from PDF into pandas DataFrame
Evaluating the performance and accuracy of ABBYY FineReader's OCR on Senate Financial Disclosure scanned forms
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
A machine learning software for extracting information from scholarly documents