extract

Document (PDF) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown

api pdf json ocr extract anonymization pii ocr-python llm

Updated Nov 12, 2024
Python

jacoscaz / taskparser

Star

A CLI tool to extract tasks and worklogs out of Markdown documents.

markdown parser todo extract tasks

Updated Nov 12, 2024
TypeScript

LuizBrunoST / ExtractFrameVideo

Star

PRINT OS VIDEOS FAÇIL VIDEO FRAME EXTRACTOR

extract extract-frame-video

Updated Nov 12, 2024
CSS

torakiki / pdfsam

Sponsor

Star

PDFsam, a desktop application to split, merge, mix, rotate PDF files and extract pages

java pdf javafx extract split merge rotate splitter combine pdf-manipulation pdf-merge pdf-extractor pdf-split pdf-rotate pdf-mix split-pdf merge-pdf merger pdf-combiner

Updated Nov 11, 2024
Java

jrson83 / rehype-extract-excerpt

Star

rehype plugin which attaches a document's first paragraph to the VFile

plugin markdown extract ast unified rehype hast excerpt rehype-plugin

Updated Nov 11, 2024
TypeScript

keilerkonzept / dockerfile-json

Star

🐳 parse & print a Dockerfile as JSON, query (e.g. extract base images) using JSONPath.

docker cli golang dockerfile json extract jq jsonpath base-images build-args

Updated Nov 10, 2024
Go

MicheleCotrufo / pdf2doi

Star

A python library/command-line tool to extract the DOI or other identifiers of a scientific paper from a pdf file.

python metadata pdf bibtex extract doi arxiv identifiers pypdf2 bibtex-entry pdf-text arxiv-identifiers extract-doi

Updated Nov 10, 2024
Python

Ne-Lexa / php-zip

Star

PhpZip is a php-library for extended work with ZIP-archives.

php php-library zip unzip extract archive zipalign ziparchive winzip

Updated Nov 10, 2024
PHP

sanori / node-unzip-mbcs

Star

UnZip for non-UTF8 encoding such as cp949, sjis, gbk, euc-kr, euc-jp, and gb2312

unzip extract sjis

Updated Nov 9, 2024
JavaScript

extractus / article-extractor

Star

To extract main article from given URL with Node.js

nodejs crawler scraper article extract readability article-parser article-extractor

Updated Nov 9, 2024
JavaScript

manferlo81 / rollup-plugin-strip-shebang

Star

A Rollup.js plugin to remove and optionally extract shebang.

plugin extract rollup rollup-plugin strip shebang hashbang

Updated Nov 12, 2024
TypeScript

elliotwutingfeng / go-fasttld

Star

go-fasttld is a high performance effective top level domains (eTLD) extraction module.

url golang parser osint public ipv6 extract ipv4 mozilla tldextract suffix tld punycode radix-tree public-suffix-list idna idn compressed-trie etld

Updated Nov 9, 2024
Go

droe / acefile

Star

read/test/extract ACE 1.0 and 2.0 archives in pure python

python python-library extract ace python3 pure-python archiver-ace

Updated Nov 9, 2024
Python

pratikkarbhal / m3u8_StreamSniper

Star

💻🐞Alternative for extracting HLS streams, Uses Puppeteer to capture .m3u8 URLs from free live-streaming sites automatically using GitHub Actions. No additional setup required.

scraper stream hls livestream extract m3u8 free snipper m3u8-parser

Updated Nov 8, 2024
JavaScript

sam-k0 / SpriteRipper

Star

Extract sprites from .exe / .win files (namely GameMaker games)

extract gamemaker gamemaker-studio gamemaker-studio-2 sussybaka assetrecovery

Updated Nov 8, 2024
C++

Fahad-alkamli / extract-ayah-using-opencv

Star

Quran ayah and Surah x and Y coordinates extractor

opencv extract quran

Updated Nov 8, 2024
Java

Improve this page

Add a description, image, and links to the extract topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the extract topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extract

Here are 897 public repositories matching this topic...

dlt-hub / dlt

the-real-tokai / grablinks

ICIJ / datashare

mholt / archiver

CatchTheTornado / pdf-extract-api

jacoscaz / taskparser

LuizBrunoST / ExtractFrameVideo

torakiki / pdfsam

jrson83 / rehype-extract-excerpt

keilerkonzept / dockerfile-json

MicheleCotrufo / pdf2doi

Ne-Lexa / php-zip

sanori / node-unzip-mbcs

extractus / article-extractor

manferlo81 / rollup-plugin-strip-shebang

elliotwutingfeng / go-fasttld

droe / acefile

pratikkarbhal / m3u8_StreamSniper

sam-k0 / SpriteRipper

Fahad-alkamli / extract-ayah-using-opencv

Improve this page

Add this topic to your repo