data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
-
Updated
Nov 13, 2024 - Python
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
A self-hosted search engine for documents.
Document (PDF) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown
PDFsam, a desktop application to split, merge, mix, rotate PDF files and extract pages
🐳 parse & print a Dockerfile as JSON, query (e.g. extract base images) using JSONPath.
A python library/command-line tool to extract the DOI or other identifiers of a scientific paper from a pdf file.
PhpZip is a php-library for extended work with ZIP-archives.
To extract main article from given URL with Node.js
go-fasttld is a high performance effective top level domains (eTLD) extraction module.
read/test/extract ACE 1.0 and 2.0 archives in pure python
💻🐞Alternative for extracting HLS streams, Uses Puppeteer to capture .m3u8 URLs from free live-streaming sites automatically using GitHub Actions. No additional setup required.
Extract sprites from .exe / .win files (namely GameMaker games)
Add a description, image, and links to the extract topic page so that developers can more easily learn about it.
To associate your repository with the extract topic, visit your repo's landing page and select "manage topics."