Skip to content
@ucbepic

EPIC Data Lab

Effective Programming Interaction and Computation with Data

Popular repositories Loading

  1. docetl docetl Public

    A system for agentic LLM-powered data processing and ETL

    Python 3.4k 358

  2. TWIX TWIX Public

    TWIX is an open-source data extraction tool that reconstructs structured data from documents at scale, accurately and at low cost, by inferring the shared underlying visual template across documents

    Python 210 17

  3. BARGAIN BARGAIN Public

    Low-Cost LLM-Powered Data Processing with Theoretical Guarantees

    Python 31 4

  4. data-agent-benchmark-study data-agent-benchmark-study Public

    Welcoming contributions from practitioners building AI/data systems - share your real-world problems, document where current tools fail, and help improve the benchmark taxonomy across the enterpris…

    Python 11 2

  5. pdf_parser pdf_parser Public

    Parse PDFs using computer vision, layout analysis, and other state-of-the-art document intelligence techniques. WebApp implemented in Flask/Jinja2 with infer and train pipelines managed by FlorDB

    JavaScript 9 2

  6. docetl-examples docetl-examples Public

    Examples of docetl pipelines

    Python 2 1

Repositories

Showing 8 of 8 repositories
  • task-cascades Public
    ucbepic/task-cascades’s past year of commit activity
    Python 1 0 0 0 Updated Jan 1, 2026
  • docetl Public

    A system for agentic LLM-powered data processing and ETL

    ucbepic/docetl’s past year of commit activity
    Python 3,361 MIT 358 28 (1 issue needs help) 8 Updated Dec 30, 2025
  • TWIX Public

    TWIX is an open-source data extraction tool that reconstructs structured data from documents at scale, accurately and at low cost, by inferring the shared underlying visual template across documents

    ucbepic/TWIX’s past year of commit activity
    Python 210 17 4 2 Updated Nov 26, 2025
  • BARGAIN Public

    Low-Cost LLM-Powered Data Processing with Theoretical Guarantees

    ucbepic/BARGAIN’s past year of commit activity
    Python 31 MIT 4 0 0 Updated Sep 24, 2025
  • data-agent-benchmark-study Public

    Welcoming contributions from practitioners building AI/data systems - share your real-world problems, document where current tools fail, and help improve the benchmark taxonomy across the enterprise data categories.

    ucbepic/data-agent-benchmark-study’s past year of commit activity
    Python 11 2 0 0 Updated Sep 4, 2025
  • docetl-examples Public

    Examples of docetl pipelines

    ucbepic/docetl-examples’s past year of commit activity
    Python 2 1 0 0 Updated Apr 22, 2025
  • pdf_parser Public

    Parse PDFs using computer vision, layout analysis, and other state-of-the-art document intelligence techniques. WebApp implemented in Flask/Jinja2 with infer and train pipelines managed by FlorDB

    ucbepic/pdf_parser’s past year of commit activity
    JavaScript 9 Apache-2.0 2 0 0 Updated Jul 26, 2024
  • ml_tutorial Public

    Introduction to Flordb with PyTorch and TensorFlow

    ucbepic/ml_tutorial’s past year of commit activity
    Jupyter Notebook 0 Apache-2.0 0 0 0 Updated Apr 9, 2024