Skip to content

Models trained on 110 million large-scale datasets #372

Open
@chaodreaming

Description

@chaodreaming

Dear Seniors, I trained a model for 110 million datasets, and although the model didn't converge completely, the results have met the requirements of most scenarios. Unlike Latex-OCR, I referenced the images to yolov8 for processing, scaled them uniformly, and used detection to differentiate multi-line formulas, which is more suitable for real-world scenarios, and provided the onnx model directly, considering that formulas of such a large scale are not trainable by the general public.
For more information about the code Via the code

https://github.com/chaodreaming/Simple-LaTeX-OCR
For some other features like gui etc, I can't take on such a large amount of work alone, so you can consider merging them if you want to
If it's useful for your work, please ⭐ my repo

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions