Stars
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images
MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
Autonomous Agents (LLMs) research papers. Updated Daily.
A comprehensive tool for processing and analyzing video footage, producing detailed insights into gameplay and player performance enhancing game understanding and performance evaluation.
🔥🕷️ Crawl4AI: Open-source LLM Friendly Web Crawler & Scrapper
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
🤖 MLE-Agent: Your intelligent companion for seamless AI engineering and research. 🔍 Integrate with arxiv and paper with code to provide better code/research plans 🧰 OpenAI, Anthropic, Ollama, etc s…
OpenAI Whisper ASR Webservice API
Speech To Speech: an effort for an open-sourced and modular GPT4-o
A permissively licensed implementation of YOLOv9.
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped, CVPR 2022
UI for your AI. Open Source Tailwind components tailored for your GPT, generative AI, and LLM projects.
Code review powered by LLMs (OpenAI GPT4, Sonnet 3.5) & Embeddings ⚡️ Improve code quality and catch bugs before you break production 🚀 Lives in your Github/GitLab/Azure DevOps CI
An open source implementation of OpenAI's ChatGPT Code interpreter
📋 A list of open LLMs available for commercial use.
A demo of an GPT-based agent existing in an RPG-like environment
Langflow is a low-code app builder for RAG and multi-agent AI applications. It’s Python-based and agnostic to any model, API, or database.
Interactively explore unstructured datasets from your dataframe.
Rembg is a tool to remove images background