Stars
code for paper "Accessing higher dimensions for unsupervised word translation"
Adaptive Length Image Tokenization via Recurrent Allocation | How many tokens is an image worth ?
A suite of image and video neural tokenizers
Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention)
A Comprehensive Toolkit for High-Quality PDF Content Extraction
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
Text-to-Music Generation with Rectified Flow Transformers
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-sim…
A collection of projects designed to help developers quickly get started with building deployable applications using the Anthropic API
Efficient Triton Kernels for LLM Training
High-resolution models for human tasks.
SGLang is a fast serving framework for large language models and vision language models.
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬
Benchmarking Legal Knowledge of Large Language Models
real time face swap and one-click video deepfake with only a single image
Official PyTorch implementation of "Authentic Hand Avatar from a Phone Scan via Universal Hand Model", CVPR 2024.
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
Run OpenAI's CLIP and Apple's MobileCLIP model on iOS to search photos.
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
Claude Engineer is an interactive command-line interface (CLI) that leverages the power of Anthropic's Claude-3.5-Sonnet model to assist with software development tasks. This tool combines the capa…