vlm
Here are 163 public repositories matching this topic...
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.
-
Updated
Dec 11, 2024 - Python
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
-
Updated
Nov 7, 2024 - Python
An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organizes files for quick, seamless access and easy retrieval.
-
Updated
Oct 21, 2024 - Python
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
-
Updated
Dec 9, 2024 - Python
LLM Agent Framework in ComfyUI includes Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfaces, such as o1,ollama, gemini, grok, qwen, GLM, deepseek, moonshot,doubao. Adapted to local llms, vlm, gguf such as llama-3.2, Linkage graphRAG / RAG
-
Updated
Dec 8, 2024 - Python
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
-
Updated
Dec 11, 2024
Aircraft design optimization made fast through computational graph transformations (e.g., automatic differentiation). Composable analysis tools for aerodynamics, propulsion, structures, trajectory design, and much more.
-
Updated
Dec 10, 2024 - Jupyter Notebook
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
-
Updated
Nov 4, 2024
Famous Vision Language Models and Their Architectures
-
Updated
Sep 8, 2024 - Markdown
[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing
-
Updated
Nov 28, 2024 - Python
A curated list of awesome papers on Embodied AI and related research/industry-driven resources.
-
Updated
Nov 29, 2024
A streamlined and customizable framework for efficient large model evaluation and performance benchmarking
-
Updated
Dec 11, 2024 - Python
[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models
-
Updated
Oct 2, 2024 - Python
Phi-3.5 for Mac: Locally-run Vision and Language Models for Apple Silicon
-
Updated
Sep 7, 2024 - Jupyter Notebook
JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.
-
Updated
Nov 29, 2024 - Python
Improve this page
Add a description, image, and links to the vlm topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the vlm topic, visit your repo's landing page and select "manage topics."