Build Agentic AI With NVIDIA NIM and NeMo

Explore the latest optimized AI models, connect AI agents to data with NVIDIA NeMo, and deploy anywhere with NVIDIA NIM microservices.

Try Now Watch Video

Try Now

Experience Leading Open Models Now

Integrations

Accelerated AI is Just an API Call Away

Get up and running quickly with familiar APIs.

Start Building With World-Class Models

Cosmos World Foundation Models generate physics-aware videos and world states for physical AI development. Cosmos Nemotron VLMs enable querying images and videos from the physical or virtual world. Llama Nemotron LLMs advance the best open-source models with the latest techniques from NVIDIA for unparalleled performance.

Learn More About Cosmos

Build With Cosmos Nemotron

Learn More About Llama Nemotron

Seamless Compatibility With Popular Libraries

Use NVIDIA APIs from your existing tools and applications with as little as three lines of code.

Use the Tools You Love

Work with your favorite LLM programming frameworks, including LangChain and LlamaIndex, and easily integrate the latest AI models into your applications.

Learn More About Building With These Tools and NVIDIA NIM

Unlock Insights From Enterprise Data

Data powers modern enterprise applications. Connect AI agents to enterprise data at scale with an AI query engine that uses retrieval-augmented generation (RAG) to equip employees with instant, accurate institutional knowledge.

Learn More

NVIDIA Blueprints

Everything you need to build impactful agentic AI applications. Each blueprint includes NVIDIA NIM™, NeMo, and partner microservices, one or more AI agents, sample code, customization instructions, and a Helm chart for deployment.

Learn More

Try Now

Run Anywhere

Accelerate Your AI Deployment With NVIDIA NIM

Part of NVIDIA AI Enterprise, NVIDIA NIM is a set of easy-to-use inference microservices for accelerating the deployment of foundation models on any cloud or data center and helping to keep your data secure.

Deploy Now

Accelerate Your AI Deployment With NVIDIA NIM

Learn More About NVIDIA NIM

See How to Deploy NIM in Five Minutes

Deploy NIM

Deploy NIM for your model with a single command. You can also easily run NIM with fine tuned-models.

Run Inference

Get NIM up and running with the optimal runtime engine based on your NVIDIA-accelerated infrastructure.

Build

Developers can integrate self-hosted NIM endpoints with just a few lines of code.

Deploy

Run

Build

docker run nvcr.io/nim/publisher_name/model_name

 
 curl -X 'POST' \ 
  'http://0.0.0.0:8000/v1/completions' \ 
  -H 'accept: application/json' \ 
  -H 'Content-Type: application/json' \ 
  -d '{ 
  "model" : "model_name", 
  "prompt" : "Once upon a time", 
  "max_tokens" : 64 
 }' 

 
 import openai 
 client = openai.OpenAI( 
  base_url = "YOUR_LOCAL_ENDPOINT_URL", 
  api_key="YOUR_LOCAL_API_KEY" 
 ) 
 chat_completion = client.chat.completions.create( 
  model="model_name", 
  messages=[{"role" : "user" , "content" : "Write me a love song" }], 
  temperature=0.7 
 ) 

Launch Locally or Scale With Kubernetes

Seamlessly deploy containerized AI microservices on any NVIDIA accelerated infrastructure, from a single device to data center scale.

Deploy Securely With Confidence

Rely on production-grade runtimes, including ongoing security updates, and run your business applications with stable APIs backed by enterprise-grade support.

Lower Costs and Your Carbon Footprint

Lower the operational cost of running models in production with AI runtimes that are continuously optimized for low latency and high throughput on NVIDIA-accelerated infrastructure.

Throughput

NVIDIA NIM provides optimized throughput and latency out of the box to maximize token generation, support concurrent users at peak times, and improve responsiveness.

Configuration: Llama3.1-8B-instruct, 1x H100SXM; input 1000 tokens, output 1000 tokens. Concurrent requests: 200. NIM On : FP8. throughput 6,354 tokens/s, TTFT 0.4s, ITL: 31ms. NIM Off : FP8. throughput 2,265 tokens/s, TTFT 1.1s, ITL: 85ms

Customization

Tailor NIM Microservices for Your Domain-Specific Needs

Fine-Tune With NVIDIA NeMo

NVIDIA NeMo™ is an end-to-end platform for developing custom generative AI anywhere. It includes tools for training, customization and retrieval-augmented generation, guardrailing, data curation, and model pretraining, offering enterprises an easy, cost-effective, and fast way to adopt generative AI.

Read Technical Blog

Get Started With Tutorials

See NVIDIA Agentic AI in Action

Talk to Your Supply Chain Data

Supercharge Software Delivery With Event-Driven RAG

Always Available, Real-Time Generative AI Healthcare Agents

Digital Humans Transform Industries

Generative AI Microservices for Virtual Screening

Metropolis Video Search and Summarization

Get Started

Start Prototyping for Free

Get started with easy-to-use, NVIDIA-managed serverless APIs.

Access fully accelerated AI infrastructure.
Ensure your data isn't used for model training.
Get started for free with 1,000 inference credits.

Download and Deploy

Run NVIDIA NIM to scale optimized AI models in the cloud or data center of your choice.

Ensure that data never leaves your secure enclave.
Seamlessly transition from cloud endpoints to self-hosted APIs without code changes.
Use an NVIDIA AI Enterprise license for production, or get started for free with the NVIDIA Developer Program.

Build Now

Use Cases

Ignite Your Innovation

See how NVIDIA APIs support industry use cases and jump-start your AI development with curated examples.

Digital Humans

Bring game characters to life or create interactive virtual avatars to enhance customer service, empowering your application to connect more deeply with users.

Learn More

Content Generation

Generate highly relevant, bespoke, and accurate content, grounded in the domain expertise and proprietary IP of your enterprise.

Learn More

Biomolecular Generation

Biomolecular generative models and the computational power of GPUs efficiently explore the chemical space, rapidly generating diverse sets of small molecules tailored to specific drug targets or properties.

Learn More

Ecosystem

Take Your Enterprise AI Farther, Faster

Join leading partners to develop your AI applications with models, toolkits, vector databases, frameworks, and infrastructure from our ecosystem.

Resources

Unlock, Upskill, and Upscale

NVIDIA LaunchPad

Unlock AI With a Hands-On Lab

Experience end-to-end AI solutions through guided hands-on labs for development frameworks, retrieval-augmented generation (RAG)-based chatbots, route optimizations, and more.

Go to LaunchPad

NVIDIA Developer Program

Accelerate Your AI Applications

Get free access to NIM for application development, research, and testing plus technical learning resources through the NVIDIA Developer Program.

Start Building

AI Workbench

Start Small. Scale Big

NVIDIA AI Workbench gives developers the flexibility to run API-enabled models on local or remote GPU-powered containers, allowing for interactive project workflows from experimentation to prototyping to proof of concept.

Learn More About AI Workbench

News

Explore NVIDIA NIM in the News

Check out the latest NVIDIA press releases to see how NIM and generative AI are impacting industries, partners, customers, and more.

Documentation

Explore technical documentation to start prototyping and building your enterprise AI applications with NVIDIA APIs, or scale on your own infrastructure with NVIDIA NIM.

NVIDIA API Docs NVIDIA NIM Docs