Skip to content

jxtngx/nemo-lab

Repository files navigation

NeMo Lab

Important

NeMo Lab is under active development

NeMo Lab is an example template for Generative AI with NVIDIA NeMo 2.0.

NVIDA NeMo is an accelerated end-to-end platform that is flexible and production ready. NeMo is comprised of several component frameworks which enable teams to build, customize, and deploy Generative AI solutions for:

  • large language models
  • vision language models
  • video models
  • speech models

Tip

Get started with the quick start tutorials and scripts

Tutorial Concepts

NeMo Lab is inspired by NeMo tutorials and openhackathons-org/End-to-End-LLM; of which the later follows the below shown pipeline to guide hackathon participants through instruction tuning and deploying a Llama variant:

flowchart LR
id1(data processing) --> id2(model development) --> id3(model deployment)
Loading

NeMo Lab currently focuses on language models, and will expand into NeMo audio, vision, and multimodal capabilities when appropriate.

Data Processing

Data processing is task dependent, relative to pretraining or finetuning. When pretraining, we will use Hugging Face's cosmopedia dataset. When finetuning, we will use NeMo's default dataset – SquadDataModule – a variant of the Stanford QA dataset.

Note

Refer to the data processing tutorial for a detailed walk-through

Model Development

We will use NeMo to train Nemotron 3 4B on the cosmopedia dataset, and tune a Llama variant on the SQuAD dataset.

Note

Refer to the model development tutorial for a detailed walk-through

Model Deployment

We will use NeMo interfaces to export models for inference with TensorRT-LLM and Triton Inference Server, or vLLM.

Note

Refer to the model deployment tutorial for a detailed walk-through

Additional Concepts

Source Code Concepts

The source code found in src/nemo_lab is used to provide examples of implementing concepts "from-scratch" with NeMo. For instance – how might we add a custom model, or our own training recipe given base interfaces and mixins found within the framework.

Note

The current focus for the source code is implementing support for Llama 3.2 variants

Models

We will use NVIDIA and Meta models including, but not limited to:

  • NVIDIA Llama variants, Mistral variants, Megatron distillations, and Minitron
  • NVIDIA embedding, reranking, and retrieval models
  • NVIDIA Cosmos tokenizers
  • NeMo compatible Meta Llama variants

Tip

See models/ for more on model families and types

System Requirements

  • a CUDA compatible OS and device (GPU) with at least 48GB of VRAM (e.g. an L40S).
  • CUDA 12.1
  • Python 3.10.10
  • Pytorch 2.2.1

Tip

See hardware/ for more regarding VRAM requirements of particular models

User Account Requirements

Setup

Tip

Get started with the quick start tutorials and scripts

On Host (local, no container)

To prepare a development environment, please run the following in terminal:

bash install_requirements.sh

Doing so will install nemo_lab along with the nemo_run, megatron_core 0.10.0rc0, and the nvidia/apex PyTorch extension.

Note

megatron_core 0.10.0rc0 is required for compatibility with NeMo 2.0

Note

NVIDIA Apex is required for RoPE Scaling in NeMo 2.0. NVIDIA Apex is built with CUDA and C++ extensions for performance and full functionality. please be aware that the build process may take several minutes

Docker

Two Docker images have been created for the quick start tutorials. One for pretraining, and one for finetuning.

To run pretraining, do the following in terminal:

docker pull jxtngx/nemo-pretrain-nemotron3-4b
docker run --rm --gpus 1 -it jxtngx/nemo-pretrain-nemotron3-4b
python pretrain_nemotron3_4b.py

To run finetuning, do the following in terminal:

docker pull docker pull jxtngx/nemo-finetune-llama3-8b
docker run --rm --gpus 1 -it docker jxtngx/nemo-finetune-llama3-8b
huggingface-cli login
{ENTER HF KEY WHEN PROMPTED}
python finetune_llama3_8b.py

Important

Finetuning requires a Hugging Face key and access to Llama 3 8B
For keys, see: https://huggingface.co/docs/hub/en/security-tokens
For Llama 3 8B access, see: https://huggingface.co/meta-llama/Meta-Llama-3-8B

Hosted Compute Environments

See Quickstart Studios and Images

Resources

Quickstart Studios and Images

Quickstart Studio Docker
Pretrain Nemotron 3 4B Open In Studio
Finetune Llama 3 8B Open In Studio

NeMo References

Dependency References

Interoperability Guides

NVIDIA Deep Learning Institute

NVIDIA On-Demand

NVIDIA Technical Blog

Academic Papers

Additional Materials