This website uses cookies to anonymously analyze website traffic using Google Analytics.

36K GPUs NVIDIA GB200 NVL72, coming in Q1 2025. Request your cluster

The AI Acceleration AccelerationCloud

Train, fine-tune-and run inference on AI models blazing fast, at low cost, and at production scale.

Trusted by

200+ generative AI models

Build with open-source and specialized multimodal models for chat, images, code, and more. Migrate from closed models with OpenAI-compatible APIs.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Try now
together.ai

End-to-end platform for the full generative AI lifecycle

Leverage pre-trained models, fine-tune them for your needs, or build custom models from scratch. Whatever your generative AI needs, Together AI offers a seamless continuum of AI compute solutions to support your entire journey.

  • Inference

    The fastest way to launch AI models:

    • ✔ Serverless or dedicated endpoints

    • ✔ Deploy in enterprise VPC or on-prem

    • ✔ SOC 2 and HIPAA compliant

  • Fine-Tuning

    Tailored customization for your tasks

    • ✔ Complete model ownership

    • ✔ Fully tune or adapt models

    • ✔ Easy-to-use APIs

    • Full Fine-Tuning
    • LoRA Fine-Tuning
  • GPU Clusters

    Full control for massive AI workloads

    • ✔ Accelerate large model training

    • ✔ GB200, H200, and H100 GPUs

    • ✔ Pricing from $1.75 / hour

Run
models

Train

Models

Speed, cost, and accuracy. Pick all three.

SPEED RELATIVE TO VLLM

4x FASTER

LLAMA-3 8B AT FULL PRECISION

400 TOKENS/SEC

COST RELATIVE TO GPT-4o

11x lower cost

Why Together Inference

Powered by the Together Inference Engine, combining research-driven innovation with deployment flexibility.

Control your IP.
‍Own your AI.

Fine-tune open-source models like Llama on your data and run them on Together Cloud, in a hyperscaler VPC, or on-prem. With no vendor lock-in, your AI remains fully under your control.

together files upload acme_corp_customer_support.jsonl
  
{
  "filename" : "acme_corp_customer_support.json",
  "id": "file-aab9997e-bca8-4b7e-a720-e820e682a10a",
  "object": "file"
}
  
  
together finetune create --training-file file-aab9997-bca8-4b7e-a720-e820e682a10a
--model together compute/RedPajama-INCITE-7B-Chat

together finetune create --training-file $FILE_ID 
--model $MODEL_NAME 
--wandb-api-key $WANDB_API_KEY 
--n-epochs 10 
--n-checkpoints 5 
--batch-size 8 
--learning-rate 0.0003
{
    "training_file": "file-aab9997-bca8-4b7e-a720-e820e682a10a",
    "model_output_name": "username/togethercomputer/llama-2-13b-chat",
    "model_output_path": "s3://together/finetune/63e2b89da6382c4d75d5ef22/username/togethercomputer/llama-2-13b-chat",
    "Suffix": "Llama-2-13b 1",
    "model": "togethercomputer/llama-2-13b-chat",
    "n_epochs": 4,
    "batch_size": 128,
    "learning_rate": 1e-06,
    "checkpoint_steps": 2,
    "created_at": 1687982945,
    "updated_at": 1687982945,
    "status": "pending",
    "id": "ft-5bf8990b-841d-4d63-a8a3-5248d73e045f",
    "epochs_completed": 3,
    "events": [
        {
            "object": "fine-tune-event",
            "created_at": 1687982945,
            "message": "Fine tune request created",
            "type": "JOB_PENDING",
        }
    ],
    "queue_depth": 0,
    "wandb_project_name": "Llama-2-13b Fine-tuned 1"
}

Forge the AI frontier. Train on expert-built clusters.

Built by AI researchers for AI innovators, Together GPU Clusters are powered by NVIDIA GB200, H200, and H100 GPUs, along with the Together Kernel Collection — delivering up to 24% faster training operations.

  • Top-Tier NVIDIA GPUs

    NVIDIA's latest GPUs, like GB200, H200, and H100,
for peak AI performance, supporting both training and inference.

  • Accelerated Software Stack

    The Together Kernel Collection includes
custom CUDA kernels, reducing training times and costs with superior throughput.

  • High-Speed Interconnects

    InfiniBand and NVLink ensure fast
communication between GPUs,
eliminating bottlenecks and enabling
rapid processing of large datasets.

  • Highly Scalable & Reliable

    Deploy 16 to 1000+ GPUs across global locations, with 99.9% uptime SLA.

  • Expert AI Advisory Services

    Together AI’s expert team offers
consulting for custom model development
and scalable training best practices.

  • Robust Management Tools

    Slurm and Kubernetes orchestrate
dynamic AI workloads, optimizing training
and inference seamlessly.

Training-ready clusters – H100, H200, or A100

Reserve your cluster today

THE AI
ACCELERATION
CLOUD

BUILT ON LEADING AI RESEARCH.

Sphere

Innovations

Our research team is behind breakthrough AI models, datasets, and optimizations.

Customer Stories

See how we support leading teams around the world. Our customers are creating innovative generative AI applications, faster.

Pika creates the next gen text-to-video models on Together GPU Clusters

Nexusflow uses Together GPU Clusters to build cybersecurity models

Arcee builds domain adaptive language models with Together Custom Models

Start
building
yours
here →