![DeepSeek v3 and R1 Model Architecture: Why it's powerful and economical](/_next/image?url=https%3A%2F%2Fprod-files-secure.s3.us-west-2.amazonaws.com%2Fc285f3eb-d4f2-4ce1-8c53-25d0d3a0337b%2F63f53587-2abe-4f63-929b-5675b0f37a13%2FScreenshot_2025-02-06_at_8.52.58_PM.png%3FX-Amz-Algorithm%3DAWS4-HMAC-SHA256%26X-Amz-Content-Sha256%3DUNSIGNED-PAYLOAD%26X-Amz-Credential%3DASIAZI2LB4662PGLLDQG%252F20250217%252Fus-west-2%252Fs3%252Faws4_request%26X-Amz-Date%3D20250217T024701Z%26X-Amz-Expires%3D3600%26X-Amz-Security-Token%3DIQoJb3JpZ2luX2VjEEEaCXVzLXdlc3QtMiJHMEUCIQDqBNH5HvuNsdZiiWguq5fM%252FTJyUaLuND89KBdayMBDHQIgIIc4vGnvs9ShZcuLqhtKjnKNtuTPwc%252BN%252BJTFTiqeWIwq%252FwMIahAAGgw2Mzc0MjMxODM4MDUiDGFZVXDOBG%252B%252B8YximircAy0LcVUVRRDJgF26zeJtq%252F8wSu9g179D83rE8ymZ%252Fv0jYUOHLAORnGj4X7uygiAECcTelIaJiEX8fGsZPyovBr7oyi2LrT1kuRNJzu1bqshAZx5fGmBsE3WjLEl0AHplRMYjJnD07n%252FNky2fLUpb9N2cr%252BO5NB2K5IJAtC%252Ff5eWLaMCRbqBMiv9F3MQQHavfyXm8vgrRnrLSjvcPzVymFn%252Frnm0sDP64FiWsn10Mnw4mE9bFV4dbJu7QBFVbFxDfYhg2eXw0rVFTNZTXzLP7DI1rxNZ96iBVdbHcZior3kOXO0yoVbnwzBdP1rbefSqk8QRRUbqjZZrHKLooNc20BOvQmzDLh43LsyASD9%252FRbejW5uxtmyxvvnlwvIpiOxIfvF%252Bq4riWNVKwNcDVL4rXRqItHMlVoIP9hMvo1%252BKw5%252B8MW5CoYY174n15n61G05J7cuKGS5v%252FQBzrTfS8HxshIYQXb1EY%252FwEGT5USdlz8JOYraSoFCrfh%252FvhPyEgPrqxQZLe6P6Y0V%252Bg7LG6Z%252B5iQ%252FRmhnMTtiwBWizNaowngE%252F%252FC95VGEpkCZNdTstKqxkgfMwLPPwhJ509niUHMLU2IlTdA9bPqm6ROwSK1%252FZiarQ6yDSVF0kvkKQMNQYgHMOKbyr0GOqUBjHQInTY0nTLsbxZXtFuFHkCZ%252BGgOQGb1u%252BQROqRe17k4Bi5WAtqE3APGf1MaFryLTcVFMAETrW5WuXFbj%252BR0lzRevneWXhEQzz3DoWu3ihRiQIsjm12tp30Mou9pVd0g8y%252FAI%252BbnFZZmujoNjaJ8Anffm%252F3muUUuqicqBGbiglfyhUSKhT8rdksxNGLz01un15BDfVpoMGEytSlZvKxB6KQHNUn0%26X-Amz-Signature%3Dd8454adb4277cfded920a9a20f267fb1b48eccadf7a65e955ac545a0d0d41fdc%26X-Amz-SignedHeaders%3Dhost%26x-id%3DGetObject&w=2048&q=75)
DeepSeek v3 and R1 Model Architecture: Why it's powerful and economical
This article introduces the architecture of DeepSeek v3 and DeepSeek R1, which shares exactly the same architecture. Explains DeepSeek MoE (Mixture of Experts) and FP8 pre-training in depth.Fireworks boosted DeepSeek v3 and R1 inference efficiency with Multi-token prediction.