Announcing Black Forest Labs

Aug 1, 2024

—

Today, we are excited to announce the launch of Black Forest Labs. Deeply rooted in the generative AI research community, our mission is to develop and advance state-of-the-art generative deep learning models for media such as images and videos, and to push the boundaries of creativity, efficiency and diversity. We believe that generative AI will be a fundamental building block of all future technologies. By making our models available to a wide audience, we want to bring its benefits to everyone, educate the public and enhance trust in the safety of these models. We are determined to build the industry standard for generative media. Today, as the first step towards this goal, we release the FLUX.1 suite of models that push the frontiers of text-to-image synthesis.

The Black Forest Team

We are a team of distinguished AI researchers and engineers with an outstanding track record in developing foundational generative AI models in academic, industrial, and open-source environments. Our innovations include creating VQGAN and Latent Diffusion, The Stable Diffusion models for image and video generation (Stable Diffusion XL, Stable Video Diffusion, Rectified Flow Transformers), and Adversarial Diffusion Distillation for ultra-fast, real-time image synthesis.

Our core belief is that widely accessible models not only foster innovation and collaboration within the research community and academia, but also increase transparency, which is essential for trust and broad adoption. Our team strives to develop the highest quality technology and to make it accessible to the broadest audience possible.

Funding

We are excited to announce the successful closing of our Series Seed funding round of $31 million. This round was led by our main investor, Andreessen Horowitz, including notable participation from angel investors Brendan Iribe, Michael Ovitz, Garry Tan, Timo Aila and Vladlen Koltun and other renowned experts in AI research and company building. We have received follow-up investments from General Catalyst and MätchVC to support us on our mission to bring state-of-the-art AI from Europe to everyone around the world.

Furthermore, we are pleased to announce our advisory board, including Michael Ovitz, bringing extensive experience in the content creation industry, and Prof. Matthias Bethge, pioneer of neural style transfer and leading expert in open European AI research.

Flux.1 Model Family

We release the FLUX.1 suite of text-to-image models that define a new state-of-the-art in image detail, prompt adherence, style diversity and scene complexity for text-to-image synthesis.

To strike a balance between accessibility and model capabilities, FLUX.1 comes in three variants: FLUX.1 [pro], FLUX.1 [dev] and FLUX.1 [schnell]:

FLUX.1 [pro]: The best of FLUX.1, offering state-of-the-art performance image generation with top of the line prompt following, visual quality, image detail and output diversity. Sign up for FLUX.1 [pro] access via our API here. FLUX.1 [pro] is also available via Replicate and fal.ai. Moreover we offer dedicated and customized enterprise solutions – reach out via [email protected] to get in touch.
FLUX.1 [dev]: FLUX.1 [dev] is an open-weight, guidance-distilled model for non-commercial applications. Directly distilled from FLUX.1 [pro], FLUX.1 [dev] obtains similar quality and prompt adherence capabilities, while being more efficient than a standard model of the same size. FLUX.1 [dev] weights are available on HuggingFace and can be directly tried out on Replicate or Fal.ai. For applications in commercial contexts, get in touch out via [email protected].
FLUX.1 [schnell]: our fastest model is tailored for local development and personal use. FLUX.1 [schnell] is openly available under an Apache2.0 license. Similar, FLUX.1 [dev], weights are available on Hugging Face and inference code can be found on GitHub and in HuggingFace’s Diffusers. Moreover we’re happy to have day-1 integration for ComfyUI.

Transformer-powered Flow Models at Scale

All public FLUX.1 models are based on a hybrid architecture of multimodal and parallel diffusion transformer blocks and scaled to 12B parameters. We improve over previous state-of-the-art diffusion models by building on flow matching, a general and conceptually simple method for training generative models, which includes diffusion as a special case. In addition, we increase model performance and improve hardware efficiency by incorporating rotary positional embeddings and parallel attention layers. We will publish a more detailed tech report in the near future.

A new Benchmark for Image Synthesis

FLUX.1 defines the new state-of-the-art in image synthesis. Our models set new standards in their respective model class. FLUX.1 [pro] and [dev] surpass popular models like Midjourney v6.0, DALL·E 3 (HD) and SD3-Ultra in each of the following aspects: Visual Quality, Prompt Following, Size/Aspect Variability, Typography and Output Diversity. FLUX.1 [schnell] is the most advanced few-step model to date, outperforming not even its in-class competitors but also strong non-distilled models like Midjourney v6.0 and DALL·E 3 (HD) . Our models are specifically finetuned to preserve the entire output diversity from pretraining. Compared to the current state-of-the-art they offer drastically improved possibilities as shown below

All FLUX.1 model variants support a diverse range of aspect ratios and resolutions in 0.1 and 2.0 megapixels, as shown in the following example.

Up Next: SOTA Text-to-Video for All

Today we release the FLUX.1 text-to-image model suite. With their strong creative capabilities, these models serve as a powerful foundation for our upcoming suite of competitive generative text-to-video systems. Our video models will unlock precise creation and editing at high definition and unprecedented speed. We are committed to continue pioneering the future of generative media.

Join Us!

We are hiring exceptionally strong machine learning and backend engineers. If you are interested in joining our team, reach out to [email protected].