Following this book to teach myself about the transformer architecture in depth.
Some excellent resources I've come across along the way:
- Illustrated Guide to Transformers Neural Network: A step by step explanation - by Michael Phi (@LearnedVector)
- Let's build GPT: from scratch, in code, spelled out. - by the legendary Andrej Karpathy (@karpathy)
- Transformers from Scratch - by Peter Bloem (@pbloem)
- Lil'Log > The Transformer Family Version 2.0 - by Lilian Weng (@lilianweng)
- The Illustrated Transformer - by Jay Alammar (@jalammar)
- Transformer Architecture: The Positional Encoding - by Amirhossein Kazemnejad (@kazemnejad)
- Dive into Deep Learning > Attention Mechanisms and Transformers
- Harvard NLP > The Annotated Transformer
- Towards Data Science > Transformers Explained Visually: Part 1, Part 2, Part 3 and Part 4 - by Ketan Doshi
- Lecture 12 of the "Deep Learning at the Vrije Universiteit Amsterdam" (DLVU) Series - by Peter Bloem (@pbloem)
- Natural Language Processing in Action Using Transformers in TensorFlow 2.0 - by Aurélien Geron (@ageron)
- TensorFlow Tutorials > Neural machine translation with a Transformer and Keras