Large Language Model
Large Language Model
Large Language Model
Since 2004
● Introduction to LM
● Large Language Models and applications
UET-FIT 2
Language Modeling (Mô hình ngôn ngữ)?
● What is the probability of “Tôi trình bày ChatGPT tại Trường ĐH Công
Nghệ” ?
● What is the probability of “Công Nghệ học Đại trình bày ChatGPT tại Tôi” ?
● “Tôi trình bày ChatGPT tại Trường ĐH Công nghệ, địa điểm …”) or
P(…/Tôi trình bày ChatGPT tại Trường ĐH Công nghệ, địa điểm) ?
● A model that computes either of these:
W = w1,w2,w3,w4,w5…wn
P(W) or P(wn|w1,w2…wn-1) is called a language model
3
Large Language Model
4
Large Language Model (Hundreds of Billions of
Tokens)
5
6
Large Language Models - yottaFlops of Compute
Source: https://web.stanford.edu/class/cs224n/slides/cs224n-2023-lecture11-prompting-rlhf.pdf 7
Why LLMs?
● Double Descent
8
Why LLMs?
9
Why LLMs?
● Generalization
○ We can now use one single model to solve many NLP tasks
10
Why LLMs? Emergence in few-shot prompting
Emergent Abilities
• Some ability of LM is
not present in
smaller models but
is present in larger
models
Emergent Capability - In-Context Learning
12
Emergent Capability - In-Context Learning
13
What is pre-training / fine-tuning?
14
Pretraining + Prompting Paradigm
15
Prompting Engineering (2020 now)
● Prompts involve instructions and context passed to a language model to
achieve a desired task
Prompt engineering is
the practice of
developing and
optimizing prompts to
efficiently use language
models (LMs) for a
variety of applications
16
Prompt Engineering Techniques
● Many advanced prompting techniques have been designed to
improve performance on complex tasks •
○ Few-shot prompts
○ Chain-of-thought (CoT) prompting
○ Self-Consistency
○ Knowledge Generation Prompting
○ ReAct
17
Temperature and Top-p Sampling in LLMs
● Temperature and Top-p sampling are two essential parameters that can be
tweaked to control the output of LLMs
● Temperature (0-2): This parameter determines the creativity and diversity of the text
generated by LLMs model. A higher temperature value (e.g., 1.5) leads to more
diverse and creative text, while a lower value (e.g., 0.5) results in more focused and
deterministic text.
● Top-p Sampling (0-1): This parameter maintains a balance between diversity and
high-probability words by selecting tokens from the top-p most probable tokens
whose collective probability mass is greater than or equal to a threshold p.
18
Three major forms of pre-training (LLMs)
19
BERT: Bidirectional Encoder Representations from
Transformers
Source: (Devlin et al, 2019): BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 20
Masked Language Modeling (MLM)
● Solution: Mask out k% of the input words, and then predict the masked words
21
Next Sentence Prediction (NSP)
22
BERT pre-training
23
RoBERTa
● BERT is still under-trained
● Removed the next sentence prediction pre-training — it adds more noise than
benefits!
● Trained longer with 10x data & bigger batch sizes
● Pre-trained on 1,024 V100 GPUs for one day in 2019
24
(Liu et al., 2019): RoBERTa: A Robustly Optimized BERT Pretraining Approach
Text-to-text models: the best of both worlds (Bard)?
● Encoder-only models (e.g., BERT) enjoy the benefits of bidirectionality but they can’t be
used to generate text
● Decoder-only models (e.g., GPT3, Lamma2) can do generation but they are left-to-right
LMs..
● Text-to-text models combine the best of both worlds!
(Raffel et al., 2020): Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer 25
How to use these pre-trained models?
26
From GPT to GPT-2 to GPT-3
27
Quiz
● Context size?
● The larger the size context, the more difficult it is?
28
GPT-3: language models are few-shot learners
29
GPT-3’s in-context learning
30
[2020] GPT-3 to [2022] ChatGPT
What’s new?
● Training on code
● Supervised
instruction tuning
● RLHF =
Reinforcement
learning from
human feedback
Source: Fu, 2022, “How does GPT Obtain its Ability? Tracing Emergent Abilities of Language
Models to their Sources" 31
How was ChatGPT developed?
32
Evaluation of LLMs
33
LLMs newest
34
ChatGPT application for reading comprehension (ChatPdf)
35
Large Language models Risks
36
Summary
● Introduction to LLM
● Large Language models (types)
UET-FIT 37
UET
Since 2004
Thank you
Email me
[email protected]