NLP END SEM QUESTION BANK
UNIT NO: 4
Q. No Question Marks
1 Write a Short note on WordNet. 6
ANS
2 What is Word Sense Disambiguation? Explain the use of word sense Disambiguation in NLP 6
application.
ANS Word Sense Disambiguation:
Word sense disambiguation, in natural language processing (NLP), may be defined as the
ability to determine which meaning of word is activated by the use of word in a particular
context.
Lexical ambiguity, syntactic or semantic, is one of the very first problem that any NLP
system faces. Part-of-speech (POS) taggers with high level of accuracy can solve Word’s
syntactic ambiguity.
The problem of resolving semantic ambiguity is called WSD (word sense disambiguation).
Resolving semantic ambiguity is harder than resolving syntactic ambiguity.
Example: Consider the two examples of the distinct sense that exist for the word “bass”
I can hear bass sound.
1. He likes to eat grilled bass.
2. The occurrence of the word bass clearly denotes the distinct meaning.
In first sentence, it means frequency and in second, it means fish.
Hence, if it would be disambiguated by WSD then the correct meaning to the above
sentences can be assigned as follows:
1. I can hear bass/frequency sound.
2. He likes to eat grilled bass/fish.
Difficulties in Word Sense Disambiguation:
1) Different Text-Corpus or Dictionary:
One issue with word sense disambiguation is determining what the senses are because
different dictionaries and thesauruses divide words into distinct senses.
Some academics have proposed employing a specific lexicon and its set of senses to
address this problem.
In general, however, research findings based on broad sense distinctions have
outperformed those based on limited ones. The majority of researchers are still working
on fine-grained WSD.
2) PoS Tagging:
Part-of-speech tagging and sense tagging have been shown to be very tightly coupled in
any real test, with each potentially constraining the other.
Both disambiguating and tagging with words are involved in WSM part-of-speech
tagging.
However, algorithms designed for one do not always work well for the other, owing to
the fact that a word’s part of speech is mostly decided by the one to three words
immediately adjacent to it, whereas a word’s sense can be determined by words further
away.
Evaluation of WSD:
1) Dictionary: The very first input for evaluation of WSD is dictionary, which is used to
specify the senses to be disambiguated.
2) Test Corpus:
NLP END SEM QUESTION BANK
Another input required by WSD is the high-annotated test corpus that has the target or
correct-senses.
The test corpora can be of two types &minsu:
1. Lexical sample: This kind of corpora is used in the system, where it is required to
disambiguate a small sample of words.
2. All-words: This kind of corpora is used in the system, where it is expected to
disambiguate all the words in a piece of running text.
Approaches and Methods to Word Sense Disambiguation:
1) Dictionary-based or Knowledge-based:
These methods primarily rely on dictionaries, treasures and lexical knowledge base.
They do not use corpora evidences for disambiguation.
The Lesk method is the seminal dictionary-based method introduced by Michael Lesk
in 1986.
The Lesk definition, on which the Lesk algorithm is based is “measure overlap between
sense definitions for all words in context”.
In 2000, Kilgarriff and Rosensweig gave the simplified Lesk definition as “measure
overlap between sense definitions of word and current context”, which further means
identify the correct sense for one word at a time.
Here the current context is the set of words in surrounding sentence or paragraph.
2) Supervised:
The assumption behind supervised approaches is that the context can supply enough
evidence to disambiguate words on its own.
Supervised methods for Word Sense Disambiguation (WSD) involve training a model
using a labeled dataset of word senses.
The model is then used to disambiguate the sense of a target word in new text.
1. Decision list :A decision list is a set of rules that are used to assign a sense to a target
word based on the context in which it appears.
2. Neural Network: Neural networks such as feedforward networks, recurrent neural
networks, and transformer networks are used to model the context-sense relationship.
3. Support Vector Machines: SVM is a supervised machine learning algorithm used
for classification and regression analysis.
4. Naive Bayes: Naive Bayes is a probabilistic algorithm that uses Bayes’ theorem to
classify text into predefined categories.
5. Decision Trees: Decision Trees are a flowchart-like structure in which an internal
node represents feature(or attribute), the branch represents a decision rule, and each
leaf node represents the outcome.
3) Unsupervised:
The underlying assumption is that similar senses occur in similar contexts, and thus
senses can be induced from the text by clustering word occurrences using some measure
of similarity of context.
Using fixed-size dense vectors to represent words in context has become one of the most
fundamental blocks in several NLP systems.
Traditional word embedding approaches can still be utilized to improve WSD, despite
the fact that they conflate words with many meanings into a single vector representation.
Lexical databases (e.g., WordNet, ConceptNet, BabelNet) can also help unsupervised
systems map words and their senses as dictionaries, in addition to word embedding
techniques.
Applications of Word Sense Disambiguation:
NLP END SEM QUESTION BANK
1) Machine Translation:
Machine translation or MT is the most obvious application of WSD.
In MT, Lexical choice for the words that have distinct translations for different senses,
is done by WSD.
The senses in MT are represented as words in the target language.
Most of the machine translation systems do not use explicit WSD module.
2) Information Retrieval (IR):
Information retrieval (IR) may be defined as a software program that deals with the
organization, storage, retrieval and evaluation of information from document
repositories particularly textual information.
The system basically assists users in finding the information they required but it does
not explicitly return the answers of the questions.
WSD is used to resolve the ambiguities of the queries provided to IR system.
As like MT, current IR systems do not explicitly use WSD module and they rely on the
concept that user would type enough context in the query to only retrieve relevant
documents.
3) Text Mining and Information Extraction (IE):
In most of the applications, WSD is necessary to do accurate analysis of text.
For example, WSD helps intelligent gathering system to do flagging of the correct words.
For example, medical intelligent system might need flagging of “illegal drugs” rather
than “medical drugs”.
4) Lexicography:
WSD and lexicography can work together in loop because modern lexicography is
corpus based.
With lexicography, WSD provides rough empirical sense groupings as well as
statistically significant contextual indicators of sense.
3 What is a Metaphor in NLP? List its types and uses. 6
ANS Metaphor:
A metaphor is a figure of speech that describes an object or action in a way that isn’t literally
true, but helps explain an idea or make a comparison.
A metaphor states that one thing is another thing.
It equates those two things not because they actually are the same, but for the sake of
comparison or symbolism.
Metaphors are used commonly in NLP in order to enable the NLP trainer to make connection
with the client’s unconscious mind.
The best advantage of working with figurative expression is that it connects them with the
core pattern in the clients mind map behaviour, thus enabling the trainer to change rapidly
and reframe it.
Metaphor is the soul of language.
Every human being use 6 metaphors in a minute consciously or unconsciously.
These can be positive or negative.
The positive ones need not to be touched, but yes the negative ones needs to be re-framed
properly by the coach.
Milton Erickson used this technique to pace a person’s experience, distract their conscious
mind, and allow them to find resources or solutions on their own.
Strictly speaking, a metaphor, a simile and an analogy are different, in NLP they are used in
similar ways.
NLP END SEM QUESTION BANK
The purpose is to transfer meanings and understandings from one situation or thing to
another.
Types of Metaphor:
1) Shallow Metaphor:
A shallow metaphor makes simple comparisons and creates a better understanding like
a rat up a drainpipe.
They are very simple metaphor compared to simile.
Example:
Good metaphor: life is like a game of chess.
Bad metaphor: life is like a struggle.
2) Deep metaphor:
These comes from the unconscious level of mind.
It has stories with many different levels of meaning and is typically most useful when a
client is in trance in order to communicate with the unconscious mind of the client.
Example:
Good metaphor: a journey of a thousand miles begins with a single step.
Bad metaphor: when I think of xyz, I see a very big storm coming towards me.
3) Embedded metaphors:
Embedded metaphors are those where more than one metaphor are linked together (like
Ronnie Corbett frequently used in “The Two Ronnies”).
The idea is that the conscious mind of the client is confused as the stories appear to make
no sense, thus allowing the trainer to get access to unconscious resources and make
suggestions to improve learning or healing.
Tony Robbins has championed the use of life metaphors as a way to make far reaching
generalized changes in a person’s life.
Seeing relationships as a dance rather than a battle for instance.
Creating a parable for Change involves firstly mapping out the present situation or
problem in terms of relationships and strategies currently used with stand in characters
and situations to build a story.
Then the new strategies and resources are woven into the story to lead to the desired
outcome.
Sharing anecdotes or tales often works well because the listener instinctively relates to
the protagonist and can’t help but try on the situation and the solution as they listen.
Uses of Metaphor:
Identify the sequence of behavior and/or events in question
Assess the strategy of the client – the sequence of the representations creating the behavior.
Identify and determine the desired new outcomes and choices – present state to desired state.
Establish anchors for strategic elements involved in this current behavior and the desired
outcome.
Create or think of a logical, smooth story.
Choose an appropriate context for the story.
Displace referential indices.
Establish a relationship between the client’s situation and behavior, and the situation and
behaviors of the characters in the story.
Access and establish new choices and resources for the client in terms of the characters and
events in the story.
Use ambiguities, direct quotes and other language patterns.
Provide a resolution
NLP END SEM QUESTION BANK
Collapse the pre-established anchors and provide a future pace.
Can help clients to better understand something about the object or idea to which the
metaphor is applied.
In therapy.
Induces rapport.
Can make speaking, and writing livelier and more interesting.
Can communicate a great deal of meaning with just a word or phrase.
Can create a mind shift since they imply rather than directly state relationships, can get
clients to think about what they are hearing and take on new learnings.
4 With example explain creation of Synset in WordNet. 6
ANS
5 What is WordNet? Explain Word Sense Disambiguation in WordNet. 6
ANS
6 Explain the Semantic Roles Labeling used in the semantic analysis with grammatical cases. 6
What are the thematic roles associated with the Sentences:
John Broke the window with the hammer.
ANS Definition:
Semantic roles, also known as thematic roles, describe the relationship between a verb and
its arguments within a sentence.
These roles provide a deeper understanding of the sentence by indicating how each entity
(noun) is involved in the action described by the verb.
Semantic roles are labels that describe the relationship between a verb and its arguments,
indicating the roles that entities play in a sentence.
Semantic roles are crucial in NLP for understanding the meaning of sentences by identifying
the relationships between verbs and their arguments.
Semantic Roles in NLP:
1) Agent:
The entity that performs the action.
Example: John (agent) kicked the ball.
2) Patient:
The entity that is affected by the action.
Example: John kicked the ball (patient).
3) Instrument:
The entity used to perform the action.
Example: She cut the bread with a knife (instrument).
4) Experiencer:
The entity that experiences or perceives something.
Example: Mary (experiencer) heard a strange noise.
5) Theme:
The entity that is moved or the topic of the action.
Example: She gave the book (theme) to him.
6) Location:
The place where the action occurs.
Example: He stayed in the house (location).
7) Source:
The starting point of the action.
Example: She came from the village (source).
NLP END SEM QUESTION BANK
8) Goal:
The endpoint of the action.
Example: He walked to the park (goal).
Example:
Input: John Broke the window with the hammer.
Key Role:
1) Agent: John (the doer of the action)
2) Patient: the window (the object being acted upon)
3) Instrument: the hammer (the tool used for the action)
4) Theme: breaking action (the central action)
7 What is Word Sense Disambiguation? Illustrate with example how dictionary-based approach 6
identifies correct sense of an ambiguous word.
ANS
8 Explain the Selection Restriction. Analyze with examples how Selectional Restrictions can use to 6
solve following NLP Problem:
i. Semantic Role Assignments
ii. Syntactic Ambiguity
iii. Word Sense Disambiguation
ANS
9 Explain the terms with example: 6
i. Homonymy
ii. Polysemy
iii. Synonymy
iv. Hyponymy
ANS Elements of Semantic Analysis:
1) Hyponymy:
Hyponymy refers to a term that is an instance of a generic term. They can be understood
by taking class-object as an analogy.
Example: ‘Color‘ is a hypernymy while ‘grey‘, ‘blue‘, ‘red‘, etc., are its hyponyms.
2) Homonymy:
Homonymy refers to two or more lexical terms with the same spellings but completely
distinct in meaning.
Example: ‘Rose‘ might mean ‘the past form of rise‘ or ‘a flower‘, – same spelling but
different meanings; hence, ‘rose‘ is a homonymy.
3) Synonymy:
When two or more lexical terms that might be spelt distinctly have the same or similar
meaning, they are called Synonymy.
Example: (Job, Occupation), (Large, Big), (Stop, Halt).
4) Antonymy:
Antonymy refers to a pair of lexical terms that have contrasting meanings they are
symmetric to a semantic axis.
Example: (Day, Night), (Hot, Cold), (Large, Small).
5) Polysemy:
Polysemy refers to lexical terms that have the same spelling but multiple closely related
meanings.
It differs from homonymy because the meanings of the terms need not be closely related
in the case of homonymy.
NLP END SEM QUESTION BANK
Example: ‘man‘ may mean ‘the human species‘ or ‘a male human‘ or ‘an adult male
human‘ – since all these different meanings bear a close association, the lexical term
‘man‘ is a polysemy.
6) Meronomy:
Meronomy refers to a relationship wherein one lexical term is a constituent of some
larger entity.
Example: ‘Wheel‘ is a meronym of ‘Automobile‘
10 What is Text Classification? Explain How does Text classification work. 6
ANS
11 Explain Text Summarization and Multiple document text summarization with neat diagram. 6
ANS Text Summarization:
Text summarization is the process of generating short, fluent, and most importantly accurate
summary of a respectively longer text document.
The main idea behind automatic text summarization is to be able to find a short subset of the
most essential information from the entire set and present it in a human-readable format.
As online textual data grows, automatic text summarization methods have the potential to
be very helpful because more useful information can be read in a short time.
Automatic text summarization refers to a group of methods that employ algorithms to
compress a certain amount of text while preserving the text's key points.
Although it may not receive as much attention as other machine learning successes, this field
of computer automation has witnessed consistent advancement and improvement.
Therefore, systems capable of extracting the key concepts from the text while maintaining
the overall meaning have the potential to revolutionize a variety of industries, including
banking, law, and even healthcare.
Types of Text Summarization:
1) Extractive Summarization:
Extractive summarization algorithms are employed to generate a summary by selecting
and combining key passages from the source material.
Unlike humans, these models emphasize creating the most essential sentences from the
original text rather than generating new ones.
Extractive summarization utilizes the Text Rank algorithm, which is highly suitable for
text summarization tasks.
NLP END SEM QUESTION BANK
2) Abstractive Summarization:
Abstractive summarization techniques emulate human writing by generating entirely
new sentences to convey key concepts from the source text, rather than merely
rephrasing portions of it.
These fresh sentences distill the vital information while eliminating irrelevant details,
often incorporating novel vocabulary absent in the original text.
Advantages of Text Summarization:
1) Instantly effective:
It takes time and effort to read the entire article, deconstruct it, and separate the
significant concepts from the raw text.
It takes at least 15 minutes to read a 500-word article.
In a fraction of a second, automatic summary software summarizes texts of 500-5000
words.
This enables the user to read less data while still getting the most critical information
and drawing sound judgments.
2) It Functions in Any Language:
Many summarization software can work in any language, which is a capability that most
humans lack.
Because summarizers are based on linguistic models, they can automatically summaries
texts in a wide range of languages, from English to Russian.
As a result, they're great for persons who read and work with multilingual information.
3) Productivity is increased:
Not only do some software summaries documents, but they also summarise web pages.
This boosts productivity by accelerating the surfing process.
Instead of reading entire news stories that are full of irrelevant information, summaries
of such websites can be detailed and accurate while yet being only 20% of the original
article's size.
Multiple document text summarization:
12 Apply “Logistic Regression Model” to perform “Text Classification” 6
ANS Text Classification using Logistic Regression:
Text classification is the process of automatically assigning labels or categories to pieces of
text.
This has tons of applications, like sorting emails into spam or not-spam, figuring out if a
product review is positive or negative, or even identifying the topic of a news article.
How Logistic Regression Works for Text Classification:
Logistic Regression is a statistical method used for binary classification problems, and it can
also be extended to handle multi-class classification.
When applied to text classification, the goal is to predict the category or class of a given text
document based on its features.
1. Text Representation:
Before applying logistic regression, text data should be converted as numerical
features known as text vectorization.
Common techniques for text vectorization include Bag of Words (BoW), Term
Frequency-Inverse Document Frequency (TF-IDF), or more advanced methods like
NLP END SEM QUESTION BANK
word embeddings (Word2Vec, GloVe) or deep learning-based embeddings (BERT,
GPT).
2. Feature Extraction:
Once data is represented numerically, these representations can be used as features
for model.
Features could be the counts of words in BoW, the weighted values in TF-IDF, or
the numerical vectors in embeddings.
3. Logistic Regression Model:
Logistic Regression models the relationship between the features and the probability
of belonging to a particular class using the logistic function.
The logistic function (also called the sigmoid function) maps any real-valued number
into the range [0, 1], which is suitable for representing probabilities.
The logistic regression model calculates a weighted sum of the input features and
applies the logistic function to obtain the probability of belonging to the positive
class.
Logistic Regression Text Classification with Scikit-Learn:
We'll use the popular SMS Collection Dataset, consists of a collection of SMS (Short
Message Service) messages, which are labeled as either "ham" (non-spam) or "spam" based
on their content.
The implementation is designed to classify text messages into two categories:
spam (unwanted messages)
ham (legitimate messages)
The process is broken down into several key steps:
1. Import Libraries:
The first step involves importing necessary libraries.
Pandas is used for data manipulation.
CountVectorizer for converting text data into a numeric format.
Various functions from sklearn.model_selection and sklearn.linear_model for
creating and training the model.
Functions from [Link] to evaluate the model's performance.
2. Load and Prepare the Data:
Load the dataset from a CSV file, and rename columns for clarity.
latin-1 encoding is specified to handle any non-ASCII characters that may be present
in the file.
Map labels from text to numeric values (0 for ham, 1 for spam), making it suitable
for model training.
3. Text Vectorization: Convert text data into a numeric format using CountVectorizer,
which transforms the text into a sparse matrix of token counts.
4. Split Data into Training and Testing Sets: Divide the dataset into training and testing
sets to evaluate the model's performance on unseen data.
5. Train the Logistic Regression Model: Create and train the logistic regression model
using the training set.