Natural Language Processing
Chapter 4. Semantic Analysis
Q1) What is Word-Sense Disambiguation (WSD)? Explain the dictionary-based approach to
Word Sense Disambiguation.
OR
Explain dictionary-based approach (Lesk algorithm) for word sense disambiguation (WSD)
with suitable example
o Word Sense Disambiguation (WSD) is the process of determining which meaning (sense)
of a word is used in a given context. It helps machines understand the correct meaning
of ambiguous words in natural language processing.
o WSD is essential in tasks like machine translation, information retrieval, and question
answering. Without disambiguation, machines might misinterpret sentences and
produce incorrect results.
o For example, the word “bank” can mean a financial institution or the side of a river; WSD
helps identify the correct sense based on context. In “He sat by the bank of the river,”
WSD chooses the geographical meaning.
o Dictionary-based approaches use lexical resources like WordNet to identify word senses.
These dictionaries list all possible meanings of a word along with definitions and usage.
o The Lesk algorithm is a popular dictionary-based method for WSD. It disambiguates a
word by comparing the dictionary definitions (glosses) of each sense with the context in
which the word appears.
o In the Lesk algorithm, the sense with the most overlapping words between its gloss and
the surrounding context is selected.
For example, if "bat" appears near "cricket," the algorithm favors the sports-related
sense.
o Dictionary-based approaches require extensive lexical databases and perform best when
context words also appear in glosses. Their effectiveness depends heavily on the quality
and coverage of the dictionary.
o These methods are language-independent if appropriate dictionaries are available for
the language. They can be adapted for English, Hindi, Marathi, etc., using corresponding
lexical databases.
o One limitation of dictionary-based WSD is that glosses might not always overlap
significantly with the context. This may reduce accuracy, especially in complex or
technical text.
o Despite limitations, dictionary-based WSD is useful in low-resource settings where
machine learning models may not be feasible. It provides a rule-based alternative that
doesn’t rely on large training datasets.
Q2) Explain with suitable example the following relationship between word meaning:
Hyponymy, Homonymy, Polysemy, Synonymy, Antonymy, Hypernymy, Meronymy
1. Hyponymy: A hyponym is a word whose meaning is a more specific instance of a more
general term (called a hypernym).
Example: "Rose" is a hyponym of "flower"
2. Homonymy: A Homonyms are words that are spelled and/or pronounced the same but
have different meanings.
Example: "Bat" (an animal) and "bat" (used in cricket) are homonyms.
They sound the same and are spelled the same but mean different things.
3. Polysemy: A polysemous word has multiple related meanings (unlike homonyms, which
have unrelated meanings).
Example: "Paper" can mean a material to write on OR a written article in a journal.
Both meanings are conceptually related.
4. Synonymy: A Synonyms are words that have similar or identical meanings in some or all
contexts.
Example: "Begin" and "start" are synonyms.
5. Antonymy: The Antonyms are words that have opposite meanings.
Example: "Hot" and "cold", "fast" and "slow".
They represent contrasting properties or directions.
6. Hypernymy: A hypernym is a general category word that includes more specific hyponyms
under it.
Example: "Vehicle" is a hypernym of "car", "bike", and "truck".
7. Meronymy: The Meronyms refer to a part-whole relationship between words.
Example: "Wheel" is a meronym of "car".
Q3) Explain Yarowsky bootstrapping approach of semi supervised learning
o The Yarowsky algorithm is a semi-supervised learning method that uses a small set of
labelled examples and a large amount of unlabelled data to iteratively build a classifier.
o It is especially effective for word sense disambiguation (WSD) and other tasks where
contextual patterns are important.
o The algorithm starts with a few manually labelled examples (called seed words) for each
class or sense.
o It extracts contextual features (like nearby words, parts of speech, etc.) around the seed
examples to learn decision rules.
o The system builds if-then rules from the seed examples using a decision list model (e.g.,
"If word X appears near, then it’s likely sense A").
o These learned rules are then used to label more examples in the unlabelled data pool.
o Newly labelled examples are added to the training set, and the process repeats – refining
the rules each time (hence called bootstrapping).
o It assumes that a word is likely to have the same meaning throughout a single document,
helping improve consistency in labeling.
o It also assumes a word has the same meaning when it appears with the same
neighboring words (collocations).
o This approach is powerful because it can achieve high accuracy using very little labeled
data, making it ideal for scenarios where annotations are costly.
Q4) Describe the semantic analysis in NLP
o The Semantic analysis in NLP refers to the process of understanding the meaning and
interpretation of words, phrases, and sentences in a given context.
o It focuses on identifying the contextual meaning of words, resolving ambiguity, and
ensuring that language is understood the way humans do.
o The Techniques such as word sense disambiguation (WSD), named entity recognition
(NER), and semantic role labelling are commonly applied in semantic analysis.
o It includes lexical semantics, which deals with the meanings and relationships between
words, such as synonyms, antonyms, and hyponyms.
o The Semantic analysis is essential for various NLP applications such as machine
translation, question answering, chatbots, and information retrieval, where accurate
meaning is crucial.
Q6) How can supervised learning be applied for WSD?
o The Word Sense Disambiguation (WSD) is the process of identifying the correct sense or
meaning of a word in a given context from multiple possible senses.
o In supervised learning, a model is trained using a labelled dataset where the correct
sense of each ambiguous word is already annotated.
o Supervised WSD requires a large, manually annotated corpus where each occurrence of
an ambiguous word is tagged with its correct sense from a predefined sense inventory
like WordNet.
o Features such as surrounding words (context), part-of-speech tags, syntactic
dependencies, and word position are extracted to help identify the correct word sense.
o A sense inventory, like WordNet, provides the list of possible senses for ambiguous
words and acts as the label set for training the model.
o Machine learning classifiers such as Naive Bayes, Decision Trees, Support Vector
Machines, or Neural Networks are trained using the labelled data to learn patterns
associated with each sense.
o A context window (e.g., ±2 words) around the ambiguous word is used during training to
understand the word's usage pattern in different sentences.
o The trained model is tested on unseen labelled data to evaluate its accuracy in predicting
the correct sense based on context.
o The main limitation of supervised WSD is the scarcity of large-scale sense-annotated
corpora for many languages and domains.
o The Supervised WSD improves the performance of tasks such as machine translation,
semantic search, question answering, and information retrieval by correctly interpreting
word meanings.
Q7) Demonstrate lexical semantic analysis using an example
o The Lexical semantic analysis is the process of analyzing and understanding the meaning
of individual words and the relationships among them in a given context.
o The primary goal is to determine how word meanings contribute to the meaning of a
sentence or phrase.
o The Words often have multiple meanings (polysemy), and lexical semantic analysis
identifies the most appropriate meaning based on the surrounding context.
o Example Sentence: Consider the sentence: "She deposited money in the bank." Here, the
word "bank" is ambiguous.
o The Lexical semantic analysis identifies that "bank" in this context refers to a financial
institution, not a river bank, based on the presence of words like "deposited" and
"money".
o The Lexical semantics also examines word relationships like synonyms (money ↔
currency), hyponyms (bank ↔ financial institution), and antonyms (deposit ↔
withdraw).
o The Tools like WordNet help map words to their meanings, synonyms, and semantic
relations to assist in analysis.
o The Lexical semantic analysis also relies on knowing a word’s part of speech—e.g., "run"
as a verb vs. "run" as a noun—since meanings change accordingly.
o This analysis is used in NLP applications like machine translation, sentiment analysis,
information retrieval, and question answering.
o Without lexical semantic analysis, NLP systems would misunderstand word meanings,
leading to incorrect outputs and interpretations.