0% found this document useful (0 votes)

73 views66 pages

Unit 3-1

Semantic parsing is a technique that transforms natural language into formal representations, enabling machines to understand human language for applications like question answering and dialogue systems. It involves processes such as tokenization, syntactic parsing, and semantic mapping, while addressing challenges like word sense disambiguation and structural ambiguity. Various system paradigms for semantic parsing include knowledge-based, unsupervised, supervised, and semi-supervised approaches, each with distinct methods for handling meaning representation and context.

Uploaded by

meghanayakkala597

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views66 pages

Unit 3-1

Uploaded by

meghanayakkala597

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Unit-3:Semantic Parsing

Semantic Parsing: Introduction, Semantic Interpretation,

System Paradigms, Word Sense.

Introduction
• Semantic parsing is a technique that converts natural language into
a formal representation of its meaning, such as logical forms or
structured queries.
• It enables machines to understand and process human language in
a way that supports applications like question answering, dialogue
systems, and information retrieval.
• it has exactly two types of meaning representations: a domain-
dependent, deeper representation and a set of relatively shallow
but general-purpose, low-level, and intermediate representations
• Semantic parsing typically involves:
• Tokenization – Breaking text into words.
• Syntactic Parsing – Analyzing sentence structure.
• Semantic Mapping – Translating words into formal representations.
Semantic Interpretation

• Semantic analysis analyzes the grammatical format of

sentences, including the arrangement of words, phrases, and
clauses, to determine relationships between independent terms
in a specific context.
A semantic theory should be able to:
1. Explain sentences having ambiguous meanings.
• For example, it should account for the fact that the word bill in the
sentence The bill is large is ambiguous in the sense that it could
represent money or the beak of a bird.
2. Resolve the ambiguities of words in context. For example, if
the same sentence is extended to form The bill is large but need
not be paid, then the theory should be able to disambiguate the
monetary meaning of bill.
3. Identify meaningless but syntactically well-formed sentences,

such : Colorless green ideas sleep furiously.

4. Identify syntactically unrelated paraphrases of a concept having the

same semantic content.

EX::It's raining heavily.“
There is intense rainfall.
Structural Ambiguity
• This is a sentence-level phenomenon and essentially means
transforming a sentence into its underlying syntactic
representation.
Word Sense
• In any given language, it is almost certainly that the same word
type, is used in different contexts and with different morphological
variants to represent different concepts in the world.
• For example, we use the word nail to represent a part of the human
anatomy and also to represent the metallic object used to secure
other objects.
• Consider four examples, the presence of words such
as hammer and hardware store in sentences 1 and 2, and
of clipped and manicure in sentences 3 and 4, enable humans to
easily disambiguate the sense in which nail is used:
1. He nailed the loose arm of the chair with a hammer.
2. He bought a box of nails from the hardware store.
3. He went to the beauty salon to get his nails clipped.
4. He went to get a manicure. His nails had grown very long.
• The next component of semantic interpretation is the identification
of entities that are spread across different phrases.
• Identifying the type of entity or event is critical semantic
representation.
• Two predominant tasks have become popular over the
years: named entity recognition and coreference resolution.
• Named entity recognition is a NLP technique that can scan entire articles and

identify some fundamental entities in a text and classify them into predefined

categories. Entities may be,

• Organizations,

• Monetary values,

• People’s names

• Company names

• Geographic locations
• Reference, in NLP, is a process where one word in a sentence
may refer to another word.
• The task of resolving such references is known as Reference
Resolution.
• In the above example, “She” and “Her” referring to the
entity “Ana” and “the institute” referring to the entity “UT
Dallas” are two examples of Reference Resolution.
Predicate-Argument Structure
• Once we have the word senses, entities, and events identified,
another level of semantic structure is identifying the participants
of the entities in these events.
• Generally, this process can be defined as the identification
of who did what to whom, when, where, why, and how.
A representation
of who did what to whom, when, where, why, and how
Meaning Representation
• The final process of the semantic interpretation is to build a
meaning representation that can be used by algorithms to various
application .
• This process is sometimes called the deep representation.

Example:: Which river is the longest?

Answer (x1, longest(x1 river(x1)))

System Paradigms

The approaches generally fall into the following three categories.

1. System Architectures
a. Knowledge based: As the name suggests, these systems use a
predefined set of rules to obtain a solution to a new problem.
b. Unsupervised: These systems tend to require minimal human
intervention to be functional by using existing resources that can be
used for a particular application.
c. Supervised:
• These systems involve the manual annotation of some phenomena
so that machine learning algorithms can be applied.
• Typically, researchers create feature functions that allow each
problem instance to be projected into a space of features.
• A model is trained to use these features to predict labels, and then
it is applied to test data.
d. Semi-Supervised:
• Manual annotation is usually very expensive and does not yield
enough data to completely capture a phenomenon.
• Semi-supervised learning is a machine learning technique that uses
both labeled and unlabeled data to train models.
• It's a combination of supervised and unsupervised learning.
2. Scope
a. Domain Dependent: These systems are specific to certain
domains, such as air travel reservations or simulated football
coaching.
b. Domain Independent: These systems are general enough that
the techniques can be applicable to multiple domains without
little or no change.
3. Coverage
a. Shallow: These systems tend to produce an intermediate
representation that can then be converted to one that a machine can
base its actions on.
b. Deep: These systems usually create a terminal representation that is
directly consumed by a machine or application.
Word Sense
• In language, a word is used in more than one way,understanding the
various usage patterns in language is important for various NLP
Applications.
• In various usage situations, the same word can mean differently.
• Word Sense Disambiguation (WSD) is the process of determining
the correct meaning of a word in context when the word has
multiple meanings.
• Attempts to solve this problem range from rule based to
completely unsupervised, supervised, and semi-supervised
learning methods.
• These rule based methods rely on text data like dictionaries.
• Supervised methods use corpora, to train machine learning
models.
• But, a problem that may arise is that such corpora are very tough
and time-consuming to create.
• Due to the lack of such corpus, most word sense
disambiguation algorithms use semi-supervised methods.
• The process starts with a small amount of data, which is often
manually created.
• Word sense ambiguities can be of 3 types: (i) homonymy, (ii)
polysemy, and (iii) categorial ambiguity
• Homonymy indicates that the words share the same spelling, but
the meanings are quite different. For example, the word “Bat”
• Polysemy, Polysemy refers to a single word having multiple
related [Link] Bank.
• Categorial ambiguity, Categorial ambiguity (or part-of-speech
(POS) ambiguity) occurs when a word can belong to multiple
grammatical categories.
Example
• "Book"
– Noun: I read a book yesterday.
– Verb: Can you book a hotel room for me?
Resources
• Resources is a key factor in the disambiguation of word senses in
corpora.
• A corpus is a large and structured set of machine-readable texts.
• Its plural is corpora. They can be derived in different ways like
text that was originally electronic, transcripts of spoken language.
• Early work on word sense disambiguation used machine-readable
dictionaries as knowledge sources.
• Two prominent sources were the Longman Dictionary of
Contemporary English (LDOCE) and Roget’s Thesaurus
• The late 1980s gave birth to a significant lexicographical resource,
WordNet.

• More recently, WordNet has been extended by adding syntactic

information on the glosses(Context), disambiguating them for

better incorporation in applications.

For example, in WordNet:
• "bank" (financial institution) → "A financial institution that accepts
deposits and channels the money into lending activities."
Systems
• researchers have explored various system architectures to address
the word sense disambiguation problem.
• We can classify these systems into four main categories: (i) rule
based or knowledge based, (ii) supervised, (iii) unsupervised, and
(iv) semisupervised.
Rule Based
• The first generation of word sense disambiguation systems was
primarily based on dictionary sense definitions and glosses.
• Probably the simplest and oldest dictionary-based sense
disambiguation algorithm was introduced by Lesk.
• The Lesk Algorithm (LA) disambiguates by calculating the overlap
of a set of dictionary definitions (senses) and the context words.
• For example, for the word "bank", WordNet provides multiple
glosses:
• bank (financial institution) → "A financial institution that accepts
deposits and channels the money into lending activities."
• bank (riverbank) → "The slope beside a body of water."
Algorithm. Pseudocode of the simplified Lesk algorithm
• The function COMPUTEOVERLAP returns the number of words
common to the two sets.
• Another dictionary-based algorithm used Roget’s
Thesaurus categories and classified unseen words into one of
these 1,042 categories.
The method consists of three steps.
• The first step is a collection of contexts.
• The second step computes weights for each of the words.
• P (w|RCat) is the probability of a word w occurring in the context
of a Roget’s Thesaurus category RCat.
• Finally, in the third step, the unseen words in the test set are
classified into the category that has the maximum weight.
• Knowledge-based algorithm that uses graphical representation of
senses of words in context is SSI.
• Structural Semantic Interconnections(SSI) refer to the
relationships between words, meanings within a semantic
network.
• These interconnections help define how meanings are related
based on links.
• It uses various sources of information, including WordNet, and all
possible corpora to form semantic graphs.
• The algorithm consists of two steps: an initialization step and an
iterative step, in which the algorithm attempts to disambiguate all
the words in context iteratively until it cannot disambiguate any
further or until all the terms are successfully disambiguated.
• Example::shows the semantic graphs for two senses of the
term bus. The first one is the vehicle sense, and the second one is
the connector sense.
The graphs for sense 1 and 2 of the noun bus as
generated by the SSI algorithm
Supervised
• These systems uses machine learning classifier trained on various
features extracted for words that have been manually
disambiguated in a given corpus.
• A good feature of these systems is that the user can use rules and
knowledge in the form of features.
• Classifier Probably the most common and high-performing
classifiers are support vector machines (SVMs) and maximum
entropy (MaxEnt) classifiers.
• Many good-quality, freely available distributions of each are
available and can be used to train word sense disambiguation
models.
Features
• Lexical context—The lexical context feature in NLP refers to the
surrounding words and phrases that help determine the meaning,
interpretation, or usage of a particular word in a sentence.
• Parts of speech—This feature comprises the POS information for
words in the window
• Bag of words context—Bag of Words (BoW) context feature refers
to how the BoW model represents textual data by focusing on word
occurrences.
• Local collocations—A local collocation in NLP refers to a sequence
of words that frequently appear together within a short window in
text, forming meaningful units.
• For example, if the target word is w, then Ci,j would be a collocation
where i and j refer to the start and offsets with respect to the word w.
• A positive sign indicates words on the right, and a negative sign
indicates words on the left of the target.
He bought a box of nails from the hardware store.

• In this example, the collocation C1,1 would be the word from,

and C1,3 would be the string from the hardware, and so on.
• Syntactic -- These features capture the structural patterns in text.
• Topic features—The broad topic, or domain, of the article
• Voice of the sentence—This feature indicates whether the sentence
in which the word occurs is a passive, or active sentence.
• Presence of subject/object—This binary feature indicates whether
the target word has a subject or object.
• Prepositional phrase adjunct—A prepositional phrase adjunct is
a prepositional phrase (PP) that provides additional (optional)
information about the verb, noun, or sentence.
Dependency Parsing
Algorithm:: Rules for selecting syntactic relations as features
Unsupervised

• Unsupervised learning in Natural Language Processing (NLP) deals

with extracting meaningful patterns, structures, and
representations from text without labeled data.

• It is widely used for clustering, dimensionality reduction, and

anomaly detection.
• There are a few solutions to this problem:

1. Devise a way to cluster instances of a word so that each cluster effectively

constrains the examples of the word to a certain sense. This could be
considered sense induction through clustering.

2. Use some metrics to identify the proximity of a given instance with some sets of
known senses of a word and select the closest to be the sense of that instance.

3. Start with seeds of examples of certain senses, then iteratively grow them to
form clusters.
Algorithms that use distance measure to identify senses.
• a new measure of semantic similarity: information content
• In NLP, Information Content (IC) is a measure of how specific or
informative a word or concept is in a given context.
• It is widely used in semantic similarity tasks, particularly in
WordNet-based similarity measures.
• Ex::"Animal" is a general term, so its IC is low.
• "Dog" is more specific, so it has a higher IC.
• CONCEPTUAL DENSITY ::Select a sense based on the relatedness of
that word-sense to the context.
• Relatedness is measured in terms of conceptual distance

• This approach uses a structured hierarchical semantic net (WordNet) for

finding the conceptual distance.
• It helps in word sense disambiguation, text summarization, and
knowledge representation.
• Example: In "I went to the bank to withdraw cash," the financial institution
sense of "bank" has higher conceptual density in the financial context.
Conceptual density

• The dots in the figure

represent the senses of
the words in context.

Sense 2 is the one with the highest conceptual density and is therefore the
chosen sense.
• For example, pigeon, crow, eagle are all hyponyms of
bird.
Semi supervised
• The next category of algorithms start from a small seed of
examples and an iterative algorithm that identifies more training
examples using a classifier.
• The automatically labeled data can be used to augment the training
data of the classifier to provide better predictions for the next
selection cycle, and so on.
Key Principles of the Yarowsky Algorithm
1. One Sense Per Collocation
• A word tends to have the same meaning in the same local context

2. One sense per discourse: A word typically retains the same

meaning within a single document or conversation..
[Link]
• The algorithm starts with a small set of labeled examples and then
iteratively expands its knowledge using unlabeled data.
How the Yarowsky Algorithm Works

Step 1: Initialization
Step 2: Identify Collocational & Contextual
Features
Step 3: Train a Classifier
Step 4: Label Unlabeled Data
Step 5: Iterative Refinement (Bootstrapping)
• Based on the assumption that these properties exist, the Yarowsky
algorithm iteratively disambiguates most of the words

The two stages of the Yarowsky algorithm

• Another variation of semi supervised systems is the use of
unsupervised methods for the creation of data combined with
supervised methods to learn models for that data.
• Synset(Synonym-set) is a special kind of a simple interface that is present in
NLTK to look up words in WordNet.
• Synset for "Happy"
Words in the Synset: happy, joyful, elated, glad.
Stop words are common words (such as "is," "the," "and," "in") that appear
frequently in language but do not carry significant meaning in text analysis.
Thank you

NLP - Mid 2 Examination
No ratings yet
NLP - Mid 2 Examination
4 pages
NLP Unit 3
No ratings yet
NLP Unit 3
20 pages
Unit-III PDF
No ratings yet
Unit-III PDF
72 pages
Understanding Semantic Parsing in NLP
No ratings yet
Understanding Semantic Parsing in NLP
27 pages
Unit 3 and 4 Notes
No ratings yet
Unit 3 and 4 Notes
27 pages
NLP - Unit 3 Part2
No ratings yet
NLP - Unit 3 Part2
12 pages
Lectures Unit3 - Semantic Parsing
No ratings yet
Lectures Unit3 - Semantic Parsing
19 pages
NLP Assign Mod-4,5,6 IramShaikh
No ratings yet
NLP Assign Mod-4,5,6 IramShaikh
10 pages
NLP JNTUH Unit 3
No ratings yet
NLP JNTUH Unit 3
19 pages
NLP Unit 3
No ratings yet
NLP Unit 3
83 pages
NLP M4 Part 1 SPP
No ratings yet
NLP M4 Part 1 SPP
57 pages
UNIT 4 New
No ratings yet
UNIT 4 New
14 pages
NLP Techniques: POS & Semantic Tagging
No ratings yet
NLP Techniques: POS & Semantic Tagging
30 pages
Semantic Parsing
No ratings yet
Semantic Parsing
79 pages
Unit 3, 4 Textbook
No ratings yet
Unit 3, 4 Textbook
83 pages
Unit 2
No ratings yet
Unit 2
8 pages
NLP - Mid 2
No ratings yet
NLP - Mid 2
3 pages
UNIT3NLP
No ratings yet
UNIT3NLP
40 pages
NLP UNIT III Notes
100% (5)
NLP UNIT III Notes
9 pages
NLP Unit3
No ratings yet
NLP Unit3
8 pages
Understanding Semantic Parsing in NLP
No ratings yet
Understanding Semantic Parsing in NLP
11 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
18 pages
SemanticsSpeechRecognitionUnderstanding PDF
No ratings yet
SemanticsSpeechRecognitionUnderstanding PDF
11 pages
Semantic Processing Overview
No ratings yet
Semantic Processing Overview
13 pages
Feature Eng
No ratings yet
Feature Eng
34 pages
Semantic Analysis in NLP Guide
No ratings yet
Semantic Analysis in NLP Guide
18 pages
NLP Chapter 4
No ratings yet
NLP Chapter 4
4 pages
MNLP - Unit-3
No ratings yet
MNLP - Unit-3
100 pages
Semantic Parsing for NLP Experts
No ratings yet
Semantic Parsing for NLP Experts
21 pages
NLP Unit-3
No ratings yet
NLP Unit-3
37 pages
Corpus Based Approach For Semantic Interpretation
No ratings yet
Corpus Based Approach For Semantic Interpretation
20 pages
DVT U4 My Notes
No ratings yet
DVT U4 My Notes
15 pages
Unit 4
No ratings yet
Unit 4
15 pages
Unit-3 - Semantics Material
No ratings yet
Unit-3 - Semantics Material
16 pages
Unit 3 & 4 Semantic Parsing
No ratings yet
Unit 3 & 4 Semantic Parsing
81 pages
Nlp-Unit Iii
No ratings yet
Nlp-Unit Iii
74 pages
Word Segmentation in NLP Explained
No ratings yet
Word Segmentation in NLP Explained
27 pages
Word Sense Disambiguation Survey
No ratings yet
Word Sense Disambiguation Survey
22 pages
Word Sense Disambiguation in NLP
No ratings yet
Word Sense Disambiguation in NLP
4 pages
NLPQB2
No ratings yet
NLPQB2
8 pages
Hindi Word Sense Disambiguation Method
No ratings yet
Hindi Word Sense Disambiguation Method
17 pages
NLP3
No ratings yet
NLP3
100 pages
Chapter 4 NLP
No ratings yet
Chapter 4 NLP
17 pages
Lesson 3 Natural Language Understanding Techniques
No ratings yet
Lesson 3 Natural Language Understanding Techniques
89 pages
Semantic Computing Insights
No ratings yet
Semantic Computing Insights
12 pages
Module 4
No ratings yet
Module 4
25 pages
Natural Language Processing
No ratings yet
Natural Language Processing
32 pages
Notes
No ratings yet
Notes
37 pages
A Survey On Semantic Processing Techniques: A, C, B, D, E, F, B, A, A
No ratings yet
A Survey On Semantic Processing Techniques: A, C, B, D, E, F, B, A, A
100 pages
?? ??? ????????? ?????????
No ratings yet
?? ??? ????????? ?????????
23 pages
Lecture10 - SRL
No ratings yet
Lecture10 - SRL
32 pages
Natural Language Processing
No ratings yet
Natural Language Processing
41 pages
Context in NLP
No ratings yet
Context in NLP
7 pages
Unit 3 NLP
No ratings yet
Unit 3 NLP
103 pages
NLP
No ratings yet
NLP
17 pages
A Survey On Semantic Parsing
No ratings yet
A Survey On Semantic Parsing
22 pages
Unit 4 Ai
No ratings yet
Unit 4 Ai
15 pages
PLANMYWAY
No ratings yet
PLANMYWAY
32 pages
Submitted To The Faculty of Engineering of
No ratings yet
Submitted To The Faculty of Engineering of
25 pages
MeghanaYakkala Programming Esse Certificate
No ratings yet
MeghanaYakkala Programming Esse Certificate
1 page
Unit-4 TNM
No ratings yet
Unit-4 TNM
25 pages
Submitted To The Faculty of Engineering of
No ratings yet
Submitted To The Faculty of Engineering of
35 pages
Unit 3 Part 2
No ratings yet
Unit 3 Part 2
21 pages
Understanding Predicate-Argument Structure
No ratings yet
Understanding Predicate-Argument Structure
70 pages
Unit 5 Language Modeling Notes
No ratings yet
Unit 5 Language Modeling Notes
3 pages
Unit Ii
No ratings yet
Unit Ii
31 pages
Rahim Artificialintelligence 2025
No ratings yet
Rahim Artificialintelligence 2025
51 pages
ML Lec1
No ratings yet
ML Lec1
13 pages
Unit 5
No ratings yet
Unit 5
116 pages
Unit Iii
No ratings yet
Unit Iii
50 pages
Module 1
No ratings yet
Module 1
41 pages
Barua Et Al. - 2024 - Second-Order Learning With Grounding Alignment A
No ratings yet
Barua Et Al. - 2024 - Second-Order Learning With Grounding Alignment A
12 pages
Final AI&ML
No ratings yet
Final AI&ML
84 pages
Module 1 (ML)
No ratings yet
Module 1 (ML)
17 pages
Unit 2 Supervised Learning
No ratings yet
Unit 2 Supervised Learning
36 pages
CS8082U4L02 - Locally Weighted Regression
No ratings yet
CS8082U4L02 - Locally Weighted Regression
13 pages
A SEMINAR REPORT On Machine Learning
No ratings yet
A SEMINAR REPORT On Machine Learning
40 pages
2.a.i. What Is Machine Learning?: Xii/Sem-Iii/Artificial Intelligence (Arti)
No ratings yet
2.a.i. What Is Machine Learning?: Xii/Sem-Iii/Artificial Intelligence (Arti)
35 pages
Big Data Analytics and Machine Learning
100% (1)
Big Data Analytics and Machine Learning
45 pages
Smart Mirror - Final - Doc
No ratings yet
Smart Mirror - Final - Doc
46 pages
Artificial Intelligence Based Intrusion Detection
No ratings yet
Artificial Intelligence Based Intrusion Detection
10 pages
Automated RCA with ML Algorithms
No ratings yet
Automated RCA with ML Algorithms
13 pages
Technical Report 2.0
No ratings yet
Technical Report 2.0
8 pages
Unit-1 New
No ratings yet
Unit-1 New
48 pages
LECTURE#9 EE258 F22 Part2 Draft v1
No ratings yet
LECTURE#9 EE258 F22 Part2 Draft v1
14 pages
Introduction To Machine Learning, Neural Networks, and Deep Learning
No ratings yet
Introduction To Machine Learning, Neural Networks, and Deep Learning
12 pages
2016-Revisiting Semi-Supervised Learning With Graph Embeddings
No ratings yet
2016-Revisiting Semi-Supervised Learning With Graph Embeddings
9 pages
What Is Machine Learning-UNIT III
No ratings yet
What Is Machine Learning-UNIT III
12 pages
Malware Analysis Using ML
No ratings yet
Malware Analysis Using ML
10 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
26 pages
Acs Chemrev 8b00728 PDF
No ratings yet
Acs Chemrev 8b00728 PDF
75 pages
Advancing Cybersecurity: A Comprehensive Review of AI-driven Detection Techniques
100% (1)
Advancing Cybersecurity: A Comprehensive Review of AI-driven Detection Techniques
38 pages
AI Pastpaper Solve by M.Noman Tariq
No ratings yet
AI Pastpaper Solve by M.Noman Tariq
23 pages
Electronics 09 01295 v2
No ratings yet
Electronics 09 01295 v2
12 pages

Uploaded by

Uploaded by

Unit-3:Semantic Parsing

Semantic Parsing: Introduction, Semantic Interpretation,

System Paradigms, Word Sense.

• Semantic analysis analyzes the grammatical format of

such : Colorless green ideas sleep furiously.

4. Identify syntactically unrelated paraphrases of a concept having the

same semantic content.

categories. Entities may be,

Example:: Which river is the longest?

Answer (x1, longest(x1 river(x1)))

The approaches generally fall into the following three categories.

• More recently, WordNet has been extended by adding syntactic

better incorporation in applications.

• In this example, the collocation C1,1 would be the word from,

• Unsupervised learning in Natural Language Processing (NLP) deals

• It is widely used for clustering, dimensionality reduction, and

1. Devise a way to cluster instances of a word so that each cluster effectively

• This approach uses a structured hierarchical semantic net (WordNet) for

• The dots in the figure

2. One sense per discourse: A word typically retains the same

The two stages of the Yarowsky algorithm

You might also like