0% found this document useful (0 votes)
73 views66 pages

Unit 3-1

Semantic parsing is a technique that transforms natural language into formal representations, enabling machines to understand human language for applications like question answering and dialogue systems. It involves processes such as tokenization, syntactic parsing, and semantic mapping, while addressing challenges like word sense disambiguation and structural ambiguity. Various system paradigms for semantic parsing include knowledge-based, unsupervised, supervised, and semi-supervised approaches, each with distinct methods for handling meaning representation and context.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views66 pages

Unit 3-1

Semantic parsing is a technique that transforms natural language into formal representations, enabling machines to understand human language for applications like question answering and dialogue systems. It involves processes such as tokenization, syntactic parsing, and semantic mapping, while addressing challenges like word sense disambiguation and structural ambiguity. Various system paradigms for semantic parsing include knowledge-based, unsupervised, supervised, and semi-supervised approaches, each with distinct methods for handling meaning representation and context.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Unit-3:Semantic Parsing

Semantic Parsing: Introduction, Semantic Interpretation,

System Paradigms, Word Sense.


Introduction
• Semantic parsing is a technique that converts natural language into
a formal representation of its meaning, such as logical forms or
structured queries.
• It enables machines to understand and process human language in
a way that supports applications like question answering, dialogue
systems, and information retrieval.
• it has exactly two types of meaning representations: a domain-
dependent, deeper representation and a set of relatively shallow
but general-purpose, low-level, and intermediate representations
• Semantic parsing typically involves:
• Tokenization – Breaking text into words.
• Syntactic Parsing – Analyzing sentence structure.
• Semantic Mapping – Translating words into formal representations.
Semantic Interpretation

• Semantic analysis analyzes the grammatical format of


sentences, including the arrangement of words, phrases, and
clauses, to determine relationships between independent terms
in a specific context.
A semantic theory should be able to:
1. Explain sentences having ambiguous meanings.
• For example, it should account for the fact that the word bill in the
sentence The bill is large is ambiguous in the sense that it could
represent money or the beak of a bird.
2. Resolve the ambiguities of words in context. For example, if
the same sentence is extended to form The bill is large but need
not be paid, then the theory should be able to disambiguate the
monetary meaning of bill.
3. Identify meaningless but syntactically well-formed sentences,

such : Colorless green ideas sleep furiously.

4. Identify syntactically unrelated paraphrases of a concept having the

same semantic content.


EX::It's raining heavily.“
There is intense rainfall.
Structural Ambiguity
• This is a sentence-level phenomenon and essentially means
transforming a sentence into its underlying syntactic
representation.
Word Sense
• In any given language, it is almost certainly that the same word
type, is used in different contexts and with different morphological
variants to represent different concepts in the world.
• For example, we use the word nail to represent a part of the human
anatomy and also to represent the metallic object used to secure
other objects.
• Consider four examples, the presence of words such
as hammer and hardware store in sentences 1 and 2, and
of clipped and manicure in sentences 3 and 4, enable humans to
easily disambiguate the sense in which nail is used:
1. He nailed the loose arm of the chair with a hammer.
2. He bought a box of nails from the hardware store.
3. He went to the beauty salon to get his nails clipped.
4. He went to get a manicure. His nails had grown very long.
• The next component of semantic interpretation is the identification
of entities that are spread across different phrases.
• Identifying the type of entity or event is critical semantic
representation.
• Two predominant tasks have become popular over the
years: named entity recognition and coreference resolution.
• Named entity recognition is a NLP technique that can scan entire articles and

identify some fundamental entities in a text and classify them into predefined

categories. Entities may be,

• Organizations,

• Monetary values,

• People’s names

• Company names

• Geographic locations
• Reference, in NLP, is a process where one word in a sentence
may refer to another word.
• The task of resolving such references is known as Reference
Resolution.
• In the above example, “She” and “Her” referring to the
entity “Ana” and “the institute” referring to the entity “UT
Dallas” are two examples of Reference Resolution.
Predicate-Argument Structure
• Once we have the word senses, entities, and events identified,
another level of semantic structure is identifying the participants
of the entities in these events.
• Generally, this process can be defined as the identification
of who did what to whom, when, where, why, and how.
A representation
of who did what to whom, when, where, why, and how
Meaning Representation
• The final process of the semantic interpretation is to build a
meaning representation that can be used by algorithms to various
application .
• This process is sometimes called the deep representation.

Example:: Which river is the longest?

Answer (x1, longest(x1 river(x1)))


System Paradigms

The approaches generally fall into the following three categories.


1. System Architectures
a. Knowledge based: As the name suggests, these systems use a
predefined set of rules to obtain a solution to a new problem.
b. Unsupervised: These systems tend to require minimal human
intervention to be functional by using existing resources that can be
used for a particular application.
c. Supervised:
• These systems involve the manual annotation of some phenomena
so that machine learning algorithms can be applied.
• Typically, researchers create feature functions that allow each
problem instance to be projected into a space of features.
• A model is trained to use these features to predict labels, and then
it is applied to test data.
d. Semi-Supervised:
• Manual annotation is usually very expensive and does not yield
enough data to completely capture a phenomenon.
• Semi-supervised learning is a machine learning technique that uses
both labeled and unlabeled data to train models.
• It's a combination of supervised and unsupervised learning.
2. Scope
a. Domain Dependent: These systems are specific to certain
domains, such as air travel reservations or simulated football
coaching.
b. Domain Independent: These systems are general enough that
the techniques can be applicable to multiple domains without
little or no change.
3. Coverage
a. Shallow: These systems tend to produce an intermediate
representation that can then be converted to one that a machine can
base its actions on.
b. Deep: These systems usually create a terminal representation that is
directly consumed by a machine or application.
Word Sense
• In language, a word is used in more than one way,understanding the
various usage patterns in language is important for various NLP
Applications.
• In various usage situations, the same word can mean differently.
• Word Sense Disambiguation (WSD) is the process of determining
the correct meaning of a word in context when the word has
multiple meanings.
• Attempts to solve this problem range from rule based to
completely unsupervised, supervised, and semi-supervised
learning methods.
• These rule based methods rely on text data like dictionaries.
• Supervised methods use corpora, to train machine learning
models.
• But, a problem that may arise is that such corpora are very tough
and time-consuming to create.
• Due to the lack of such corpus, most word sense
disambiguation algorithms use semi-supervised methods.
• The process starts with a small amount of data, which is often
manually created.
• Word sense ambiguities can be of 3 types: (i) homonymy, (ii)
polysemy, and (iii) categorial ambiguity
• Homonymy indicates that the words share the same spelling, but
the meanings are quite different. For example, the word “Bat”
• Polysemy, Polysemy refers to a single word having multiple
related [Link] Bank.
• Categorial ambiguity, Categorial ambiguity (or part-of-speech
(POS) ambiguity) occurs when a word can belong to multiple
grammatical categories.
Example
• "Book"
– Noun: I read a book yesterday.
– Verb: Can you book a hotel room for me?
Resources
• Resources is a key factor in the disambiguation of word senses in
corpora.
• A corpus is a large and structured set of machine-readable texts.
• Its plural is corpora. They can be derived in different ways like
text that was originally electronic, transcripts of spoken language.
• Early work on word sense disambiguation used machine-readable
dictionaries as knowledge sources.
• Two prominent sources were the Longman Dictionary of
Contemporary English (LDOCE) and Roget’s Thesaurus
• The late 1980s gave birth to a significant lexicographical resource,
WordNet.

• More recently, WordNet has been extended by adding syntactic


information on the glosses(Context), disambiguating them for

better incorporation in applications.


For example, in WordNet:
• "bank" (financial institution) → "A financial institution that accepts
deposits and channels the money into lending activities."
Systems
• researchers have explored various system architectures to address
the word sense disambiguation problem.
• We can classify these systems into four main categories: (i) rule
based or knowledge based, (ii) supervised, (iii) unsupervised, and
(iv) semisupervised.
Rule Based
• The first generation of word sense disambiguation systems was
primarily based on dictionary sense definitions and glosses.
• Probably the simplest and oldest dictionary-based sense
disambiguation algorithm was introduced by Lesk.
• The Lesk Algorithm (LA) disambiguates by calculating the overlap
of a set of dictionary definitions (senses) and the context words.
• For example, for the word "bank", WordNet provides multiple
glosses:
• bank (financial institution) → "A financial institution that accepts
deposits and channels the money into lending activities."
• bank (riverbank) → "The slope beside a body of water."
Algorithm. Pseudocode of the simplified Lesk algorithm
• The function COMPUTEOVERLAP returns the number of words
common to the two sets.
• Another dictionary-based algorithm used Roget’s
Thesaurus categories and classified unseen words into one of
these 1,042 categories.
The method consists of three steps.
• The first step is a collection of contexts.
• The second step computes weights for each of the words.
• P (w|RCat) is the probability of a word w occurring in the context
of a Roget’s Thesaurus category RCat.
• Finally, in the third step, the unseen words in the test set are
classified into the category that has the maximum weight.
• Knowledge-based algorithm that uses graphical representation of
senses of words in context is SSI.
• Structural Semantic Interconnections(SSI) refer to the
relationships between words, meanings within a semantic
network.
• These interconnections help define how meanings are related
based on links.
• It uses various sources of information, including WordNet, and all
possible corpora to form semantic graphs.
• The algorithm consists of two steps: an initialization step and an
iterative step, in which the algorithm attempts to disambiguate all
the words in context iteratively until it cannot disambiguate any
further or until all the terms are successfully disambiguated.
• Example::shows the semantic graphs for two senses of the
term bus. The first one is the vehicle sense, and the second one is
the connector sense.
The graphs for sense 1 and 2 of the noun bus as
generated by the SSI algorithm
Supervised
• These systems uses machine learning classifier trained on various
features extracted for words that have been manually
disambiguated in a given corpus.
• A good feature of these systems is that the user can use rules and
knowledge in the form of features.
• Classifier Probably the most common and high-performing
classifiers are support vector machines (SVMs) and maximum
entropy (MaxEnt) classifiers.
• Many good-quality, freely available distributions of each are
available and can be used to train word sense disambiguation
models.
Features
• Lexical context—The lexical context feature in NLP refers to the
surrounding words and phrases that help determine the meaning,
interpretation, or usage of a particular word in a sentence.
• Parts of speech—This feature comprises the POS information for
words in the window
• Bag of words context—Bag of Words (BoW) context feature refers
to how the BoW model represents textual data by focusing on word
occurrences.
• Local collocations—A local collocation in NLP refers to a sequence
of words that frequently appear together within a short window in
text, forming meaningful units.
• For example, if the target word is w, then Ci,j would be a collocation
where i and j refer to the start and offsets with respect to the word w.
• A positive sign indicates words on the right, and a negative sign
indicates words on the left of the target.
He bought a box of nails from the hardware store.

• In this example, the collocation C1,1 would be the word from,

and C1,3 would be the string from the hardware, and so on.
• Syntactic -- These features capture the structural patterns in text.
• Topic features—The broad topic, or domain, of the article
• Voice of the sentence—This feature indicates whether the sentence
in which the word occurs is a passive, or active sentence.
• Presence of subject/object—This binary feature indicates whether
the target word has a subject or object.
• Prepositional phrase adjunct—A prepositional phrase adjunct is
a prepositional phrase (PP) that provides additional (optional)
information about the verb, noun, or sentence.
Dependency Parsing
Algorithm:: Rules for selecting syntactic relations as features
Unsupervised

• Unsupervised learning in Natural Language Processing (NLP) deals


with extracting meaningful patterns, structures, and
representations from text without labeled data.

• It is widely used for clustering, dimensionality reduction, and


anomaly detection.
• There are a few solutions to this problem:

1. Devise a way to cluster instances of a word so that each cluster effectively


constrains the examples of the word to a certain sense. This could be
considered sense induction through clustering.

2. Use some metrics to identify the proximity of a given instance with some sets of
known senses of a word and select the closest to be the sense of that instance.

3. Start with seeds of examples of certain senses, then iteratively grow them to
form clusters.
Algorithms that use distance measure to identify senses.
• a new measure of semantic similarity: information content
• In NLP, Information Content (IC) is a measure of how specific or
informative a word or concept is in a given context.
• It is widely used in semantic similarity tasks, particularly in
WordNet-based similarity measures.
• Ex::"Animal" is a general term, so its IC is low.
• "Dog" is more specific, so it has a higher IC.
• CONCEPTUAL DENSITY ::Select a sense based on the relatedness of
that word-sense to the context.
• Relatedness is measured in terms of conceptual distance

• This approach uses a structured hierarchical semantic net (WordNet) for


finding the conceptual distance.
• It helps in word sense disambiguation, text summarization, and
knowledge representation.
• Example: In "I went to the bank to withdraw cash," the financial institution
sense of "bank" has higher conceptual density in the financial context.
Conceptual density

• The dots in the figure


represent the senses of
the words in context.

Sense 2 is the one with the highest conceptual density and is therefore the
chosen sense.
• For example, pigeon, crow, eagle are all hyponyms of
bird.
Semi supervised
• The next category of algorithms start from a small seed of
examples and an iterative algorithm that identifies more training
examples using a classifier.
• The automatically labeled data can be used to augment the training
data of the classifier to provide better predictions for the next
selection cycle, and so on.
Key Principles of the Yarowsky Algorithm
1. One Sense Per Collocation
• A word tends to have the same meaning in the same local context

2. One sense per discourse: A word typically retains the same


meaning within a single document or conversation..
[Link]
• The algorithm starts with a small set of labeled examples and then
iteratively expands its knowledge using unlabeled data.
How the Yarowsky Algorithm Works

Step 1: Initialization
Step 2: Identify Collocational & Contextual
Features
Step 3: Train a Classifier
Step 4: Label Unlabeled Data
Step 5: Iterative Refinement (Bootstrapping)
• Based on the assumption that these properties exist, the Yarowsky
algorithm iteratively disambiguates most of the words

The two stages of the Yarowsky algorithm


• Another variation of semi supervised systems is the use of
unsupervised methods for the creation of data combined with
supervised methods to learn models for that data.
• Synset(Synonym-set) is a special kind of a simple interface that is present in
NLTK to look up words in WordNet.
• Synset for "Happy"
Words in the Synset: happy, joyful, elated, glad.
Stop words are common words (such as "is," "the," "and," "in") that appear
frequently in language but do not carry significant meaning in text analysis.
Thank you

You might also like