Project Report
Of
Word Sense Disambiguation
Bachelor of Technology
Computer Science and Engineering
Submitted by
Akriti Jaiswal 0609110014
Ankita Singh 0609110026
Apoorva Gupta 0609110031
Saumya Srivastava 0609110104
Department of Computer Science and Engineering
JSS Academy of Technical Education, Noida
C-20/1,Sector-62,Noida-201301
May,2010.
WORD SENSE DISAMBIGUATION
Submitted by
Akriti Jaiswal (0609110014)
Ankita Singh (0609110026)
Apoorva Gupta (0609110031)
Saumya Srivastava (0609110104)
Submitted to the Department of Computer Science & Engineering
in partial fulfillment of the requirements for the degree of
Bachelor of Technology in
Computer Science
JSS Academy of Technical Education, Noida
U.P. Technical University
May,2010.
DECLARATION
We hereby declare that this submission is our own work and that, to the best of our knowledge
and belief, it contains no material previously published or written by another person nor material
which to a substantial extent has been accepted for the award of any other degree or diploma of
the university or other institute of higher learning, except where due acknowledgment has been
made in the text.
Signature: Signature:
Name :Akriti Jaiswal Name :Ankita Singh
Roll No.:0609110014 Roll No.:0609110026
Date : Date :
Signature: Signature:
Name :Apoorva Gupta Name :Saumya Srivastava
Roll No.: 0609110031 Roll No.: 0609110104
Date : Date :
ii
CERTIFICATE
This is to certify that Project Report entitled “Word Sense Disambiguation” which is submitted
by Akriti Jaiswal, Ankita Singh, Apoorva Gupta and Saumya Srivastava in partial fulfillment of
the requirement for the award of degree B. Tech. in Department of Computer Science &
Engineering of U. P. Technical University, is a record of the candidates’ own work carried out
by them under my supervision. The matter embodied in this thesis is original and has not been
submitted for the award of any other degree.
Supervisor
Mrs. Seema Shukla
Asst. Professor
Department of Computer Science & Engineering.
Date
iii
ACKNOWLEDGEMENT
It gives us a great sense of pleasure to present the report of the B. Tech Project undertaken during B.
Tech. Final Year. We owe special debt of gratitude to our Professor(Mrs.) Seema Shukla, Departmentof
Computer Science & Engineering, JSS Academy of Technical Education, Noida for her constant support
and guidance throughout the course of our work. Her sincerity, thoroughness and perseverance have
been a constant source of inspiration for us. It is only her cognizant efforts that our endeavors have seen
light of the day.
We also do not like to miss the opportunity to acknowledge the contribution of all faculty members of the
department for their kind assistance and cooperation during the development of our project. Last but not
the least, we acknowledge our friends for their contribution in the completion of the project.
Signature: Signature:
Name :Akriti Jaiswal Name :Ankita Singh
Roll No.:0609110014 Roll No.:0609110026
Date : Date :
Signature: Signature:
Name :Apoorva Gupta Name :Saumya Srivastava
Roll No.: 0609110031 Roll No.: 0609110104
Date : Date :
iv
iv
ABSTRACT
Word sense disambiguation is the process of automatically figuring out the intended meaning of
such a word when used in a sentence. WSD comes under the field of Natural Language
Understanding and thus will form an integral part of any NLU application.
Here a lexicon-based statistical technique has been used for Word Sense Disambiguation. First,
the input is processed on Stanford POS Tagger to obtain a word POS tagging. Then the stop
words are removed from the input text. The ambiguous words present in the input text are
identified by referring to the specific table in the database determined by POS tagging of the
word for instance if an ambiguous word is tagged noun then it will be checked only in the table
which contains noun ambiguous words. The process of disambiguation is carried out by
considering a context window around the target word and comparing it against a set of
associated words from previously created lexicon of ambiguous words. The sense with the
highest match is then selected as the result. For all the successfully detected ambiguous words
the system is found to have an accuracy of around 75%.
The biggest drawback of this algorithm is that the lexicons used have to be very exhaustive. Also
they have been manually created making it a very cumbersome task.
v
TABLE OF CONTENTS
Page
DECLARATION ii
CERTIFICATE iii
ACKNOWLEDGEMENTS iv
ABSTRACT v
LIST OF FIGURES ix
LIST OF TABLES x
CHAPTER 1 INTRODUCTION
1.1 Problem Introduction 1
1.1.1 Motivation 1
1.1.2 Applications of WSD 2
1.1.3 Objective 2
1.1.4 Scope of the Project 2
1.2 Related Previous Work 4
1.3 Organization of the report 5
CHAPTER 2 LITERATURE SURVEY 6
2.1 Natural Language Processing 6
2.1.1 Divisions 7
2.1.2 Problems in NLP Systems 8
2.2 Word Sense Disambiguation 9
2.3 Need for Word Sense Disambiguation 10
2.4 Approaches used in Word Sense Disambiguation 11
2.4.1 Knowledge Based Approach 11
2.4.2 Supervised Approach 11
2.4.3 Unsupervised Approach 12
2.5 Existing Techniques for WSD 12
2.5.1 Yarowsky’s Algorithm 13
2.5.2 Lesk’s Algorithm 13
2.5.3 Naïve Bayes Classifiers 13
2.5.4 Selectional Preferences 13
2.6 Lexicons 13
2.7 Stop Words Removal 14
2.8 Summary 14
CHAPTER 3 SYSTEM DESIGN AND METHODOLOGY 16
3.1 System Design 16
3.1.1 System Architecture Model 16
3.1.2 Database Architecture 18
3.2 Methodology and Flowcharts 22
3.2.1 Text Preprocessing 22
3.2.2 Parsing 22
3.2.3 Stop Word Removal 22
3.2.4 Lexicon Sets 23
3.2.5 Ambiguity Resolution 24
CHAPTER 4 IMPLEMENTATION AND RESULTS 27
4.1 Hardware and Software Specifications 27
4.2 Assumptions and Dependencies 27
4.3 Constraints 28
4.4 Implementation Details 29
4.4.1 User Interfaces 29
4.4.2 Results 36
CHAPTER 5 CONCLUSION 41
43
5.1 Future Directions 41
APPENDIX A: LIST OF AMBIGUOUS WORDS 43
APPENDIX B: LEXICON OF ASSOCIATED WORDS 45
REFERENCES 63
LIST OF FIGURES
Fig. Page
3.1. System Architecture Model 17
3.2. Flow Diagram for Stopword Removal 23
3.3. Flow Diagram for Ambiguity Resolution 25
4.1. Snapshot of Home Page 29
4.2. Snapshot of Browse Button 30
4.3. Snapshot of Input file after using remove stop word button 31
4.4. Snapshot of ambiguous words in the input text with their positions 32
4.5. Snapshot of process of tagging the part of speech 33
4.6. Snapshot of Input with its part of speech tagging 34
4.7. Snapshot of Output comprising of all ambiguous words
along with their position and meaning 35
ix
LIST OF TABLES
Table Page
3.1. Table for Stop Words 19
3.2. Table for Ambiguous Words 19
3.3. Table for the Noun sense 20
3.4. Table for the Verb sense 20
3.5. Table for the Adjective sense 21
3.6. Table for the Adverb sense 21