0% found this document useful (0 votes)
67 views12 pages

Preliminary Section

The document is a project report submitted by four students - Akriti Jaiswal, Ankita Singh, Apoorva Gupta and Saumya Srivastava - to their department of computer science and engineering. The report discusses word sense disambiguation and presents their work on developing a system to automatically resolve the intended meanings of ambiguous words in sentences. The system uses a lexicon-based statistical technique and achieves an accuracy of around 75% for successfully detecting ambiguous words.

Uploaded by

Akriti Jaiswal
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views12 pages

Preliminary Section

The document is a project report submitted by four students - Akriti Jaiswal, Ankita Singh, Apoorva Gupta and Saumya Srivastava - to their department of computer science and engineering. The report discusses word sense disambiguation and presents their work on developing a system to automatically resolve the intended meanings of ambiguous words in sentences. The system uses a lexicon-based statistical technique and achieves an accuracy of around 75% for successfully detecting ambiguous words.

Uploaded by

Akriti Jaiswal
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Project Report

Of
Word Sense Disambiguation
Bachelor of Technology

Computer Science and Engineering

Submitted by

Akriti Jaiswal 0609110014


Ankita Singh 0609110026
Apoorva Gupta 0609110031
Saumya Srivastava 0609110104

Department of Computer Science and Engineering


JSS Academy of Technical Education, Noida
C-20/1,Sector-62,Noida-201301

May,2010.
WORD SENSE DISAMBIGUATION

Submitted by

Akriti Jaiswal (0609110014)


Ankita Singh (0609110026)
Apoorva Gupta (0609110031)
Saumya Srivastava (0609110104)

Submitted to the Department of Computer Science & Engineering

in partial fulfillment of the requirements for the degree of

Bachelor of Technology in

Computer Science

JSS Academy of Technical Education, Noida


U.P. Technical University

May,2010.
DECLARATION

We hereby declare that this submission is our own work and that, to the best of our knowledge
and belief, it contains no material previously published or written by another person nor material
which to a substantial extent has been accepted for the award of any other degree or diploma of
the university or other institute of higher learning, except where due acknowledgment has been
made in the text.

Signature: Signature:

Name :Akriti Jaiswal Name :Ankita Singh

Roll No.:0609110014 Roll No.:0609110026

Date : Date :

Signature: Signature:

Name :Apoorva Gupta Name :Saumya Srivastava

Roll No.: 0609110031 Roll No.: 0609110104

Date : Date :

ii
CERTIFICATE
 

This is to certify that Project Report entitled “Word Sense Disambiguation” which is submitted
by Akriti Jaiswal, Ankita Singh, Apoorva Gupta and Saumya Srivastava in partial fulfillment of
the requirement for the award of degree B. Tech. in Department of Computer Science &
Engineering of U. P. Technical University, is a record of the candidates’ own work carried out
by them under my supervision. The matter embodied in this thesis is original and has not been
submitted for the award of any other degree.
 
 

Supervisor

Mrs. Seema Shukla

Asst. Professor

Department of Computer Science & Engineering.

Date

iii
ACKNOWLEDGEMENT

It gives us a great sense of pleasure to present the report of the B. Tech Project undertaken during B.
Tech. Final Year. We owe special debt of gratitude to our Professor(Mrs.) Seema Shukla, Departmentof
Computer Science & Engineering, JSS Academy of Technical Education, Noida for her constant support
and guidance throughout the course of our work. Her sincerity, thoroughness and perseverance have
been a constant source of inspiration for us. It is only her cognizant efforts that our endeavors have seen
light of the day.

We also do not like to miss the opportunity to acknowledge the contribution of all faculty members of the
department for their kind assistance and cooperation during the development of our project. Last but not
the least, we acknowledge our friends for their contribution in the completion of the project.

Signature: Signature:

Name :Akriti Jaiswal Name :Ankita Singh

Roll No.:0609110014 Roll No.:0609110026

Date : Date :

Signature: Signature:

Name :Apoorva Gupta Name :Saumya Srivastava

Roll No.: 0609110031 Roll No.: 0609110104

Date : Date :

iv
iv

ABSTRACT

Word sense disambiguation is the process of automatically figuring out the intended meaning of
such a word when used in a sentence. WSD comes under the field of Natural Language
Understanding and thus will form an integral part of any NLU application.

Here a lexicon-based statistical technique has been used for Word Sense Disambiguation. First,
the input is processed on Stanford POS Tagger to obtain a word POS tagging. Then the stop
words are removed from the input text. The ambiguous words present in the input text are
identified by referring to the specific table in the database determined by POS tagging of the
word for instance if an ambiguous word is tagged noun then it will be checked only in the table
which contains noun ambiguous words. The process of disambiguation is carried out by
considering a context window around the target word and comparing it against a set of
associated words from previously created lexicon of ambiguous words. The sense with the
highest match is then selected as the result. For all the successfully detected ambiguous words
the system is found to have an accuracy of around 75%.

The biggest drawback of this algorithm is that the lexicons used have to be very exhaustive. Also
they have been manually created making it a very cumbersome task.
v

TABLE OF CONTENTS
Page

DECLARATION ii

CERTIFICATE iii

ACKNOWLEDGEMENTS iv

ABSTRACT v

LIST OF FIGURES ix

LIST OF TABLES x

CHAPTER 1 INTRODUCTION

1.1 Problem Introduction 1

1.1.1 Motivation 1

1.1.2 Applications of WSD 2

1.1.3 Objective 2

1.1.4 Scope of the Project 2

1.2 Related Previous Work 4

1.3 Organization of the report 5

CHAPTER 2 LITERATURE SURVEY 6

2.1 Natural Language Processing 6

2.1.1 Divisions 7

2.1.2 Problems in NLP Systems 8

2.2 Word Sense Disambiguation 9

2.3 Need for Word Sense Disambiguation 10


2.4 Approaches used in Word Sense Disambiguation 11

2.4.1 Knowledge Based Approach 11

2.4.2 Supervised Approach 11

2.4.3 Unsupervised Approach 12

2.5 Existing Techniques for WSD 12

2.5.1 Yarowsky’s Algorithm 13

2.5.2 Lesk’s Algorithm 13

2.5.3 Naïve Bayes Classifiers 13

2.5.4 Selectional Preferences 13

2.6 Lexicons 13

2.7 Stop Words Removal 14

2.8 Summary 14

CHAPTER 3 SYSTEM DESIGN AND METHODOLOGY 16

3.1 System Design 16

3.1.1 System Architecture Model 16

3.1.2 Database Architecture 18

3.2 Methodology and Flowcharts 22

3.2.1 Text Preprocessing 22

3.2.2 Parsing 22

3.2.3 Stop Word Removal 22

3.2.4 Lexicon Sets 23

3.2.5 Ambiguity Resolution 24

CHAPTER 4 IMPLEMENTATION AND RESULTS 27

4.1 Hardware and Software Specifications 27


4.2 Assumptions and Dependencies 27

4.3 Constraints 28

4.4 Implementation Details 29

4.4.1 User Interfaces 29

4.4.2 Results 36

CHAPTER 5 CONCLUSION 41
43

5.1 Future Directions 41

APPENDIX A: LIST OF AMBIGUOUS WORDS 43

APPENDIX B: LEXICON OF ASSOCIATED WORDS 45

REFERENCES 63
LIST OF FIGURES

Fig. Page

3.1. System Architecture Model 17

3.2. Flow Diagram for Stopword Removal 23

3.3. Flow Diagram for Ambiguity Resolution 25

4.1. Snapshot of Home Page 29

4.2. Snapshot of Browse Button 30

4.3. Snapshot of Input file after using remove stop word button 31

4.4. Snapshot of ambiguous words in the input text with their positions 32

4.5. Snapshot of process of tagging the part of speech 33

4.6. Snapshot of Input with its part of speech tagging 34

4.7. Snapshot of Output comprising of all ambiguous words


along with their position and meaning 35
ix
LIST OF TABLES

Table Page

3.1. Table for Stop Words 19

3.2. Table for Ambiguous Words 19

3.3. Table for the Noun sense 20

3.4. Table for the Verb sense 20

3.5. Table for the Adjective sense 21

3.6. Table for the Adverb sense 21

You might also like