American Sign Language (ASL) Recognition Based On Hough
American Sign Language (ASL) Recognition Based On Hough
American Sign Language (ASL) Recognition Based On Hough
with Applications
Expert Systems with Applications 32 (2007) 24–37
www.elsevier.com/locate/eswa
Abstract
The work presented in this paper aims to develop a system for automatic translation of static gestures of alphabets and signs in Amer-
ican sign language. In doing so, we have used Hough transform and neural networks which is trained to recognize signs. Our system does
not rely on using any gloves or visual markings to achieve the recognition task. Instead, it deals with images of bare hands, which allows
the user to interact with the system in a natural way. An image is processed and converted to a feature vector that will be compared with
the feature vectors of a training set of signs. The extracted features are not affected by the rotation, scaling or translation of the gesture
within the image, which makes the system more flexible.
The system was implemented and tested using a data set of 300 samples of hand sign images; 15 images for each sign. Experiments
revealed that our system was able to recognize selected ASL signs with an accuracy of 92.3%.
2005 Elsevier Ltd. All rights reserved.
Keywords: American sign language; Neural network; Hough transform; Canny edge detection; Sobel edge detection; Feature extraction
0957-4174/$ - see front matter 2005 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2005.11.018
Q. Munib et al. / Expert Systems with Applications 32 (2007) 24–37 25
A static sign is determined by a certain configuration of postures of the body. ASL also has its own grammar that
the hand, while a dynamic gesture is a moving gesture is different from other sign languages such as English and
determined by a sequence of hand movements and config- Swedish. ASL consists of approximately 6000 gestures of
urations. Dynamic gestures are sometimes accompanied common words with finger spelling used to communicate
with body and facial expressions. unclear words or proper nouns. Finger spelling uses one
The aim of sign language recognition is to provide an hand and 26 gestures to communicate the 26 letters of
easy, efficient and accurate mechanism to transform sign the alphabet. The 26 alphabets of ASL are shown in Fig. 1.
language into text or speech. With the help of computer-
ized digital image processing (Gonzalez, Woods, & Eddins, 1.2. Related work
2004) and neural networks techniques (Haykin, 1999), the
system that can recognize the alphabet flow can recognize Attempts to automatically recognize sign language
and interpret ASL words and phrases. For a gesture recog- began to appear in the literature in the 90s. Research on
nition system, there are four main components: gesture hand gestures can be classified into two categories first cat-
modeling, gesture analysis, gesture recognition and ges- egory, relies on electromechanical devices that are used to
ture-based application systems. measure the different gesture parameters such as the hand’s
position, angle, and the location of the fingertips. Systems
1.1. American sign language that use such devices are usually called glove-based systems
(e.g. the work of (Grimes, 1983) at AT&T Bell Labs devel-
American Sign Language (ASL) (International Bibliog- oped the ‘‘Digital Data Entry Glove’’). Major problems
raphy of Sign Language, 2005; National Institute on with such systems, that they force the singer to wear cum-
Deafness & Other Communication Disorders, 2005) is a bersome and inconvenient devices. As a result, the way by
complete language that employs signs made with the hands which the user interacts with the system will be complicated
and other gestures, including facial expressions and and less natural.
The second category exploits machine vision and image A. Feature extraction, statistics, and models.
processing techniques to create visual based hand gesture This technique can be classified into six sub-categories:
recognition systems. Visual-based gesture recognition 1. Template matching (e.g. research work of Darrell
systems are further divided into two categories. The first & Pentland, 1993; Newby, 1993; Sturman, 1992;
one relies on using specially designed gloves with visual Watson, 1993; Zimmerman, Lanier, Blanchard,
markers called ‘‘visual-based gesture with glove–markers Bryson, & Harvill, 1987).
(VBGwGM)’’ that help in determining hand postures 2. Feature extraction and analysis, (e.g. research
(Dorner & Hagen, 1994; Fels & Hinton, 1993; Starner, 1995). work of Rubine, 1991; Sturman, 1992; Wexelblat,
A summary of selected research efforts listed in Table 1. 1994, 1995).
But using gloves and markers do not provide the natu- 3. Active shape models ‘‘smart snakes’’ (e.g. research
ralness required in human–computer interaction systems. work of Heap & Samaria, 1995).
Besides, if colored gloves are used, the processing complex- 4. Principal component analysis (e.g. research work
ity is increased. of Birk, Moeslund, & Madsen, 1997; Martin &
As an alternative, the second kind of visual based hand James, 1997; Takahashi & Kishino, 1991).
gesture recognition systems can be called ‘‘pure visual- 5. Linear fingertip models (e.g. research work of
based gesture (PVBG)’’ (i.e. visual-based gesture without Davis & Shah, 1993; Rangarajan & Shah, 1991).
glove–markers). This type tries to achieve the ultimate con- 6. Causal analysis (e.g. research work of Brand &
venience naturalness by using images of bare hands to rec- Irfan, 1995).
ognize gestures. B. Learning algorithms.
Among many factors, five important factors must be This technique can be classified into three sub-categories:
considered for the successful development of a vision-based 1. Neural network (e.g. research work of Banarse,
solution to collecting data for hand posture and gesture 1993; Fels, 1994; Fukushima, 1989; Murakami &
recognition (Ong & Ranganath, 2005; Starner, 1995; Stur- Taguchi, 1991).
man & Zeltzer, 1994; Watson, 1993). 2. Hidden Markov Models (e.g. research work of
Charniak, 1993; Liang & Ouhyoung, 1998; Nam
• The placement and number of cameras used. & Wohn, 1996; Starner, 1995).
• The visibility of the object (hand) to the camera for sim- 3. Instance-based learning (research work of Kadous,
pler extraction of hand data/features. 1995; also see Aha, Dennis, & Marc, 1991).
• The extraction of features from the stream or streams of C. Miscellaneous techniques.
raw image data. This technique can be classified into three sub-categories:
• The ability of recognition algorithms to extracted 1. The linguistic approach (e.g. research work of
features. Hand, Sexton, & Mullan, 1994).
• The efficiency and effectiveness of the selected algo- 2. Appearance-based motion analysis (e.g. research
rithms to provide maximize accuracy and robustness. work of Davis & Shah, 1993).
3. Spatio-temporal vector analysis (e.g. research
A number of recognition techniques are available and in work of Quek, 1994).
some cases they can be applied for the two types of vision-
based solutions (i.e. VBGwGM and PVBG). In general Regardless of the approach used (i.e. VBGwGM or
these recognition techniques can be categorized into three PVBG etc.), many researchers have been trying to intro-
broad categories: duce hand gestures to Human–Computer Interaction field.
Table 1
A summary of gloves used
Research Gloves used
Dorner and Hagen (1994) They use a cotton glove, with various areas of it painted different
colors to enable tracking (i.e. gloves with rings of colors around each joint)
Starner (1995) Two colored gloves: an orange glove on the left hand and a yellow glove on the right hand
Fels and Hinton (1993) and Fels (1994) VPL DataGlove Mark II with a Polhemus tracker as input devices;
wearing a glove that the user moves in certain ways, users would learn
to generate vocal sounds (Glove-Talk)
Takahashi and Kishino (1991) VPL DataGlove
Murakami and Taguchi (1991) VPL DataGlove
Kramer and Leifer (1990) CyberGlove
Vamplew (1996) A single CyberGlove with position tracking
Rung-Huei and Ouhyoung (1996) DataGlove as input devices.
Kadous (1996) Power gloves
Grobel and Assan (1996) Colored gloves
Wexelblat (1994, 1995) CyberGlove on each hand
Q. Munib et al. / Expert Systems with Applications 32 (2007) 24–37 27
Charayaphan and Marble (1992) investigated a way using spelt American sign language. Their approach based on
image processing to understand American Sign Language one-state transitions of the English Language which are
(ASL). Their system can correctly recognize 27 out of the projected into shape space for tracking and model predic-
31 ASL symbols. Fels and Hinton (1993) developed a sys- tion using an HMM like approach. Symeonidis (2000) used
tem using a VPL DataGlove Mark II with a Polhemus orientation histograms to recognize static hand gestures,
tracker as input devices. In their system, the neural net- specifically, a subset of American Sign Language (ASL).
work method was employed for classifying hand gestures. A pattern recognition system used a transform that con-
Another system using neural networks developed by Ban- verts an image into a feature vector, which will then be
arse (1993) was vision-based and recognized hand postures compared with the feature vectors of a training set of ges-
using a neocognitron network, a neural network based on tures. The system was implemented with a perceptron net-
the spatial recognition system of the visual cortex of the work. The main problem with this technique is how good
brain. Heap and Samaria (1995) extend active shape mod- differentiation one can achieve. This of course is dependent
els, or ‘‘smart snakes’’ technique to recognize hand pos- upon the images but it comes down to the algorithm as
tures and gestures using computer vision. In their system, well. It may be enhanced using other image processing
they apply an active shape model and a point distribution techniques like edge detection. For farther information
model for tracking a human hand. Starner and Pentland and hot topics on this issue a modern and an excellent sur-
(1995) used a view-based approach with a single camera vey can be found in (Ong & Ranganath, 2005).
to extract two-dimensional features as input to HMMs.
The correct rate was 91% in recognizing the sentences com- 2. System design and implementation
prised 40 signs. Kadous (1996) demonstrated a system
based on power gloves to recognize a set of 95 isolated 2.1. System overview
Auslan signs with 80% accuracy, with an emphasis on com-
putationally inexpensive methods. Grobel and Assan Our system is designed to visually recognize all static
(1996) used HMMs to recognize isolated signs with signs of the American Sign Language (ASL), all signs of
91.3% accuracy out of a 262-sign vocabulary. They the ALS alphabets, single digit numbers used in ASL
extracted the features from video recordings of signers (e.g. 3, 5, 7) and a sample of words (e.g. love, meet, more)
wearing colored gloves. Vogler and Metaxas (1997) used using bare hands. The users/signers are not required to
computer vision methods to extract the three-dimensional wear any gloves or to use any devices to interact with the
parameters of a signer’s arm motions, coupled the com- system. However, different signers vary their hand shape
puter vision methods and HMMs to recognize continuous size, body size, operation habit and so on, which bring
American sign language sentences with a vocabulary of 53 about more difficulties in recognition. Therefore, we real-
signs. They modeled context-dependent HMMs to alleviate ized the necessity to investigate the signer-independent sign
the effects of movement epenthesis. An accuracy of 89.9% language recognition to improve the system robustness and
was observed. Yoshinori, Kang-Hyun, Nobutaka, and practicability in the future by using Hidden Markov Model
Yoshiaki (1998) used colored gloves and have shown that (HMM) (Seymore, McCallum, & Rosenfeld, 1999). The
using solid colored gloves allows faster hand features combination of powerful Hough transformation with
extraction than simply wearing no gloves at all. Liang excellent image processing and neural networks capabilities
and Ouhyoung (1998), used HMMs for continuous recog- has led to the successful development of ASL recognition
nition of Taiwan sign language with a vocabulary between system using MATLAB (Gonzalez et al., 2004). Our
71 and 250 signs with DataGlove as input devices. How- method relies on presenting the gesture as a feature vector
ever, their system required that gestures performed by the that is translation, scale and rotation invariant.
signer be slow to detect the word boundary. Yang and The system has two phases: the feature extraction phase
Ahuja (1999) investigated dynamic gestures recognition and the classification, as shown in Fig. 2. Images were pre-
as they utilized skin colour detection and affine transforms pared using Portable Document Format (PDF) form so the
of the skin regions in motion to detect the motion trajec- system will deal with images that have a uniform back-
tory of ASL signs. Using a time delayed neural network, ground (PDF Reference, 2004). The feature extraction
they recognised 40 ASL gestures with a success rate around applied an image processing technique which involves
96%. But their technique potentially has a high computa- using algorithms to detect and isolate various desired por-
tional cost when false skin regions are detected. A local fea- tions of the digitized sign. During this phase, each colored
ture extraction technique is employed to detect hand image is resized and then converted from RGB to grayscale
shapes in sign language recognition by Imagawa, Matsuo, one. This is followed by an edge detection technique the so-
Taniguchi, Arita, and Igi (2000). They used an appearance- called Canny edge detection (Canny, 1986).
based eigen method to detect hand shapes. Using a cluster- The goal of edge detection is to mark the points in an
ing technique, they generate clusters of hand shapes on an image (sign image) at which the intensity changes sharply.
eigenspace with accuracy achieved a round 93% recogni- Sharp changes in image properties usually reflect important
tion of 160 words. Bowden and Sarhadi (2002) developed events and changes in world properties. The Canny
a non-linear model of shape and motion for tracking finger (Canny, 1986) operator was originally designed to be an
28 Q. Munib et al. / Expert Systems with Applications 32 (2007) 24–37
where d is the Dirac delta function. The d term forces inte- we agree on taking the angle in the range (0, p] and to some
gration of f along the line specified by q and h and, and is precision due to our computational constraints. Let y be a
equivalent to the Hough transform for binary images. point chosen arbitrarily as the reference point for the
For shapes other than straight lines the Radon trans- shape—it is convenient, though not necessary, to choose
form is expressed by replacing the argument of d by a func- a point close to the centroid of the shape—and the vector
tion which forces integration of the image along the shape r = y x displacement vector.
(that is, its boundary). Note that the shape boundary may contain other points
For the proposed system to consider sign language rec- at which the gradient (tangent) has the same direction—
ognition, the following parameters for a generalized signs e.g., x 0 as shown in Fig. 5, and the corresponding displace-
can be defined: a = {y, s, h} where y = (xr, yr) is a reference ment vector is different r 0 5 r. We shall store the displace-
origin for a shape in the image space, h is shape orientation, ment vectors as function mapping each possible / to a set
and s = (sx, sy) describes two orthogonal scale factors, of displacement vectors. The resulting data structure is
along x- and y-axis respectively. For the sake of simplicity called the ‘‘Theta Radius Distribution Matrix’’. Its format
we shall discuss the specific (but significant) case where s is is illustrated by the table below, where y stands for the ref-
a scalar, and the parameter space has four dimensions. erence point chosen for the shape, B for shape boundary, /
For simplicity, consider the sign ‘‘O’’ taken from a side (x) denotes the gradient (tangent) direction at x.
angle (shown in Fig. 4) which represents a circle with fixed
radius r0; since the radius is fixed, we do not need the scale
parameter, and due to the symmetry the orientation
parameter is redundant as well, so the accumulator is con- i /i R/i
gruent to the image.
1 D/ {r j y r = x, x 2 B, /(x) = D/}
But this is not the case if to consider an arbitrary shape
recognition as shown in Fig. 5. Let the tangent line at the
2.
..
2D/
.. .. j y r = x, x 2 B, /(x) = 2D/}
{r
. .
point x be directed by angle / (note that this angle is mea-
p/D/ p {r j y r = x, x 2 B, /(x) = p}
sured in respect to some fixed direction in the image space);
One may choose to measure directions rather from p2 2.3.3. Creating the network
to p2 to, or in any other range of the same magnitude— Creating a network object is accomplished by training a
the table rows will be just cyclically shifted. feed-forward back propagation network, using the MAT-
The reference point inside the image sign can be chosen LAB built-in function (newff). It requires four input and
according to the following heuristics: returns the network object, the first input is (R, 2) matrix
! of minimum and maximum values for each of the R ele-
1 X X ments of the input vector. The second input is an array
y ¼ ðy 1 ; y 2 Þ ¼ x1 ; x2 containing the sizes of each layer; the third input is a cell
jBj B B
array containing the names of the transfer functions used
in each layer. The final input contains the names of the
that is, the reference is made to the mean of the pattern
training functions used. The following command was used
points. This will keep r relatively small (not much larger
to create a three-layer network:
than necessary), consequently reducing the error.
net1 ¼ newffðinput range; ½214 3 214
2.3. Classification phase 2 214; f‘logsig’ ‘logsig’ ‘logsig’g; ‘traingdx’Þ
There are (214 * 3) neurons in the first hidden layer,
The classification neural network is shown in Fig. 6, the
(214 * 2) neurons in the second hidden layer, and
neural network has (200) instances as its input vector, and
(214 * 1) neurons in the output layer. For the three-layer
214 output neurons in the output layer.
Log-Sigmoid (‘logsig’) transfer function was used.
After executing the above command, a network object is
2.3.1. Network architecture created, weights and biases of the network are initialized
When working with neural networks it is hard to predict and the network is ready for training.
what the result will be. Sometimes practice represents the
best solution. Decision in this field is very difficult; you 2.3.4. Training the network
have to examine different architectures and decide accord- The training process starts by a set of examples of
ing to their results. proper network behavior-network inputs and target out-
Therefore, after several experiments, it has been decided puts. During training, the weight and biases of the network
that the proposed system should based on supervised learn- are iteratively adjusted to minimize the network perfor-
ing in which the learning rule is provided with the set of mance (net.performfcn). The default performance function,
examples (the training set). for feed-forward networks, is the mean square error
(MSE), which is defined as the average squared error
2.3.2. Target architecture between the networks outputs and target outputs.
The size of our target matrix is 214 by 200. Fig. 7 The training function that was used is traingdx. It works
depicted a part of the target where each row specifies the accurately with noise patterns in training, and they increase
class to which an instance belongs to. the network accuracy on unseen samples. That is why we
choose it in the final implementation. The number of 3. Experiments results and analysis
epochs was 10,000 and the goal was 0.0001.
In this section, we evaluate the performance of our rec-
2.3.5. Testing and simulating the network ognition system by testing its ability to classify signs for
The system was tested with (100) images, (five images for both training and testing set of data. The effect of the num-
each sign) untrained images; previously unseen for the test- ber of inputs to the neural network is considered. In addi-
ing phase. The MATLAB built-in function (sim) simulates tion we discuss some problems in the performance of some
a network. The behavior of (sim) takes the network input, signs due to the similarities between them.
and the network object, then returning the network out-
puts. More than one network was trained and simulated. 3.1. Data set
For user convenience and simplicity, we have created a
GUI that helps in finding out the results. An example is The data set used for training and testing the recognition
shown in Figs. 8 and 9 respectively. system consists of grayscale images for all of the 20 signs
used in the experiment, these 20 signs are shown in Fig. 10. for training purpose, while the remaining five signs were
Also 15 samples for each sign were taken from 15 different used for testing. The samples were taken from different dis-
volunteers. For each sign, 10 out of 15 samples were used tances by digital camera, and with different orientations.
Fig. 11. Training chart for six samples for each sign.
34 Q. Munib et al. / Expert Systems with Applications 32 (2007) 24–37
This way, we were able to obtain a data set with cases that tion rate) of the tested data is very low (51%) and the num-
have different sizes and orientations, so we can examine the ber of training data is too small that does not have different
capabilities of our feature extraction scheme. reasonable orientations.
In order to obtain a more satisfactory result, we
3.2. Recognition rate trained the network on eight samples for each sign and
with (0.15) threshold for Canny edge detector. The training
We evaluate the performance of the system based on its chart for this network is shown in Fig. 12. The testing
ability to correctly classify samples to their corresponding results for both training and testing data are shown in
classes. The recognition rate is defined as the ratio of the Table 5.
number of correctly classified samples to the total number As shown, the results above were much better than the
of samples, i.e. previous. Another experiment which was the most satisfac-
number of correctly classified signs tory, was the trained network on 10 samples for each sign
Recognition rate ¼ and with (0.25) threshold for Canny edge detector. The
Total number of signs
training chart for this network is shown in Fig. 13. The
100% testing results for both training and testing data are shown
Through the experiment of the proposed system, firstly we in Table 6. However, the details results for the last network
trained our system on six samples for each sign and with no are shown in Table 7.
threshold for Canny edge detector. The training chart for
this network is shown in Fig. 11. The testing results, for 3.3. Hardware and software
both training and testing data are shown in Table 4.
Although the overall system has a good performance it The system was implemented in MATLAB version 6.5.
cannot be guaranteed because the performance (recogni- The recognition training and tests were run on a modern
Table 5
Table 4 Results of training eight samples for each sign and with (0.15) Canny
Results of training six samples for each sign and without Canny threshold threshold
Data No. of samples Recognized samples Recognition rate (%) Data No. of samples Recognized samples Recognition rate (%)
Training 120 120 100.0 Training 160 158 98.75
Testing 100 51 51.00 Testing 100 80 80.00
Total 220 171 77.72 Total 260 238 91.53
Fig. 12. Training chart for a network trained on eight samples for each sign and with (0.15) Canny threshold.
Q. Munib et al. / Expert Systems with Applications 32 (2007) 24–37 35
Fig. 13. Training chart for a network trained on 10 samples for each sign and with (0.25) Canny threshold.
• Looking for possible changes in the environment by Hough, P. (1962). Method and means for recognizing complex patterns,
designing a new system that works in real-time US Patent (3,069,654).
Imagawa, K., Matsuo, H., Taniguchi, R., Arita, D., & Igi, S. (2000).
environment. Recognition of local features for camera-based sign language recog-
nition system. In International conference on pattern recognition
References (ICPR), pp. 4849–4853.
International Bibliography of Sign Language, (2005). http://www.sign-
Aha, D. W., Dennis, K., & Marc, A. (1991). Instance-based learning lang.uni-hamburg.de/bibweb/F-Journals.html.
algorithms. Machine Learning, 6, 37–66. International Journal of Language & Communication Disorders, (2005).
Banarse, D. S. (1993). Hand posture recognition with the neocognitron Available from http://www.newcastle.edu.au/renwick/ROL/Jnlcon-
network. Master’s thesis, School of Electronic Engineering and tents/000mgmgf.htm.
Computer Systems, University College of North Wales, Bangor. Kadous, W. (1995). GRASP: recognition of Australian sign language
Birk, H., Moeslund, T. B., & Madsen, C. B. (1997). Real-time recognition using instrumented gloves. Bachelor’s thesis, University of New South
of hand alphabet gestures using principal component analysis. In Wales.
Proceedings of the 10th Scandinavian conference on image analysis. Kadous, W. (1996). Machine recognition of Auslan signs using Power-
Bowden, R., & Sarhadi, M. (2002). A non-linear model of shape and Glove: towards large-lexicon recognition of sign language. In Pro-
motion for tracking finger spelt American sign language. Image and ceeding of the workshop on the integration of gesture in language and
Vision Computing, IVC(20)(9–10), 597–607. speech, Wilmington, DE, pp. 165–174.
Brand, M., & Irfan, E. (1995). Causal analysis for visual gesture Kramer, J., & Leifer, L. (1990). A ‘‘Talking Glove’’ for nonverbal deaf
understanding, MIT Media Laboratory Perceptual Computing Section individuals. Technical Report CDR TR 1990 0312, Centre For Design
Technical Report No. 327. Research, Stanford University.
Canny, J. (1986). A computational approach to edge detection. IEEE Liang, R. H., & Ouhyoung, M. (1998). A real-time continuous gesture
Transactions on Pattern Analysis and Machine Intelligence, 8(6), 1986, recognition system for sign language. In Proceeding of the third
November. international conference on automatic face and gesture recognition,
Charayaphan, C., & Marble, A. (1992). Image processing system for Nara, Japan, pp. 558–565.
interpreting motion in American sign language. Journal of Biomedical Martin, J., & James L. C. (1997). An appearance-based approach to
Engineering, 14, 419–425. gesture recognition. In Proceedings of the ninth international confer-
Charniak, E. (1993). Statistical language learning. Cambridge: MIT Press. ence on image analysis and processing, pp. 340–347.
Cutler, R., & Turk, M. (1998). View-based interpretation of real-time Murakami, K., & Taguchi, H. (1991). Gesture recognition using recurrent
optical flow for gesture recognition. IEEE International Conference on neural networks. In Proceedings of CHI’91 human factors in comput-
Automatic Face and Gesture Recognition. ing systems, pp. 237–242.
Darrell, T., & Pentland, A. (1993). Recognition of space–time gestures Nam, Y., & Wohn, K. Y. (1996). Recognition of space–time hand-gestures
using a distributed representation. MIT Media Laboratory Vision and using hidden Markov model. In Proceedings of the ACM symposium on
Modeling Group Technical Report No. 197. virtual reality software and technology ’96 (pp. 51–58). ACM Press.
Davis J., & Shah M. (1993). Gesture recognition, Technical Report, National Institute on Deafness and Other Communication Disorders,
Department of Computer Science, University of Central Florida, CS- (2005). Available from: http://www.nidcd.nih.gov/health/hearing/
TR-93-11. asl.asp.
Dorner, B., & Hagen, E. (1994). Towards an American sign language Newby, G. (1993). Gesture recognition using statistical similarity. In
interface. Artificial Intelligence Review, 8(2–3), 235–253. Proceedings of virtual reality and persons with disabilities.
Duda, R. O., & Hart, P. E. (1972). Use of the Hough transformation to Ong, S., & Ranganath, S. (2005). Automatic sign language analysis: a
detect lines and curves in pictures. Communications of the ACM, 15, survey and the future beyond lexical meaning. IEEE Transactions on
11–15. Pattern Analysis and Machine Intelligence, 27(6).
Fels, S. (1994). Glove-TalkII: mapping hand gestures to speech using PDF Reference, (2004). Addison-Wesley, 5th ed.
neural networks—an approach to building adaptive interfaces. PhD Quek, Francis (1994). Toward a vision-based gesture interface. In
thesis, Computer Science Department, University of Toronto. Proceedings of the ACM symposium on virtual reality software and
Fels, S., & Hinton, G. (1993). GloveTalk: a neural network interface technology ’94 (pp. 17–31). ACM Press.
between a DataGlove and a speech synthesizer. IEEE Transactions on Rangarajan, K., & Shah, M. (1991). Establishing motion correspondence.
Neural Networks, 4, 2–8. CVGIP: Image Understanding, 54, 56–73.
Fukushima, K. (1989). Analysis of the process of visual pattern recogni- Rubine, D. (1991). Specifying gestures by example. In Proceedings of
tion by the neocognitron. Neural Networks, 2, 413–420. SIGGRAPH’91 (pp. 329–337). ACM Press.
Gonzalez, R. C., Woods, R. E., & Eddins, S. L. (2004). Digital image Rung-Huei, L., & Ouhyoung, M. (1996). A sign language recognition
processing using MATLAB. Prentice Hall. system using hidden Markov model and context sensitive search. In
Grimes, G. (1983). Digital data entry glove interface device, Patent Proceedings of the ACM symposium on virtual reality software and
4,414,537, AT & T Bell Labs. technology ’96 (pp. 59–66). ACM Press.
Grobel, K., & Assan, M. (1996). Isolated sign language recognition using Seymore, K., McCallum, A., & Rosenfeld, R. (1999). Learning hidden
hidden Markov models. In Proceedings of the international conference Markov model structure for information extraction. AAAI 99 work-
of system, man and cybernetics, pp. 162–167. shop on machine learning for information extraction.
Hand, C., Sexton, I., & Mullan, M. (1994). A linguistic approach to the Starner, T. (1995). Visual recognition of American sign language using
recognition of hand gestures. In Proceedings of the designing future hidden Markov models. Master’s thesis, Massachusetts Institute of
interaction conference, University of Warwick, UK. Technology.
Haykin, S. (1999). Neural networks: a comprehensive foundation. Prentice Starner, T., & Pentland, A. (1995). Visual recognition of American sign
Hall. language using hidden Markov models. International Workshop on
Heap, A. J., & Samaria, F. (1995). Real-time hand tracking and gesture Automatic Face and Gesture Recognition, Zurich, Switzerland, pp.
recognition using smart snakes. In Proceedings of interface to real and 189–194.
virtual worlds, Montpellier. Sturman, D. (1992). Whole-hand Input. Ph.D dissertation, Massachusetts
Hong, P. et al. (2000). Gesture modeling and recognition using finite state Institute of Technology.
machines. IEEE international conference on automatic face and Sturman, David, & Zeltzer, David (1994). A Survey of glove-based Input.
gesture recognition, pp. 410–415. IEEE Computer Graphics and Applications, 14(1), 30–39.
Q. Munib et al. / Expert Systems with Applications 32 (2007) 24–37 37
Symeonidis, K. (2000). Hand gesture recognition using neural networks. Wexelblat, A. (1994). A feature-based approach to continuous-gesture
Master’s thesis, University of Surrey. analysis. Master’s thesis, Massachusetts Institute of Technology.
Takahashi, T., & Kishino, F. (1991). Hand gesture coding based on Wexelblat, A. (1995). An approach to natural gesture in virtual environ-
experiments using a hand gesture interface device. SIGCHI Bulletin, ments. ACM Transactions on Computer–Human Interaction, 2(3), 179–200.
23(2), 67–73. Yang, M., & Ahuja, N. (1999). Recognizing hand gestures using motion
Vamplew, P. (1996). Recognition of sign language using neural networks. trajectories. In IEEE international conference on computer vision and
PhD Thesis, Department of Computer Science, University of pattern recognition (CVPR), pp. 466–472.
Tasmania. Yoshinori, K., Tomoyuki I., Kang-Hyun, J., Nobutaka, S., & Yoshiaki, S.
Vogler, C., & Metaxas, D. (1997). Adapting hidden Markov models for (1998). Vision-based human interface system: selectively recognizing
ASL recognition by using three-dimensional computer vision methods. intentional hand gestures. In Proceedings of the IASTED international
In Proceedings of the IEEE international conference on systems, man conference on computer graphics and imaging, pp. 219–223.
and cybernetics, Orlando, pp. 156–161. Zimmerman, T., Lanier, J., Blanchard, C., Bryson, S., & Harvill, Y.
Watson, R. (1993). A survey of gesture recognition techniques. Technical (1987). A hand gesture interface device. In Proceedings of CHI + GI’87
Report TCD-CS-93-11, Department of Computer Science, Trinity human factors in computing systems and graphics interface
College Dublin. (pp. 189–192). ACM Press.