It is essential for computer systems to possess the ability to recognize meaningful gestures if computers are to interact naturally with people. Humans use gestures in daily life as a means of communication, e.g., pointing to an object to bring someone's attention to it, waving ``hello'' to a friend, requesting n of something by raising n fingers, etc. The best example of communication through gestures is given by sign language. American Sign Language (ASL) incorporates the entire English alphabet along with many gestures representing words and phrases . Using Computer Vision, a computer can recognize and perform the user's gesture command, thus alleviating the need for a keyboard. Applications for such a vision system are the remote control of a robotic arm, guiding a computer presentation system, and executing computer operational commands such as opening a window or program. We have proposed a Computer Vision gesture recognition method [7, 8, 9] which allows users wearing a specially marked glove to command a computer system to carry out predefined gesture action commands. A subset of the gestures is comprised of selected ASL letters. Each gesture begins with the hand in the ``hello'' position and ends in the recognizable gesture position. The current library of gestures contains seven gestures: Left, Right, Up, Down, Rotate, Grab, and Stop. In our method, each image in the sequence is analyzed to find the location of the fingertips. If the hand is found to be in motion to a gesture position, motion correspondence [19, 24] is used to track the points to the resulting gesture position. Finally, the trajectories computed by the motion correspondence algorithm are converted to vector form to be matched with the stored gestures.
This project deals with the extension of our method to 3-D gesture recognition. Gesture recognition becomes more realistic with a 3-D, rather than a 2-D, approach. We will employ a 3-D hand model based on Generalized Cylinders, and estimate motion parameters in order to compute 3-D hand trajectories. The use of a 3-D model of the hand helps to eliminate the glove, which is currently needed for fingertip detection. Since the tracking of fingers will be performed using a 3-D model, 2-D motion correspondence will also not be needed. Using 3-D information, we know the real-world location of the fingers at any time, and can exploit this knowledge to suit the desired application without having to concern ourselves with the weaker and possibly ambiguous 2-D information. The ambiguities which may arise in 2-D are the many 3-D trajectories, which after undergoing perspective projection, may have the same corresponding 2-D trajectory.
Our gesture recognition method was developed by a 1993/94 REU participant. This fact should provide good inspiration for a new REU student working on this project. The student working on this project will first read the extended version of our paper, and discuss the paper with the faculty advisor. Next, the student will be introduced to the implementation of this method on the workstation in the X-window environment. The participant will also run some experiments to study the algorithm. In parallel, he or she will read other key papers on gesture recognition [3, 6, 10, 21]. The next step will be to work on the extension of our method to 3-D gesture recognition. The REU participant will be directed through several steps in achieving this goal. Intially it will be assumed that the 3-D model of the hand is given, and that it is conformed to the image of a hand. This can be achieved manually. The model will provide depth, Z, for each node. The participant will be introduced to the instantaneous motion model for recovering 3-D motion of a hand. Since the depth is known, the motion estimation problem becomes linear. and a linear regression program can be used to estimate 3-D motion using this model. Later, the student will be directed to come up with a method for generating the hand model using generalized clyinders.
Additional possible project areas for students at UCF include shape from shading [5, 28, 34, 35, 36] and motion analysis [14, 13, 17, 24, 25, 26, 27, 30].