Recognition of Bangla Sign Language Alphabets by Image Processing

This paper presents an image processing technique for mapping Bangla Sign Language alphabets to text. It attempts to process static images of the subject considered, and then matches them to a statistical database of pre-processed images to ultimately recognize the specific set of signed letters. Hand gesture recognition is a challenging problem in its general form. We consider a fixed set of manual commands and a reasonably structured Environment, and develop a simple, yet effective, procedure for gesture recognition. Our approach contains steps for converting the RGB image to Binary image, removing noise from this image, segmenting the hand region, finding out its area, circumference, and edges then extract some features from this preprocessed image. Then we create a database based on this features and classify the gesture based on the database. We also use exclusive-Or template matching and PSNR (Peak signal to noise ratio) comparison to detect the signs of Bangla Sign Language. Finally we combine the result of these three methods to detect the resultant gesture and convert them to text. We demonstrate the effectiveness of the technique on real imagery.

Index Terms—Bangla Sign Language, Communication, Image processing, Finger-spelling, Linguistics
Sign language.

A sign language [1] is a language which uses different kinds of sign pattern to express the thoughts of a signer. Sign language is commonly used by the physically impaired people who can not speak and hear.
As a nation we have the historical background of language movement, which reminds us that everyone has the right to communicate using their own language. Since the beginning, Sign language has been promoted side-by-side with oral language as a medium of interaction and exchanging of ideas. Most widely spoken languages around the world have established Sign language; however Bangladesh did not have any standardized Bangla Sign language. Bangla sign language is a modified form of British, American and Australian sign language and some local indigenous signs are also used in there. According to Centre for Disability in Development (CDD) [2] there are as many as 80 million people with disabilities in Bangladesh. So it is very important to disseminate the sign language throughout the community.
Here we follow the Bangla sign language letters and numeric signs developed by CDD. Recognizing the hand gesture of Bangla sign language using Computer vision [3, 4] is a new idea. Previously these types of works are done on American Sign Language (ASL) and British Sign Language (BSL) but in case Bangla Sign Language it is in its beginning phase. Two approaches are commonly used to recognize gestures one of them is gloved-base approach [5] and the other is vision-based approach. In gloved-base approach, gloves, sensors etc are used as a measuring device to analyze the hand movements. But the glove based system suffers from the limitation of using a device which is intrusive both for signer and the audience and they are also very expensive.
Vision-based gesture recognition systems can be divided into three main components: Image processing or extracting important clues (hand shape and position, face or head Position, etc.) [6, 7, 8] tracking the gesture features (related position or motion of hand Poses), and gesture interpretation (based on collected information that support predefined Meaningful gesture). The first phase of gesture recognition task is to select a model of the gesture. The modeling of gesture depends on the intent-dent applications by the gesture. There are two different approaches for vision-based modeling of gesture: Model based approach and Appearance based approach. The Model based techniques are tried to create a 3D model of the user hand (parameters: Joint angles and palm position) or contour model of the hand and use these for gesture recognition.
Appearance based approaches use template images or features from the training images (images, image geometry parameters, image motion parameters, fingertip position, etc.) [10] which is used for gesture recognition. The gestures are modeled by relating the appearance of any gesture to the appearance of the set of predefined template gestures. Gesture recognition methods are divided into two categories: static gesture or hand poster and dynamic gesture or motion gesture. Dynamic gestures are considered as temporally consecutive sequences of hand or head or body postures in sequence of time frames. Dynamic gestures recognition is accomplished using Hidden Markov Models (HMMs), Dynamic Time Warping, Bayesian networks or other patterns recognition methods that can recognize sequences over time steps.
Static gesture (or pose gesture) recognition can be accomplished by using template matching, eigenspaces or PCA (Principal Component Analysis), Elastic Graph Matching, neural network [11,12] or other standard pattern recognition techniques. Template matching techniques are actually the pattern matching approaches. It is possible to find out the most likely hand postures from an image by computing the correlation coefficient or minimum distance metrics with template images.
In this paper we worked on recognizing static gesture of Bangla Sign Language using pattern matching approaches. When tested for real imagery, satisfactory results have been obtained for signs of numerals, vowels and consonants.
The goal of this research work is to interpret the signs of Bangla Sign Language. Here we use image processing as a tool [13] to interpret this signs. Hand gesture recognition by image processing is a challenging task. Because till now this types of work are very much under construction and as far we know it is in its starting phase in case of Bangla Sign Language. Bangla Sing Language has two-hand dominant signs for letters and one-hand dominant signs for numbers. Here we focused on both the one-hand dominant and two-hand dominant signs of Bangla Sign Language and develop a system that can recognize the numeric, vowel, consonant signs of Bangla Sign language using image processing. The block diagram of the proposed system is shown in Fig. 1 and the steps involved in this work are outlined below:

1. Convert a RGB image to Binary Image based on some threshold values (Binary conversion).
2. Remove noise from the image (Morphological filtering).
3. Segment the hand gesture from the image.
4. Extract features of the segmented region and store this data in a database (Blob analysis).
5. Train the system for each sign and create statistical database.
6. Recognize the correct sign from the input image by calculating cumulative errors with the data stored in database.

Fig. 1: System block diagram


The following image processing procedures have been implemented in this work.


We capture the images in a constraint environment. Pixel-based skin color segmentation [14, 15, 16] is very sensitive to the environmental effect such as noise and illumination. We use a black cloth as background and our signer has to wear white gloves in his both hand. We capture the image from a constant distance and try to maintain a constant light.

After analyzing lots of images we decide a threshold limit. Then using this threshold limit we convert this RGB image to Binary image as seen in Fig. 2. We read the image pixel by pixel and whenever we read a pixel that has the RGB within our threshold limit, we store 1 for this pixel position in an array and for other pixel position we store 0.
In this way we obtain a binary image that has 1 for our desired region and 0 for background.

RGB image Binary Image
Fig. 2: RGB to Binary Conversion

We use flood fill algorithm to fill the internal holes of our desired hand posture region. It is important for us to remove noise from the image. Noise means unwanted white pixel outside the desired region. For removing noise we apply Depth First Search algorithm. Then we segment [17] the desired hand region from the image and resize the image by 180 by 50 pixels.


We calculate the area and the perimeter of the upper right, upper left, lower left, lower right of the segmented region as shown in Fig. 3. Then by summing these areas we calculate the total area and total perimeter of the image. We store the average upper right area, upper left area, lower left area, lower right, total of the segmented region of four images for each sign in our database. Perimeter information is calculated in the same way as given in Fig. 4.

A_Total=(A_NW+ A_NE+ A_SW+ A_SE)
A _Up=(A_NW+ A_NE)
A _Down=(A _SW+ A _SE)
A _Right=(A _NW+ A_SW)
A _Left=(A _NE+ A_SE)
P _Total= (P_NW+ P_NE+ P_SW+ P_SE)
P _Up= (P _NW+ P _NE)
P _Down= (P _SW+ P _SE)
P _Right= (P _NW+ P _SW)
Fig. 3: Area and Perimeter of the segmented region


We follow three methods to detect the signs. Among them first one is area comparison, second one is Exclusive -OR template matching and third one is PSNR (Peak Signal to Noise Ratio) comparison.


At first we create an area and perimeter database [18] for each sign as mentioned above. Then for the input image, we also calculate its area and perimeter and calculate the cumulative error with those stored in the database. The set of image in the database that corresponds to a given letter and has the lowest cumulative error, revels the highest priority of the correct letter being returned. In Table 1 we show the cumulative error result for five images of sign ?. For each time we calculate the area of the

Fig. 4: Area and Perimeter Database

input image and then calculate the cumulative error with average area values of each image that is stored in our database. Our system returns the sign id which gives us minimum cumulative error. Here for five input images of sign ?, each time from our database the system returns minimum cumulative error for sign ?.

No: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
1 137.2 340.3 393.1 534.9 518.5 512.8 488.3 341.4 416.0 399.3 360.1 429.9 389.0 330.2 383.3
2 136.0 339.6 386.3 529.8 516.0 515.4 484.7 335.9 412.9 395.8 357.4 429.9 387.4 327.6 384.2
3 138.1 340.4 382.7 529.3 515.7 516.7 483.6 335.4 410.5 393.0 353.3 431.8 385.4 323.9 386.4
4 152.3 334.8 400.1 530.6 517.8 493.5 484.9 357.4 431.7 408.0 361.6 421.4 380.8 338.3 376.6
5 149.8 328.8 388.7 527.3 521.9 499.69 481.8 349.3 424.2 400.6 353.6 426.4 383.3 329.6 380.5

It is seen that for five images of Sign ? each time ? has the minimum cumulative error.


We create an Exclusive OR template image as shown in Fig. 5 for each sign based on the binary images stored in our database and compare it with the input images. For our input image we calculate error with the Exclusive-OR template images stored in our database and detect our desired sign based on the minimum error. The result of Table 2 shows that for five input images of sign ? each time our system return minimum error for sign ? .

No: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
1 477 5778 8097 9128 7006 8495 8575 8562 9058 9166 8522 8555 7773 7313 5798
2 230 5773 8104 9119 6989 8630 8612 8671 9033 9229 8479 8460 7592 7450 5807
3 407 5802 8173 9112 6942 8729 8543 8676 8960 9242 8446 8335 7459 7505 5802
4 1271 5808 7949 9078 7060 8661 8449 8570 9108 9142 8314 8327 7415 7415 5902
5 1555 5712 8043 9096 7034 8669 8377 8452 9126 9120 8268 8411 7581 7291 5940

Fig. 5: Exclusive OR template image for sign 4

The preprocessed input image is compared with all the Exclusive-Or template image of each sign stored in our system based on the definition of Mean Square Error (MSE) and Peak signal to Noise Ratio (PSNR) [19 , 20]. The Mean Square Error is given by

The Peak Signal to Noise Ratio is given by

Our system detect the sign based on the maximum value of PSNR comparison between Exclusive-Or template image for each sign stored in our system and our preprocessed input image.
Table3 show the PSNR comparison result for five image of sign ? and each time PSNR comparison result for sign ? is maximum.
No: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
1 15.76 4.93 3.46 2.94 4.09 3.26 3.22 3.26 2.98 2.93 3.24 3.23 3.64 3.91 4.91
2 16.45 4.91 3.42 2.95 4.13 3.14 3.23 3.21 3.02 2.89 3.28 3.34 3.82 3.79 4.91
3 18.93 4.93 3.46 2.95 4.10 3.19 3.20 3.21 2.99 2.90 3.26 3.27 3.74 3.83 4.91
4 11.51 4.91 3.59 2.97 4.06 3.17 3.28 3.21 2.95 2.94 3.35 3.34 3.85 3.86 4.84
5 10.63 4.98 3.49 2.96 4.08 3.17 3.32 3.25 2.94 2.95 3.37 3.30 3.75 3.92 4.81

We detect the sign based on the combined result of these three detection methods which make our system more accurate. The graphical interface of the proposed system is depicted in Fig. 6.

Fig. 6: Graphical Interface of our Software

Here we work with 10 numeric signs and 5 vowel and 5 consonant signs of bangla sign language.
1. We use 80 images (4 images for every sign) to train our system, including one hand dominant numeric signs and two hand dominant vowel and consonants signs.
2. We taste our system with 255 images, and get almost every sign detected correctly, except for 1 image (shown in Fig. 7).

Fig. 7: Sign for 0
3. There are some cases where the detection rate is not satisfactory. This types of signs are shown in Fig. 8.

Fig. 8: Images that have almost same shape


The Primary focus of this study was to examine image processing as a tool for the conversion of signs of Bangla Sign Language to text. This study is further promising to be used in real-time application to fully recognize all the letter of Bangla Sign language. This can be further developed into a system which can be integrated in to the upcoming telecommunication devices with cameras to bridge the communication gap between the hearing and deaf/hard of hearing communities. System can be enhanced in terms of increase in the data processing speed and data storage by using the compression techniques and feature extraction techniques.
[1]Klima, E. & Bellugi, U. (1979) The Signs of Language, Harvard University Press: Cambridge, MA.
[2] Centre for Disability in Development.
[3] L. Bretzner, I. Laptev, T. Lindeberg, S. Lenman, Y. Sundblad, A Prototype System for Computer vision based Human Computer Interaction, Technical report, Stockholm, Sweden, 2001.
[4] C. M. Glenn, M. Eastman, and G. Paliwal, “A new digital image compression algorithm base on nonlinear dynamical system,” IADAT International Conference on Multimedia, Image Processing and Computer Vision, Conference Proceedings, March 2005.
[5]. Mohammed Waleed Kadous. GRASP: Recognition of Australian sign language using instrumented gloves. Honours Thesis, 1995.
[6] Gonzalez, R. C., Woods, R. E. Digital Image Processing, 2nd Edition.
[7] T. Agrawal, S. Chaudhuri, “Gesture Recognition Using Position and Appearance Features,” International Conference on Image Processing, pp. 109-112, 2003.
[8] J. Davis, M. Shah, “Visual Gesture Recognition”, Vision, Image and Signal Precessing. IEE Proceedings -Volume 141, Issue 2, Page(s):101 – 106 , Apr 1994
[9] R. Cutler, M. Turk. “View based Interpretation of Real­time Optical Flow for Gesture Recognition,” 3rd IEEE Conf. on Face and Gesture Recognition, Nara, Japan, April 1998.
[10] Ming-Hsuan Yang, Ahuja, N., Tabb, M, “Extraction of 2D motion trajectories and its application to hand gesture recognition”, Pattern Analysis and Machine Intelligence, IEEE Transactions on, Volume: 24 Issue: 8, , pp. 1061 -1074, Aug 2002.
[11] D. Yarowsky, “Gesture recognition using recurrent neural networks,” Journal of the ACM, pp. 237–242, January 1991.
[12] Becky Sue Parton, “Sign Language Recognition and Translation: A Multidiscipline Approach From the Field of Artificial Intelligence”, Journal of Deaf Studies and Deaf Education Advance Access published September 28, 2005.
[13] matlabcentral
[14] D. Chai and A. Bouzerdoum. A Bayesian Approach to Skin Color Classi?cation in YCbCr Color Space. In Proc. 10th IEEE Conf. on Region, pages 421–424, 2000.
[15] Mayank Bomb, IT-BHU, “Color Based Image Segmentation using Mahalnobis Distance in the YCbCr Color Space for Gesture Recognition”, IEEE India Council ,MV Chauhan Student Paper Contest 2002.
[16] D. Chai and A. Bouzerdoum. A Bayesian Approach to Skin Color Classi?cation in YCbCr Color Space. In Proc. 10th IEEE Conf. on Region, pages 421–424, 2000.
[17] Mark Tabb and Narendra Ahuja, “Multiscale Image Segmentation by Integrated Edge and Region Detection”, IEEE Transactions on Image Processing, vol. 6, no. 5, May 1997.
[18] Furst, J.,, Database Design for American Sign Language. Proceedings of the ISCA 15th International Conference on Computers and Their Applications, 427-430, 2000.
[19] Divya Mandloi, “Implementation of Image Processing Approach to Translation of ASL Finger-Spelling to Digital Text,” Rochester Institute of Technology: The Laboratory for Advanced Communications Technology, 2006.
[20] Kanthi Sarella and Chance M. Glenn, “Formulation of an Image Processing Technique for Improving Sign2 Performance,” International Telecommunications Education and Research Association (ITERA) Fourth Annual Conference on Telecommunications & Information Technology, Las Vegas, March 19-20, 2006.