Semi-supervised haartraining of a fast&frugal open source zygomatic smile detector A gift to OpenCV community Daniel Devatman Hromada prof. Charles Tijus Lutin Userlab Ecole Pratique des Hautes Etudes Cognition Humaine et Artificielle (ChART) Université Paris 8 Abstract—Five different versions OpenCV-positive XML haarcascades of zygomatic smile-detectors as well as five SMILEsamples from which these detectors were derived had been trained and are presented hereby as a new open source package. Samples have been extended in an incremental learning fashion, exploiting previously trained detector in order to add and label new elements of positive example set. After coupling with already known face detector, overall AUC performance ranges between 77%-90.5% when tested on JAFFE dataset and <1ms per frame speed is achieved when tested on webcam videos. Keywords-zygomatic smile detector; cascade of haar feature classifiers; computer vision; semi-supervised machine learning I. INTRODUCTION Great amount of work is being done in the domain of facial expression (FE) recognition. Of particular interest is a FE being at the very base of mother-baby interaction [1], a FE interpreted unequivocally in all human cultures [2] - smile. Maybe because of these reasons, maybe because of some others, smile detection is already of certain interest for computer vision (CV) community – be it for camera's smile shutter [3] or in order to study robot2children interaction [4]. Nonetheless, a publicly available, i.e. open source, smile detector is missing. This is somewhat stunning, especially given the fact that “smile” can be conceived as a “blocky” object [5] upon which a machine learning technique based on training of cascades of boosted haar-feature classifiers [6] can be applied, and that the tools for performing such a training are already publicly available as part of an OpenCV[5] project. Verily, with exceptions of detectors described in [7][8] which have not been publicly released, we did not find any reference to haarcascade-based smile detector in the literature. We aim to address this issue by making publicly available the initial results of our attempts to construct sufficiently descriptive SMILing Multisource Incremental-Learning Extensible Sample (SMILEs) and five smile detectors (smileD) generated from this sample. From more general perspective, our aim was to study whether one can use already generated classifiers in order to facilitate such a semi-supervised extension of initial sample that a more accurate classifier can be subsequently trained. A.SMILE sample (SMILEs) The aim of SMILEs project is to facilitate and accelerate the construction of smile detectors to anyone willing to do so. Since it is the OpenCV library which dominates the computer vision community, SMILEs package is adapted upon the needs of OpenCV in a sens that it contains 1) negative examples directory 2) positive examples directory 3) negatives.idx - list of files in negative examples directory 4) positives.idx - list of files in positives with associated information containing the coordinates of region of interest (ROI), i.e. the coordinates of the region within which smile can be located. SMILEs is considered “Multisource” because it originates as an amalgam of already existing datasets like LFW and Genki both of which are, themselves, collections of images downloaded from the Internet. Images from POFA [9] of Cohn-Kanade [10] datasets were not included into SMILEs since restricted access to these datasets is in contradiction with an open source approach1 of SMILEs project. B.Smile Detector (smileD) SMILEs are “Incremental-Learning Extensible” in a sense that they allow us to train new versions of smile detectors which are subsequently applied upon new image datasets in order to facilitate (or even fully automatize) the labeling of new images, and hence extending an original SMILEs with new images. Simply stated, SMILEs allow us train smileD which helps us to extend SMILEs etc. Since training of haar cascades is an exhaustive threshrold-finding process demanding not negligible amount of time and computational resources, 5 pregenerated OpenCVcompatible XML smileD haarcascades were trained by opencv-haartraining application and are included with SMILEs in our OpenSource SMILEsmileD package, so that anybody interested could implement our smile detector in copy&use fashion. 1 Both SMILEs & SMILEd cascades are publicly available from http://github.com/hromi/SMILEsmileD as a GPLlicensed package. C++ source codes of select&crop application for easy manual sample creation and of a facecoupled video stream smile detector are included as well. II. METHOD it is essentially a version 0.1 sample to which automatically labeled positive examples were added. Differently from version 0.3, Genki4K and not flickr was exploited as a source of additional data. Simply stated, positive examples, 624 of them in total, from Genki4K labeled as smile-containing by its authors were added to initial LFW-based sample.  Version 05 unites the versions 0.3 and 0.4, i.e. both Genki4K & flickr-originated images which were automatically labeled by smileD v0.1 were added to LFW samples. C.Initial Training Datasets SMILEs project in its current state unites 3 image sets : Labeled Faces in the Wild (LFW) dataset - LFW dataset [11] contains more than 13000 images of faces collected from the web; its cropped version contains only 25x25pixel regions detected by OpenCV's frontal face detector. No information about the presence/absence of a sm.ile within the image is given  Genki4K dataset – Genki4K is a publicly available part of UCSD's Genki project [12] containing 4000 images downloaded from Internet. A text file indicating the presence/absence of the smile in a given image is included.  Ad hoc Flickr dataset – We have used the search keyword “smile” in order to download more than 4200 additional pictures from image-sharing website flickr.com. More than 2600 of them contained at least one smiling face.  cropped D.Construction of SMILEs datasets We have created five different version of SMILEs. All these versions exploit the same negative sample set of LFW's nonsmiling images. All manual labeling focalised solely on zygomatic smile (ZS) region2:  Version 0.1 is based solely upon an LFW dataset. All pictures were manually labeled by our ad hoc region selection & cropping application and divided into samples of positive (3606 images) and negative (9474 images) examples.  Version 0.2 added 2666 manually labeled images downloaded from flickr.com to positive examples contained already in 0.1. Labeling & region selection was realised by same application as in case of 0.1.  Version 0.3 also extended the positive&negative example samples of version 0.1 with images from flickr. This time, however, the flickr-originated images weren't labeled manually, but the smile-containing regions of interest were determined automatically, by applying smileD of version 0.1 upon the set of downloaded images. 1372 ROIs (1 ROI for 1image) were identified&labeled in this way. E.SMILEs -> smileD Training Identical haarcascade training parameters [width=43, height=19, number of stages=16, stage hit rate=0.995, stage false alarm rate=0.5, week classifier decision tree depth=1(i.e. Stump), weight trimming rate=0.95] were applied for training of all five smileD versions, one smileD corresponding to one SMILEs, both referenced by same version number. F.smileD evaluation Training phase of every new version of smileD was followed by measuring its performance upon a Japanese Female Facial Expression (JAFFE) dataset in order to evaluate the performance of different versions of smileD classifiers when applied upon a sample having different luminosity conditions than that any imageset included in train sample Detectors were face-detector-coupled during testing, i.e. smile detection was performed iff a face was detected in a tested image, and only in the ROI defined by well-known geometric ratios [13] Receiver operating characteristic (ROC) curves were plotted and AUC (“area under ROC curve”) were calculated as performance measures by means of ROCR library [14]. “Smile intensity” [7], i.e. the number of overlapping neighboring hit regions3, was used as a cutoff parameter. III. RESULTS FIGURE I. SMILED ROC CURVES TABLE II. ROC'S "AREA UNDER CURVE" PERFORMANCE OF DIFFERENT VERSIONS OF SMILED DETECTOR TABLEI. BASIC COMPONENTS OF INITIAL VERSIONS OF SMILES&SMILED PROJECT Version Positive examples LFW manual 0.1 0.2 0.3 0.4 0.5 3606 3606 3606 3606 3606  Version 2 Version 0.1 0.2 0.3 0.4 0.5 Neg. ex. Flickr manual Flickr auto Genki auto Total 0 2666 0 0 0 0 0 1372 0 1372 0 0 0 624 624 3606 6262 4978 4230 6572 9474 9474 9474 9474 9474 04 is analogous to version 0.3 in that sense that ZS region was defined only vaguely as a rectangular ROI in whose center are smiling lips – in preference with uncovered teeth. Whole ROI is bordered by smile&nasolabial wrinkles. AUC 77.94% 85.49% 83.93% 90.21% 90.51% DISCUSSION Detectors we present hereby exploit the top-bottom approach, i.e. they are face-coupled. Knowing that there can 3 Can be obtained from undocumented neighbors attribute of cvAvgComp sequence referenced by cvHaarDetectObjects be no smile without the face within which it is nested, we firstly detect the face by an OpenCV face detection solution and then smileD is applied only in very limited ROI of face's bottom third. Consequences of our decision to create facecoupled smile detector are twofold: 1) since by definition we search for smile only within the face, we have used only nonsmiling faces as negative examples (i.e. background images) 2) smile detection itself is very fast, once the position of face is specified. When applied upon the webcamoriginated (320x240 resolution) video streams, the time needed for in-face smile detection in never exceeded 1ms per frame on a Mobile Intel(R) Pentium(R) 4 CPU (1.8GHz), suggesting that SmileD could be potentially embedded even into mobile devices disposing of less computational resources. SmileD's speed can somehow neutralize its smaller accuracy handicap which it has in comparison with results reported in [8]. In its current state, our approach suffer from somewhat high false alarm rates, but our research indicates that in real life condition, these can be in great measure reduced by taking into account the dynamic sequence of subsequent frames since the probability of the same false alarm occuring within all the frames of the sequence is proportional to the product of probabilities of occurrence of that false alarm for every frame of the sequence taken individually. High speed is therefore of utmost importance and analysis of sequences of frames can substantially reduce the number of false positives. Tuning of training parameters and the extension of negative example do remain as other possibilities how to augment the accuracy of our project. Tab.2 indicates that accuracy of such semi-supervised classifiers like smileD gets saturated at certain limit which can possibly be surmounted only by extension of negative sample set. In case of smile detection, we suggest that extension of negative example sample with more images containing “upper lip raiser” action unit (AU 10) – teeth-uncovering4 but associated with disgust rather than smile – could yield some significant increases in accuracy, as reported by [9]. Since such an extension is relatively easy and not much time-consuming, given that such AU10-containing images are given and marked as negative examples, it may be the subject of future research. In this study, however, we left the negative example unchanged in order to study the effectivity of “Incremental Learning” approach during which an old detector is used to facilitate the extension of a positive example sample thanks to which a new detector is obtained. Since semi-supervised smileD versions v0.4 and v0.5 have outperformed v0.2 for which manual labeling was implemented, while the latter one performed only slightly better than v0.3 which exploited an identic flickr-originated imagebase than v0.2, it is not unreasonable to think that such semi-supervised incremental training approach can be a feasible solution for training haarcascade detectors. If that would be the case, it could possibly be stated that the machine started, in certain sense, to 4 From anatomic point of view, disgust-expressing AU10 is associated with Levator Labii Superioris muscle while smile associates with Zygomaticus Major muscle (AU12). ground [15] its own notion of smile. ACKNOWLEDGMENT We would like to thank the third section of EPHE, University Paris 8 and CROUS de Paris for their kind support. REFERENCES [1] L. Strathearn, J. Li, P. Fonagy, et P.R. Montague, “What's in a smile? Maternal brain responses to infant facial cues,” Pediatrics, vol. 122, 2008, p. 40. [2] C. Darwin, P. Ekman, et P. Prodger, The expression of the emotions in man and animals, Oxford University Press, USA, 2002. [3] M. Akita, K. Marukawa, et S. Tanaka, “Imaging apparatus and display control method,” 2010. [4] J.R. Movellan, F. Tanaka, I.R. Fasel, C. Taylor, P. Ruvolo, et M. Eckhardt, “The RUBI project: a progress report,” Proceedings of the ACM/IEEE international conference on Human-robot interaction, 2007, p. 339. [5] G. Bradski et A. Kaehler, Learning OpenCV, O'Reilly Media, Inc., 2008. [6] P. Viola et M. Jones, “Rapid Object Detection using a Boosted Cascade of Simple,” Proc. IEEE CVPR 2001. [7] O. Deniz, M. Castrillon, J. Lorenzo, L. Anton, et G. Bueno, “Smile Detection for User Interfaces,” Advances in Visual Computing, p. 602–611. [8] J. Whitehill, M. Bartlett, G. Littlewort, I. Fasel, et J. Movellan, “Developing a practical smile detector,” Submitted to PAMI, vol. 3, 2007, p. 5. [9] P. Ekman et W.V. Friesen, Pictures of facial affect, Palo Alto, CA: Consulting Psychologists Press, 1976. [10] T. Kanade, Y. Tian, et J.F. Cohn, “Comprehensive database for facial expression analysis,” fg, 2000, p. 46. [11] G.B. Huang, M. Ramesh, T. Berg, et E. Learned-Miller, “Labeled faces in the wild: A database for studying face recognition” University of Massachusetts, Amherst, Technical Report, vol. 57, 2007, p. 07–49. [12] J. Whitehill, G. Littlewort, I. Fasel, M. Bartlett, et J. Movellan, “Toward Practical Smile Detection,” IEEE transactions on pattern analysis and machine intelligence, 2009, p. 2106–2111. [13] L. Da Vinci et J.P. Richter, The notebooks of Leonardo da Vinci, Dover Publications, 1970. [14] T. Sing, O. Sander, N. Beerenwinkel, et T. Lengauer, “ROCR: visualizing classifier performance in R,” Bioinformatics, 2005. [15] S. Harnad, “The symbol grounding problem,” Physica d, vol. 42, 1990, p. 335–346.