ABSTRACT
Depression is one of the most common mood disorders. Technology has the potential to assist in screening and treating people with depression by robustly modeling and tracking the complex behavioral cues associated with the disorder (e.g., speech, language, facial expressions, head movement, body language). Similarly, robust affect recognition is another challenge which stands to benefit from modeling such cues. The Audio/Visual Emotion Challenge (AVEC) aims toward understanding the two phenomena and modeling their correlation with observable cues across several modalities. In this paper, we use multimodal signal processing methodologies to address the two problems using data from human-computer interactions. We develop separate systems for predicting depression levels and affective dimensions, experimenting with several methods for combining the multimodal information. The proposed depression prediction system uses a feature selection approach based on audio, visual, and linguistic cues to predict depression scores for each session. Similarly, we use multiple systems trained on audio and visual cues to predict the affective dimensions in continuous-time. Our affect recognition system accounts for context during the frame-wise inference and performs a linear fusion of outcomes from the audio-visual systems. For both problems, our proposed systems outperform the video-feature based baseline systems. As part of this work, we analyze the role played by each modality in predicting the target variable and provide analytical insights.
- Gnu aspell. http://www.aspell.net.Google Scholar
- S. Alghowinem, R. Goecke, M. Wagner, G. Parker, and M. Breakspear. Head pose and movement analysis as an indicator of depression. In A_ective Computing and Intelligent Interaction, 2013. Google ScholarDigital Library
- American Psychiatric Association. Diagnostic and statistical manual of mental disorders. 5th edition, VA: American Psychiatric Publishing.Google Scholar
- Murali Annavaram, Nenad Medvidovic, Urbashi Mitra, Shrikanth S. Narayanan, Gaurav Sukhatme, Zhaoshi Meng, Shi Qiu, Rohit Kumar, Gautam Thatte, and Donna Spruijt-Metz. Multimodal sensing for pediatric obesity applications. In Proc. Int. Workshop UrbanSense, pages 21--25, Raleigh, NC, November 2008.Google Scholar
- Anxiety and Depression Association of America. Depression, January 2014. http://www.adaa.org/ understanding-anxiety/depression.Google Scholar
- Aaron T. Beck, Robert A Steer, Roberta Ball, and William F. Ranieri. Comparison of beck depression inventories-ia and-ii in psychiatric outpatients. J. of personality assessment, 67(3):588--597, 1996.Google Scholar
- Matthew P. Black, Athanasios Katsamanis, Brian Baucom, Chi-Chun Lee, Adam Lammert, Andrew Christensen, Panayiotis G. Georgiou, and Shrikanth S. Narayanan. Toward automating a human behavioral coding system for married couples' interactions using speech acoustic features. Speech Communication, 55 (1):1--21, 2013. Google ScholarDigital Library
- Daniel Bone, Matthew Black, Chi-Chun Lee, Marian Williams, Pat Levitt, Sungbok Lee, and Shrikanth S. Narayanan. The psychologist as an interlocutor in autism spectrum disorder assessment: Insights from a study of spontaneous prosody. J. of Speech, Language, and Hearing Research, 2013.Google Scholar
- Theodora Chaspari, Daniel Bone, James Gibson, Chi-Chun Lee, and Shrikanth S. Narayanan. Using physiology and language cues for modeling verbal response latencies of children with asd. In Proc. ICASSP, May 2013.Google ScholarCross Ref
- M Cox, J Nuevo-Chiquero, JM Saragih, and S Lucey. Csiro face analysis sdk. Brisbane, Australia, 2013.Google Scholar
- F. Eyben, M. Wöllmer, and B. Schuller. OpenSMILE - The Munich versatile and fast open-source audio feature extractor. In ACM Multime Google ScholarDigital Library
- A. J. Ferrari, F. J. Charlson, R. E. Norman, S. B. Patten, G. Freedman, C. J. L. Murray, and H. A Whiteford. Burden of depressive disorders by country, sex, age, and year: Findings from the global burden of disease study 2010. Public Library of Science Medicine, 10(11), 2013.Google Scholar
- D. J. France, R. G. Shiavi, S. Silverman, M. Silverman, and D. M. Wilkes. Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Trans. Biomedical Engineering, 47 (7):829--837, 2000.Google ScholarCross Ref
- Tülin Gençöz. Discriminant validity of low positive affect: is it specific to depression? Personality and Individual Differences, 32(6):991--999, 2002.Google ScholarCross Ref
- J. Girard, J. Cohn, M. H. Mahoor, S. M. Mavadati., Z. Hammal, and D. P. Rosenwald. Nonverbal social withdrawal in depression: Evidence from manual and automatic analysis. In Image and Vision Computing, 2013.Google Scholar
- R. Gupta, K. Audhkhasi, S. Lee, and S. S. Narayanan. Speech paralinguistic event detection using probabilistic time-series smoothing and masking. In Proc. Interspeech, 2013.Google Scholar
- Rahul Gupta, Panayiotis G. Georgiou, David Atkins, and Shrikanth S. Narayanan. Predicting client's inclination towards target behavior change in motivational interviewing and investigating the role of laughter. In Proc. InterSpeech, September 2014.Google Scholar
- A. Halfin. Depression: The benefits of early and appropriate treatment. American J. of Managed Care, 13(4):S92--S97, 2007.Google Scholar
- J. Hamm, C. G. Kohler, R. C. Gur, and R. Verma. Automated facial action coding system for dynamic analysis of facial expressions in neuropsychiatric disorders. J. of Neuroscience Methods, 200(2): 237--256, 2011.Google ScholarCross Ref
- M. Káchele, M. Glodek, D. Zharkov, S. Meudt, and F. Schwenker. Fusion of audio-visual features using hierarchical classifier systems for the recognition of affective states and the state of depression. In Proc. Int. Conf. on Pattern Recognition Applications and Methods, 2014.Google Scholar
- M. Lech, L.-S. Low, and K. E. Ooi. Detection and prediction of clinical depression. Mental Health Informatics, Studies in Computational Intelligence, 491:185--199, 2014.Google Scholar
- Chi-Chun Lee, Athanasios Katsamanis, Matthew P. Black, Brian Baucom, Andrew Christensen, Panayiotis G. Georgiou, and Shrikanth S. Narayanan. Computing vocal entrainment: A signal-derived pca-based quantification scheme with application to affect analysis in married couple interactions. Computer, Speech, and Language, 28(2):518--539, March 2014. doi: 10.1016/j.csl.2012.06.006. URL www.sciencedirect.com/science/article/pii/S0885230812000472?v=s5. Google ScholarDigital Library
- L.-S. A. Low, N. C. Maddage, M. Lech, L. Sheeber, and N. Allen. Inuence of acoustic low-level descriptors in the detection of clinical depression in adolescents. In Proc. ICASSP, 2010.Google Scholar
- N. Malandrakis, A. Potamianos, E. Iosif, and S. Narayanan. Distributional semantic models for affective text analysis. IEEE Trans. Audio, Speech, and Language Processing, 21(11):2379--2392, 2013. Google ScholarDigital Library
- M. Mandal and B. Bhattacharya. Recognition of facial affect in depression. Perceptual and motor skills, 61 (1):13--14, 1985.Google Scholar
- Angeliki Metallinou, Martin Wollmer, Athanasios Katsamanis, Florian Eyben, Bjorn Schuller, and Shrikanth S. Narayanan. Context-sensitive learning for enhanced audiovisual emotion classi_cation. IEEE Trans. A_ective Computing, 3(2):184--198, April 2012. Google ScholarDigital Library
- Angeliki Metallinou, Athanasios Katsamanis, and Shrikanth S. Narayanan. Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information. Image and Vision Computing, 31(2): 137--152, February 2013. doi: dx.doi.org/10.1016/j.imavis.2012.08.018. URL www.sciencedirect.com/science/article/pii/S0262885612001710. Google ScholarDigital Library
- E. Moore II, M. A. Clements, J. W. Peifer, and L. Weisser. Critical analysis of the impact of glottal features in the classification of clinical depression in speech. IEEE Trans. Biomedical Engineering, 55(1): 96--107, 2008.Google ScholarCross Ref
- M. C. Mundt, A. P. Vogel, D.E. Feltner, and W. R. Lenderking. Vocal acoustic biomarkers of depression severity and treatment response. J. of Biological Psychiatry, 72(7):580--587, 2012.Google ScholarCross Ref
- Shrikanth S. Narayanan and Panayiotis G. Georgiou. Behavioral Signal Processing: Deriving human behavioral informatics from speech and language. Proc. of IEEE, 101(5):1203--1233, 2013.Google ScholarCross Ref
- National Institute of Mental Health. Depression, January 2014. http://www.nimh.nih.gov/health/topics/depression/index.shtml.Google Scholar
- Timo Ojala, Matti Pietikainen, and Topi Maenpaa. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. PAMI, 24(7):971--987, 2002. Google ScholarDigital Library
- K. E. B. Ooi, L. S. A. Low, M. Lech, and N. B. Allen. Prediction of clinical depression in adolescents using facial image analysis. In Image Analysis for Multimedia Interactive Services, 2011.Google Scholar
- Jason M Saragih, Simon Lucey, and Je_rey F Cohn. Deformable model _tting by regularized landmark mean-shift. IJCV, 91(2):200--215, 2011. Google ScholarDigital Library
- H. Schmid. Probabilistic part-of-speech tagging using decision trees. In Proc. Int. Conf. on New Methods in Language Processing, volume 12, pages 44--49, 1994.Google Scholar
- B. Schuller, S. Steidl, A. Batliner, J. Epps, F. Eyben, F. Ringeval, E. Marchi, and Y. Zhang. The INTERSPECH 2014 computational paralinguistics challenge: Cognitive & physical load. In Proc. Interspeech, Singapore, Singapore, 2014.Google Scholar
- Jianbo Shi and Carlo Tomasi. Good features to track. In Proc. CVPR 1994, pages 593--600. IEEE, 1994.Google Scholar
- Michel Valstar, Bjoern Schuller, Kirsty Smith, Timur Almaev, Florian Eyben, Jarek Krajewski, Roddy Cowie, and Maja Pantic. AVEC 2014 -- 3D Dimensional A_ect and Depression Recognition Challenge. In Proc. ACM AVEC, 2014. Google ScholarDigital Library
- M. Van Segbroeck, R. Travadi, Colin Vaz, Jangwon Kim, Matthew P. Black, Alexandros Potamianos, and S. S. Narayanan. Classification of cognitive load from speech using an i-vector framework. In Proc. Interspeech, 2014.Google Scholar
- Maarten Van Segbroeck, Andreas Tsiartas, and Shrikanth S. Narayanan. A robust frontend for VAD: Exploiting contextual, discriminative and spectral cues of human voice. In Proc. Interspeech, 2013.Google Scholar
- A. Vinciarelli, M. Pantic, D. Heylen, C. Pelachaud, I. Poggi, F. D'ericco, and M. Schroeder. Bridging the gap between social animal and unsocial machine: A survey of social signal processing. IEEE Trans. A_ective Computing, 3(1):69--87, 2012. Google ScholarDigital Library
- M. V~ A_t, M. Conrad, L. Kuchinke, K. Urton, M. Hofmann, and A. Jacobs. The berlin affective word list reloaded (bawl-r). Behavior Research Methods, 41: 534--538, 2009.Google ScholarCross Ref
- P. Wang, F. Barrett, Martin E., M. Milonova, R. E. Gur, C. Gur, and C. Kohler. Automated video-based facial expression analysis of neuropsychiatric disorders. J. of Neuroscience Methods, 168(1):224--238, 2008.Google ScholarCross Ref
- P. Waxer. Nonverbal cues for depression. J. of Abnormal Psychology, 83(3):319, 1974.Google ScholarCross Ref
- W. Weintraub. Verbal Behavior: Adaptation and Psychopathology. New York: Springer, 1981.Google Scholar
- J. R. Williamson, T. F. Quatieri, R. S. Helfer, R. Horwitz, B. Yu, and D. D. Mehta. Vocal biomarkers of depression based on motor incoordination. In Proc. ACM AVEC, 2013. Google ScholarDigital Library
- B. Xiao, P. G. Georgiou, Z. E. Imel, D. Atkins, and S. S. Narayanan. Modeling therapist empathy and vocal entrainment in drug addition counseling. In Proc. Interspeech, 2013.Google Scholar
- Guoying Zhao and Matti Pietikainen. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. PAMI, 29(6):915--928, 2007. Google ScholarDigital Library
- Y. Zhou, S. Scherer, D. Devault, J. Gratch, G. Stratou, L.-P. Morency, and J. Cassell. Multimodal prediction of psychological disorder: Learning nonverbal commonality in adjacency pairs. In Proc. Workshop on Semantics and Pragmatics of Dialogue, 2013.Google Scholar
Index Terms
- Multimodal Prediction of Affective Dimensions and Depression in Human-Computer Interactions
Recommendations
Multimodal and Multiresolution Depression Detection from Speech and Facial Landmark Features
AVEC '16: Proceedings of the 6th International Workshop on Audio/Visual Emotion ChallengeAutomatic classification of depression using audiovisual cues can help towards its objective diagnosis. In this paper, we present a multimodal depression classification system as a part of the 2016 Audio/Visual Emotion Challenge and Workshop (AVEC2016). ...
PanoEmo, a set of affective 360-degree panoramas: a psychophysiological study
AbstractThere is a significant increase in the use of virtual reality in scientific experiments in the fields of ergonomics, education, and psychology among others. Many researchers successfully provoked different affective states in participants in order ...
The Effect of Thermal Stimuli on the Emotional Perception of Images
CHI '16: Proceedings of the 2016 CHI Conference on Human Factors in Computing SystemsThermal stimulation is a feedback channel that has the potential to influence the emotional response of people to media such as images. While previous work has demonstrated that thermal stimuli might have an effect on the emotional perception of images, ...
Comments