skip to main content
10.1145/2661806.2661810acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Multimodal Prediction of Affective Dimensions and Depression in Human-Computer Interactions

Authors Info & Claims
Published:07 November 2014Publication History

ABSTRACT

Depression is one of the most common mood disorders. Technology has the potential to assist in screening and treating people with depression by robustly modeling and tracking the complex behavioral cues associated with the disorder (e.g., speech, language, facial expressions, head movement, body language). Similarly, robust affect recognition is another challenge which stands to benefit from modeling such cues. The Audio/Visual Emotion Challenge (AVEC) aims toward understanding the two phenomena and modeling their correlation with observable cues across several modalities. In this paper, we use multimodal signal processing methodologies to address the two problems using data from human-computer interactions. We develop separate systems for predicting depression levels and affective dimensions, experimenting with several methods for combining the multimodal information. The proposed depression prediction system uses a feature selection approach based on audio, visual, and linguistic cues to predict depression scores for each session. Similarly, we use multiple systems trained on audio and visual cues to predict the affective dimensions in continuous-time. Our affect recognition system accounts for context during the frame-wise inference and performs a linear fusion of outcomes from the audio-visual systems. For both problems, our proposed systems outperform the video-feature based baseline systems. As part of this work, we analyze the role played by each modality in predicting the target variable and provide analytical insights.

References

  1. Gnu aspell. http://www.aspell.net.Google ScholarGoogle Scholar
  2. S. Alghowinem, R. Goecke, M. Wagner, G. Parker, and M. Breakspear. Head pose and movement analysis as an indicator of depression. In A_ective Computing and Intelligent Interaction, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. American Psychiatric Association. Diagnostic and statistical manual of mental disorders. 5th edition, VA: American Psychiatric Publishing.Google ScholarGoogle Scholar
  4. Murali Annavaram, Nenad Medvidovic, Urbashi Mitra, Shrikanth S. Narayanan, Gaurav Sukhatme, Zhaoshi Meng, Shi Qiu, Rohit Kumar, Gautam Thatte, and Donna Spruijt-Metz. Multimodal sensing for pediatric obesity applications. In Proc. Int. Workshop UrbanSense, pages 21--25, Raleigh, NC, November 2008.Google ScholarGoogle Scholar
  5. Anxiety and Depression Association of America. Depression, January 2014. http://www.adaa.org/ understanding-anxiety/depression.Google ScholarGoogle Scholar
  6. Aaron T. Beck, Robert A Steer, Roberta Ball, and William F. Ranieri. Comparison of beck depression inventories-ia and-ii in psychiatric outpatients. J. of personality assessment, 67(3):588--597, 1996.Google ScholarGoogle Scholar
  7. Matthew P. Black, Athanasios Katsamanis, Brian Baucom, Chi-Chun Lee, Adam Lammert, Andrew Christensen, Panayiotis G. Georgiou, and Shrikanth S. Narayanan. Toward automating a human behavioral coding system for married couples' interactions using speech acoustic features. Speech Communication, 55 (1):1--21, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Daniel Bone, Matthew Black, Chi-Chun Lee, Marian Williams, Pat Levitt, Sungbok Lee, and Shrikanth S. Narayanan. The psychologist as an interlocutor in autism spectrum disorder assessment: Insights from a study of spontaneous prosody. J. of Speech, Language, and Hearing Research, 2013.Google ScholarGoogle Scholar
  9. Theodora Chaspari, Daniel Bone, James Gibson, Chi-Chun Lee, and Shrikanth S. Narayanan. Using physiology and language cues for modeling verbal response latencies of children with asd. In Proc. ICASSP, May 2013.Google ScholarGoogle ScholarCross RefCross Ref
  10. M Cox, J Nuevo-Chiquero, JM Saragih, and S Lucey. Csiro face analysis sdk. Brisbane, Australia, 2013.Google ScholarGoogle Scholar
  11. F. Eyben, M. Wöllmer, and B. Schuller. OpenSMILE - The Munich versatile and fast open-source audio feature extractor. In ACM Multime Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. J. Ferrari, F. J. Charlson, R. E. Norman, S. B. Patten, G. Freedman, C. J. L. Murray, and H. A Whiteford. Burden of depressive disorders by country, sex, age, and year: Findings from the global burden of disease study 2010. Public Library of Science Medicine, 10(11), 2013.Google ScholarGoogle Scholar
  13. D. J. France, R. G. Shiavi, S. Silverman, M. Silverman, and D. M. Wilkes. Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Trans. Biomedical Engineering, 47 (7):829--837, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  14. Tülin Gençöz. Discriminant validity of low positive affect: is it specific to depression? Personality and Individual Differences, 32(6):991--999, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  15. J. Girard, J. Cohn, M. H. Mahoor, S. M. Mavadati., Z. Hammal, and D. P. Rosenwald. Nonverbal social withdrawal in depression: Evidence from manual and automatic analysis. In Image and Vision Computing, 2013.Google ScholarGoogle Scholar
  16. R. Gupta, K. Audhkhasi, S. Lee, and S. S. Narayanan. Speech paralinguistic event detection using probabilistic time-series smoothing and masking. In Proc. Interspeech, 2013.Google ScholarGoogle Scholar
  17. Rahul Gupta, Panayiotis G. Georgiou, David Atkins, and Shrikanth S. Narayanan. Predicting client's inclination towards target behavior change in motivational interviewing and investigating the role of laughter. In Proc. InterSpeech, September 2014.Google ScholarGoogle Scholar
  18. A. Halfin. Depression: The benefits of early and appropriate treatment. American J. of Managed Care, 13(4):S92--S97, 2007.Google ScholarGoogle Scholar
  19. J. Hamm, C. G. Kohler, R. C. Gur, and R. Verma. Automated facial action coding system for dynamic analysis of facial expressions in neuropsychiatric disorders. J. of Neuroscience Methods, 200(2): 237--256, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  20. M. Káchele, M. Glodek, D. Zharkov, S. Meudt, and F. Schwenker. Fusion of audio-visual features using hierarchical classifier systems for the recognition of affective states and the state of depression. In Proc. Int. Conf. on Pattern Recognition Applications and Methods, 2014.Google ScholarGoogle Scholar
  21. M. Lech, L.-S. Low, and K. E. Ooi. Detection and prediction of clinical depression. Mental Health Informatics, Studies in Computational Intelligence, 491:185--199, 2014.Google ScholarGoogle Scholar
  22. Chi-Chun Lee, Athanasios Katsamanis, Matthew P. Black, Brian Baucom, Andrew Christensen, Panayiotis G. Georgiou, and Shrikanth S. Narayanan. Computing vocal entrainment: A signal-derived pca-based quantification scheme with application to affect analysis in married couple interactions. Computer, Speech, and Language, 28(2):518--539, March 2014. doi: 10.1016/j.csl.2012.06.006. URL www.sciencedirect.com/science/article/pii/S0885230812000472?v=s5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L.-S. A. Low, N. C. Maddage, M. Lech, L. Sheeber, and N. Allen. Inuence of acoustic low-level descriptors in the detection of clinical depression in adolescents. In Proc. ICASSP, 2010.Google ScholarGoogle Scholar
  24. N. Malandrakis, A. Potamianos, E. Iosif, and S. Narayanan. Distributional semantic models for affective text analysis. IEEE Trans. Audio, Speech, and Language Processing, 21(11):2379--2392, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Mandal and B. Bhattacharya. Recognition of facial affect in depression. Perceptual and motor skills, 61 (1):13--14, 1985.Google ScholarGoogle Scholar
  26. Angeliki Metallinou, Martin Wollmer, Athanasios Katsamanis, Florian Eyben, Bjorn Schuller, and Shrikanth S. Narayanan. Context-sensitive learning for enhanced audiovisual emotion classi_cation. IEEE Trans. A_ective Computing, 3(2):184--198, April 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Angeliki Metallinou, Athanasios Katsamanis, and Shrikanth S. Narayanan. Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information. Image and Vision Computing, 31(2): 137--152, February 2013. doi: dx.doi.org/10.1016/j.imavis.2012.08.018. URL www.sciencedirect.com/science/article/pii/S0262885612001710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. E. Moore II, M. A. Clements, J. W. Peifer, and L. Weisser. Critical analysis of the impact of glottal features in the classification of clinical depression in speech. IEEE Trans. Biomedical Engineering, 55(1): 96--107, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  29. M. C. Mundt, A. P. Vogel, D.E. Feltner, and W. R. Lenderking. Vocal acoustic biomarkers of depression severity and treatment response. J. of Biological Psychiatry, 72(7):580--587, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  30. Shrikanth S. Narayanan and Panayiotis G. Georgiou. Behavioral Signal Processing: Deriving human behavioral informatics from speech and language. Proc. of IEEE, 101(5):1203--1233, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  31. National Institute of Mental Health. Depression, January 2014. http://www.nimh.nih.gov/health/topics/depression/index.shtml.Google ScholarGoogle Scholar
  32. Timo Ojala, Matti Pietikainen, and Topi Maenpaa. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. PAMI, 24(7):971--987, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. E. B. Ooi, L. S. A. Low, M. Lech, and N. B. Allen. Prediction of clinical depression in adolescents using facial image analysis. In Image Analysis for Multimedia Interactive Services, 2011.Google ScholarGoogle Scholar
  34. Jason M Saragih, Simon Lucey, and Je_rey F Cohn. Deformable model _tting by regularized landmark mean-shift. IJCV, 91(2):200--215, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. H. Schmid. Probabilistic part-of-speech tagging using decision trees. In Proc. Int. Conf. on New Methods in Language Processing, volume 12, pages 44--49, 1994.Google ScholarGoogle Scholar
  36. B. Schuller, S. Steidl, A. Batliner, J. Epps, F. Eyben, F. Ringeval, E. Marchi, and Y. Zhang. The INTERSPECH 2014 computational paralinguistics challenge: Cognitive & physical load. In Proc. Interspeech, Singapore, Singapore, 2014.Google ScholarGoogle Scholar
  37. Jianbo Shi and Carlo Tomasi. Good features to track. In Proc. CVPR 1994, pages 593--600. IEEE, 1994.Google ScholarGoogle Scholar
  38. Michel Valstar, Bjoern Schuller, Kirsty Smith, Timur Almaev, Florian Eyben, Jarek Krajewski, Roddy Cowie, and Maja Pantic. AVEC 2014 -- 3D Dimensional A_ect and Depression Recognition Challenge. In Proc. ACM AVEC, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. Van Segbroeck, R. Travadi, Colin Vaz, Jangwon Kim, Matthew P. Black, Alexandros Potamianos, and S. S. Narayanan. Classification of cognitive load from speech using an i-vector framework. In Proc. Interspeech, 2014.Google ScholarGoogle Scholar
  40. Maarten Van Segbroeck, Andreas Tsiartas, and Shrikanth S. Narayanan. A robust frontend for VAD: Exploiting contextual, discriminative and spectral cues of human voice. In Proc. Interspeech, 2013.Google ScholarGoogle Scholar
  41. A. Vinciarelli, M. Pantic, D. Heylen, C. Pelachaud, I. Poggi, F. D'ericco, and M. Schroeder. Bridging the gap between social animal and unsocial machine: A survey of social signal processing. IEEE Trans. A_ective Computing, 3(1):69--87, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. M. V~ A_t, M. Conrad, L. Kuchinke, K. Urton, M. Hofmann, and A. Jacobs. The berlin affective word list reloaded (bawl-r). Behavior Research Methods, 41: 534--538, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  43. P. Wang, F. Barrett, Martin E., M. Milonova, R. E. Gur, C. Gur, and C. Kohler. Automated video-based facial expression analysis of neuropsychiatric disorders. J. of Neuroscience Methods, 168(1):224--238, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  44. P. Waxer. Nonverbal cues for depression. J. of Abnormal Psychology, 83(3):319, 1974.Google ScholarGoogle ScholarCross RefCross Ref
  45. W. Weintraub. Verbal Behavior: Adaptation and Psychopathology. New York: Springer, 1981.Google ScholarGoogle Scholar
  46. J. R. Williamson, T. F. Quatieri, R. S. Helfer, R. Horwitz, B. Yu, and D. D. Mehta. Vocal biomarkers of depression based on motor incoordination. In Proc. ACM AVEC, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. B. Xiao, P. G. Georgiou, Z. E. Imel, D. Atkins, and S. S. Narayanan. Modeling therapist empathy and vocal entrainment in drug addition counseling. In Proc. Interspeech, 2013.Google ScholarGoogle Scholar
  48. Guoying Zhao and Matti Pietikainen. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. PAMI, 29(6):915--928, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Y. Zhou, S. Scherer, D. Devault, J. Gratch, G. Stratou, L.-P. Morency, and J. Cassell. Multimodal prediction of psychological disorder: Learning nonverbal commonality in adjacency pairs. In Proc. Workshop on Semantics and Pragmatics of Dialogue, 2013.Google ScholarGoogle Scholar

Index Terms

  1. Multimodal Prediction of Affective Dimensions and Depression in Human-Computer Interactions

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      AVEC '14: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge
      November 2014
      110 pages
      ISBN:9781450331197
      DOI:10.1145/2661806

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 November 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      AVEC '14 Paper Acceptance Rate8of22submissions,36%Overall Acceptance Rate52of98submissions,53%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader