research-article

Multimodal Prediction of Affective Dimensions and Depression in Human-Computer Interactions

Authors:
Rahul Gupta

Univesity of Southern California, Los Angeles, CA, USA

Univesity of Southern California, Los Angeles, CA, USA
View Profile

,
Nikolaos Malandrakis

Univesity of Southern California, Los Angeles, CA, USA

Univesity of Southern California, Los Angeles, CA, USA
View Profile

,
Bo Xiao

Univesity of Southern California, Los Angeles, CA, USA

Univesity of Southern California, Los Angeles, CA, USA
View Profile

,
Tanaya Guha

Univesity of Southern California, Los Angeles, CA, USA

Univesity of Southern California, Los Angeles, CA, USA
View Profile

,
Maarten Van Segbroeck

Univesity of Southern California, Los Angeles, CA, USA

Univesity of Southern California, Los Angeles, CA, USA
View Profile

,
Matthew Black

Univesity of Southern California, Los Angeles, CA, USA

Univesity of Southern California, Los Angeles, CA, USA
View Profile

,
Alexandros Potamianos

National Technical University of Athens, Athens, Greece

National Technical University of Athens, Athens, Greece
View Profile

,
Shrikanth Narayanan

University of Southern California, Los Angeles, CA, USA

University of Southern California, Los Angeles, CA, USA
View Profile

AVEC '14: Proceedings of the 4th International Workshop on Audio/Visual Emotion ChallengeNovember 2014Pages 33–40https://doi.org/10.1145/2661806.2661810

Published:07 November 2014Publication History

AVEC '14: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge

Pages 33–40

ABSTRACT

Depression is one of the most common mood disorders. Technology has the potential to assist in screening and treating people with depression by robustly modeling and tracking the complex behavioral cues associated with the disorder (e.g., speech, language, facial expressions, head movement, body language). Similarly, robust affect recognition is another challenge which stands to benefit from modeling such cues. The Audio/Visual Emotion Challenge (AVEC) aims toward understanding the two phenomena and modeling their correlation with observable cues across several modalities. In this paper, we use multimodal signal processing methodologies to address the two problems using data from human-computer interactions. We develop separate systems for predicting depression levels and affective dimensions, experimenting with several methods for combining the multimodal information. The proposed depression prediction system uses a feature selection approach based on audio, visual, and linguistic cues to predict depression scores for each session. Similarly, we use multiple systems trained on audio and visual cues to predict the affective dimensions in continuous-time. Our affect recognition system accounts for context during the frame-wise inference and performs a linear fusion of outcomes from the audio-visual systems. For both problems, our proposed systems outperform the video-feature based baseline systems. As part of this work, we analyze the role played by each modality in predicting the target variable and provide analytical insights.

References

Gnu aspell. http://www.aspell.net.Google Scholar
S. Alghowinem, R. Goecke, M. Wagner, G. Parker, and M. Breakspear. Head pose and movement analysis as an indicator of depression. In A_ective Computing and Intelligent Interaction, 2013. Google ScholarDigital Library
American Psychiatric Association. Diagnostic and statistical manual of mental disorders. 5th edition, VA: American Psychiatric Publishing.Google Scholar
Murali Annavaram, Nenad Medvidovic, Urbashi Mitra, Shrikanth S. Narayanan, Gaurav Sukhatme, Zhaoshi Meng, Shi Qiu, Rohit Kumar, Gautam Thatte, and Donna Spruijt-Metz. Multimodal sensing for pediatric obesity applications. In Proc. Int. Workshop UrbanSense, pages 21--25, Raleigh, NC, November 2008.Google Scholar
Anxiety and Depression Association of America. Depression, January 2014. http://www.adaa.org/ understanding-anxiety/depression.Google Scholar
Aaron T. Beck, Robert A Steer, Roberta Ball, and William F. Ranieri. Comparison of beck depression inventories-ia and-ii in psychiatric outpatients. J. of personality assessment, 67(3):588--597, 1996.Google Scholar
Matthew P. Black, Athanasios Katsamanis, Brian Baucom, Chi-Chun Lee, Adam Lammert, Andrew Christensen, Panayiotis G. Georgiou, and Shrikanth S. Narayanan. Toward automating a human behavioral coding system for married couples' interactions using speech acoustic features. Speech Communication, 55 (1):1--21, 2013. Google ScholarDigital Library
Daniel Bone, Matthew Black, Chi-Chun Lee, Marian Williams, Pat Levitt, Sungbok Lee, and Shrikanth S. Narayanan. The psychologist as an interlocutor in autism spectrum disorder assessment: Insights from a study of spontaneous prosody. J. of Speech, Language, and Hearing Research, 2013.Google Scholar
Theodora Chaspari, Daniel Bone, James Gibson, Chi-Chun Lee, and Shrikanth S. Narayanan. Using physiology and language cues for modeling verbal response latencies of children with asd. In Proc. ICASSP, May 2013.Google ScholarCross Ref
M Cox, J Nuevo-Chiquero, JM Saragih, and S Lucey. Csiro face analysis sdk. Brisbane, Australia, 2013.Google Scholar
F. Eyben, M. Wöllmer, and B. Schuller. OpenSMILE - The Munich versatile and fast open-source audio feature extractor. In ACM Multime Google ScholarDigital Library
A. J. Ferrari, F. J. Charlson, R. E. Norman, S. B. Patten, G. Freedman, C. J. L. Murray, and H. A Whiteford. Burden of depressive disorders by country, sex, age, and year: Findings from the global burden of disease study 2010. Public Library of Science Medicine, 10(11), 2013.Google Scholar
D. J. France, R. G. Shiavi, S. Silverman, M. Silverman, and D. M. Wilkes. Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Trans. Biomedical Engineering, 47 (7):829--837, 2000.Google ScholarCross Ref
Tülin Gençöz. Discriminant validity of low positive affect: is it specific to depression? Personality and Individual Differences, 32(6):991--999, 2002.Google ScholarCross Ref
J. Girard, J. Cohn, M. H. Mahoor, S. M. Mavadati., Z. Hammal, and D. P. Rosenwald. Nonverbal social withdrawal in depression: Evidence from manual and automatic analysis. In Image and Vision Computing, 2013.Google Scholar
R. Gupta, K. Audhkhasi, S. Lee, and S. S. Narayanan. Speech paralinguistic event detection using probabilistic time-series smoothing and masking. In Proc. Interspeech, 2013.Google Scholar
Rahul Gupta, Panayiotis G. Georgiou, David Atkins, and Shrikanth S. Narayanan. Predicting client's inclination towards target behavior change in motivational interviewing and investigating the role of laughter. In Proc. InterSpeech, September 2014.Google Scholar
A. Halfin. Depression: The benefits of early and appropriate treatment. American J. of Managed Care, 13(4):S92--S97, 2007.Google Scholar
J. Hamm, C. G. Kohler, R. C. Gur, and R. Verma. Automated facial action coding system for dynamic analysis of facial expressions in neuropsychiatric disorders. J. of Neuroscience Methods, 200(2): 237--256, 2011.Google ScholarCross Ref
M. Káchele, M. Glodek, D. Zharkov, S. Meudt, and F. Schwenker. Fusion of audio-visual features using hierarchical classifier systems for the recognition of affective states and the state of depression. In Proc. Int. Conf. on Pattern Recognition Applications and Methods, 2014.Google Scholar
M. Lech, L.-S. Low, and K. E. Ooi. Detection and prediction of clinical depression. Mental Health Informatics, Studies in Computational Intelligence, 491:185--199, 2014.Google Scholar
Chi-Chun Lee, Athanasios Katsamanis, Matthew P. Black, Brian Baucom, Andrew Christensen, Panayiotis G. Georgiou, and Shrikanth S. Narayanan. Computing vocal entrainment: A signal-derived pca-based quantification scheme with application to affect analysis in married couple interactions. Computer, Speech, and Language, 28(2):518--539, March 2014. doi: 10.1016/j.csl.2012.06.006. URL www.sciencedirect.com/science/article/pii/S0885230812000472?v=s5. Google ScholarDigital Library
L.-S. A. Low, N. C. Maddage, M. Lech, L. Sheeber, and N. Allen. Inuence of acoustic low-level descriptors in the detection of clinical depression in adolescents. In Proc. ICASSP, 2010.Google Scholar
N. Malandrakis, A. Potamianos, E. Iosif, and S. Narayanan. Distributional semantic models for affective text analysis. IEEE Trans. Audio, Speech, and Language Processing, 21(11):2379--2392, 2013. Google ScholarDigital Library
M. Mandal and B. Bhattacharya. Recognition of facial affect in depression. Perceptual and motor skills, 61 (1):13--14, 1985.Google Scholar
Angeliki Metallinou, Martin Wollmer, Athanasios Katsamanis, Florian Eyben, Bjorn Schuller, and Shrikanth S. Narayanan. Context-sensitive learning for enhanced audiovisual emotion classi_cation. IEEE Trans. A_ective Computing, 3(2):184--198, April 2012. Google ScholarDigital Library
Angeliki Metallinou, Athanasios Katsamanis, and Shrikanth S. Narayanan. Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information. Image and Vision Computing, 31(2): 137--152, February 2013. doi: dx.doi.org/10.1016/j.imavis.2012.08.018. URL www.sciencedirect.com/science/article/pii/S0262885612001710. Google ScholarDigital Library
E. Moore II, M. A. Clements, J. W. Peifer, and L. Weisser. Critical analysis of the impact of glottal features in the classification of clinical depression in speech. IEEE Trans. Biomedical Engineering, 55(1): 96--107, 2008.Google ScholarCross Ref
M. C. Mundt, A. P. Vogel, D.E. Feltner, and W. R. Lenderking. Vocal acoustic biomarkers of depression severity and treatment response. J. of Biological Psychiatry, 72(7):580--587, 2012.Google ScholarCross Ref
Shrikanth S. Narayanan and Panayiotis G. Georgiou. Behavioral Signal Processing: Deriving human behavioral informatics from speech and language. Proc. of IEEE, 101(5):1203--1233, 2013.Google ScholarCross Ref
National Institute of Mental Health. Depression, January 2014. http://www.nimh.nih.gov/health/topics/depression/index.shtml.Google Scholar
Timo Ojala, Matti Pietikainen, and Topi Maenpaa. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. PAMI, 24(7):971--987, 2002. Google ScholarDigital Library
K. E. B. Ooi, L. S. A. Low, M. Lech, and N. B. Allen. Prediction of clinical depression in adolescents using facial image analysis. In Image Analysis for Multimedia Interactive Services, 2011.Google Scholar
Jason M Saragih, Simon Lucey, and Je_rey F Cohn. Deformable model _tting by regularized landmark mean-shift. IJCV, 91(2):200--215, 2011. Google ScholarDigital Library
H. Schmid. Probabilistic part-of-speech tagging using decision trees. In Proc. Int. Conf. on New Methods in Language Processing, volume 12, pages 44--49, 1994.Google Scholar
B. Schuller, S. Steidl, A. Batliner, J. Epps, F. Eyben, F. Ringeval, E. Marchi, and Y. Zhang. The INTERSPECH 2014 computational paralinguistics challenge: Cognitive & physical load. In Proc. Interspeech, Singapore, Singapore, 2014.Google Scholar
Jianbo Shi and Carlo Tomasi. Good features to track. In Proc. CVPR 1994, pages 593--600. IEEE, 1994.Google Scholar
Michel Valstar, Bjoern Schuller, Kirsty Smith, Timur Almaev, Florian Eyben, Jarek Krajewski, Roddy Cowie, and Maja Pantic. AVEC 2014 -- 3D Dimensional A_ect and Depression Recognition Challenge. In Proc. ACM AVEC, 2014. Google ScholarDigital Library
M. Van Segbroeck, R. Travadi, Colin Vaz, Jangwon Kim, Matthew P. Black, Alexandros Potamianos, and S. S. Narayanan. Classification of cognitive load from speech using an i-vector framework. In Proc. Interspeech, 2014.Google Scholar
Maarten Van Segbroeck, Andreas Tsiartas, and Shrikanth S. Narayanan. A robust frontend for VAD: Exploiting contextual, discriminative and spectral cues of human voice. In Proc. Interspeech, 2013.Google Scholar
A. Vinciarelli, M. Pantic, D. Heylen, C. Pelachaud, I. Poggi, F. D'ericco, and M. Schroeder. Bridging the gap between social animal and unsocial machine: A survey of social signal processing. IEEE Trans. A_ective Computing, 3(1):69--87, 2012. Google ScholarDigital Library
M. V~ A_t, M. Conrad, L. Kuchinke, K. Urton, M. Hofmann, and A. Jacobs. The berlin affective word list reloaded (bawl-r). Behavior Research Methods, 41: 534--538, 2009.Google ScholarCross Ref
P. Wang, F. Barrett, Martin E., M. Milonova, R. E. Gur, C. Gur, and C. Kohler. Automated video-based facial expression analysis of neuropsychiatric disorders. J. of Neuroscience Methods, 168(1):224--238, 2008.Google ScholarCross Ref
P. Waxer. Nonverbal cues for depression. J. of Abnormal Psychology, 83(3):319, 1974.Google ScholarCross Ref
W. Weintraub. Verbal Behavior: Adaptation and Psychopathology. New York: Springer, 1981.Google Scholar
J. R. Williamson, T. F. Quatieri, R. S. Helfer, R. Horwitz, B. Yu, and D. D. Mehta. Vocal biomarkers of depression based on motor incoordination. In Proc. ACM AVEC, 2013. Google ScholarDigital Library
B. Xiao, P. G. Georgiou, Z. E. Imel, D. Atkins, and S. S. Narayanan. Modeling therapist empathy and vocal entrainment in drug addition counseling. In Proc. Interspeech, 2013.Google Scholar
Guoying Zhao and Matti Pietikainen. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. PAMI, 29(6):915--928, 2007. Google ScholarDigital Library
Y. Zhou, S. Scherer, D. Devault, J. Gratch, G. Stratou, L.-P. Morency, and J. Cassell. Multimodal prediction of psychological disorder: Learning nonverbal commonality in adjacency pairs. In Proc. Workshop on Semantics and Pragmatics of Dialogue, 2013.Google Scholar

Index Terms

Multimodal Prediction of Affective Dimensions and Depression in Human-Computer Interactions
1. Hardware
  1. Communication hardware, interfaces and storage
    1. Signal processing systems

Recommendations

Multimodal and Multiresolution Depression Detection from Speech and Facial Landmark Features
AVEC '16: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge

Automatic classification of depression using audiovisual cues can help towards its objective diagnosis. In this paper, we present a multimodal depression classification system as a part of the 2016 Audio/Visual Emotion Challenge and Workshop (AVEC2016). ...
Read More
PanoEmo, a set of affective 360-degree panoramas: a psychophysiological study
Abstract
There is a significant increase in the use of virtual reality in scientific experiments in the fields of ergonomics, education, and psychology among others. Many researchers successfully provoked different affective states in participants in order ...
Read More
The Effect of Thermal Stimuli on the Emotional Perception of Images
CHI '16: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems

Thermal stimulation is a feedback channel that has the potential to influence the emotional response of people to media such as images. While previous work has demonstrated that thermal stimuli might have an effect on the emotional perception of images, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
AVEC '14: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge
November 2014
110 pages
ISBN:9781450331197
DOI:10.1145/2661806
General Chairs:
Michel Valstar
University of Nottingham, UK
,
Björn Schuller
Technische Universität Münich/Imperial College London, DE/UK
,
Jarek Krajewski
University of Wuppertal, Germany
,
Roddy Cowie
Queen's University Belfast, UK
,
Maja Pantic
Imperial College London/Twente University, UK/The Netherlands
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 November 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
arousal
behavioral signal processing (bsp)
depression
dominance
fusion
multimodal signal processing
valence
Qualifiers
- research-article
Conference

Acceptance Rates
AVEC '14 Paper Acceptance Rate8of22submissions,36%Overall Acceptance Rate52of98submissions,53%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 63
  Total Citations
  View Citations
- 736
  Total Downloads
- Downloads (Last 12 months)82
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multimodal Prediction of Affective Dimensions and Depression in Human-Computer Interactions

AVEC '14: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multimodal and Multiresolution Depression Detection from Speech and Facial Landmark Features

PanoEmo, a set of affective 360-degree panoramas: a psychophysiological study

The Effect of Thermal Stimuli on the Emotional Perception of Images

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Multimodal Prediction of Affective Dimensions and Depression in Human-Computer Interactions

AVEC '14: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multimodal and Multiresolution Depression Detection from Speech and Facial Landmark Features

PanoEmo, a set of affective 360-degree panoramas: a psychophysiological study

The Effect of Thermal Stimuli on the Emotional Perception of Images

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media