Skip to main content
Top
Published in: BMC Medical Informatics and Decision Making 1/2020

01-12-2020 | Vaccination | Software

Monitoring stance towards vaccination in twitter messages

Authors: Florian Kunneman, Mattijs Lambooij, Albert Wong, Antal van den Bosch, Liesbeth Mollema

Published in: BMC Medical Informatics and Decision Making | Issue 1/2020

Login to get access

Abstract

Background

We developed a system to automatically classify stance towards vaccination in Twitter messages, with a focus on messages with a negative stance. Such a system makes it possible to monitor the ongoing stream of messages on social media, offering actionable insights into public hesitance with respect to vaccination. At the moment, such monitoring is done by means of regular sentiment analysis with a poor performance on detecting negative stance towards vaccination. For Dutch Twitter messages that mention vaccination-related key terms, we annotated their stance and feeling in relation to vaccination (provided that they referred to this topic). Subsequently, we used these coded data to train and test different machine learning set-ups. With the aim to best identify messages with a negative stance towards vaccination, we compared set-ups at an increasing dataset size and decreasing reliability, at an increasing number of categories to distinguish, and with different classification algorithms.

Results

We found that Support Vector Machines trained on a combination of strictly and laxly labeled data with a more fine-grained labeling yielded the best result, at an F1-score of 0.36 and an Area under the ROC curve of 0.66, considerably outperforming the currently used sentiment analysis that yielded an F1-score of 0.25 and an Area under the ROC curve of 0.57. We also show that the recall of our system could be optimized to 0.60 at little loss of precision.

Conclusion

The outcomes of our study indicate that stance prediction by a computerized system only is a challenging task. Nonetheless, the model showed sufficient recall on identifying negative tweets so as to reduce the manual effort of reviewing messages. Our analysis of the data and behavior of our system suggests that an approach is needed in which the use of a larger training dataset is combined with a setting in which a human-in-the-loop provides the system with feedback on its predictions.
Footnotes
4
Although original content of the sender could be added to retweets, this was only manifested in a small part of the retweets in our dataset. It was therefore most effective to remove them.
 
5
We give a full overview of the annotated categories, to be exact about the decisions made by the annotators. However, we did not include all annotation categories in our classification experiment. A motivation will be given in the “Data categorization” section.
 
7
The raw annotations by tweet identifier can be downloaded from http://​cls.​ru.​nl/​~fkunneman/​data_​stance_​vaccination.​zip
 
8
The tweet IDs and their labels can be downloaded from http://​cls.​ru.​nl/​~fkunneman/​data_​stance_​vaccination.​zip
 
10
We choose to value the AUC over the F1-score, as the former is more robust in case of imbalanced test sets
 
Literature
1.
go back to reference Chew C, Eysenbach G. Pandemics in the age of twitter: content analysis of tweets during the 2009 h1n1 outbreak. PLoS ONE. 2010; 5(11):14118.CrossRef Chew C, Eysenbach G. Pandemics in the age of twitter: content analysis of tweets during the 2009 h1n1 outbreak. PLoS ONE. 2010; 5(11):14118.CrossRef
2.
go back to reference Salathé M, Khandelwal S. Assessing vaccination sentiments with online social media: implications for infectious disease dynamics and control. PLoS Comput Biol. 2011; 7(10):1002199.CrossRef Salathé M, Khandelwal S. Assessing vaccination sentiments with online social media: implications for infectious disease dynamics and control. PLoS Comput Biol. 2011; 7(10):1002199.CrossRef
4.
go back to reference Massey PM, Leader A, Yom-Tov E, Budenz A, Fisher K, Klassen AC. Applying multiple data collection tools to quantify human papillomavirus vaccine communication on twitter. J Med Internet Res. 2016; 18(12):318.CrossRef Massey PM, Leader A, Yom-Tov E, Budenz A, Fisher K, Klassen AC. Applying multiple data collection tools to quantify human papillomavirus vaccine communication on twitter. J Med Internet Res. 2016; 18(12):318.CrossRef
5.
go back to reference Larson HJ, Smith DM, Paterson P, Cumming M, Eckersberger E, Freifeld CC, Ghinai I, Jarrett C, Paushter L, Brownstein JS, et al. Measuring vaccine confidence: analysis of data obtained by a media surveillance system used to analyse public concerns about vaccines. The Lancet Infect Dis. 2013; 13(7):606–13.CrossRef Larson HJ, Smith DM, Paterson P, Cumming M, Eckersberger E, Freifeld CC, Ghinai I, Jarrett C, Paushter L, Brownstein JS, et al. Measuring vaccine confidence: analysis of data obtained by a media surveillance system used to analyse public concerns about vaccines. The Lancet Infect Dis. 2013; 13(7):606–13.CrossRef
6.
go back to reference Linge JP, Steinberger R, Weber TP, Yangarber R, van der Goot E, Al Khudhairy DH, Stilianakis NI. Internet surveillance systems for early alerting of health threats. Eurosurveillance. 2009; 14(13). Linge JP, Steinberger R, Weber TP, Yangarber R, van der Goot E, Al Khudhairy DH, Stilianakis NI. Internet surveillance systems for early alerting of health threats. Eurosurveillance. 2009; 14(13).
7.
go back to reference Rortais A, Belyaeva J, Gemo M, Van der Goot E, Linge JP. Medisys: An early-warning system for the detection of (re-) emerging food-and feed-borne hazards. Food Res Int. 2010; 43(5):1553–6.CrossRef Rortais A, Belyaeva J, Gemo M, Van der Goot E, Linge JP. Medisys: An early-warning system for the detection of (re-) emerging food-and feed-borne hazards. Food Res Int. 2010; 43(5):1553–6.CrossRef
8.
go back to reference Becker BFH, Larson HJ, Bonhoeffer J, van Mulligen EM, Kors JA, Sturkenboom MCJM. Evaluation of a multinational, multilingual vaccine debate on twitter. Vaccine. 2016; 34(50):6166–71.CrossRef Becker BFH, Larson HJ, Bonhoeffer J, van Mulligen EM, Kors JA, Sturkenboom MCJM. Evaluation of a multinational, multilingual vaccine debate on twitter. Vaccine. 2016; 34(50):6166–71.CrossRef
9.
go back to reference Huang X, Smith MC, Paul MJ, Ryzhkov D, Quinn SC, Broniatowski DA, Dredze M. Examining patterns of influenza vaccination in social media. In: Proceedings of the AAAI Joint Workshop on Health Intelligence (W3PHIAI). San Francisco: AAAI: 2017. Huang X, Smith MC, Paul MJ, Ryzhkov D, Quinn SC, Broniatowski DA, Dredze M. Examining patterns of influenza vaccination in social media. In: Proceedings of the AAAI Joint Workshop on Health Intelligence (W3PHIAI). San Francisco: AAAI: 2017.
10.
go back to reference Aquino F, Donzelli G, De Franco E, Privitera G, Lopalco PL, Carducci A. The web and public confidence in mmr vaccination in Italy. Vaccine. 2017; 35:4494–8.CrossRef Aquino F, Donzelli G, De Franco E, Privitera G, Lopalco PL, Carducci A. The web and public confidence in mmr vaccination in Italy. Vaccine. 2017; 35:4494–8.CrossRef
11.
go back to reference Wagner M, Lampos V, Cox IJ, Pebody R. The added value of online user-generated content in traditional methods for influenza surveillance. Sci Rep. 2018; 8(1):13963.CrossRef Wagner M, Lampos V, Cox IJ, Pebody R. The added value of online user-generated content in traditional methods for influenza surveillance. Sci Rep. 2018; 8(1):13963.CrossRef
13.
go back to reference Nagar R, Yuan Q, Freifeld CC, Santillana M, Nojima A, Chunara R, Brownstein JS. A case study of the New York City 2012-2013 influenza season with daily geocoded twitter data from temporal and spatiotemporal perspectives. J Med Internet Res. 2014; 16(10). https://doi.org/10.2196/jmir.3416.CrossRef Nagar R, Yuan Q, Freifeld CC, Santillana M, Nojima A, Chunara R, Brownstein JS. A case study of the New York City 2012-2013 influenza season with daily geocoded twitter data from temporal and spatiotemporal perspectives. J Med Internet Res. 2014; 16(10). https://​doi.​org/​10.​2196/​jmir.​3416.CrossRef
14.
go back to reference Kim E-K, Seok JH, Oh JS, Lee HW, Kim KH. Use of hangeul twitter to track and predict human influenza infection. PLoS ONE. 2013; 8(7):69305.CrossRef Kim E-K, Seok JH, Oh JS, Lee HW, Kim KH. Use of hangeul twitter to track and predict human influenza infection. PLoS ONE. 2013; 8(7):69305.CrossRef
15.
go back to reference Signorini A, Segre AM, Polgreen PM. The use of twitter to track levels of disease activity and public concern in the us during the influenza a h1n1 pandemic. PLoS ONE. 2011; 6(5):19467.CrossRef Signorini A, Segre AM, Polgreen PM. The use of twitter to track levels of disease activity and public concern in the us during the influenza a h1n1 pandemic. PLoS ONE. 2011; 6(5):19467.CrossRef
16.
go back to reference Vasterman PLM, Ruigrok N. Pandemic alarm in the dutch media: Media coverage of the 2009 influenza a (h1n1) pandemic and the role of the expert sources. Eur J Commun. 2013; 28(4):436–53.CrossRef Vasterman PLM, Ruigrok N. Pandemic alarm in the dutch media: Media coverage of the 2009 influenza a (h1n1) pandemic and the role of the expert sources. Eur J Commun. 2013; 28(4):436–53.CrossRef
17.
go back to reference Mollema L, Harmsen IA, Broekhuizen E, Clijnk R, De Melker H, Paulussen T, Kok G, Ruiter R, Das E. Disease detection or public opinion reflection? content analysis of tweets, other social media, and online newspapers during the measles outbreak in the netherlands in 2013. J Med Internet Res. 2015; 17(5). https://doi.org/10.2196/jmir.3863.CrossRef Mollema L, Harmsen IA, Broekhuizen E, Clijnk R, De Melker H, Paulussen T, Kok G, Ruiter R, Das E. Disease detection or public opinion reflection? content analysis of tweets, other social media, and online newspapers during the measles outbreak in the netherlands in 2013. J Med Internet Res. 2015; 17(5). https://​doi.​org/​10.​2196/​jmir.​3863.CrossRef
18.
go back to reference Bello-Orgaz G, Hernandez-Castro J, Camacho D. Detecting discussion communities on vaccination in twitter. Future Gener Comput Syst. 2017; 66:125–36.CrossRef Bello-Orgaz G, Hernandez-Castro J, Camacho D. Detecting discussion communities on vaccination in twitter. Future Gener Comput Syst. 2017; 66:125–36.CrossRef
19.
go back to reference Kang GJ, Ewing-Nelson SR, Mackey L, Schlitt JT, Marathe A, Abbas KM, Swarup S. Semantic network analysis of vaccine sentiment in online social media. Vaccine. 2017; 35(29):3621–38.CrossRef Kang GJ, Ewing-Nelson SR, Mackey L, Schlitt JT, Marathe A, Abbas KM, Swarup S. Semantic network analysis of vaccine sentiment in online social media. Vaccine. 2017; 35(29):3621–38.CrossRef
20.
go back to reference Tangherlini TR, Roychowdhury V, Glenn B, Crespi CM, Bandari R, Wadia A, Falahi M, Ebrahimzadeh E, Bastani R. “mommy blogs” and the vaccination exemption narrative: results from a machine-learning approach for story aggregation on parenting social media sites. JMIR Publ Health Surveill. 2016; 2(2). https://doi.org/10.2196/publichealth.6586.CrossRef Tangherlini TR, Roychowdhury V, Glenn B, Crespi CM, Bandari R, Wadia A, Falahi M, Ebrahimzadeh E, Bastani R. “mommy blogs” and the vaccination exemption narrative: results from a machine-learning approach for story aggregation on parenting social media sites. JMIR Publ Health Surveill. 2016; 2(2). https://​doi.​org/​10.​2196/​publichealth.​6586.CrossRef
21.
go back to reference Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. J Mach Learn Res. 2003; 3:993–1022. Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. J Mach Learn Res. 2003; 3:993–1022.
23.
go back to reference Tjong K, Sang E, van den Bosch A. Dealing with big data: The case of twitter. Comput Linguist Neth J. 2013; 3:121–34. Tjong K, Sang E, van den Bosch A. Dealing with big data: The case of twitter. Comput Linguist Neth J. 2013; 3:121–34.
24.
go back to reference Hayes AF, Krippendorff K. Answering the call for a standard reliability measure for coding data. Commun Methods Measures. 2007; 1(1):77–89.CrossRef Hayes AF, Krippendorff K. Answering the call for a standard reliability measure for coding data. Commun Methods Measures. 2007; 1(1):77–89.CrossRef
25.
go back to reference Kovár V, Rychlý P, Jakubícek M. Low inter-annotator agreement=an ill-defined problem? In: Proceedings of Recent Advances in Slavonic Natural Language Processing. Brno: NLP Consulting: 2014. p. 57–62. Kovár V, Rychlý P, Jakubícek M. Low inter-annotator agreement=an ill-defined problem? In: Proceedings of Recent Advances in Slavonic Natural Language Processing. Brno: NLP Consulting: 2014. p. 57–62.
26.
go back to reference Krippendorff K. Content Analysis: An Introduction to Its Methodology. Thousand Oaks: SAGE Publications; 2004. Krippendorff K. Content Analysis: An Introduction to Its Methodology. Thousand Oaks: SAGE Publications; 2004.
27.
go back to reference Hand DJ, Yu K. Idiot’s bayes—not so stupid after all?Int Stat Rev. 2001; 69(3):385–98. Hand DJ, Yu K. Idiot’s bayes—not so stupid after all?Int Stat Rev. 2001; 69(3):385–98.
28.
go back to reference Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B. Support vector machines. IEEE Intell Syst Appl. 1998; 13(4):18–28.CrossRef Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B. Support vector machines. IEEE Intell Syst Appl. 1998; 13(4):18–28.CrossRef
29.
go back to reference Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: Machine learning in python. J Mach Learn Res. 2011; 12:2825–30. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: Machine learning in python. J Mach Learn Res. 2011; 12:2825–30.
30.
go back to reference Smedt TD, Daelemans W. Pattern for python. J Mach Learn Res. 2012; 13:2063–7. Smedt TD, Daelemans W. Pattern for python. J Mach Learn Res. 2012; 13:2063–7.
31.
go back to reference Tong S, Koller D. Support vector machine active learning with applications to text classification. J Mach Learn Res. 2001; 2:45–66. Tong S, Koller D. Support vector machine active learning with applications to text classification. J Mach Learn Res. 2001; 2:45–66.
Metadata
Title
Monitoring stance towards vaccination in twitter messages
Authors
Florian Kunneman
Mattijs Lambooij
Albert Wong
Antal van den Bosch
Liesbeth Mollema
Publication date
01-12-2020
Publisher
BioMed Central
Keyword
Vaccination
Published in
BMC Medical Informatics and Decision Making / Issue 1/2020
Electronic ISSN: 1472-6947
DOI
https://doi.org/10.1186/s12911-020-1046-y

Other articles of this Issue 1/2020

BMC Medical Informatics and Decision Making 1/2020 Go to the issue