research-article

A Machine Learning Approach for Semi-Automated Search and Selection in Literature Studies

Authors:
Rasmus Ros

Lund University, Department of Computer Science, Sweden

Lund University, Department of Computer Science, Sweden
View Profile

,
Elizabeth Bjarnason

Lund University, Department of Computer Science, Sweden

Lund University, Department of Computer Science, Sweden
View Profile

,
Per Runeson

Lund University, Department of Computer Science, Sweden

Lund University, Department of Computer Science, Sweden
View Profile

EASE '17: Proceedings of the 21st International Conference on Evaluation and Assessment in Software EngineeringJune 2017Pages 118–127https://doi.org/10.1145/3084226.3084243

Published:15 June 2017Publication History

EASE '17: Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering

Pages 118–127

ABSTRACT

Background. Search and selection of primary studies in Systematic Literature Reviews (SLR) is labour intensive, and hard to replicate and update. Aims. We explore a machine learning approach to support semi-automated search and selection in SLRs to address these weaknesses. Method. We 1) train a classifier on an initial set of papers, 2) extend this set of papers by automated search and snowballing, 3) have the researcher validate the top paper, selected by the classifier, and 4) update the set of papers and iterate the process until a stopping criterion is met. Results. We demonstrate with a proof-of-concept tool that the proposed automated search and selection approach generates valid search strings and that the performance for subsets of primary studies can reduce the manual work by half. Conclusions. The approach is promising and the demonstrated advantages include cost savings and replicability. The next steps include further tool development and evaluate the approach on a complete SLR.

References

S. Augier, G. Venturini, and Y. Kodratoff. 1995. Learning first order logic rules with a genetic algorithm. In Proc. of The 1st International Conference on Knowledge Discovery and Data Mining (KDD-95). Google ScholarDigital Library
D. Badampudi, C. Wohlin, and K. Petersen. 2015. Experiences from Using Snowballing and Database Searches in Systematic Literature Studies. In Proc. of the 19th International Conference on Evaluation and Assessment in Software Engineering (EASE '15). ACM, New York, NY, USA, Article 17, 10 pages. Google ScholarDigital Library
S. Bird, E. Klein, and E. Loper. 2009. Natural Language Processing with Python -- Analyzing Text with the Natural Language Toolkit. O'Reilly Media, Inc. Google ScholarDigital Library
D. M. Blei, A. Y. Ng, and J. I. Michael. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research 3, Jan (2003), 993--1022. Google ScholarDigital Library
C.-C. Chang and C.-J. Lin. 2011. LIBSVM: A Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology 2 (2011), 27:1--27:27. Issue 3. Google ScholarDigital Library
O. Chapelle and L. Li. 2011. An Empirical Evaluation of Thompson Sampling. In Proc. of the 24th International Conference on Neural Information Processing Systems (NIPS'11). 2249--2257. Google ScholarDigital Library
M. K. Choong, F. Galgani, A. G. Dunn, and G. Tsafnat. 2014. Automatic Evidence Retrieval for Systematic Reviews. Journal of Medical Internet Research 10, e223 (Oct 2014).Google ScholarCross Ref
C. Cortes and V. Vapnik. 1995. Support-Vector Networks. Machine Learning 20, 3 (1995), 273--297. Google ScholarDigital Library
D. S. Cruzes and T. Dybå. 2011. Research Synthesis in Software Engineering: A Tertiary Study. Information and Software Technology 53, 5 (2011), 440--455. Google ScholarDigital Library
F. Q. B. daSilva, A.L. M.Santos, S. Soares, A. C França, C. V F. Monteiro, and F.F. Maciel. 2011. Six Years of Systematic Literature Reviews in Software Engineering: An Updated Tertiary Study. Information and Software Technology 53, 9 (2011), 899--913. Google ScholarDigital Library
O. Dieste, A. Grimán, and N. Juristo. 2009. Developing Search Strategies for Detecting Relevant Experiments. Empirical Software Engineering 14, 5 (2009), 513--539. Google ScholarDigital Library
R. E. Fan, K. W. Chang, C.J. Hsieh, X. R. Wang, and C.J. Lin. 2008. LIBLINEAR: A Library for Large Linear Classification. Journal of Machine Learning Research 9 (2008), 1871--1874. Google ScholarDigital Library
M. Ghafari, M. Saleh, and T. Ebrahimi. 2012. A Federated Search Approach to Facilitate Systematic Literature Review in Software Engineering. International Journal of Software Engineering & Applications 3, 2 (2012), 13--24.Google ScholarCross Ref
S. Jalali and C. Wohlin. 2012. Systematic Literature Studies: Database Searches vs. Backward Snowballing. In Proc. of the ACM-IEEE International Symposium on Empirical Software Engineering and Measurement. 29--38. Google ScholarDigital Library
B. A. Kitchenham and P. Brereton. 2013. A Systematic Review of Systematic Review Process Research in Software Engineering. Information and Software Technology 55, 12 (2013), 2049--2075. Google ScholarDigital Library
B. A. Kitchenham, D. Budgen, and P. Brereton. 2015. Evidence-Based Software Engineering and Systematic Reviews. CRC Press. Google ScholarDigital Library
B. A. Kitchenham, Z. Li, and A. Burn. 2011. Validating Search Processes in Systematic Literature Reviews. In Proc. of the 1st International Workshop on Evidential Assessment of Software Technologies.Google Scholar
B. A. Kitchenham, R. Pretorius, D. Budgen, P. Brereton, M. Turner, M. Niazi, and S. Linkman. 2010. Systematic Literature Reviews in Software Engineering - A Tertiary Study. Information and Software Technology 52, 8 (2010), 792--805. Google ScholarDigital Library
C. Marshall and P. Brereton. 2013. Tools to Support Systematic Literature Reviews in Software Engineering: A Mapping Study. In ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 296--299.Google Scholar
C. Marshall, P. Brereton, and B. A. Kitchenham. 2014. Tools to Support Systematic Reviews in Software Engineering: A Feature Analysis. In Proc. of the 18th International Conference on Evaluation and Assessment in Software Engineering. ACM, 13. Google ScholarDigital Library
M. Miwa, J. Thomas, A. OfiMara-Eves, and S. Ananiadou. 2014. Reducing Systematic Review Workload Through Certainty-based Screening. Journal of Biomedical Informatics 51 (2014), 242--253. Google ScholarDigital Library
K. P. Murphy. 2012. Machine Learning: A Probabilistic Perspective. The MIT Press. Google ScholarDigital Library
D. Q. Nguyen. 2015. jLDADMM: A Java Package for the LDA and DMM Topic Models. http://jldadmm.sourceforge.net/. (2015).Google Scholar
B. K. Olorisade, E. de Quincey, P. Brereton, and P. Andras. 2016. A Critical Analysis of Studies That Address the Use of Text Mining for Citation Screening in Systematic Reviews. In Proc. of the 20th International Conference on Evaluation and Assessment in Software Engineering (EASE '16). ACM, 14:1--14:11. Google ScholarDigital Library
A. O'Mara-Eves" J. Thomas, J. McNaught, M. Miwa, and S. Ananiadou. 2015. Using Text Mining for Study Identification in Systematic Reviews: A Systematic Review of Current Approaches. Systematic Reviews 4, 1 (2015), 5.Google ScholarCross Ref
J. R. Quinlan. 1986. Induction of Decision Trees. Machine Learning 1, 1 (1986), 81--106. Google ScholarCross Ref
K. A. Robinson, A. G. Dunn, G. Tsafnat, and P. Glasziou. 2014. Citation Networks of Related Trials are Often Disconnected: Implications for Bidirectional Citation Searches. Journal of Clinical Epidemiology 67, 7 (2014), 793 - 799.Google ScholarCross Ref
G. Salton, E. A. Fox, and H. Wu. 1983. Extended Boolean Information Retrieval. Communication of the ACM 26, 11 (1983), 1022--1036. Google ScholarDigital Library
B. Settles. 2009. Active Learning Literature Survey. Computer Sciences Technical Report 1648. University of Wisconsin-Madison. 11 pages. https://minds.wisconsin.edu/handle/1793/60660.Google Scholar
B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. de Freitas. 2016. Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. of the IEEE 104, 1 (2016), 148--175.Google ScholarCross Ref
M. Skoglund and P. Runeson. 2009. Reference-based Search Strategies in Systematic Reviews. In Proc. of the 13th international conference on Evaluation and Assessment in Software Engineering (EASE'09). 31--40. Google ScholarDigital Library
Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. 2006. Hierarchical Dirichlet Processes. J. Amer. Statist. Assoc. 101, 476 (2006), 1566--1581.Google ScholarCross Ref
G. Tsafnat, P. Glasziou, M. K. Choong, A. Dunn, F. Galgani, and E. Coiera. 2014. Systematic Review Automation Technologies. Systematic Reviews 3, 1 (2014), 74.Google ScholarCross Ref
B. C. Wallace, K. Small, C. E. Brodley, J. Lau, C. H. Schmid, L. Bertram, C. M. Lill, J. T. Cohen, and T. A. Trikalinos. 2012. Toward Modernizing the Systematic Review Pipeline in Genetics: Efficient Updating via Data Mining. Genetics in Medicine 14, 7 (2012), 663--669.Google ScholarCross Ref
C. Wohlin, P. Runeson, P. A. da Mota Silveira, E. Engstrom, I. do Carmo Machado, and E. S. de Almeida. 2013. On the Reliability of Mapping Studies in Software Engineering. Journal of Systems and Software 86, 10 (2013), 2594--2610.Google ScholarCross Ref
H. Zhang, M. A. Babar, and P. Tell. 2011. Identifying Relevant Studies in Software Engineering. Information and Software Technology 53, 6 (2011), 625--637. Google ScholarDigital Library

Index Terms

A Machine Learning Approach for Semi-Automated Search and Selection in Literature Studies
1. Computing methodologies
  1. Machine learning
    1. Learning settings
      1. Semi-supervised learning settings
2. General and reference
  1. Cross-computing tools and techniques
    1. Empirical studies
  2. Document types
    1. Surveys and overviews

Recommendations

Guidelines for snowballing in systematic literature studies and a replication in software engineering
EASE '14: Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering

Background: Systematic literature studies have become common in software engineering, and hence it is important to understand how to conduct them efficiently and reliably.

Objective: This paper presents guidelines for conducting literature reviews using ...
Read More
Systematic literature studies: database searches vs. backward snowballing
ESEM '12: Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement

Systematic studies of the literature can be done in different ways. In particular, different guidelines propose different first steps in their recommendations, e.g. start with search strings in different databases or start with the reference lists of a ...
Read More
Automation of systematic literature reviews: A systematic literature review
Abstract Context
Systematic Literature Review (SLR) studies aim to identify relevant primary papers, extract the required data, analyze, and synthesize results to gain further and broader insight into the investigated domain. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EASE '17: Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering
June 2017
405 pages
ISBN:9781450348041
DOI:10.1145/3084226
Conference Chair:
Emilia Mendes,
Program Chairs:
Steve Counsell,
Kai Petersen
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 June 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Automation
Machine learning
Reinforcement learning
Research identification
Study selection
Systematic literature review
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate71of232submissions,31%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 28
  Total Citations
  View Citations
- 644
  Total Downloads
- Downloads (Last 12 months)131
- Downloads (Last 6 weeks)14
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Machine Learning Approach for Semi-Automated Search and Selection in Literature Studies

EASE '17: Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Guidelines for snowballing in systematic literature studies and a replication in software engineering

Systematic literature studies: database searches vs. backward snowballing

Automation of systematic literature reviews: A systematic literature review

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A Machine Learning Approach for Semi-Automated Search and Selection in Literature Studies

EASE '17: Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Guidelines for snowballing in systematic literature studies and a replication in software engineering

Systematic literature studies: database searches vs. backward snowballing

Automation of systematic literature reviews: A systematic literature review

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media