Selecting Classification Algorithms with Active Testing

Leite, Rui; Brazdil, Pavel; Vanschoren, Joaquin

doi:10.1007/978-3-642-31537-4_10

Rui Leite²⁰,
Pavel Brazdil²⁰ &
Joaquin Vanschoren²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7376))

Included in the following conference series:

International Workshop on Machine Learning and Data Mining in Pattern Recognition

6099 Accesses
43 Citations
2 Altmetric

Abstract

Given the large amount of data mining algorithms, their combinations (e.g. ensembles) and possible parameter settings, finding the most adequate method to analyze a new dataset becomes an ever more challenging task. This is because in many cases testing all possibly useful alternatives quickly becomes prohibitively expensive. In this paper we propose a novel technique, called active testing, that intelligently selects the most useful cross-validation tests. It proceeds in a tournament-style fashion, in each round selecting and testing the algorithm that is most likely to outperform the best algorithm of the previous round on the new dataset. This ‘most promising’ competitor is chosen based on a history of prior duels between both algorithms on similar datasets. Each new cross-validation test will contribute information to a better estimate of dataset similarity, and thus better predict which algorithms are most promising on the new dataset. We have evaluated this approach using a set of 292 algorithm-parameter combinations on 76 UCI datasets for classification. The results show that active testing will quickly yield an algorithm whose performance is very close to the optimum, after relatively few tests. It also provides a better solution than previously proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Observing a Naïve Bayes Classifier’s Performance on Multiple Datasets

An extensive experimental evaluation of automated machine learning methods for recommending classification algorithms

Article 19 August 2020

Efficient Model Selection for Regularized Classification by Exploiting Unlabeled Data

References

Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998)
Google Scholar
Pfahringer, B., Bensussan, H., Giraud-Carrier, C.: Meta-learning by landmarking various learning algorithms. In: Proceedings of the 17th Int. Conf. on Machine Learning (ICML 2000), Stanford, CA (2000)
Google Scholar
Brazdil, P., Soares, C., Costa, J.: Ranking learning algorithms: Using IBL and meta-learning on accuracy and time results. Machine Learning 50, 251–277 (2003)
Article MATH Google Scholar
De Grave, K., Ramon, J., De Raedt, L.: Active learning for primary drug screening. In: Proceedings of Discovery Science. Springer (2008)
Google Scholar
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Fedorov, V.: Theory of Optimal Experiments. Academic Press, New York (1972)
Google Scholar
Freund, Y., Seung, H., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Machine Learning 28, 133–168 (1997)
Article MATH Google Scholar
Fürnkranz, J., Petrak, J.: An evaluation of landmarking variants. In: Proceedings of the ECML/PKDD Workshop on Integrating Aspects of Data Mining, Decision Support and Meta-Learning (IDDM 2001), pp. 57–68. Springer (2001)
Google Scholar
Gittins, J.: Multi-armed bandit allocation indices. Wiley Interscience Series in Systems and Optimization. John Wiley & Sons, Ltd. (1989)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Article Google Scholar
Blockeel, H.: Experiment Databases: A Novel Methodology for Experimental Research. In: Bonchi, F., Boulicaut, J.-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 72–85. Springer, Heidelberg (2006)
Chapter Google Scholar
Fürnkranz, J., Petrak, J.: An evaluation of landmarking variants. In: Carrier, C., Lavrac, N., Moyle, S. (eds.) Working Notes of ECML/PKDD 2000 Workshop on Integration Aspects of Data Mining, Decision Support and Meta-Learning (2001)
Google Scholar
Leite, R., Brazdil, P.: Predicting relative performance of classifiers from samples. In: ICML 2005: Proceedings of the 22nd International Conference on Machine Learning, pp. 497–503. ACM Press, New York (2005)
Chapter Google Scholar
Leite, R., Brazdil, P.: Active testing strategy to predict the best classification algorithm via sampling and metalearning. In: Proceedings of the 19th European Conference on Artificial Intelligence, ECAI 2010 (2010)
Google Scholar
Long, B., Chapelle, O., Zhang, Y., Chang, Y., Zheng, Z., Tseng, B.: Active learning for rankings through expected loss optimization. In: Proceedings of the SIGIR 2010. ACM (2010)
Google Scholar
Mahajan, A., Teneketzis, D.: Multi-armed bandit problems. In: Castanon, D.A., Cochran, D., Kastella, K. (eds.) Foundations and Applications of Sensor Management. Springer (2007)
Google Scholar
Michie, D., Spiegelhalter, D.J., Taylor, C.C.: Machine Learning, Neural and Statistical Classification. Ellis Horwood (1994)
Google Scholar
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
MATH Google Scholar
Rice, J.R.: The algorithm selection problem. Advances in Computers, vol. 15, pp. 65–118. Elsevier (1976)
Google Scholar
Smith-Miles, K.A.: Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput. Surv. 41(1), 1–25 (2008)
Article Google Scholar
Soares, C., Petrak, J., Brazdil, P.: Sampling-Based Relative Landmarks: Systematically Test-Driving Algorithms before Choosing. In: Brazdil, P.B., Jorge, A.M. (eds.) EPIA 2001. LNCS (LNAI), vol. 2258, pp. 88–94. Springer, Heidelberg (2001)
Google Scholar
Vanschoren, J., Blockeel, H.: A Community-Based Platform for Machine Learning Experimentation. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS, vol. 5782, pp. 750–754. Springer, Heidelberg (2009)
Chapter Google Scholar
Vilalta, R., Drissi, Y.: A perspective view and survey of meta-learning. Artif. Intell. Rev. 18(2), 77–95 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

LIAAD-INESC Porto L.A./Faculty of Economics, University of Porto, Portugal
Rui Leite & Pavel Brazdil
LIACS - Leiden Institute of Advanced Computer Science, University of Leiden, Nederlands
Joaquin Vanschoren

Authors

Rui Leite
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Brazdil
View author publications
You can also search for this author in PubMed Google Scholar
Joaquin Vanschoren
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, IBaI, Kohlenstraße 2, 04107, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Leite, R., Brazdil, P., Vanschoren, J. (2012). Selecting Classification Algorithms with Active Testing. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2012. Lecture Notes in Computer Science(), vol 7376. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31537-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-31537-4_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31536-7
Online ISBN: 978-3-642-31537-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Selecting Classification Algorithms with Active Testing

Abstract

Access this chapter

Preview

Similar content being viewed by others

Observing a Naïve Bayes Classifier’s Performance on Multiple Datasets

An extensive experimental evaluation of automated machine learning methods for recommending classification algorithms

Efficient Model Selection for Regularized Classification by Exploiting Unlabeled Data

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Selecting Classification Algorithms with Active Testing

Abstract

Access this chapter

Preview

Similar content being viewed by others

Observing a Naïve Bayes Classifier’s Performance on Multiple Datasets

An extensive experimental evaluation of automated machine learning methods for recommending classification algorithms

Efficient Model Selection for Regularized Classification by Exploiting Unlabeled Data

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation