Abstract
Given the large amount of data mining algorithms, their combinations (e.g. ensembles) and possible parameter settings, finding the most adequate method to analyze a new dataset becomes an ever more challenging task. This is because in many cases testing all possibly useful alternatives quickly becomes prohibitively expensive. In this paper we propose a novel technique, called active testing, that intelligently selects the most useful cross-validation tests. It proceeds in a tournament-style fashion, in each round selecting and testing the algorithm that is most likely to outperform the best algorithm of the previous round on the new dataset. This ‘most promising’ competitor is chosen based on a history of prior duels between both algorithms on similar datasets. Each new cross-validation test will contribute information to a better estimate of dataset similarity, and thus better predict which algorithms are most promising on the new dataset. We have evaluated this approach using a set of 292 algorithm-parameter combinations on 76 UCI datasets for classification. The results show that active testing will quickly yield an algorithm whose performance is very close to the optimum, after relatively few tests. It also provides a better solution than previously proposed methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998)
Pfahringer, B., Bensussan, H., Giraud-Carrier, C.: Meta-learning by landmarking various learning algorithms. In: Proceedings of the 17th Int. Conf. on Machine Learning (ICML 2000), Stanford, CA (2000)
Brazdil, P., Soares, C., Costa, J.: Ranking learning algorithms: Using IBL and meta-learning on accuracy and time results. Machine Learning 50, 251–277 (2003)
De Grave, K., Ramon, J., De Raedt, L.: Active learning for primary drug screening. In: Proceedings of Discovery Science. Springer (2008)
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research 7, 1–30 (2006)
Fedorov, V.: Theory of Optimal Experiments. Academic Press, New York (1972)
Freund, Y., Seung, H., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Machine Learning 28, 133–168 (1997)
Fürnkranz, J., Petrak, J.: An evaluation of landmarking variants. In: Proceedings of the ECML/PKDD Workshop on Integrating Aspects of Data Mining, Decision Support and Meta-Learning (IDDM 2001), pp. 57–68. Springer (2001)
Gittins, J.: Multi-armed bandit allocation indices. Wiley Interscience Series in Systems and Optimization. John Wiley & Sons, Ltd. (1989)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Blockeel, H.: Experiment Databases: A Novel Methodology for Experimental Research. In: Bonchi, F., Boulicaut, J.-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 72–85. Springer, Heidelberg (2006)
Fürnkranz, J., Petrak, J.: An evaluation of landmarking variants. In: Carrier, C., Lavrac, N., Moyle, S. (eds.) Working Notes of ECML/PKDD 2000 Workshop on Integration Aspects of Data Mining, Decision Support and Meta-Learning (2001)
Leite, R., Brazdil, P.: Predicting relative performance of classifiers from samples. In: ICML 2005: Proceedings of the 22nd International Conference on Machine Learning, pp. 497–503. ACM Press, New York (2005)
Leite, R., Brazdil, P.: Active testing strategy to predict the best classification algorithm via sampling and metalearning. In: Proceedings of the 19th European Conference on Artificial Intelligence, ECAI 2010 (2010)
Long, B., Chapelle, O., Zhang, Y., Chang, Y., Zheng, Z., Tseng, B.: Active learning for rankings through expected loss optimization. In: Proceedings of the SIGIR 2010. ACM (2010)
Mahajan, A., Teneketzis, D.: Multi-armed bandit problems. In: Castanon, D.A., Cochran, D., Kastella, K. (eds.) Foundations and Applications of Sensor Management. Springer (2007)
Michie, D., Spiegelhalter, D.J., Taylor, C.C.: Machine Learning, Neural and Statistical Classification. Ellis Horwood (1994)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Rice, J.R.: The algorithm selection problem. Advances in Computers, vol. 15, pp. 65–118. Elsevier (1976)
Smith-Miles, K.A.: Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput. Surv. 41(1), 1–25 (2008)
Soares, C., Petrak, J., Brazdil, P.: Sampling-Based Relative Landmarks: Systematically Test-Driving Algorithms before Choosing. In: Brazdil, P.B., Jorge, A.M. (eds.) EPIA 2001. LNCS (LNAI), vol. 2258, pp. 88–94. Springer, Heidelberg (2001)
Vanschoren, J., Blockeel, H.: A Community-Based Platform for Machine Learning Experimentation. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS, vol. 5782, pp. 750–754. Springer, Heidelberg (2009)
Vilalta, R., Drissi, Y.: A perspective view and survey of meta-learning. Artif. Intell. Rev. 18(2), 77–95 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Leite, R., Brazdil, P., Vanschoren, J. (2012). Selecting Classification Algorithms with Active Testing. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2012. Lecture Notes in Computer Science(), vol 7376. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31537-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-31537-4_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31536-7
Online ISBN: 978-3-642-31537-4
eBook Packages: Computer ScienceComputer Science (R0)