Skip to main content

Selecting Classification Algorithms with Active Testing

  • Conference paper
Machine Learning and Data Mining in Pattern Recognition (MLDM 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7376))

Abstract

Given the large amount of data mining algorithms, their combinations (e.g. ensembles) and possible parameter settings, finding the most adequate method to analyze a new dataset becomes an ever more challenging task. This is because in many cases testing all possibly useful alternatives quickly becomes prohibitively expensive. In this paper we propose a novel technique, called active testing, that intelligently selects the most useful cross-validation tests. It proceeds in a tournament-style fashion, in each round selecting and testing the algorithm that is most likely to outperform the best algorithm of the previous round on the new dataset. This ‘most promising’ competitor is chosen based on a history of prior duels between both algorithms on similar datasets. Each new cross-validation test will contribute information to a better estimate of dataset similarity, and thus better predict which algorithms are most promising on the new dataset. We have evaluated this approach using a set of 292 algorithm-parameter combinations on 76 UCI datasets for classification. The results show that active testing will quickly yield an algorithm whose performance is very close to the optimum, after relatively few tests. It also provides a better solution than previously proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998)

    Google Scholar 

  2. Pfahringer, B., Bensussan, H., Giraud-Carrier, C.: Meta-learning by landmarking various learning algorithms. In: Proceedings of the 17th Int. Conf. on Machine Learning (ICML 2000), Stanford, CA (2000)

    Google Scholar 

  3. Brazdil, P., Soares, C., Costa, J.: Ranking learning algorithms: Using IBL and meta-learning on accuracy and time results. Machine Learning 50, 251–277 (2003)

    Article  MATH  Google Scholar 

  4. De Grave, K., Ramon, J., De Raedt, L.: Active learning for primary drug screening. In: Proceedings of Discovery Science. Springer (2008)

    Google Scholar 

  5. Demsar, J.: Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  6. Fedorov, V.: Theory of Optimal Experiments. Academic Press, New York (1972)

    Google Scholar 

  7. Freund, Y., Seung, H., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Machine Learning 28, 133–168 (1997)

    Article  MATH  Google Scholar 

  8. Fürnkranz, J., Petrak, J.: An evaluation of landmarking variants. In: Proceedings of the ECML/PKDD Workshop on Integrating Aspects of Data Mining, Decision Support and Meta-Learning (IDDM 2001), pp. 57–68. Springer (2001)

    Google Scholar 

  9. Gittins, J.: Multi-armed bandit allocation indices. Wiley Interscience Series in Systems and Optimization. John Wiley & Sons, Ltd. (1989)

    Google Scholar 

  10. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

  11. Blockeel, H.: Experiment Databases: A Novel Methodology for Experimental Research. In: Bonchi, F., Boulicaut, J.-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 72–85. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  12. Fürnkranz, J., Petrak, J.: An evaluation of landmarking variants. In: Carrier, C., Lavrac, N., Moyle, S. (eds.) Working Notes of ECML/PKDD 2000 Workshop on Integration Aspects of Data Mining, Decision Support and Meta-Learning (2001)

    Google Scholar 

  13. Leite, R., Brazdil, P.: Predicting relative performance of classifiers from samples. In: ICML 2005: Proceedings of the 22nd International Conference on Machine Learning, pp. 497–503. ACM Press, New York (2005)

    Chapter  Google Scholar 

  14. Leite, R., Brazdil, P.: Active testing strategy to predict the best classification algorithm via sampling and metalearning. In: Proceedings of the 19th European Conference on Artificial Intelligence, ECAI 2010 (2010)

    Google Scholar 

  15. Long, B., Chapelle, O., Zhang, Y., Chang, Y., Zheng, Z., Tseng, B.: Active learning for rankings through expected loss optimization. In: Proceedings of the SIGIR 2010. ACM (2010)

    Google Scholar 

  16. Mahajan, A., Teneketzis, D.: Multi-armed bandit problems. In: Castanon, D.A., Cochran, D., Kastella, K. (eds.) Foundations and Applications of Sensor Management. Springer (2007)

    Google Scholar 

  17. Michie, D., Spiegelhalter, D.J., Taylor, C.C.: Machine Learning, Neural and Statistical Classification. Ellis Horwood (1994)

    Google Scholar 

  18. Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  19. Rice, J.R.: The algorithm selection problem. Advances in Computers, vol. 15, pp. 65–118. Elsevier (1976)

    Google Scholar 

  20. Smith-Miles, K.A.: Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput. Surv. 41(1), 1–25 (2008)

    Article  Google Scholar 

  21. Soares, C., Petrak, J., Brazdil, P.: Sampling-Based Relative Landmarks: Systematically Test-Driving Algorithms before Choosing. In: Brazdil, P.B., Jorge, A.M. (eds.) EPIA 2001. LNCS (LNAI), vol. 2258, pp. 88–94. Springer, Heidelberg (2001)

    Google Scholar 

  22. Vanschoren, J., Blockeel, H.: A Community-Based Platform for Machine Learning Experimentation. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS, vol. 5782, pp. 750–754. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  23. Vilalta, R., Drissi, Y.: A perspective view and survey of meta-learning. Artif. Intell. Rev. 18(2), 77–95 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Leite, R., Brazdil, P., Vanschoren, J. (2012). Selecting Classification Algorithms with Active Testing. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2012. Lecture Notes in Computer Science(), vol 7376. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31537-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31537-4_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31536-7

  • Online ISBN: 978-3-642-31537-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics