ABSTRACT
Class membership probability estimates are important for many applications of data mining in which classification outputs are combined with other sources of information for decision-making, such as example-dependent misclassification costs, the outputs of other classifiers, or domain knowledge. Previous calibration methods apply only to two-class problems. Here, we show how to obtain accurate probability estimates for multiclass problems by combining calibrated binary probability estimates. We also propose a new method for obtaining calibrated two-class probability estimates that can be applied to any classifier that produces a ranking of examples. Using naive Bayes and support vector machine classifiers, we give experimental results from a variety of two-class and multiclass domains, including direct marketing, text categorization and digit recognition.
- E.L. Allwein, R. E. Schapire, and Y. Singer. Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1:113--141, 2000. Google ScholarDigital Library
- M. Ayer, H. Brunk, G. Ewing, W. Reid, and E. Silverman. An empirical distribution function for sampling with incomplete information. Annals of Mathematical Statistics, 26(4):641--647, 1955.Google ScholarCross Ref
- S. D. Bay. UCI KDD archive. Department of Information and Computer Sciences, University of California, lrvine, 2000. http://kdd.ics.uci.edu/.Google Scholar
- P. N. Bennett. Assessing the calibration of naive Bayes' posterior estimates. Technical Report CMU-CS-00-155, Carnegie Mellon University, 2000.Google Scholar
- C.L. Blake and C. J. Merz. UCI repository of machine learning databases. Department of Information and Computer Sciences, University of California, Irvine, 1998. hetp://www.ics.uci.edu/~mlearn/MLRepository.html.Google Scholar
- G. W. Brier. Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78:1--3, 1950.Google ScholarCross Ref
- M. H. DeGroot and S. E. Fienberg. The comparison and evaluation of forecasters. Statistician, 32(1):12--22, 1982.Google Scholar
- T. G. Dietterich and G. Bakiri. Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2:263--286, 1995. Google ScholarCross Ref
- P. Domingos and M. Pazzani. Beyond independence: Conditions for the optimality of the simple Bayesian classifier. In Proceedings of the Thirteenth International Conference on Machine Learning, pages 105--112. Morgan Kaufmann Publishers, Inc., 1996.Google Scholar
- L. Dümbgen. Statistical software (MATLAB), 2000. Available at http://www.math.mu-luebeck.de/workers/duembgen/software/software.html.Google Scholar
- C. Elkan. Boosting and naive bayesian learning. Technical Report CS97-557, University of California, San Diego, 1997.Google Scholar
- C. Elkan. Magical thinking in data mining: Lessons from coil challenge 2000. In Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining, pages 426--431. ACM Press, 2001. Google ScholarDigital Library
- J. Georges and A. H. Milley. KDD'99 competition: Knowledge discovery contest report. Available at http://www-cse.ucsd.edu/users/elkan/kdresults.html, 1999.Google Scholar
- T. Hastie and R. Tibshirani. Classification by pairwise coupling. In Advances in Neural Information Processing Systems, volume 10. MIT Press, 1998. Google ScholarDigital Library
- E. G. Kong and T. G. Dietterich. Probability estimation using error-correcting output coding. In Int. Conf : Artificial Intelligence and Soft Computing, 1997.Google Scholar
- K. Lang. Newsweeder: Learning to filter netnews. In Proceedings of the Twelfth International Conference on Machine Learning, pages 331--339, 1995.Google ScholarCross Ref
- A. Murphy and R. Winkler. Reliability of subjective probability forecasts of precipitation and temperature. Applied Statistics, 26(1):41--47, 1977.Google ScholarCross Ref
- J. Platt. Probabilistic outputs for support vector rnachines and comparison to regularized likelihood methods. In Advances in Large Margin Classifiers. MIT Press, 1999.Google Scholar
- F. Provost and P. Domingos. Well-trained PETs: Improving probability estimation trees. CDER Working Paper #00-04-IS, Stern School of Business, New York University, NY, NY 10012, 2000.Google Scholar
- J. Rennie and R. Rifkin. Improving multiclass text classification with the support vector machine. Technical Report AIM-2001-026.2001, MIT, 2001.Google Scholar
- R. Rifkin. SvmFu 3, 2001. Available at http://five-percent-nation.mit.edu/SvmFu.Google Scholar
- T. Robertson, P. Wright, and R. Dykstra. Order Restricted Statistical Inference, chapter 1. John Wiley & Sons, 1988.Google Scholar
- B. Zadrozny. Reducing multiclass to binary by coupling probability estimates. In Advances in Neural Information Processing Systems (NIPS*2001), 2002. To appear.Google Scholar
- B. Zadrozny and C. Elkan. Learning and making decisions when costs and probabilities are both unknown. In Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining, pages 204--213. ACM Press, 2001. Google ScholarDigital Library
- B. Zadrozny and C. Elkan. Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 609--616. Morgan Kaufmann Publishers, Inc., 2001. Google ScholarDigital Library
Index Terms
- Transforming classifier scores into accurate multiclass probability estimates
Recommendations
Transforming examples for multiclass boosting
AdaBoost.M2 and AdaBoost.MH are boosting algorithms for learning from multiclass datasets. They have received less attention than other boosting algorithms because they require base classifiers that can handle the pseudoloss or Hamming loss, ...
A new adaptive framework for classifier ensemble in multiclass large data
ICCSA'11: Proceedings of the 2011 international conference on Computational science and its applications - Volume Part IThis paper proposes an innovative combinational algorithm to improve the performance of multiclass problems. Because the more accurate classifier the better performance of classification, so researchers have been tended to improve the accuracies of ...
Cost-sensitive multi-class classification from probability estimates
ICML '08: Proceedings of the 25th international conference on Machine learningFor two-class classification, it is common to classify by setting a threshold on class probability estimates, where the threshold is determined by ROC curve analysis. An analog for multi-class classification is learning a new class partitioning of the ...
Comments