Article

Transforming classifier scores into accurate multiclass probability estimates

Authors:
Bianca Zadrozny

University of California, San Diego, La Jolla, California

University of California, San Diego, La Jolla, California
View Profile

,
Charles Elkan

University of California, San Diego, La Jolla, California

University of California, San Diego, La Jolla, California
View Profile

KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningJuly 2002Pages 694–699https://doi.org/10.1145/775047.775151

Published:23 July 2002Publication History

KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 694–699

ABSTRACT

Class membership probability estimates are important for many applications of data mining in which classification outputs are combined with other sources of information for decision-making, such as example-dependent misclassification costs, the outputs of other classifiers, or domain knowledge. Previous calibration methods apply only to two-class problems. Here, we show how to obtain accurate probability estimates for multiclass problems by combining calibrated binary probability estimates. We also propose a new method for obtaining calibrated two-class probability estimates that can be applied to any classifier that produces a ranking of examples. Using naive Bayes and support vector machine classifiers, we give experimental results from a variety of two-class and multiclass domains, including direct marketing, text categorization and digit recognition.

References

E.L. Allwein, R. E. Schapire, and Y. Singer. Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1:113--141, 2000. Google ScholarDigital Library
M. Ayer, H. Brunk, G. Ewing, W. Reid, and E. Silverman. An empirical distribution function for sampling with incomplete information. Annals of Mathematical Statistics, 26(4):641--647, 1955.Google ScholarCross Ref
S. D. Bay. UCI KDD archive. Department of Information and Computer Sciences, University of California, lrvine, 2000. http://kdd.ics.uci.edu/.Google Scholar
P. N. Bennett. Assessing the calibration of naive Bayes' posterior estimates. Technical Report CMU-CS-00-155, Carnegie Mellon University, 2000.Google Scholar
C.L. Blake and C. J. Merz. UCI repository of machine learning databases. Department of Information and Computer Sciences, University of California, Irvine, 1998. hetp://www.ics.uci.edu/~mlearn/MLRepository.html.Google Scholar
G. W. Brier. Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78:1--3, 1950.Google ScholarCross Ref
M. H. DeGroot and S. E. Fienberg. The comparison and evaluation of forecasters. Statistician, 32(1):12--22, 1982.Google Scholar
T. G. Dietterich and G. Bakiri. Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2:263--286, 1995. Google ScholarCross Ref
P. Domingos and M. Pazzani. Beyond independence: Conditions for the optimality of the simple Bayesian classifier. In Proceedings of the Thirteenth International Conference on Machine Learning, pages 105--112. Morgan Kaufmann Publishers, Inc., 1996.Google Scholar
L. Dümbgen. Statistical software (MATLAB), 2000. Available at http://www.math.mu-luebeck.de/workers/duembgen/software/software.html.Google Scholar
C. Elkan. Boosting and naive bayesian learning. Technical Report CS97-557, University of California, San Diego, 1997.Google Scholar
C. Elkan. Magical thinking in data mining: Lessons from coil challenge 2000. In Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining, pages 426--431. ACM Press, 2001. Google ScholarDigital Library
J. Georges and A. H. Milley. KDD'99 competition: Knowledge discovery contest report. Available at http://www-cse.ucsd.edu/users/elkan/kdresults.html, 1999.Google Scholar
T. Hastie and R. Tibshirani. Classification by pairwise coupling. In Advances in Neural Information Processing Systems, volume 10. MIT Press, 1998. Google ScholarDigital Library
E. G. Kong and T. G. Dietterich. Probability estimation using error-correcting output coding. In Int. Conf : Artificial Intelligence and Soft Computing, 1997.Google Scholar
K. Lang. Newsweeder: Learning to filter netnews. In Proceedings of the Twelfth International Conference on Machine Learning, pages 331--339, 1995.Google ScholarCross Ref
A. Murphy and R. Winkler. Reliability of subjective probability forecasts of precipitation and temperature. Applied Statistics, 26(1):41--47, 1977.Google ScholarCross Ref
J. Platt. Probabilistic outputs for support vector rnachines and comparison to regularized likelihood methods. In Advances in Large Margin Classifiers. MIT Press, 1999.Google Scholar
F. Provost and P. Domingos. Well-trained PETs: Improving probability estimation trees. CDER Working Paper #00-04-IS, Stern School of Business, New York University, NY, NY 10012, 2000.Google Scholar
J. Rennie and R. Rifkin. Improving multiclass text classification with the support vector machine. Technical Report AIM-2001-026.2001, MIT, 2001.Google Scholar
R. Rifkin. SvmFu 3, 2001. Available at http://five-percent-nation.mit.edu/SvmFu.Google Scholar
T. Robertson, P. Wright, and R. Dykstra. Order Restricted Statistical Inference, chapter 1. John Wiley & Sons, 1988.Google Scholar
B. Zadrozny. Reducing multiclass to binary by coupling probability estimates. In Advances in Neural Information Processing Systems (NIPS*2001), 2002. To appear.Google Scholar
B. Zadrozny and C. Elkan. Learning and making decisions when costs and probabilities are both unknown. In Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining, pages 204--213. ACM Press, 2001. Google ScholarDigital Library
B. Zadrozny and C. Elkan. Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 609--616. Morgan Kaufmann Publishers, Inc., 2001. Google ScholarDigital Library

Index Terms

Transforming classifier scores into accurate multiclass probability estimates

Recommendations

Transforming examples for multiclass boosting

AdaBoost.M2 and AdaBoost.MH are boosting algorithms for learning from multiclass datasets. They have received less attention than other boosting algorithms because they require base classifiers that can handle the pseudoloss or Hamming loss, ...
Read More
A new adaptive framework for classifier ensemble in multiclass large data
ICCSA'11: Proceedings of the 2011 international conference on Computational science and its applications - Volume Part I

This paper proposes an innovative combinational algorithm to improve the performance of multiclass problems. Because the more accurate classifier the better performance of classification, so researchers have been tended to improve the accuracies of ...
Read More
Cost-sensitive multi-class classification from probability estimates
ICML '08: Proceedings of the 25th international conference on Machine learning

For two-class classification, it is common to classify by setting a threshold on class probability estimates, where the threshold is determined by ROC curve analysis. An analog for multi-class classification is learning a new class partitioning of the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
July 2002
719 pages
ISBN:158113567X
DOI:10.1145/775047
Conference Chair:
Osmar R. Zaïane
University of Alberta, Canada
,
General Chair:
Randy Goebel
University of Alberta, Canada
,
Program Chairs:
David Hand
Imperial College, UK
,
Daniel Keim
AT&T
,
Raymond Ng
University of British Columbia, Canada
Copyright © 2002 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 July 2002
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
KDD '02 Paper Acceptance Rate44of307submissions,14%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 539
  Total Citations
  View Citations
- 3,846
  Total Downloads
- Downloads (Last 12 months)467
- Downloads (Last 6 weeks)59
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Transforming classifier scores into accurate multiclass probability estimates

KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Transforming examples for multiclass boosting

A new adaptive framework for classifier ensemble in multiclass large data

Cost-sensitive multi-class classification from probability estimates