skip to main content
10.1145/375663.375670acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Space-efficient online computation of quantile summaries

Authors Info & Claims
Published:01 May 2001Publication History

ABSTRACT

An ∈-approximate quantile summary of a sequence of N elements is a data structure that can answer quantile queries about the sequence to within a precision of ∈N.

We present a new online algorithm for computing∈-approximate quantile summaries of very large data sequences. The algorithm has a worst-case space requirement of Ο(1÷∈ log(∈N)). This improves upon the previous best result of Ο(1÷∈ log2(∈N)). Moreover, in contrast to earlier deterministic algorithms, our algorithm does not require a priori knowledge of the length of the input sequence.

Finally, the actual space bounds obtained on experimental data are significantly better than the worst case guarantees of our algorithm as well as the observed space requirements of earlier algorithms.

References

  1. 1.Rakesh Agrawal and Arun Swami. A one-pass space-efficient algorithm for finding quantiles. Proc. 7th Int. Conf. Management of Data, COMAD, 28-30 December 1995.Google ScholarGoogle Scholar
  2. 2.Khaled Alsabti, Sanjay Ranka, and Vineet Singh. A one-pass algorithm for accurately estimating quantiles for disk-resident data. Proceedings of the 23rd Intl. Conference onVery Large Data Bases, Athens, Greece, 26-29 August 1997, pages 346-355, Los Altos, CA 94022, USA, 1997. Morgan Kaufmann Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.Surajit Chaudhuri, Rajeev Motwani, and Vivek Narasayya. Random sampling for histogram construction: how much is enough? In ACM SIGMOD '98, volume 28, pages 436-447, Seattle, WA, June 1-4, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.Phillip B. Gibbons, Yossi Matias, and Viswanath Poosala. Fast incremental maintenance of approximate histograms. In Proceedings of the 23rd Intl. Conf. Very Large Data Bases, VLDB, pages 466-475. Morgan Kaufmann, 25-27 August 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.Michael B. Greenwald. Practical algorithms for self scaling histograms or better than average data collection. Performance Evaluation, 27&28:19-40, October 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.R. Jain and I. Chlamtac. The P2 algorithm for dynamic calculation of quantile and histograms without storing observations. Communications of the ACM, 28(10):1076-1085, October 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.I. Pohl. A minimum storage algorithm for computing the median. IBM Research Report RC 2701, November 1969.Google ScholarGoogle Scholar
  8. 8.Gurmeet Singh Manku, Sridhar Rajagopalan, and Bruce G. Lindsay. Approximate medians and other quantiles in one pass and with limited memory. ACM SIGMOD '98, volume 28, pages 426-435, Seattle, WA, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.Gurmeet Singh Manku, Sridhar Rajagopalan, and Bruce G. Lindsay. Random sampling techniques for space efficient online computation of order statistics of large datasets. In ACM SIGMOD '99, volume 29, pages 251-262. Philadelphia, PA, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.J. I. Munro and M.S. Paterson. Selection and sorting with limited storage. Theoretical Computer Science, vol. 12: 315-323; 1980.Google ScholarGoogle ScholarCross RefCross Ref
  11. 11.M.S. Paterson. Progress in selection. Technical Report, University ofWarwick, Coventry, UK, 1997.Google ScholarGoogle Scholar
  12. 12.Viswanath Poosala, Venkatesh Ganti, and Yannis E. Ioannidis. Approximate query answering using histograms. Bulletin of the IEEE Technical Committee on Data Engineering, 22(4):6-15, December 1999.Google ScholarGoogle Scholar
  13. 13.Viswanath Poosala, Peter J. Haas, Yannis E. Ioannidis, and Eugene J. Shekita. Improved histograms for selectivity estimation of range predicates. In ACM SIGMOD 96, volume 26, pages 294-305, Montreal, Quebec, Canada, June 4-6, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Space-efficient online computation of quantile summaries

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SIGMOD '01: Proceedings of the 2001 ACM SIGMOD international conference on Management of data
            May 2001
            630 pages
            ISBN:1581133324
            DOI:10.1145/375663

            Copyright © 2001 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 May 2001

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            SIGMOD '01 Paper Acceptance Rate44of293submissions,15%Overall Acceptance Rate785of4,003submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader