Skip to main content

Scanning Multiple Sequences

  • Chapter
Scan Statistics and Applications

Part of the book series: Statistics for Industry and Technology ((SIT))

  • 597 Accesses

Abstract

Much of the scanning literature focuses on unusual clusters of a given type of event in a single sequence of trials or time period. In this chapter, we discuss approaches to simultaneously scan multiple series. In one set of problems, there are multiple series corresponding to the occurrence of different types of events over the same period of time; the researcher looks for multiple-type clusters allowing for lagged effects between the different types of events. In the second set of problems, one scans multiple series looking for the largest common perfect or almost perfect match between all or most of the series. This second set of problems is of importance to molecular biologists searching for strong homologies in DNA sequences. Some related problems in two-dimensional scanning are mentioned.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Ahn, H. and Kuo, W. (1994). Applications of consecutive system reliability in selecting acceptance sampling strategies, InRuns and Patterns in Probability(Eds., A. P. Godbole and S. G. Papastavridis), pp. 131–162, Dordrecht, The Netherlands: Kluwer Academic Publishers.

    Chapter  Google Scholar 

  2. Aldous, D. (1989).Probability Approximations via the Poisson Clumping HeuristicNew York: Springer-Verlag.

    MATH  Google Scholar 

  3. Anscombe, F. J., Godwin, H. J. and Plackett, R. L. (1947). Methods of deferred sentencing in testingJournal of the Royal Statistical Society Series B 7198–217.

    MathSciNet  Google Scholar 

  4. Arratia, R., Goldstein, L. and Gordon, L. (1990). Poisson approximation and the Chen-Stein methodStatistical Science 5403–434.

    MathSciNet  MATH  Google Scholar 

  5. Arratia, R., Gordon, L. and Waterman, M. S. (1990). The Erdös-Rényi law in distribution, for coin tossing and sequence matchingAnnals of Statistics 18539–570.

    Article  MathSciNet  MATH  Google Scholar 

  6. Barbour, A. D., Holst, L. and Janson, S. (1992).Poisson ApproximationOxford, England: Clarendon Press.

    Google Scholar 

  7. Chen, J. and Glaz, J. (1996). Two-dimensional discrete scan statisticsStatistics ε Probability Letters 3159–68.

    Article  MathSciNet  MATH  Google Scholar 

  8. Darling, R. W. R. and Waterman, M. S. (1985). Matching rectangles in d-dimensions: Algorithms and laws of large numbersAdvances in Mathematics 551–12.

    Article  MathSciNet  MATH  Google Scholar 

  9. Deheuvels, P. (1985). On the Erdös-Rényi theorem for random fields and sequences and its relationships with the theory of runs and spacingsZeitschrift Wahrscheinlichkeitstheorie 7091–115.

    Article  MathSciNet  MATH  Google Scholar 

  10. Drosnin, M. (1997).The Bible CodeNew York: Simon & Schuster.

    Google Scholar 

  11. Glaz, J. and Naus J. (1991). Tight bounds and approximations for scan statistic probabilities for discrete dataAnnals of Applied Probability 1306–318.

    Article  MathSciNet  MATH  Google Scholar 

  12. Greenberg, M., Naus, J., Schneider, D. and Wartenberg, D. (1991). Temporal clustering of homicide and suicide among 15–24 year old white and black AmericansEthnicity and Disease 1342–350.

    Google Scholar 

  13. Huntington, R. J. (1976). Expected waiting time till a constrained quotaTechnical ReportAT&T.

    Google Scholar 

  14. Karlin, S. and Ost, F. (1987). Counts of long aligned word matches among random letter sequencesAdvances in Applied Probability 19293–351.

    Article  MathSciNet  MATH  Google Scholar 

  15. Karlin, S. and Ost, F. (1988). Maximal length of common words among random letter sequencesAnnals of Probability 16535–563.

    Article  MathSciNet  MATH  Google Scholar 

  16. Koutras, M. V. and Alexandrou, V. A. (1995). Runs, scans and urn model distributions: A unified Markov chain approachAnnals of the Institute of Statistical Mathematics 47743–766.

    Article  MathSciNet  MATH  Google Scholar 

  17. Leung, M. Y., Blaisdell, B. E., Burge, C. and Karlin, S. (1991). An efficient algorithm for identifying matches with errors in multiple long molecular sequencesJournal of Molecular Biology 2211367–1378.

    Article  Google Scholar 

  18. Mott, R. F., Kirkwood, T. B. L. and Curnow, R. N. (1990). An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequencesBulletin of Mathematical Biology 52773–784.

    MATH  Google Scholar 

  19. Naus, J. I. (1974). Probabilities for a generalized birthday problemJournal of the American Statistical Association 69810–815.

    Article  MathSciNet  MATH  Google Scholar 

  20. Naus, J. I. (1982). Approximations for distributions of scan statisticsJournal of the American Statistical Association 77177–183.

    Article  MathSciNet  MATH  Google Scholar 

  21. Naus, J. I. (1988). Scan statistics, InEncyclopedia of Statistical SciencesVolume 8 (Eds., N. L. Johnson and S. Kotz), pp. 281–284, New York: John Wiley & Sons.

    Google Scholar 

  22. Naus, J. I. and Sheng, K. N. (1996). Screening for unusual matched segments in multiple protein sequencesCommunications in Statistics-Simulation and Computation 25937–952.

    Article  MathSciNet  MATH  Google Scholar 

  23. Naus, J.I. and Sheng, K.N. (1997). Matching among multiple random sequencesBulletin of Mathematical Biology 59483–496.

    Article  MATH  Google Scholar 

  24. Naus, J. I. and Wartenberg, D. (1997). A double scan statistic for clusters of two types of eventsJournal of the American Statistical Association 921105–1113.

    Article  MathSciNet  MATH  Google Scholar 

  25. Page, E. S. (1955). Control charts with warning linesBiometrika 42243–257.

    MathSciNet  MATH  Google Scholar 

  26. Papastavridis, S. G. and Koutras, M. V. (1993). Bounds for reliability of consecutivek-within-m-out-of-n:FsystemIEEE Transactions on Reliability 42156–160.

    Article  MATH  Google Scholar 

  27. Piterbarg, V. I. (1992). On the distribution of the maximum similarity score for fragments of two random sequences, InMathematical Methods of Analysis of Biopolymer Sequences(F,d., Simon Gindikin), pp. 11–18,DI-MACS series in Discrete Mathematics and Theoretical Computer Science, Volume8Providence, RI: American Mathematical Society.

    Google Scholar 

  28. Roberts, S. W. (1958). Properties of control chart zone testsBell System Technical Journal 3783–114.

    Google Scholar 

  29. Sheng, K. N. and Naus, J. I. (1994). Pattern matching between two nonaligned random sequencesBulletin of Mathematical Biology56, 1143–1162.

    MATH  Google Scholar 

  30. Sheng, K. N. and Naus, J. I. (1996). Matching fixed rectangles in 2-dimensionsStatistics ε Probability Letters26, 83–90.

    Article  MathSciNet  MATH  Google Scholar 

  31. Waterman, M. S. (1986). Multiple sequence alignment by consensusNucleic Acids Research14, 9095–9102.

    Article  MathSciNet  Google Scholar 

  32. Witztum, D., Rips, E. and Rosenberg, Y. (1994). Equidistant letter sequences in the book of GenesisStatistical Science9, 429–438.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer Science+Business Media New York

About this chapter

Cite this chapter

Naus, J.I. (1999). Scanning Multiple Sequences. In: Glaz, J., Balakrishnan, N. (eds) Scan Statistics and Applications. Statistics for Industry and Technology. Birkhäuser, Boston, MA. https://doi.org/10.1007/978-1-4612-1578-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-4612-1578-3_4

  • Publisher Name: Birkhäuser, Boston, MA

  • Print ISBN: 978-1-4612-7201-4

  • Online ISBN: 978-1-4612-1578-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics