Scanning Multiple Sequences

Naus, Joseph I.

doi:10.1007/978-1-4612-1578-3_4

Joseph I. Naus⁴

Part of the book series: Statistics for Industry and Technology ((SIT))

597 Accesses

Abstract

Much of the scanning literature focuses on unusual clusters of a given type of event in a single sequence of trials or time period. In this chapter, we discuss approaches to simultaneously scan multiple series. In one set of problems, there are multiple series corresponding to the occurrence of different types of events over the same period of time; the researcher looks for multiple-type clusters allowing for lagged effects between the different types of events. In the second set of problems, one scans multiple series looking for the largest common perfect or almost perfect match between all or most of the series. This second set of problems is of importance to molecular biologists searching for strong homologies in DNA sequences. Some related problems in two-dimensional scanning are mentioned.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Introducing time series chains: a new primitive for time series data mining

Article 02 June 2018

Finding the Intrinsic Patterns in a Collection of Time Series

Matrix profile goes MAD: variable-length motif and discord discovery in data series

Article 07 May 2020

References

Ahn, H. and Kuo, W. (1994). Applications of consecutive system reliability in selecting acceptance sampling strategies, InRuns and Patterns in Probability(Eds., A. P. Godbole and S. G. Papastavridis), pp. 131–162, Dordrecht, The Netherlands: Kluwer Academic Publishers.
Chapter Google Scholar
Aldous, D. (1989).Probability Approximations via the Poisson Clumping HeuristicNew York: Springer-Verlag.
MATH Google Scholar
Anscombe, F. J., Godwin, H. J. and Plackett, R. L. (1947). Methods of deferred sentencing in testingJournal of the Royal Statistical Society Series B 7198–217.
MathSciNet Google Scholar
Arratia, R., Goldstein, L. and Gordon, L. (1990). Poisson approximation and the Chen-Stein methodStatistical Science 5403–434.
MathSciNet MATH Google Scholar
Arratia, R., Gordon, L. and Waterman, M. S. (1990). The Erdös-Rényi law in distribution, for coin tossing and sequence matchingAnnals of Statistics 18539–570.
Article MathSciNet MATH Google Scholar
Barbour, A. D., Holst, L. and Janson, S. (1992).Poisson ApproximationOxford, England: Clarendon Press.
Google Scholar
Chen, J. and Glaz, J. (1996). Two-dimensional discrete scan statisticsStatistics ε Probability Letters 3159–68.
Article MathSciNet MATH Google Scholar
Darling, R. W. R. and Waterman, M. S. (1985). Matching rectangles in d-dimensions: Algorithms and laws of large numbersAdvances in Mathematics 551–12.
Article MathSciNet MATH Google Scholar
Deheuvels, P. (1985). On the Erdös-Rényi theorem for random fields and sequences and its relationships with the theory of runs and spacingsZeitschrift Wahrscheinlichkeitstheorie 7091–115.
Article MathSciNet MATH Google Scholar
Drosnin, M. (1997).The Bible CodeNew York: Simon & Schuster.
Google Scholar
Glaz, J. and Naus J. (1991). Tight bounds and approximations for scan statistic probabilities for discrete dataAnnals of Applied Probability 1306–318.
Article MathSciNet MATH Google Scholar
Greenberg, M., Naus, J., Schneider, D. and Wartenberg, D. (1991). Temporal clustering of homicide and suicide among 15–24 year old white and black AmericansEthnicity and Disease 1342–350.
Google Scholar
Huntington, R. J. (1976). Expected waiting time till a constrained quotaTechnical ReportAT&T.
Google Scholar
Karlin, S. and Ost, F. (1987). Counts of long aligned word matches among random letter sequencesAdvances in Applied Probability 19293–351.
Article MathSciNet MATH Google Scholar
Karlin, S. and Ost, F. (1988). Maximal length of common words among random letter sequencesAnnals of Probability 16535–563.
Article MathSciNet MATH Google Scholar
Koutras, M. V. and Alexandrou, V. A. (1995). Runs, scans and urn model distributions: A unified Markov chain approachAnnals of the Institute of Statistical Mathematics 47743–766.
Article MathSciNet MATH Google Scholar
Leung, M. Y., Blaisdell, B. E., Burge, C. and Karlin, S. (1991). An efficient algorithm for identifying matches with errors in multiple long molecular sequencesJournal of Molecular Biology 2211367–1378.
Article Google Scholar
Mott, R. F., Kirkwood, T. B. L. and Curnow, R. N. (1990). An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequencesBulletin of Mathematical Biology 52773–784.
MATH Google Scholar
Naus, J. I. (1974). Probabilities for a generalized birthday problemJournal of the American Statistical Association 69810–815.
Article MathSciNet MATH Google Scholar
Naus, J. I. (1982). Approximations for distributions of scan statisticsJournal of the American Statistical Association 77177–183.
Article MathSciNet MATH Google Scholar
Naus, J. I. (1988). Scan statistics, InEncyclopedia of Statistical SciencesVolume 8 (Eds., N. L. Johnson and S. Kotz), pp. 281–284, New York: John Wiley & Sons.
Google Scholar
Naus, J. I. and Sheng, K. N. (1996). Screening for unusual matched segments in multiple protein sequencesCommunications in Statistics-Simulation and Computation 25937–952.
Article MathSciNet MATH Google Scholar
Naus, J.I. and Sheng, K.N. (1997). Matching among multiple random sequencesBulletin of Mathematical Biology 59483–496.
Article MATH Google Scholar
Naus, J. I. and Wartenberg, D. (1997). A double scan statistic for clusters of two types of eventsJournal of the American Statistical Association 921105–1113.
Article MathSciNet MATH Google Scholar
Page, E. S. (1955). Control charts with warning linesBiometrika 42243–257.
MathSciNet MATH Google Scholar
Papastavridis, S. G. and Koutras, M. V. (1993). Bounds for reliability of consecutivek-within-m-out-of-n:FsystemIEEE Transactions on Reliability 42156–160.
Article MATH Google Scholar
Piterbarg, V. I. (1992). On the distribution of the maximum similarity score for fragments of two random sequences, InMathematical Methods of Analysis of Biopolymer Sequences(F,d., Simon Gindikin), pp. 11–18,DI-MACS series in Discrete Mathematics and Theoretical Computer Science, Volume8Providence, RI: American Mathematical Society.
Google Scholar
Roberts, S. W. (1958). Properties of control chart zone testsBell System Technical Journal 3783–114.
Google Scholar
Sheng, K. N. and Naus, J. I. (1994). Pattern matching between two nonaligned random sequencesBulletin of Mathematical Biology56, 1143–1162.
MATH Google Scholar
Sheng, K. N. and Naus, J. I. (1996). Matching fixed rectangles in 2-dimensionsStatistics ε Probability Letters26, 83–90.
Article MathSciNet MATH Google Scholar
Waterman, M. S. (1986). Multiple sequence alignment by consensusNucleic Acids Research14, 9095–9102.
Article MathSciNet Google Scholar
Witztum, D., Rips, E. and Rosenberg, Y. (1994). Equidistant letter sequences in the book of GenesisStatistical Science9, 429–438.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Rutgers University, New Brunswick, NJ, USA
Joseph I. Naus

Authors

Joseph I. Naus
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Statistics, University of Connecticut at Storrs, Storrs, CT, 06269-3120, USA
Joseph Glaz
Department of Mathematics and Statistics, McMaster University, Hamilton, Ontario, L8S 4K1, Canada
N. Balakrishnan

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Naus, J.I. (1999). Scanning Multiple Sequences. In: Glaz, J., Balakrishnan, N. (eds) Scan Statistics and Applications. Statistics for Industry and Technology. Birkhäuser, Boston, MA. https://doi.org/10.1007/978-1-4612-1578-3_4

Download citation

DOI: https://doi.org/10.1007/978-1-4612-1578-3_4
Publisher Name: Birkhäuser, Boston, MA
Print ISBN: 978-1-4612-7201-4
Online ISBN: 978-1-4612-1578-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Scanning Multiple Sequences

Abstract

Access this chapter

Preview

Similar content being viewed by others

Introducing time series chains: a new primitive for time series data mining

Finding the Intrinsic Patterns in a Collection of Time Series

Matrix profile goes MAD: variable-length motif and discord discovery in data series

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Scanning Multiple Sequences

Abstract

Access this chapter

Preview

Similar content being viewed by others

Introducing time series chains: a new primitive for time series data mining

Finding the Intrinsic Patterns in a Collection of Time Series

Matrix profile goes MAD: variable-length motif and discord discovery in data series

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation