Abstract
High-throughput screening is an early critical step in drug discovery. Its aim is to screen a large number of diverse chemical compounds to identify candidate 'hits' rapidly and accurately. Few statistical tools are currently available, however, to detect quality hits with a high degree of confidence. We examine statistical aspects of data preprocessing and hit identification for primary screens. We focus on concerns related to positional effects of wells within plates, choice of hit threshold and the importance of minimizing false-positive and false-negative rates. We argue that replicate measurements are needed to verify assumptions of current methods and to suggest data analysis strategies when assumptions are not met. The integration of replicates with robust statistical methods in primary screens will facilitate the discovery of reliable hits, ultimately improving the sensitivity and specificity of the screening process.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Dove, A. Screening for content—the evolution of high throughput. Nat. Biotechnol. 21, 859–864 (2003).
Landro, J.A. et al. HTS in the new millennium: the role of pharmacology and flexibility. J. Pharmacol. Toxicol. Methods 44, 273–289 (2000).
Stein, R.L. High-throughput screening in academia: the Harvard experience. J. Biomol. Screen. 8, 615–619 (2003).
Nelson, R.M. & Yingling, J.D. Introduction to High-Throughput Screening for Drug Discovery (IBC USA Conferences, Inc., San Diego, CA, 2004).
Campbell, D.T. & Kenny, D.A. A Primer on Regression Artifacts (Guilford Press, New York, 1999).
Stigler, S.M. Statistics on the Table: the History of Statistical Concepts and Methods (Harvard University Press, Cambridge, MA, 1999).
Lundholt, B.K., Scudder, K.M. & Pagliaro, L. A simple technique for reducing edge effect in cell-based assays. J. Biomol. Screen. 8, 566–570 (2003).
Zhang, J.H., Chung, T.D.Y. & Oldenburg, K.R. Confirmation of primary active substances from high throughput screening of chemical and biological populations: a statistical approach and practical considerations. J. Comb. Chem. 2, 258–265 (2000).
Tukey, J.W. A survey of sampling from contaminated distributions. in Contributions to Probability and Statistics (ed. Olkin, I.) 448–485 (Stanford University Press, Stanford, CA, 1960).
Brideau, C., Gunter, B., Pikounis, B. & Liaw, A. Improved statistical methods for hit selection in high-throughput screening. J. Biomol. Screen. 8, 634–647 (2003).
Gunter, B., Brideau, C., Pikounis, B. & Liaw, A. Statistical and graphical methods for quality control determination of high-throughput screening data. J. Biomol. Screen. 8, 624–633 (2003).
Hoaglin, D.C., Mosteller, F. & Tukey, J.W. Understanding Robust and Exploratory Data Analysis (Wiley, New York, 1983).
Buxser, S. & Vroegop, S. Calculating the probability of detection for inhibitors in enzymatic or binding reactions in high-throughput screening. Anal. Biochem. 340, 1–13 (2005).
Chen, Y., Dougherty, E.R. & Bittner, M.L. Ratio-based decisions and the quantitative analysis of cDNA microarray images. J. Biomed. Opt. 2, 364–374 (1997).
Rocke, D.M. Design and analysis of experiments with high throughput biological assay data. Semin. Cell Dev. Biol. 15, 703–713 (2004).
Lee, M.L., Kuo, F.C., Whitmore, G.A. & Sklar, J. Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. Proc. Natl. Acad. Sci. USA 97, 9834–9839 (2000).
Nadon, R. & Shoemaker, J. Statistical issues with microarrays: processing and analysis. Trends Genet. 18, 265–271 (2002).
Box, G.E.P., Hunter, J.S. & Hunter, W.G. Statistics for Experimenters: Design, Innovation, and Discovery, edn. 2 (Wiley-Interscience, Hoboken, N.J., 2005).
Wright, G.W. & Simon, R.M. A random variance model for detection of differential gene expression in small microarray experiments. Bioinformatics 19, 2448–2455 (2003).
Smyth, G. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3, no.1, art. 3 (2004).
Baldi, P. & Long, A.D. A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 17, 509–519 (2001).
Verkman, A.S. Drug discovery in academia. Am. J. Physiol. Cell Physiol. 286, C465–C474 (2004).
Kerns, E.H. & Di, L. Pharmaceutical profiling in drug discovery. Drug Discov. Today 8, 316–323 (2003).
Fay, N. & Ullmann, D. Leveraging process integration in early drug discovery. Drug Discov. Today 7, S181–S186 (2002).
Acknowledgements
We thank Jing Liu and Janie Lapointe for generating the Figure 3 data. This work was supported by the “Informatics and Chemical Genomics” funding to R.N. under the Genome Quebec Phase II Bioinformatics Consortium program.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Rights and permissions
About this article
Cite this article
Malo, N., Hanley, J., Cerquozzi, S. et al. Statistical practice in high-throughput screening data analysis. Nat Biotechnol 24, 167–175 (2006). https://doi.org/10.1038/nbt1186
Published:
Issue Date:
DOI: https://doi.org/10.1038/nbt1186
This article is cited by
-
Probing the chemical ‘reactome’ with high-throughput experimentation data
Nature Chemistry (2024)
-
Generative adversarial networks review in earthquake-related engineering fields
Bulletin of Earthquake Engineering (2024)
-
Anti-malaria drug artesunate prevents development of amyloid-β pathology in mice by upregulating PICALM at the blood-brain barrier
Molecular Neurodegeneration (2023)
-
High-throughput sequencing in plant disease management: a comprehensive review of benefits, challenges, and future perspectives
Phytopathology Research (2023)
-
The regulation of endocrine-disrupting chemicals to minimize their impact on health
Nature Reviews Endocrinology (2023)