Abstract
The debate concerning how many participants represents a sufficient number for interaction testing is well-established and long-running, with prominent contributions arguing that five users provide a good benchmark when seeking to discover interaction problems. We argue that adoption of five users in this context is often done with little understanding of the basis for, or implications of, the decision. We present an analysis of relevant research to clarify the meaning of the five-user assumption and to examine the way in which the original research that suggested it has been applied. This includes its blind adoption and application in some studies, and complaints about its inadequacies in others. We argue that the five-user assumption is often misunderstood, not only in the field of Human-Computer Interaction, but also in fields such as medical device design, or in business and information applications. The analysis that we present allows us to define a systematic approach for monitoring the sample discovery likelihood, in formative and summative evaluations, and for gathering information in order to make critical decisions during the interaction testing, while respecting the aim of the evaluation and allotted budget. This approach -- which we call the Grounded Procedure -- is introduced and its value argued.
- Alshamari, M. and Mayhew, P. 2009. Tech. Review: Current Issues of Usability Testing. IETE Tech. Rev. 26, 402--406.Google ScholarCross Ref
- Bias, R. G. and Mayhew, D. J. 2005. Cost-Justifying Usability: An Update for the Internet Age. Morgan Kaufmann Publishers, San Francisco, CA. Google ScholarDigital Library
- Borsci, S., Federici, S., Mele, M. L., Polimeno, D., and Londei, A. 2012. The bootstrap discovery behaviour model: Why five users are not enough to test user experience. In Cognitively Informed Intelligent Interfaces: Systems Design and Development, E. M. Alkhalifa and K. Gaid Eds., IGI Global Press, Hershey, PA.Google Scholar
- Borsci, S., Londei, A., and Federici, S. 2011. The Bootstrap Discovery Behaviour (BDB): A new outlook on usability evaluation. Cognit. Process. 12, 23--31.Google ScholarCross Ref
- Caulton, D. A. 2001. Relaxing the homogeneity assumption in usability testing. Behav. Infor. Technol. 20, 1--7.Google ScholarCross Ref
- Crystal, A. and Greenberg, J. 2005. Usability of a metadata creation application for resource authors. Library Inf. Sci. Res. 27, 177--189.Google ScholarCross Ref
- Efron, B. 1979. Bootstrap methods: Another look at the jackknife. Ann. Stat. 7, 1--26.Google ScholarCross Ref
- Faulkner, L. 2003. Beyond the five-user assumption: Benefits of increased sample sizes in usability testing. Behav. Res. Meth. 35, 379--383.Google ScholarCross Ref
- Federici, S., Borsci, S., and Stamerra, G. 2010. Web usability evaluation with screen reader users: Implementation of the Partial Concurrent Thinking Aloud technique. Cognit. Process. 11, 263--272.Google ScholarCross Ref
- Fishman, G. S. 1995. Monte Carlo: Concepts, Algorithms, and Applications. Springer, New York.Google Scholar
- Food and Drug Administration (FDA) 2011. Draft guidance for industry and Food and Drug Administration staff - Applying human factors and usability engineering to optimize medical device design. U.S. Food and Drug Administration, Silver Spring, MD.Google Scholar
- Fox, J. 2002. An R and S-Plus Companion to Applied Regression. SAGE. Google ScholarDigital Library
- Glaser, B. G. and Strauss, A. L. 1967. The Discovery of Grounded Theory: Strategies for Qualitative Research. Aldine Publishing Company, Chicago, IL.Google Scholar
- Good, I. J. 1953. The population frequencies of species and the estimation of population parameters. Biometrika 40, 237--264.Google ScholarCross Ref
- Hertzum, M. and Jacobsen, N. E. 2003. The evaluator effect: A chilling fact about usability evaluation methods. Int. Jo. Hum.-Comput. Interact. 15, 183--204.Google ScholarCross Ref
- Hong, J. I., Heer, J., Waterson, S., and Landay, J. A. 2001. WebQuilt: A proxy-based approach to remote web usability testing. ACM Trans. on Inf. Syst. 19, 263--285. Google ScholarDigital Library
- ISO 1998. ISO 9241-11:1998, Ergonomic requirements for office work with visual display terminals, CEN, Brussels, BE.Google Scholar
- Jelinek, F. 1997. Statistical Methods for Speech Recognition. MIT Press, Cambridge, MA. Google ScholarDigital Library
- Kirakowski, J. 2005. Chapter 18 - Summative usability testing: Measurement and sample size. In Cost-Justifying Usability 2nd Ed., R. G. Bias and D. J. Mayhew Eds., Morgan Kaufmann, San Francisco, CA, 519--553.Google Scholar
- Kurosu, M. 2007. Concept of usability revisited. In Human-Computer Interaction: Interaction Design and Usability, J. Jacko Ed., Springer, Berlin, 579--586. Google ScholarDigital Library
- Lewis, J. R. 1994. Sample sizes for usability studies: Additional considerations. Human Factors: J. Human Factors Ergono. Soc. 36, 368--378.Google ScholarCross Ref
- Lewis, J. R. 2000. Validation of Monte Carlo estimation of problem discovery likelihood. Tech. rep. 29.3357, IBM, Raleigh, NC.Google Scholar
- Lewis, J. R. 2001. Evaluation of procedures for adjusting problem-discovery rates estimated from small samples. Int. J. Hum.-Comput. Interact. 13, 445--479.Google ScholarCross Ref
- Lewis, J. R. 2006. Sample sizes for usability tests: Mostly math, not magic. Interactions 13, 29--33. Google ScholarDigital Library
- Lindgaard, G. and Chattratichart, J. 2007. Usability testing: What have we overlooked? In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, New York, 1415--1424. Google ScholarDigital Library
- Manning, C. D. and Schutze, H. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA. Google ScholarDigital Library
- Molich, R. and Nielsen, J. 1990. Improving a human-computer dialogue. Comm. ACM 33, 338--348. Google ScholarDigital Library
- Nielsen, J. 1995. Severity ratings for usability problems. http://useit.com/papers/heuristic/severityrating.html.Google Scholar
- Nielsen, J. 2000. Why you only need to test with 5 users. http://www.useit.com/alertbox/20000319.html.Google Scholar
- Nielsen, J. 2012. How many test users in a usability study? http://www.useit.com/alertbox/number-of-test-users.html.Google Scholar
- Nielsen, J. and Landauer, T. K. 1993. A mathematical model of the finding of usability problems. In Proceedings of the INTERACT’93 and CHI’93 Conference on Human Factors in Computing Systems, ACM, New York, 206--213. Google ScholarDigital Library
- Nielsen, J. and Mack, R. L. 1994. Usability Inspection Methods. Wiley, New York. Google ScholarDigital Library
- Norman, D. A. 1983. Some observations on mental models. In Mental Models, D. Gentner and A. Steven Eds., Lawrence Earlbaum Associates, Hillsdale, NJ, 7--14.Google Scholar
- Norman, D. A. 1988. The Psychology of Everyday Things. Basic Books, New York.Google Scholar
- Norman, D. A. and Draper, S. W. 1986. User Centered System Design: New Perspectives on Human-Computer Interaction. Lawrence Erlbaum Associates Inc, Hillsdale, NJ. Google ScholarDigital Library
- Petrie, H. and Bevan, N. 2009. The evaluation of accessibility, usability, and user experience. In The Universal Access Handbook, C. Stephanidis Ed., CRC Press, London, UK.Google Scholar
- Sauro, J. and Lewis, J. R. 2012. Quantifying the User Experience. Morgan Kaufmann, Waltham, MA. Google ScholarDigital Library
- Schmettow, M. 2008. Heterogeneity in the usability evaluation process. In Proceedings of the 22nd British HCI Group Annual Conference on People and Computers: Culture, Creativity, Interaction. Vol. 1, UK British Computer Society, Swinton, UK, 89--98. Google ScholarDigital Library
- Schmettow, M. 2009. Controlling the usability evaluation process under varying defect visibility. In Proceedings of the 23rd British HCI Group Annual Conference on People and Computers: Celebrating People and Technology. British Computer Society, Swinton, UK, 188--197. Google ScholarDigital Library
- Schmettow, M. 2012. Sample size in usability studies. Comm. ACM 55, 64--70. Google ScholarDigital Library
- Shneiderman, B. 1986. Designing the User Interface: Strategies for Effective Human-Computer Interaction. Addison-Wesley Longman Publishing Co., Inc, Boston, MA. Google ScholarDigital Library
- Spool, J. and Schroeder, W. 2001. Testing web sites: Five users is nowhere near enough. In Proceedings of the CHI’01 Extended Abstracts on Human Factors in Computing Systems. New York, 285--286. Google ScholarDigital Library
- Tullis, T. and Albert, W. 2008. Measuring the User Experience: Collecting, Analyzing, and Presenting Usability Metrics. Morgan, Kaufmann. Google ScholarDigital Library
- Turner, C. W., Lewis, J. R., and Nielsen, J. 2006. Determining usability test sample size. In International Encyclopedia of Ergonomics and Human Factors, W. Karwowski Ed., CRC Press, Boca Raton, FL, 3084--3088.Google Scholar
- Virzi, R. A. 1990. Streamlining the design process: Running fewer subjects. In Proceedings of the Human Factors Society 34th Annual Meeting. ACM, New York, 291--294.Google ScholarCross Ref
- Virzi, R. A. 1992. Refining the test phase of usability evaluation: How many subjects is enough? Hum. Fact. 34, 457--468. Google ScholarDigital Library
- Woolrych, A. and Cockton, G. 2001. Why and when five test users aren’t enough. In Proceedings of the Proceedings of IHM-HCI 2001 Conference, J. Vanderdonckt, A. Blandford, and A. Derycke Eds., Cépaduès Editions, London, UK, 105--108.Google Scholar
Index Terms
- Reviewing and Extending the Five-User Assumption: A Grounded Procedure for Interaction Evaluation
Recommendations
Factors to actors: implications of posthumanism for social justice work
SIGDOC '15: Proceedings of the 33rd Annual International Conference on the Design of CommunicationContexts, tools, and other nonhuman factors are central to the practice and scholarship of technical communication, particularly communication design. But viewed through the lens of posthumanism, these considerations shift from factors to actors: the ...
Leading participant-centered research: an argument for taking a more strategic role as user experience architects
SIGDOC '15: Proceedings of the 33rd Annual International Conference on the Design of CommunicationIn this experience report, we discuss our experiences in negotiating participant-based research in industry projects. For the past twenty years, academics and practitioners of participant-based research have worked to integrate practices and methods to ...
Five Provocations for Ethical HCI Research
CHI '16: Proceedings of the 2016 CHI Conference on Human Factors in Computing SystemsWe present five provocations for ethics, and ethical research, in HCI. We discuss, in turn, informed consent, the researcher-participant power differential, presentation of data in publications, the role of ethical review boards, and, lastly, corporate-...
Comments