This is the final outline of our book with Morgan Kaufmann. It will bring together almost a decade of research on finding the best statistical approaches to solving the most common issues in user research. Publication date is April 15 2012.

- Introduction & How to Use this Book
- Visual Guide to What Test
- Skipping the formulas
- Quantifying User Research
- What is User Research?
- Usability Tests (lab and remote)
- Benchmarking
- Comparative Testing
- Qualitative Studies
- Surveys
- Requirements Gathering
- A/B Testing
- Questionnaires
- Using Inferential Statistics with usability Data
- Samples Size, Normality and other statistical concerns
- Measuring Usability: Quantifiable Aspects of Usability
- Introduction: Metrics as independent to formative and summative tests
- Completion
- Time
- Satisfaction
- Errors
- Clicks / Page Views
- Combined Scores
- Problems Discovered
- How precise are our estimates: Confidence Intervals
- Confidence Interval = Twice the Margin of Error
- Confidence Intervals Provide Precision & Location
- Three Components of a Confidence Interval
- Confidence Level
- Variability
- Sample Size
- Confidence Interval for a Completion Rate
- Confidence Interval History
- Wald Interval: terribly inaccurate for small samples
- Exact Confidence Interval
- Adjusted-Wald: Add Two Successes & Two Failures
- Best Point Estimates for a Completion Rate
- How accurate are point estimates from small samples?
- Confidence Interval for a Problem Occurrence
- Confidence Interval for Rating Scales and other Continuous Data
- Confidence Interval for Task Time Data
- Mean or Median Task Time?
- The Geometric Mean
- Log Transforming Confidence Intervals for Task Time Data
- Confidence Interval for a Median
- Did we meet or exceed our goal?
- Introduction
- One-Tailed and Two-Tailed Tests
- Comparing a Completion Rate to a Benchmark
- Small Sample Test
- Mid-Probability
- Large Sample Test
- Comparing a Satisfaction Score to a Benchmark
- Do at Least 75% Agree? Converting Continuous Ratings to Discrete
- Disadvantages to Converting Continuous Ratings to Discrete
- Net Promoter Score
- Comparing a Task Time to a Benchmark
- Is there a statistical difference between products?
- Comparing two Means (Rating Scales & Task Times)
- 2-sample t-test (between subjects)
- Confidence Interval around the Difference
- Paired t-test (within subjects)
- Confidence Interval around the Difference
- Comparing Completion Rates
- Small Samples : Fisher Exact Test
- Large-Samples : The N-1 2-proportion test
- Confidence Interval around the Difference
- Relationship between Chi-Square Tests and 2-proportion tests
- A/B Testing & Conversion Rates
- What Sample Sizes Do We Need? Part 1: Summative Usability Studies
- Introduction
- Why Do We Care?
- The Type of Usability Study Matters
- Basic Principles of Summative Sample Size Estimation
- Estimating Values
- Example 1: A Realistic Usability Testing Example Given Estimate of Variability
- Example 2: An Unrealistic Usability Testing Example
- Example 3: No Estimate of Variability
- Comparing Values
- Example 4: Comparison with a Benchmark
- Example 5: Within-Subjects Comparison of an Alternative
- Example 6: Between-Subjects Comparison of an Alternative
- Example 7: Where’s the Power?
- What Can I Do to Control Variability
- Sample Size Estimation for Binomial Confidence Intervals
- Binomial Sample Size Estimation for Small Samples
- Sample Size for Comparison with a Benchmark Proportion
- Sample Size Estimation for Proportions & Chi-Squared Tests
- What Sample Sizes Do We Need? Part 2 : Problem Discovery
- Using a Probabilistic Model of Problem Discovery to Estimate Sample Sizes for Formative User Research
- The famous equation (P(x ≥ 1) = 1 – (1 – p)n
- Deriving a sample size estimation equation from 1 – (1 – p)n
- Using the tables to plan sample sizes for formative user research
- Assumptions of the Binomial Probability Model
- Additional Applications of the Model
- Estimating the composite value of p for multiple problems or other events
- Adjusting small-sample composite estimates of p
- Estimating p
- Adjusting the Initial Estimate of p
- Using the Adjusted Estimate of p
- Investigating Sample Size Effectiveness
- Estimating the Number of Problems Available for Discovery
- What Affects the Value of p?
- Attitudinal Measurement with Questionnaires
- Scales, Labels and Points
- Post-Task Questionnaires
- ASQ, SMEQ, 1-question Likert
- Post-Test
- SUS, SUMI, PSSUQ, Homegrown scales
- Usability and Loyalty
- Net Promoter Scores and SUS
- Controversies in Measurement & Statistics
- Industrial versus Scientific: Purpose of statistics is to help in better decision making over the long run
- Multi-Point Scales
- p-values and NHST
- Parametric versus Non-Parametric Statistics
- Which confidence level
- When x=n or x=0 what confidence level do you use?
- Multiple testing versus omnibus testing
- 2 x 2 tables
- Final Thoughts on Statistics for User Research
- Appendix A: A Crash Course in Fundamental Statistical Concepts
- Central Tendency: Mean & Median
- Standard Deviation & Variance
- Population Parameters and Sample Statistics
- Standard Deviation
- Margin of Error
- Alpha
- Standard Error of the Mean
- Central Limit Theorem
- The normal distribution
- The Binomial Distribution
- Normal Approximation to the Binomial
- Introduction to Hypothesis Testing
- The Null and Alternative Hypothesis (Ho and Ha)
- Type I and Type II Errors
- Confidence and Power
- Making decisions from p-values
- If p is low reject the Ho
- One and Two Tailed Tests
- Mechanics of Test Statistics
- z statistics
- t-statistics

## Learn More

UX Measurement Boot Camp : Three Days of Intensive Training on UX Methods, Metrics and Measurement Aug. 7th-9th 2019 |