{"id":238,"date":"2014-11-04T23:15:00","date_gmt":"2014-11-04T23:15:00","guid":{"rendered":"http:\/\/measuringu.com\/validity-research\/"},"modified":"2021-01-28T06:29:59","modified_gmt":"2021-01-28T06:29:59","slug":"validity-research","status":"publish","type":"post","link":"https:\/\/measuringu.com\/validity-research\/","title":{"rendered":"Assessing the Validity of Your Research"},"content":{"rendered":"
You often hear that research results are not “valid” or “reliable.”<\/p>\n
Like many scientific terms that have made it into our vernacular, these terms are often used interchangeably.<\/p>\n
In fact, validity and reliability have different meanings with different implications for researchers.<\/p>\n
Validity refers to how well the results of a study measure what they are intended to measure<\/b>. Contrast that with reliability, which means consistent results over time.<\/p>\n
For example, if you weigh yourself four times on a scale and get the values 165, 164, 165, and 166, then you can say that the scale is reasonably reliable since the weights are consistent. If, however, you weigh 175 pounds and not 165, the scale measurement has little validity!<\/p>\n
Reliability is necessary, but not sufficient to establish validity.<\/p>\n
In a similar vein, if we ask 500 customers at various times during a week to rate their likelihood of recommending a product–assuming that no relevant variables have changed during that time–and we get scores of 75%, 76%, and 74%, we could call our measurement reliable.<\/p>\n
The likelihood-to-recommend question is the one used to compute the Net Promoter Score (NPS)<\/a>. The NPS is intended to predict two things. First, it’s intended to predict how many customers will recommend in the future based on what customers say now. Customer recommendations predict, in turn, company growth. If the NPS doesn’t differentiate between high-growth and low-growth companies, then the score has little validity.<\/p>\n Don’t confuse this type of validity (often called test validity) with experimental validity<\/a>, which is composed of internal and external validity. Internal validity indicates how much faith we can have in cause-and-effect statements that come out of our research. External validity indicates the level to which findings are generalized.<\/p>\n Test validity gets its name from the field of psychometrics, which got its start over 100 years ago with the measurement of intelligence vs school performance, using those standardized tests we’ve all grown to loathe. Even though we rarely use tests in user research, we use their byproducts: questionnaires, surveys, and usability-test metrics, like task-completion rates, elapsed time, and errors.<\/p>\n So while we speak in terms of test validity as one overall concept, in practice it’s made up of three component parts: content validity, criterion validity, and construct validity.<\/p>\n To determine whether your research has validity, you need to consider all three types of validity using the tripartite model developed by Cronbach & Meehl in 1955<\/a>, as shown in Figure 1 below.<\/p>\n <\/p>\n Figure 1<\/b>: The tripartite view of validity, which includes criterion-related, content and construct validity.<\/p>\n The idea behind content validity is that questions, administered in a survey, questionnaire, usability test, or focus group come from a larger pool of relevant content. For example, if you’re measuring the vocabulary of third graders, your evaluation includes a subset of the words third graders need to learn.<\/p>\nTest Validity versus Experimental Validity<\/h3>\n
<\/h3>\n
Content Validity<\/h3>\n