Assessing Interrater Reliability in UX Research
Do researchers agree on what the problems are in an interface? And will researchers group the problems into the same categories? When coding open-ended comments in a survey, will different researchers classify the comments differently? These discovery and classification activities are quite common in UX research, but they are often conducted by a single person