Why Writing Good Survey Questions Is Not Enough: EFA and CFA in Scale Development

Bahareh Jozranjbar
Mar 24
5 min read

In UX research, psychology, human factors, and AI evaluation, we often want to measure things we cannot directly observe. Trust in AI, cognitive load, frustration, confidence, perceived control, and motivation are all examples of constructs. They matter a lot, but they cannot be measured with a ruler or directly observed like task time or click count. That is why researchers build questionnaires and scales. The problem is that writing a list of good-sounding questions does not automatically create a valid measure.

This is one of the biggest misunderstandings in survey design. A questionnaire may look polished, clear, and professional, but that does not mean it is truly measuring what the researcher thinks it is measuring. Some items may be too weak. Some may overlap with other ideas. Some may sound different to the researcher but end up meaning the same thing to respondents. This is exactly why Exploratory Factor Analysis, or EFA, and Confirmatory Factor Analysis, or CFA, are so important in scale development.

What EFA and CFA Actually Do

EFA and CFA are part of the core logic of building a defensible scale. EFA helps researchers discover the structure that is actually present in the data. CFA helps researchers test whether that structure holds up when formally specified. In simple terms, EFA is for discovery and CFA is for confirmation.

EFA looks at patterns of correlations across items and identifies which questions tend to move together. If several items are answered similarly by many respondents, that usually suggests they are measuring the same underlying factor. This helps the researcher see how many dimensions are present and which items belong to each one. CFA comes later. Once the researcher has a proposed structure, CFA tests whether that exact model fits the data well enough to be trusted.

Why Good Writing Alone Is Not Enough

Researchers often assume that if they wrote items carefully, the items must reflect the intended dimensions. But respondents do not always interpret questions the way researchers expect. For example, a researcher may think they wrote separate items for trust, optimism, and comfort with technology, but the data may show that respondents treat those questions as the same idea. In other cases, a question may not clearly belong anywhere at all. Without factor analysis, these issues stay hidden.

That is why scale development must be treated as an empirical process, not just a writing exercise. You cannot know whether a scale works by reading it and thinking it looks right. You have to administer it, analyze how people respond, and let the structure emerge from the data before you can claim that the scale is measuring something meaningful.

Why EFA Matters Early in Scale Development

EFA is especially useful in the early stage of scale development because it helps uncover problems that theory alone cannot detect. It can show that some items do not load strongly on any factor, which means they are weak and probably should be removed. It can show that some items cross-load on more than one factor, which means they are ambiguous. It can also reveal unexpected dimensions or fewer dimensions than originally planned.

This is often where researchers realize that their conceptual model and the response data do not align as cleanly as they expected. EFA helps prevent premature confidence. Instead of assuming the structure is there, it forces the researcher to check whether the items actually behave like a coherent measure.

Why CFA Matters for Validation

CFA plays a different role. Once a possible structure has been identified, CFA tests whether that structure is defensible. In CFA, the researcher specifies which items belong to which factors and then evaluates whether the model fits the observed data. This is important because a pattern found in one sample may be sample-specific. What looks clean in an exploratory analysis may not replicate.

This is one reason researchers should not treat EFA and CFA as interchangeable. EFA helps generate a structure. CFA helps validate it. Using EFA alone leaves open the risk that the findings are unstable. Using CFA alone can be risky too, because it means the researcher is testing a structure based only on theory without first exploring whether the items behave as expected. The strongest scale development process usually uses both.

The Practical Steps of Building a Good Scale

A strong scale usually starts with a clear definition of the construct. The researcher needs to know exactly what is included, what is excluded, and how the construct differs from related ideas. Then comes a literature review to understand how others have defined the construct and whether good measures already exist. After that, the researcher writes a large pool of items, often more than they expect to keep in the final version.

The next steps are just as important. Experts review the items for content validity. Participants from the target population may be included in cognitive interviews or pilot testing to identify confusing wording. Then the researcher revises the items, collects enough data, and runs EFA to examine the structure. Weak or ambiguous items are removed, and the factor solution is refined. After that, CFA is run on a separate sample, or a held-out part of the sample, to test whether the structure holds up. Reliability and validity evidence are then examined before the scale is finalized.

Common Mistakes Researchers Make

A common mistake is treating reliability as if it is enough. A scale can be internally consistent and still measure the wrong thing. Another mistake is running EFA and CFA on the same sample and presenting the result as confirmation. That is not true validation. Researchers also sometimes keep weak items because they like how the questions sound, or they remove items based only on statistics without thinking about theory and content validity. These choices can damage the scale in different ways.

Another common issue is over-modifying a CFA model just to improve fit. If a researcher keeps adding correlated errors or adjusting the model without a strong theoretical reason, the analysis becomes more exploratory than confirmatory. Good fit indices alone do not prove that a scale is valid. They only show that the internal structure is statistically plausible. External validity still needs to be demonstrated.

Why This Matters for UX and AI Research

This topic is especially important in UX research and AI evaluation. Teams are increasingly trying to measure ideas like trust in AI, transparency, sense of control, perceived safety, and reliance. These are important constructs, but they are easy to measure poorly. If the underlying scale is weak, then the resulting insights may also be weak, even if the data look polished in a dashboard or report.

Better measurement leads to better decisions. When researchers use EFA and CFA thoughtfully, they are not just doing psychometrics for its own sake. They are making sure that product decisions, research claims, and design recommendations are built on something more solid than intuition.

Final Takeaway

A list of survey questions is not automatically a scale. EFA helps researchers discover whether the items form a meaningful structure. CFA helps them test whether that structure is stable and defensible. Together, these methods turn a set of items into something much more trustworthy. If you want to measure a construct seriously, writing the questions is only the start.