How to Decide Your UX Interview Sample Size
- Mohsen Rafiei
- 4 days ago
- 7 min read
You’re walking through a big clothing store, flipping through rack after rack trying to find something interesting. At first everything feels new. Different colors, styles, cuts. After a few minutes, though, you realize you’re seeing almost the same things again and again. Sure, the store is huge, but the variety isn’t unlimited. No matter which aisle you turn into, it’s basically more of what you’ve already seen. At some point you stop searching and think okay, I get it, nothing new here. Congratulations, you just experienced saturation.
This simple everyday moment captures one of the most important concepts in qualitative research and interview methodology. Saturation is the point where continuing to look, ask, or collect more data stops adding genuinely new information. It is not about running out of time, hitting a quota, or feeling tired. It is about realizing that the information space itself has been covered. In UX interviews and qualitative studies, this principle determines when data collection should stop and whether the insights we walk away with are actually dependable. But how do we know when we’ve truly reached that point? To answer that, we first need to tackle a few key questions.

Why we will just keep interviewing until it feels enough is not good enough
In a lot of UX and qualitative projects, people say things like we will interview until it feels saturated or we will stop when no new themes appear. It sounds reasonable, but the way it is often used is basically a vibe check, not a method. The qualitative literature is very explicit that this kind of unexamined logic can hurt the credibility of the findings and the validity of the conclusions.
Several large reviews of published qualitative studies show the same pattern. Researchers frequently claim they reached saturation, but give almost no detail on what that meant in practice, how they tracked it over time, or how they decided to stop data collection. In many papers, saturation is mentioned as a justification for sample size with no concrete description of the process at all. This is exactly the problem. Saturation gets treated as a magic word that closes the conversation, rather than an observable point in a traceable analytic process.
Methodologists keep warning that this is not just a reporting issue. It directly affects quality. Failing to reach or properly judge data saturation undermines content validity. If you stop too early, you risk missing important themes. If you stop for the wrong reasons, you might simply be exhausting your energy or recruitment channels, not exhausting the phenomenon you are studying.
What saturation actually means
One of the biggest sources of confusion in UX and qualitative work is that saturation gets used as a vague shorthand rather than as a clearly defined methodological concept. Across the literature, saturation does not mean being tired of interviewing or thinking that you have enough data. It means reaching a point in data collection and analysis where additional data stop contributing new information relevant to the research question.
Data saturation is defined as the moment when there is enough data to replicate the study, the ability to gather new information has been exhausted, and further coding no longer produces new insights. This definition matters because it frames saturation as something tied to analytic productivity, not to sample size or researcher fatigue. The question is not how many people have we talked to, but are we still learning anything meaningfully new from the data.
Saturation was first developed within grounded theory under the term theoretical saturation referring to the point where further data no longer add properties or relationships to developing theory. Contemporary qualitative research uses the broader term data saturation and applies it to inductive thematic analysis, focusing on the flow of new information rather than formal theory building. It is not about complete coverage of everything possible. It is about diminishing analytic returns. There is also no universal saturation number that applies to all studies. Research designs, populations, methods, and levels of analysis vary too widely for that. Ethnographic fieldwork, phenomenological interviews, multi-site comparative studies, and narrow UX workflow studies all saturate on different timelines because their goals and data structures differ.
Systematic reviews show that several factors shape the pace of saturation including use of pre determined codes, relevance and homogeneity of participants, number of qualitative methods combined, and length or depth of interview sessions. Saturation is therefore an interaction between study design and complexity of the phenomenon being studied. It is not a simple counting exercise. Saturation should be understood not as a number or a gut feeling, but as a process outcome. It is the point where new data stop expanding the conceptual or thematic space in a meaningful way. In UX research terms, this means saturation is not about pleasing stakeholders or meeting deadlines. It is about reaching the point where further recruitment becomes analytically redundant, not merely operationally inconvenient.
Code saturation vs meaning saturation
Not all saturation is the same thing. Reaching saturation at the level of surface categories does not mean you have a saturated real understanding of user experience. The literature clearly separates two layers. Code saturation is the point where no new high level topics appear. You already have the buckets and new interviews are not introducing additional categories. Meaning saturation goes deeper. It is reached only when continued interviews stop adding nuance, variation, explanations, contradictions, or contextual detail to those existing categories.
This difference matters because many UX teams stop too early. Code saturation usually happens fast. After a handful of interviews teams often feel confident they know the main issues. But meaning saturation takes longer. It comes from hearing how issues vary across user types, situations, pressures, goals, or constraints. Without that depth, themes stay superficial and design decisions are based on shallow summaries rather than grounded understanding.
This is why counting interviews alone is misleading. Ten interviews may stabilize a theme list but still leave major gaps in interpretation. In narrow and homogeneous studies both layers might converge quickly. In more complex or diverse contexts they often do not. Without checking for stabilized meaning, claims of saturation usually refer only to surface coverage and leave uncertainty about the strength and reliability of the insights.
What the evidence actually says about interview sample sizes
Once saturation is separated into code saturation and meaning saturation the discussion about sample size becomes clearer. The strongest evidence shows that in focused studies with relatively homogeneous participants code saturation often appears between about nine and seventeen interviews. Focus groups reach it even earlier because multiple perspectives are captured in single sessions.
These numbers are not universal rules. They change when participant diversity increases, when studies span sites or cultures, or when goals shift from simple description to deeper explanation. Cross site and cross cultural work often requires totals in the twenty to forty interview range to stabilize patterns. Across studies the consistent conclusion is that sample size does not drive saturation by itself. Study structure does. Homogeneous samples, narrow questions, and focused interview guides hit saturation faster. Heterogeneous samples, complex contexts, and exploratory designs saturate far more slowly. Early plateau of themes does not equal analytic completion. Code saturation tends to arrive first while meaning saturation usually lags.
How saturation can actually be measured during a study
Most teams claim saturation after data collection ends and rarely show how they reached that decision. Several researchers criticize this approach and argue that saturation should be monitored during the study, not guessed afterward.
One practical method uses three elements. Base size refers to an initial set of interviews used to build the core thematic framework. Run length refers to consecutive interviews examined for appearance of new codes. The new information threshold defines how much novelty is tolerated before declaring saturation. When interviews following the base set consistently fail to add new themes beyond that threshold, saturation is documented rather than assumed. More quantitative work models theme discovery as a growth curve where each interview contributes some amount of new information until the pool of discoverable themes is exhausted. Saturation becomes a measurable percentage rather than a vague stopping point and can be used to estimate how many more interviews might be needed for fuller coverage.
Both approaches show the same principle. Saturation should be tracked through accumulation of new information rather than simple interview counts or researcher intuition.
What pushes saturation earlier or later
Saturation speed depends largely on study design, reviews consistently identify several drivers. Pre-defined coding frameworks tend to accelerate saturation while fully inductive designs slow it down. Participant homogeneity speeds saturation while diversity in background, role, geography, or usage contexts stretches it out. Using multiple methods increases richness but delays stabilization because different techniques surface different aspects of experience. Longer exploratory interviews can deepen insight per participant but also increase analytic complexity. Theme overlap also matters. When experiences strongly overlap across users, saturation arrives sooner. When experiences differ widely thematic accumulation slows and may require larger samples.
In simple workflow studies with narrow user segments saturation often arrives early. In research exploring trust, decision making, cross product behavior, or cultural variation saturation arrives later and should be planned for accordingly.
What all this means for UX research practice
Saturation papers point to a different message than the popular UX shortcut of interviewing a few people and calling it saturated. They do not claim massive samples are always required. They argue that saturation must be deliberately designed, tracked, and justified. Well scoped UX studies with homogeneous users can reach useful saturation quickly. Broader or more explanatory projects require larger samples and longer analysis periods. Teams should distinguish whether they have only stabilized their theme list or whether meaning has also settled.
Saturation should never be a vibe. It should be an explicit design parameter decided up front, monitored as interviews progress, and reported transparently. Track new codes cumulatively. Set stopping rules. Be honest about what layer of saturation was reached. When UX teams apply this discipline, small sample qualitative work becomes defensible research rather than opinion backed by transcripts. This is exactly what the saturation literature pushes us toward.
If you are running interview projects right now and are unsure where to start or how to design them rigorously, this is exactly the kind of problem we work on at PUXLab. We help teams design interview protocols, define saturation plans upfront, track new information during collection, and translate qualitative findings into decisions that stand up to real scrutiny. If you want to move beyond gut-driven research and toward truly defensible UX evidence, you can reach us anytime at admin@puxlab.com.