A UX Framework for Measuring Feature Awareness

Bahareh Jozranjbar
Dec 11, 2025
11 min read

Most product teams still assume that if a feature ships, users will naturally discover it. In practice that assumption fails all the time. New capabilities launch, engineering and design celebrate, dashboards show a small bump in traffic, and then adoption plateaus at a level that cannot possibly justify the investment. The real problem is usually not usability in the narrow sense. It is feature awareness: do people even notice that the thing exists, and do they understand what it is for.

Feature awareness as a cognitive process

Awareness is not a simple yes or no state. It is a sequence of cognitive events. A user must first perceive the feature, then allocate attention to it, then make sense of it by fitting it into an existing schema, and finally encode it well enough that they can recall and use it later. If you break at any of these stages, adoption will look mysteriously low even though technically the feature is “available”.

Inattentional blindness

The classic invisible gorilla experiment shows how people can literally fail to see a highly visible object when their attention is focused elsewhere. In interfaces the same thing happens. When someone is trying to complete a checkout flow or find a specific document, their perceptual load is high. The brain aggressively filters out anything that does not look like it will help with the current goal.

That filtering happens at the level of attention, not at the level of raw sensation. Photons hit the retina, but the visual cortex does not surface the information as conscious content. A tiny “new” badge or subtle icon in the corner of a dense workflow has almost no chance of being processed. From the user’s perspective, it never existed.

Change blindness

Change blindness is a related mechanism. The brain does not maintain a pixel perfect snapshot of the interface. It keeps a rough gist. When a page reloads or a panel opens, new elements can appear without entering that gist. If a familiar “Save” button becomes “Submit”, or a new option appears in a menu without any motion, many users will persist with their old mental model as if nothing changed.

The practical implication is simple. When you introduce a new feature, you need a distinct temporal signal that forces the brain to update its internal model. Motion, temporary spotlight overlays, or micro animations can all exploit the visual system’s sensitivity to change and help overcome change blindness.

Cognitive load and the information store

Cognitive Load Theory gives a helpful vocabulary for why discovery fails even when features are technically visible. Intrinsic load comes from the task itself. Extraneous load comes from the way information is presented. Germane load is the effort the user can spend on learning and building new schemas.

Feature discovery lives in that germane space. If the combination of intrinsic and extraneous load already saturates working memory, there is no capacity left to explore something new. Users enter a state of cognitive tunneling. They stick to familiar flows and ignore novel tools because exploration feels too expensive.

The information store principle adds another layer. People rely heavily on long term schemas when they interact with a product. When a feature violates those expectations, discovery friction increases. Place a critical action in a rarely used footer and it will be treated like an ad or a legal link. Style a useful tool so that it resembles a banner and banner blindness will kill it.

Why recall matters more than recognition

Most awareness studies lean on recognition questions. Did you see this button. Have you noticed this icon. At first glance that seems reasonable. In practice it produces optimistic and misleading data.

Recognition is the feeling of familiarity when a stimulus is presented again. It is relatively shallow. Users can say yes because they vaguely remember the shape, or because they infer the socially desirable answer. Recall is much stricter. It requires people to retrieve the feature from memory without a direct cue.

From a UX perspective recognition is excellent for design. Menus, icons, and visual patterns that are easy to recognize reduce cognitive effort. For measurement, recall is the better test. If users cannot describe what new features they saw, in their own words and without prompts, those features have not been encoded deeply enough to drive real adoption.

A robust awareness interview uses a funnel sequence. You start with open free recall. Walk me through what you noticed on the screen and how you completed the task. Then you move to functional prompts. Were there any tools that could have helped you do X. Only at the end do you show screenshots and ask if a specific button or panel was visible. This order lets you distinguish between high salience items that surface spontaneously, features that are understood but not top of mind, and elements that were never perceived at all.

To detect response bias you can even seed a fake feature into awareness checklists. If a nontrivial fraction of respondents claim to have seen a ghost element, you know a portion of your positive responses are driven by acquiescence or memory construction rather than true experience.

Study designs that reveal real discovery

Once you take the cognitive side seriously, the way you design your studies changes quite a bit.

Unmoderated tasks for spontaneous discovery

If you want to know whether people discover a feature on their own, you need to let them work without guidance. That usually points to an unmoderated setup with outcome based goals. Instead of instructing them to use the new bulk edit tool, you ask them to update five items as efficiently as possible and watch what happens.

The key metric here is spontaneous discovery rate. Out of all participants who could have used the feature, what proportion chose it without being told it existed. Only after the task is completed do you ask about other ways they might have solved the problem or what else they noticed on the page. This separates functional success from true awareness.

Retrospective think aloud with behavioral traces

Traditional think aloud protocols ask participants to narrate their thoughts while working. That narration itself changes how they scan the UI. They become more systematic and reflective than real users. Awareness metrics from those sessions are almost always inflated.

A better approach is retrospective think aloud combined with eye tracking or high quality session recording. You let participants complete the task silently, then immediately watch the replay together. You pause at moments of hesitation or gaze shifts and ask what they were looking at or thinking about.

This method tells you whether the feature was never in their visual field, was looked at briefly and dismissed as irrelevant, or was considered but rejected as too complex or risky. These are very different failure modes and each one suggests a different design or content intervention.

A B placement tests that go beyond clicks

When the key question is where to place a feature, controlled experiments are still powerful. The design pattern is straightforward. Randomly assign users to different placements and instrument the interface carefully. The important part is not to stop at click through rates.

Time to first interaction tells you how quickly the feature becomes part of the workflow. Hover or focus rate can act as an attention proxy. If many users hover near the element or pause their gaze on it but fail to click, you are dealing with comprehension or trust issues rather than pure visibility.

From telemetry to insight

Self reports tell you how users experienced the interface. Telemetry tells you what they actually did. For feature awareness, you need both.

Core adoption metrics

It is helpful to structure behavioral metrics in layers. Breadth of adoption answers how many active users have engaged with the feature at least once. Low breadth is your first signal of discoverability issues.

Depth of adoption captures how often those adopters return to the feature. High breadth with low depth suggests that users find it but do not see enough value to keep using it.

Time to adopt measures how long it takes between a relevant reference point, such as account creation or feature release, and first use. Stickiness can be approximated by ratios like daily active users of the feature over monthly active users. That number helps you distinguish novelty clicks from genuine integration into the user’s routine.

Survival analysis for discovery

Most dashboards report an average time to first use. That metric is fundamentally biased because it excludes people who never interact with the feature. The longer and flatter that tail is, the more optimistic your average will look.

Survival analysis treats feature discovery as a time to event problem. You model the probability that a user has not yet discovered the feature as a function of time. The resulting curve is a discovery profile. A sharp early drop means the feature is easy to find for most users. A flat or slowly declining curve means that large segments never encounter it.

Crucially, survival methods handle censored data. Users who churn or who have not yet found the feature by the end of the observation window still contribute information up to the last moment you observed them. That avoids the survivorship bias baked into naive averages.

A Cox proportional hazards model lets you go one step further and ask which factors predict discovery speed. You can include device type, tenure, segment, or acquisition channel as predictors and estimate how much each one increases or decreases the instantaneous probability of finding the feature.

Navigation as a Markov process

Another common failure pattern is structural. Users simply never traverse the part of the product where the feature resides. Markov chains provide a natural way to analyze this. You treat screens or key states as nodes and compute the transition probabilities between them.

Once you have a transition matrix, you can spot feature deserts: navigation loops and flows that never lead to the new capability. If nine out of ten journeys bounce between home and search, while your feature lives in settings, no amount of visual polish will fix the problem. The solution lives in information architecture, entry points, and flows, not in button styling.

Predicting discovery with logistic regression

When you care about a binary outcome within a fixed window, such as found versus not found during the first week, logistic regression is a good workhorse. You can include all the usual suspects as predictors and obtain an interpretable model of discovery odds.

Unlike simple correlations, a regression can control for confounding factors. If mobile users have lower discovery, is that because of the device, shorter sessions, or different tasks. A model that includes all three lets you isolate the device effect more cleanly.

Getting the statistics right for small UX samples

Many UX teams work with modest sample sizes. Unfortunately, a lot of common statistical recipes assume large samples and break badly in the small N regime. That creates either absurd intervals such as negative discovery rates or unjustified certainty.

The adjusted Wald interval is a simple fix for proportions like discovery rate. Instead of plugging the raw successes and trials into the usual formula, you add two imaginary successes and two imaginary failures, then compute the interval. This small correction dramatically improves coverage when sample sizes are small or proportions are near zero or one.

For timing metrics like time to first use, distributions are often skewed with a long tail of very slow discoverers. Bootstrapping is a robust nonparametric way to estimate medians and confidence intervals without assuming normality. You resample your data with replacement many times, compute the statistic in each resample, and use the resulting distribution to form an interval.

When you compare two designs with small samples and noisy data, permutation tests can be more honest than t tests. You pool the data, repeatedly shuffle the condition labels at random, and calculate how often you would see a difference as large as the observed one if the designs were truly equivalent.

Finally, when the same users perform multiple tasks or discover multiple features, their observations are correlated. Linear mixed effects models account for that structure by modeling both fixed effects of the design and random effects for users and tasks. This prevents a single very skilled participant from dominating your conclusions.

Bayesian architectures and adaptive optimization

Classical A B testing is built around null hypothesis significance testing and fixed sample sizes. It was not designed for continuous monitoring or fast iteration, and it does not answer the question stakeholders actually care about.

A Bayesian framing treats each variant’s performance as a probability distribution. Instead of saying the difference is or is not significant, you can say there is a certain probability that variant B is better than variant A, and quantify the expected loss if you were to choose the wrong one. This naturally supports sequential decision making. You can monitor results in real time and stop the test as soon as the probability of a meaningful advantage crosses your chosen threshold.

Priors make this even more realistic. If you know that most UI tweaks historically have tiny effects on discovery, you can encode that expectation as a weakly informative prior centered near zero. Early data spikes are then treated with skepticism, reducing false positives. As more data accumulates, the influence of the prior fades and the posterior converges toward the truth.

For products with heavy traffic, contextual multi armed bandits go one step further by turning experimentation into an always on allocation problem. Instead of holding a strict fifty fifty split, the algorithm gradually routes more traffic to better performing placements while still exploring alternatives. Netflix’s evolving thumbnails are a canonical example, but the same machinery can be used for feature cues, entry points, or onboarding treatments.

Avoiding common traps

Several recurring mistakes undermine feature awareness work. Priming is one of them. Leading questions like Do you like the new save feature presuppose both the existence and value of the feature. They activate the concept in memory before you measure it. Neutral, task oriented instructions are safer. Ask users to ensure data is preserved, then see whether they choose the save feature on their own.

Another trap is uncontrolled multiple testing. It is easy to slice telemetry by device, geography, tenure, and dozens of other attributes, then celebrate whichever differences show a small p value. Without controlling the false discovery rate, a good portion of those “insights” will be random noise. Procedures like Benjamini-Hochberg help keep the share of false positives among your declared findings under control.

Vanity metrics are the third trap. Total clicks on a feature will almost always grow as your user base grows, even if the proportion of people who ever find it is shrinking. Ratios, time to first use, and retention of feature use are much more informative. A pattern of high discovery but low reuse tells a very different story than low discovery with strong repeat engagement among those who do find it.

Connecting awareness to ROI

Treating awareness as an economic variable changes how it shows up in prioritization discussions. A feature with five percent awareness means that ninety five percent of the engineering and design investment is effectively hidden from most of the user base.

You can quantify the upside of better awareness by comparing lifetime value between adopters and non adopters, then multiplying the difference by the change in discovery rate and user base size. You can account for support savings by estimating how many tickets a feature should eliminate once people actually know it exists. And you can treat the cost of the research itself as part of a straightforward ROI equation rather than as an abstract overhead.

Putting it all together

The question Do users notice the new feature sounds simple, but answering it rigorously is not. You are dealing with attention limits, memory systems, navigation structures, behavioral variability, and statistical noise all at once.

A serious framework for feature awareness does four things. It respects the cognitive architecture of perception and memory, so your designs and questions do not fight human nature. It uses study designs that separate true spontaneous discovery from primed recognition. It analyzes telemetry with methods that handle censoring, skewed data, and small samples instead of smoothing everything into misleading averages. And it reports results in a way that connects awareness to product strategy and financial outcomes.

At PUXLab this is exactly how we approach feature awareness work. We combine cognitive psychology, careful study design, and advanced analytics to move teams beyond “did they click it” toward a defensible answer to “did users truly notice, understand, and adopt this feature.” That means designing unmoderated tasks that capture spontaneous discovery, pairing retrospective think aloud with behavioral traces, using survival analysis and mixed effects models instead of fragile averages, and framing A B tests in a Bayesian way that supports real product decisions. If your team is wrestling with invisible features or low adoption after launch, this is the kind of end to end, research first framework we bring into partnerships.