In recent weeks, the debate has intensified: tools claiming to measure “AI visibility”, promises of “ranking in ChatGPT”, and a growing sense that we are repeating the same mistakes we already experienced in previous marketing waves.
Two recent pieces have brought solid data and serious arguments to the table: the study by Rand Fishkin (SparkToro) on the inconsistency of AI recommendations, and the technical analysis by Natzir Turrado on the methodological limitations of prompt trackers.
At Mindset Digital, we want to take a clear position: AI visibility is not a KPI. But it is a signal. And in a probabilistic environment, learning how to work with signals (without selling them as certainties) is a competitive advantage.
First: what AI visibility is NOT
To avoid misunderstandings, let’s start with what AI visibility is not:
- It is not “positioning” in the classic SEO sense (there is no stable SERP).
- It is not a “ranking” (the order of LLM responses varies dramatically).
- It is not a direct business proxy (it does not imply traffic, leads, or sales on its own).
- It is not the real user experience (memory, context, geography, history, session, and connected tools all affect the output).
These clarifications are not opinions: SparkToro’s own study shows that, given the same prompt, recommendation lists are rarely repeated and almost never appear in the same order.
So why do we say it is not a KPI?
Because a KPI, by definition, should meet (at least) the following conditions:
- Stable definition (the same concept across teams and vendors).
- Comparable measurement (across periods, markets, and tools).
- Reasonable relationship with outcomes (a demonstrable impact on business, even if indirect).
In AI visibility today, these conditions often fail:
- Each tool calculates “scores” differently (and rarely explains the formula clearly).
- API and interface results can differ.
- Personalization means that the “same question” is not the same question in practice.
As a result, turning an “AI visibility score” into a corporate KPI usually creates two problems: false precision and noise-driven decisions.
What it actually is: a radar (exploratory signal)
Not being a KPI does not mean it is useless.
The SparkToro study highlights an important nuance: while rankings are largely random, aggregated presence (how often a brand appears across many executions) can capture a statistically reasonable pattern in certain contexts.
This is where AI visibility has value: as a radar. A radar does not tell you “how many sales you will make”, but it helps you:
- Detect gaps: topics or intents where your brand does not enter the “consideration set”.
- Coverage diagnosis: which areas are overrepresented vs undercovered.
- Competitive signals: which topics competitors appear in (and you do not).
- Volatility alerts: sudden changes that force you to revisit assumptions.
Why “noise” is structural
Part of the “noise” is not a bug: it is the system.
- LLMs are probabilistic: they aim for response diversity.
- Real prompts are highly variable: people with the same intent phrase questions very differently.
- Citation and source selection systems evolve over time.
On this last point, there is one data point worth internalizing: a study by Profound measured “citation drift” month over month and observed very large changes in cited domains even for identical prompts (in their analysis, Google AI Overviews and ChatGPT showed monthly variations of dozens of percentage points).
This does not invalidate working with AI. What it invalidates is expecting ranking-like stability where none exists.
When it DOES deserve attention (and when it does not)
Cases where it usually adds value
- Initial diagnosis: understanding whether a brand exists or is invisible for certain intents.
- Topic-based opportunity detection: especially in sectors with heavy comparison and recommendation behavior.
- Narrow markets: when the “option space” is limited, patterns tend to be more interpretable.
- Brand and reputation strategy: when visibility depends on third parties (rankings, lists, mentions), not just your own website.
Cases where caution is required
- Very early-stage brands: the signal may be too volatile and sampling-sensitive.
- Investment decisions based on a single score: without triangulation with Search Console, PR, mentions, branded demand, and analytics.
- Few hand-picked prompts: high risk of measuring team intuition rather than the market.
How we approach it at Mindset Digital (without hype)
Our approach starts from a simple premise: AI visibility is a strategic signal, not a performance KPI.
That is why, when we measure it, we do so with clear rules:
- We do not sell “AI rankings”. We talk about presence and probability.
- We prioritize intent-based analysis and thematic clusters, not isolated prompts.
- We look for aggregated consistency, not easily manipulated “snapshots”.
- We triangulate with traditional SEO (demand, content, authority, PR, brand) and business objectives.
In 2026, the advantage is not in “having a tool”. It is in knowing how to interpret imperfect signals and turn them into sound decisions.
FAQs
Present it as an exploratory indicator (radar) with a clear objective: gap detection, intent-level presence, and change alerts. Always pair it with 2–3 business metrics (leads, MQLs, branded demand) and traditional SEO metrics for context.
Methodological transparency (models, frequency, regions, API vs web access, brand/citation detection), the ability to group by intent or topics, and reporting focused on distributions/presence rather than rankings.
By starting from demand signals (recurring sector topics, real FAQs, pain points, support and sales queries) and working with clusters. The goal is to cover intents, not to write “perfect” prompts.
With caution: each tool typically uses different prompt universes and methodologies, so absolute numbers are not comparable. If you compare, focus on trends and qualitative findings (thematic gaps) rather than scores.
It depends on the sector and the source of knowledge (whether it is more model “memory” or more “search/citation” driven). As a practical guideline, work in weekly cycles for hypotheses and reviews, and monthly cycles to validate trends.


