How to Actually Measure AI Visibility (And What Everyone Gets Wrong)

May 18, 2026

If you’ve been trying to figure out how to measure AI visibility and see where your brand shows up in AI search results, you are not alone. You might also be wondering whether any of the tools claiming to track these metrics are actually trustworthy.

The problem isn’t that there are no tools. The problem is that there are too many, most of them are measuring the wrong things, and the numbers they give you are almost meaningless without proper context. We’ve been building our own AI visibility tracking framework at the agency level. Here is what we’ve learned: what to actually track, what to ignore, and how to approach the entire ecosystem.

BLUF: Key Takeaways

Track Commercial, Not Informational: AI tools answer informational queries directly, meaning zero clicks for you. Only track commercial queries that result in a brand recommendation.
Visibility is Non-Deterministic: AI models rarely give the exact same answer twice. Single-snapshot tracking tools provide incomplete data; you need frequency and averaging.
The “Three Levels” of Tracking: Effective tracking focuses on Brand Recommendations, Share of Source (citations), and AI Referral Traffic.
Offsite Validation is Crucial: AI doesn’t decide to recommend you based solely on your website. It looks to third-party reviews, Reddit threads, and industry blogs to validate your brand first.

Why Is AI Visibility Non-Deterministic (And How Does It Change Everything)?

AI visibility is non-deterministic because AI models do not give the exact same answer twice; results constantly shift based on real-time generation, making single-point tracking completely ineffective.

Ask the same question on Monday and Tuesday, and you’ll get different results from different sources, sometimes different brands, and different rankings. There’s overlap, sure. But the results are not static. This is baked into how these systems work, and it’s something most AI visibility tools gloss over completely.

What this means practically: any single data snapshot is incomplete. A tool that pulls one query, one time, and reports your “AI score” is not giving you real intelligence. To truly measure AI visibility, you need averaging, you need frequency, and you need to know exactly what questions are being asked.

Why Do Most AI Visibility Tools Give You Useless Numbers?

Most AI visibility tools give useless numbers because they track irrelevant informational queries across varying models, completely ignoring the critical commercial queries where AI actually makes brand recommendations.

You can throw a rock right now and hit an AI visibility tool. They’re everywhere. But here’s the problem: they all use different query sets, different models, and different definitions of what “visibility” even means. If a tool tells you it’s tracking 100 prompts, that should immediately raise a flag. The bigger question is: what are those prompts?

Most tools are tracking informational queries like “what is,” “how to,” or “explain this.” These are the questions that used to drive blog traffic in the SEO era. But AI just answers them directly now. The user gets the summary, the search stops, and no one goes to your website.

Tracking informational visibility is measuring the wrong thing. The only queries worth tracking are commercial ones, questions that end with a brand recommendation. Those are the decision points. That’s where AI makes up its mind. Beyond the query set, you also need to know which model the tool is using. ChatGPT and Gemini are not the same. Asking one versus the other is like asking two different people with different training data and tendencies. Don’t assume a score from one tracks to another.

What Are the Three Levels Used to Measure AI Visibility?

We measure AI visibility across three distinct levels: Brand Recommendations (direct mentions), Share of Source (domain citations), and AI Referral Traffic (actual visits from AI platforms).

Here is a breakdown of how we analyze these levels:

Visibility Level	What It Measures	Why It Matters
1. Brand Recommendations	How often your brand is explicitly named and ranked in the AI’s answer.	This is the ultimate goal. Top-position recommendations carry immense weight and drive direct commercial decisions.
2. Share of Source	How often your domain is cited as a reference or linked source, even if your brand isn’t named.	Highly diagnostic. If you are cited but not recommended, you lack offsite trust/reviews. If competitors are cited, those URLs become your research targets.
3. AI Referral Traffic	Traffic recorded in Google Analytics from sources like ChatGPT, Perplexity, and Claude.	Shows which content AI finds valuable enough to send users to. (Note: Data is directional, as privacy browsers strip many UTMs).

Note on Referral Traffic: We’ve seen ChatGPT consistently drive the largest share of this traffic, which is why our strategy is built around ChatGPT first. Perplexity typically comes in around 20% behind it. The others are smaller but growing.

How Do We Run Our Own Tracking Process?

We run our tracking process by manually testing 5 to 10 highly relevant commercial queries first to understand context, then automating them through official APIs to track mentions, positions, and competing brands at scale.

Precision beats volume here. We’d rather have ten highly relevant, commercial-intent queries that genuinely reflect how customers make decisions than a hundred generic ones padded with informational fluff.

Here is how our tracking workflow operates:

Manual Testing First: Reading the answers yourself, as a user, gives you context no dashboard can. You see which competitors are recommended, what YouTube videos or Reddit threads appear, and the overall tone of the response.
Automating via Make: Once we validate the queries, we use Make (formerly Integromat) to pipe query data automatically into Google Sheets.
Using Official APIs: We use the official OpenAI API because it gives us access to a large volume of source data that mirrors the real ChatGPT search experience.
Cross-Referencing Models: We use Perplexity via OpenRouter (which allows multiple models through a single API key) to spot patterns. If a brand consistently appears across ChatGPT and Perplexity, the underlying content strategy is working.

What Are Fan-Out Queries (And How Do They Power AI Search)?

Fan-out queries are the hidden background searches an AI model runs before answering your prompt, revealing the authoritative sources and sub-topics it relies on to build its final response.

When you ask a question, the AI doesn’t just answer it. Ask ChatGPT “what’s the best AEO strategy for a SaaS brand in 2026” and it will quietly run four or five background searches, things like “2026 AI search optimization guidance” or “Google AI overviews structured data,” before building its response.

This is important for two reasons:

Validation preference: It tells you that AI is trying to find validated sources, gravitating toward authoritative references like developer blogs and official documentation.
Underlying intent: It shows you that the questions you’re optimizing for have underlying questions beneath them. Identify those sub-queries and answer them in your content to increase your chances of showing up in the research layer.

(Pro Tip: Use the free Chrome extension “ChatGPT Search and Final Queries” to see these background searches in real-time).

Why Is Your Website Only One-Third of the Battle (The Offsite Signal Problem)?

Your website is only one-third of the battle because AI relies heavily on third-party offsite signals like industry blogs, Reddit threads, and reviews, to validate your brand before deciding to recommend it.

Your website is not where AI decides whether to recommend you. It’s where AI goes to fetch information after it’s already decided. The recommendation happens earlier, in the brand validation layer. Research suggests AI follows a three-step process:

Discovery: It goes to industry blogs and content to discover which brands exist in a category.
Validation: It moves to UGC platforms, forums, and review sites to validate which of those brands are actually worth recommending.
Population: It pulls from your brand website to populate the final answer with specifics.

If you’re being cited but not recommended, step two is where you’re falling short. Reviews matter. Third-party mentions matter. The practical implication? Look at the sources AI is using when it answers the queries you care about. If Reddit keeps showing up, you need Reddit visibility. If YouTube keeps showing up, you need video content.

How Can You Build the Right Queries to Track?

You can build the right queries to track by filtering for prompts that produce specific brand recommendations, align exactly with your core offerings, feature competitor visibility, and have validated search volume.

Not all queries are worth tracking. Use this checklist to filter your targets:

Query Criteria	What to Look For
Produces a Brand Recommendation	The answer must end with specific brand names, not just educational steps or definitions.
Specific to Your Core Offerings	Track niche queries where you logically fit. Don’t track “best CRM” if you are a niche “CRM for small nonprofits.”
Competitor Visibility	If your competitors show up in the answer, you are in the right place. You want your name where theirs are.
Validated Search Volume	Use standard SEO tools to confirm the query has actual search traffic. Volume validates intent.

What Should You Avoid When Tracking AI Visibility?

When tracking AI visibility, you should avoid chasing informational queries, blindly trusting single-score AI visibility metrics, and relying solely on incomplete Google Analytics referral data.

It is incredibly easy to get distracted in this new landscape. Here is what you should explicitly avoid:

Chasing Informational Queries: These used to drive SEO traffic, but now AI simply answers them on-screen. Optimizing for informational visibility targets a user journey that no longer exists.
Treating “AI Scores” as Hard Benchmarks: AI scores from SaaS tools are directional at best. They are built on hidden assumptions regarding what counts as visibility and how often queries are run. Use them to track trends over time, not as absolute gospel.
Relying Only on Referral Traffic: While referral traffic is a great signal for identifying citable content, privacy tools suppress a massive portion of it. Do not make sweeping strategic decisions based solely on what Google Analytics attributes to AI sources.

What Is the Simplest Framework to Start With Today?

The simplest framework to start with today is to manually run five to ten commercial-intent queries weekly in ChatGPT, document the results, and analyze the patterns to guide your PR and content strategy.

You don’t need a custom API setup to get started. Here is the minimum viable version:

Pick 5–10 commercial-intent queries that should logically end with your brand being recommended.
Run them manually in ChatGPT.
Analyze the answers: See who is being recommended, what pages are being cited, and where your brand does or doesn’t show up.
Repeat weekly and track patterns: Keep notes in a spreadsheet. Look for which sources keep appearing and what competitors have that you don’t.

Over time, those patterns become your roadmap. The brands that are going to win in AI search aren’t necessarily the ones with the biggest budgets or the most content. They’re the ones being talked about by the right people, in the right places, in a way that AI can find and trust. Building that kind of presence takes time, but it starts with knowing exactly how to measure your baseline.