Why we stopped measuring Google rank.
Seven months ago we killed the weekly Google rank report for our own marketing and for client retainers.
The decision was not made by ideology. It was made by a graph. We had been tracking our top 40 buyer-intent queries in Google, weekly, for eighteen months. Over the same period, we had started tracking the same 40 queries in ChatGPT and Claude's web-connected modes. Our Google rank moved up 11 positions on average over 18 months. Our citation rate in ChatGPT moved from 0% to 34%. Our citation rate in Claude moved from 0% to 41%.
The question is not which number is better. The question is: when a prospect types "what is the best AI-native consulting firm in Canada" into a search bar in 2026, where does the answer come from? Increasingly not from the blue links. Increasingly from a synthesis that either cites us or does not.
We built "share of answer" as the replacement metric. It is the percentage of queries, within a defined set, where a given brand is cited at least once in the first paragraph of the answer across a specific set of engines. We have been running it weekly since the third week of February. This essay is the first public slice of the data.1
Methodology.
This is a small study. Four hundred queries, four engines, six weeks. We are reporting what we saw, not what we predict.
Query set. 400 queries spanning 8 buyer-intent categories: AI consulting firm selection, pentest vendor selection, SaaS positioning, GEO/LLM optimization, product discovery sprints, SEO audit, marketing retainer shapes, and fractional CMO hire criteria. Queries balanced 100 brand-seeking, 150 category-defining, 100 comparison, 50 how-to.
Engines. Claude (claude.ai with web search enabled), ChatGPT (gpt-4 with browsing), Perplexity (default Pro), Gemini (default with search). All run from a Canadian IP via a clean profile, new session per query, no memory.
Frequency. Each query run once per week for six weeks. Six hundred query-engine-weeks per engine. 9,600 total observations.
Recording. For each query, we capture: the first-paragraph citations, the first ten citations by position, the first three domains cited, the presence or absence of NexcurAI, our positioning in the answer, and the sentiment. Google SERP rank captured at the same moment via a separate SERP tool for comparison.
What we are not measuring. Click-through rate (the engines do not provide that data for non-Google surfaces). Conversion rate (too early). Voice-assistant answers (separate surface, separate study). Localized answers (we held locale constant).
The full dataset is available at /samples/share-of-answer-400-queries.json. Each row is one query-engine-week observation. The charts in this essay and the dashboard are generated from the same data.
Headline numbers.
Some of these were expected. One of them was not.
Share of answer by engine - NexcurAI
Percentage of the 400 queries in which NexcurAI is cited in the first paragraph. Averaged across 6 weeks.
Claude at 41%, ChatGPT at 34%, Perplexity at 26%, Gemini at 17%. These are our numbers, a small Canadian consulting firm with a specific corpus. Your numbers will be different. The interesting thing is not the absolute levels; it is the spread. The highest-citing engine cites us 2.4x as often as the lowest.
Our Google SERP rank over the same period: top-3 for 14% of queries, top-10 for 34%, top-20 for 61%. If we had used SERP rank as our only KPI, we would have under-counted our presence in Claude and ChatGPT by a large margin, and over-counted our presence in Gemini (where we rank better in SERP than we get cited in answer).
The unexpected finding: Gemini's citations are much flatter to our SERP rank than any other engine's. Gemini behaves more like classical SERP than like a synthesizing engine. Claude and ChatGPT behave least like SERP. Perplexity is in between, and tilts toward high-authority domains even when those domains rank lower in classical SERP.2
Engine by engine, what gets cited.
4.1 Claude.
Claude cites our essays more often than our service pages, at a ratio of 2.7 to 1. Pages that get cited most are the ones with named sub-sections, explicit claim-evidence-source structure, and first-person plural ("we found" / "we measured"). Claude seems to reward documents that behave like primary sources. The top-5 cited pages of ours: the handbook sample (/handbook.html), the GEO playbook (business-plan/07), the handbook-thesis essay, the pricing page, and the security service page.
Claude also cites fewer sources per answer than the other engines. Average: 3.2 sources per answer. The citations feel curated. If you are not in the top 4 or 5 sources Claude considers credible for a query, you are not cited at all.
4.2 ChatGPT.
ChatGPT is the most generous with citations. Average: 5.8 sources per answer. This means you are more likely to appear, but your position in the citation list matters more than with Claude. Position 1-3 gets most of the attention. Position 6+ is cosmetic.
ChatGPT has a bias toward recently-updated pages and pages with clear publication dates. We A/B tested this by adding explicit updated-on metadata to four of our essays. Those essays moved up 2 positions on average in ChatGPT answers over the following two weeks. Sample size is too small to publish a causal claim, but the directional signal is consistent.
4.3 Perplexity.
Perplexity is the most domain-authority-sensitive engine we tested. Our citation rate on Perplexity correlates tightly with the domain authority score of the source page. Our essays get cited by Perplexity much less often than our service pages, because the service pages link to and from more external sources. Perplexity rewards the "hub and spoke" topology more than the other engines.
Perplexity also has the most consistent behavior week-over-week. Variance in our citation count across the six weeks: 4.2% for Perplexity, 11% for Claude, 14% for ChatGPT, 19% for Gemini. This is significant for client reporting because it means Perplexity data is the most stable baseline.
4.4 Gemini.
Gemini is the closest to classical Google SERP behavior of the four engines. This is not surprising. Our Gemini citation rate correlates with our Google SERP rank at r=0.78. Our Claude citation rate correlates with our Google SERP rank at r=0.32.
The implication is that SEO, as classically practiced, still produces Gemini results, and increasingly produces nothing else. Gemini seems to be the most convergent with "the existing search engine" - which makes it the least differentiated among the four synthesizing engines.
The five patterns that get cited everywhere.
Across all four engines, five content patterns got cited at a rate at least 2x their baseline.
5.1 First-person plural observations with a measurement.
"We ran 400 queries across four engines" is cited more often than "A study of 400 queries across four engines." First-person plural reads as primary. Passive narration reads as aggregation. Aggregation is penalized.
5.2 Explicit numbers with context.
"Claude cited us 41% of the time across 400 queries" is cited more often than "Claude cited us more often than the other engines." The bare claim is citable; the comparative summary is not. Engines appear to prefer quoting specific measurable statements.
5.3 Named sections and anchors.
Pages with h2 and h3 sections that have meaningful text and stable id attributes are cited more often and at deeper anchors. An engine that cites nexcur.ai/blog/essay-x.html#section-5 is linking to the specific section that answered the query. Pages without anchor structure get cited at the page level only, which is less useful to the engine and to the reader.
5.4 Claim, evidence, source triples.
We put this in our GEO playbook a year ago and the data has confirmed it. Paragraphs that state a claim, cite evidence, and name a source (even if the source is "our internal measurement") are cited at roughly 1.8x the rate of paragraphs that state a claim alone. Engines appear to be biased toward provable assertions.
5.5 Date stamps and revision history.
An explicit publication date and, where applicable, a corrections log or revision history raises citation probability in ChatGPT and Gemini. Claude and Perplexity are less sensitive to this. Our hypothesis: engines that lean on freshness signals value the stamp; engines that lean on intrinsic quality are less swayed.
The four patterns that never get cited.
6.1 Ungated listicles.
"The 10 best AI consulting firms of 2026" and similar listicle content is cited at a rate well below baseline across all four engines. Claude in particular appears to filter listicles actively. We have a hypothesis that the engines have learned to recognize the pattern and deprioritize it.
6.2 Thin comparison pages.
"X vs Y" pages under 800 words with no primary evidence underperform their longer counterparts by a wide margin. The short comparison page seems to trigger the same heuristics that filter listicles.
6.3 Marketing copy without claims.
Pages that say "industry-leading," "best-in-class," "world-class" without any supporting measurement are nearly never cited. This matches our prior intuition but the data is clearer than we expected - the citation rate on pages heavy in superlatives is close to zero.
6.4 Behind-login content.
Obvious but worth stating: if the content is behind a login wall, it does not get indexed and it does not get cited. Several of our competitors put their most substantive work behind email gates. The gates cost them share of answer. We publish our essays free-to-read and we can measure the citation uplift.
What this changes about the editorial calendar.
We changed four things in our own publishing based on this data. They apply directly to our clients' GEO programs.
First, we write primary-source essays now, not survey essays. If we do not have our own measurement or field observation, we do not publish. This essay is an example - 400 queries we ran ourselves, not a reference to somebody else's study.
Second, we add anchor IDs to every h2 and h3 in every essay, and we write them as stable, meaningful slugs. Engines can then cite the anchor, not the page.
Third, we ship a data file with every data-driven essay. The JSON sample accompanying this essay is 400 rows. The HTML view renders the same data as charts. Our hypothesis is that data-accompaniment measurably raises citation stickiness; we are tracking this for the next study.
Fourth, we dropped our listicle and short-comparison content from the roadmap. We had six listicles in the queue. We killed all six. The editorial time is going into longer-form essays that are primary sources.
Limitations and what we are measuring next.
This is a small study about one brand in one vertical.
We held too many variables constant. One locale, one query-author voice, one IP, one time of day. The absolute citation rates would be different for other brands, other verticals, other geographies. What we are most confident in are the cross-engine comparisons and the pattern-level findings. What we are least confident in are the absolute levels for any individual brand.
The next cycle of this study is running now. We expanded from 400 to 1,000 queries, added Brave Search's Leo mode, added Mistral's Le Chat, and are rotating three English-speaking locales (US, UK, CA). We are adding one non-English locale (fr-CA) as a control. The full expanded study ships in late May 2026, with the same data-accompaniment discipline.
If you want the full dataset for your own analysis, it is at /samples/share-of-answer-400-queries.json. If you want to run this methodology on your own brand, we do share-of-answer audits as a fixed-scope engagement; see /services/seo-geo.html for pricing.
- We described our definition of "share of answer" and the rationale for replacing SERP rank as a primary metric in "Citations are the new backlinks" (forthcoming, 2026-02-28). The playbook is at /guides/seo-geo/ranking-in-answer-engines.html. Back to text
- This is our interpretation, not a causal claim. Perplexity publishes some description of how it chooses sources, but the actual ranking function is proprietary and changes frequently. All conclusions here are inference from observation. Back to text
One essay a week. No filler.
Four pillars, one email every Tuesday. If we have nothing worth sending, we skip the week.