I audited 38 Shopify stores in 2025 for agentic citation readiness. 9 of them blocked at least one AI crawler at robots.txt or Cloudflare layer without realizing it. Every blocked bot is a lost citation surface for that engine. The fix is a 10-line robots.txt.liquid override and a 2-minute curl verification.
TL;DR: Six AI user-agents matter for Shopify citation in 2026: GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended, CCBot. Shopify’s default robots.txt allows them on /products/, /collections/, /blogs/, and /pages/. Audit your specific store with curl, or paste your robots.txt into my free Robots.txt and AI Crawler Checker. If any bot returns a 403 or block page, override via templates/robots.txt.liquid and ship the allow rules below.
Why this matters for your store
- Each blocked AI crawler equals zero citations from that engine for every URL it cannot fetch
- Cloudflare bot-management defaults aggressively block “unknown” user-agents, which catches new AI bots before vendors register them
- Shopify auto-generates
robots.txt, so the only override path isrobots.txt.liquidin the theme; no admin toggle exists
What Shopify’s default robots.txt actually does for AI bots
Shopify ships a generated robots.txt per store with broad allow rules for the public-facing catalog and explicit Disallow blocks for transactional surfaces (/cart, /checkout, /account, /admin, /policies, /search). There are no explicit per-bot rules, so all crawlers (including AI bots) fall under the default User-agent: * block.
That works in most cases. Product, collection, and blog URLs are reachable. Cart and checkout are blocked. AI bots that respect robots.txt (all the major ones do) see clean catalog content and skip the noise.
The trouble starts in three patterns I see repeatedly in audits:
- Cloudflare bot rules. A store on the Cloudflare proxy with default WAF rules treats AI bot user-agents as “unknown bots” and serves a 403 challenge page. The bot logs the error and stops re-crawling for days.
- Customized
robots.txt.liquidblocks. A merchant or agency adds broadDisallow: /rules to fix duplicate-content issues and accidentally catches AI bots in the net. (Robots.txt is the wrong tool for duplicate content anyway: see my Shopify duplicate content and canonical tags guide for the canonical-first fix.) - App middleware injection. Some store-protection apps inject middleware that filters non-standard user-agents. The AI bot sees a generic block page instead of the product HTML.
Result in all three: the agentic storefront I covered in my Shopify agentic storefronts guide loses entire engines without warning. The fix is per-bot explicit rules in robots.txt.liquid plus a Cloudflare allow-list.
The 6 AI crawler user-agents to allow on Shopify in 2026
Each major engine runs at least one named crawler. Block any one and you drop out of that engine’s citation flows.
| Engine | User-agent | Purpose |
|---|---|---|
| OpenAI ChatGPT (training) | GPTBot |
Catalog and content training |
| OpenAI ChatGPT Search (real-time) | OAI-SearchBot |
Real-time grounding for search queries |
| Anthropic Claude | ClaudeBot |
Training and Claude.ai web search |
| Perplexity | PerplexityBot |
Real-time grounding |
| Google Gemini / AI Overview | Google-Extended |
Opt-out token for AI features (allow = opt-in) |
| Common Crawl | CCBot |
Downstream LLM training corpus |
Two more sometimes mentioned but less critical for Shopify in 2026: Bytespider (TikTok / ByteDance) and Diffbot (Diffbot crawler for structured-data extraction). Allow them if you want maximum reach; the six above cover the engines that actually drive shopping intent traffic.
For the canonical user-agent strings and IP ranges, see OpenAI’s GPTBot documentation, Anthropic’s bot docs, Perplexity’s bot page, and Google’s Google-Extended page.
How to override Shopify’s robots.txt in 30 minutes
Shopify stores cannot edit robots.txt directly. The override lives in templates/robots.txt.liquid in your theme.
In Shopify admin: Online Store > Themes > Actions menu on your live theme (or a duplicate first) > Edit code > Templates > Add a new template > robots.txt > Create.
The auto-emitted Liquid scaffold uses Shopify’s robots.default_groups iterator. Append per-bot allow rules with robots.add_rule:
{% comment %} templates/robots.txt.liquid {% endcomment %}
{% for group in robots.default_groups %}
{{- group.user_agent -}}
{%- for rule in group.rules -%}
{{ rule }}
{%- endfor -%}
{%- if group.sitemap != blank -%}
{{ group.sitemap }}
{%- endif -%}
{% endfor %}
{% comment %} Explicit allow rules for AI crawlers (2026) {% endcomment %}
User-agent: GPTBot
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /search
User-agent: OAI-SearchBot
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /search
User-agent: ClaudeBot
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /search
User-agent: PerplexityBot
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /admin
Disallow: /search
User-agent: Google-Extended
Allow: /
User-agent: CCBot
Allow: /products/
Allow: /collections/
Allow: /blogs/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Save the template. Visit https://yourstore.com/robots.txt in a browser within 60 seconds to confirm the new rules appear at the bottom of the file.
The first block (the for group in robots.default_groups loop) preserves Shopify’s auto-generated baseline rules. The per-bot blocks below add explicit allow-listing for each AI crawler. Google-Extended is an opt-out token (allow = opt-in to Google AI features); the others are real crawler user-agents.
How to verify the fix in 2 minutes
Three checks per bot. Run from your terminal.
curl -H "User-Agent: GPTBot" -I https://yourstore.com/products/your-top-product
curl -H "User-Agent: ClaudeBot" -I https://yourstore.com/products/your-top-product
curl -H "User-Agent: PerplexityBot" -I https://yourstore.com/products/your-top-product
What to look for in the response:
HTTP/2 200at the top = bot allowedHTTP/2 403= blocked at WAF or Shopify layer; debug Cloudflare bot rules androbots.txt.liquidHTTP/2 200with a tinyContent-Length(under 1000 bytes) = soft block page returned; some app middleware is filtering
For a full ten-line audit, repeat with the other three user-agents (OAI-SearchBot, Google-Extended, CCBot) and also test a collection URL (/collections/all) and a blog URL (/blogs/news/sample) for each bot.
If any bot returns 403, two likely culprits in order:
- Cloudflare bot management. Open Cloudflare dashboard > Security > Bots > Configure > add the bot user-agents to the allow-list. If you are on the free tier, use a Page Rule or Firewall Rule with expression
(http.user_agent contains "GPTBot")and action Allow. - A storefront protection app. Uninstall or whitelist the offending app. Common culprits are bot-blocker apps marketed to “stop scrapers.” They block AI agents indiscriminately.
If all six bots return 200, wait 2 to 3 weeks and check Microsoft Clarity AI Visibility (My cited pages panel) for new citations on product URLs. That is the same recovery window I described in the GTIN coverage audit and fix and the Product schema errors posts.
What allowing bots does not solve
Bot reachability is necessary but not sufficient. After clean robots.txt access, agents still need three things to actually cite you:
- Valid Product schema. Covered in 4 errors that block AI citation. Schema validity is the entry ticket.
- GTIN coverage above 95 percent. Covered in Shopify GTIN coverage. Without identifiers, agents cannot match your SKUs.
- Substantive product descriptions. Under 50 words and the agent has nothing factual to extract for citation snippets.
The broader operational playbook is in the Shopify agentic storefronts guide and the enable agentic storefronts setup walkthrough.
Audit your robots.txt with curl this week. Three minutes of testing tells you whether your store is reachable to the engines driving the fastest-growing slice of Shopify discovery.
The takeaway
- Test each of the 6 AI bot user-agents against a top PDP with curl this week; any 403 is an emergency
- Override
templates/robots.txt.liquidwith explicit Allow rules for the 6 bots if Shopify default does not suffice - Audit Cloudflare bot management for default-deny rules that catch AI user-agents
- Uninstall storefront protection apps that filter non-standard user-agents indiscriminately
- After fixes, wait 2 to 3 weeks and verify product URL citations in Microsoft Clarity AI Visibility