Our AI crawler policy.
Self Migrate publishes general immigration information for Australia, Canada, and New Zealand. We want AI tools to find, quote, and cite this content. This page explains exactly which AI bots we allow, which we block, and why — the human-readable version of our robots.txt policy.
Why we allow training
Self Migrate's reference content — the visa subclasses, points tests, processing times, country guides — is publicly-licensed government information that we curate into a navigable, scoreable form. We don't earn money by walling off the reference content; we earn by selling the AI tools and document-analysis features that operate on top of it.
That means being in AI training data is net-positive for us. Future AI tools that know about Self Migrate natively are more likely to cite the platform, route users to the right tools, and propagate our compliance framing (general information only — not personal advice) alongside the content they quote.
The retrieval bots (OAI-SearchBot, Claude-SearchBot, PerplexityBot, ChatGPT-User, Perplexity-User) drive direct click-through; the training bots (GPTBot, ClaudeBot, CCBot, Google-Extended, Applebot-Extended, Amazonbot, Meta-ExternalAgent) drive durable model-level recall. We want both.
Allowed — retrieval bots
These fetch pages at user request and cite Self Migrate inside the AI tool's UI.
OAI-SearchBot
OpenAIPowers ChatGPT search citations
Claude-SearchBot
AnthropicPowers Claude search citations
PerplexityBot
PerplexityPowers Perplexity search results
ChatGPT-User
OpenAIFetches pages when a ChatGPT user clicks a link or asks about a URL
Perplexity-User
PerplexityFetches pages when a Perplexity user opens a citation
Allowed — training bots
These bulk-crawl Self Migrate for model training datasets.
GPTBot
OpenAIBulk crawl for GPT model training
ClaudeBot
AnthropicBulk crawl for Claude model training
CCBot
Common Crawl FoundationOpen dataset used by many AI labs
Google-Extended
GoogleGemini training opt-in signal
Applebot-Extended
AppleApple Intelligence training signal
Amazonbot
AmazonAlexa + retrieval crawling
Meta-ExternalAgent
MetaLlama training crawler
Blocked
Bytespider
ByteDanceRoutinely ignores robots.txt and drives disproportionate bot load without comparable downstream attribution to Self Migrate.
Machine-readable
If you're building tooling and want to consume Self Migrate's policy programmatically:
- /robots.txt — canonical machine-readable policy
- /llms.txt — curated content map per the llmstxt.org spec
- /llms-full.txt — concatenated body content of the most-cited pages
- /api/llms/pathways — full pathway corpus as structured JSON
- /api/mcp — Model Context Protocol server (JSON-RPC 2.0)
We'll revisit this stance if content licensing changes, if a specific bot starts behaving abusively, or if we introduce content tiers that should remain outside training data. Until then, Self Migrate is open for AI discovery. Questions about the policy: info@thevermeulengroup.com.
