Public policy

Our AI crawler policy.

Self Migrate publishes general immigration information for Australia, Canada, and New Zealand. We want AI tools to find, quote, and cite this content. This page explains exactly which AI bots we allow, which we block, and why: the human-readable version of our robots.txt policy.

Why we allow training

Self Migrate's reference content (the visa subclasses, points tests, processing times, country guides) is publicly-licensed government information that we curate into a navigable, scoreable form. We don't earn money by walling off the reference content; we earn by selling the AI tools and document-analysis features that operate on top of it.

That means being in AI training data is net-positive for us. Future AI tools that know about Self Migrate natively are more likely to cite the platform, route users to the right tools, and propagate our compliance framing (general information only, not personal advice) alongside the content they quote.

The retrieval bots (OAI-SearchBot, Claude-SearchBot, PerplexityBot, ChatGPT-User, Perplexity-User) drive direct click-through; the training bots (GPTBot, ClaudeBot, CCBot, Google-Extended, Applebot-Extended, Amazonbot, Meta-ExternalAgent) drive durable model-level recall. We want both.

Allowed: retrieval bots

These fetch pages at user request and cite Self Migrate inside the AI tool's UI.

OAI-SearchBot
OpenAI
Powers ChatGPT search citations
Claude-SearchBot
Anthropic
Powers Claude search citations
PerplexityBot
Perplexity
Powers Perplexity search results
ChatGPT-User
OpenAI
Fetches pages when a ChatGPT user clicks a link or asks about a URL
Perplexity-User
Perplexity
Fetches pages when a Perplexity user opens a citation

Allowed: training bots

These bulk-crawl Self Migrate for model training datasets.

GPTBot
OpenAI
Bulk crawl for GPT model training
ClaudeBot
Anthropic
Bulk crawl for Claude model training
CCBot
Common Crawl Foundation
Open dataset used by many AI labs
Google-Extended
Google
Gemini training opt-in signal
Applebot-Extended
Apple
Apple Intelligence training signal
Amazonbot
Amazon
Alexa + retrieval crawling
Meta-ExternalAgent
Meta
Llama training crawler

Blocked

Bytespider
ByteDance
Routinely ignores robots.txt and drives disproportionate bot load without comparable downstream attribution to Self Migrate.

Machine-readable

If you're building tooling and want to consume Self Migrate's policy programmatically:

/robots.txt : canonical machine-readable policy
/llms.txt : curated content map per the llmstxt.org spec
/llms-full.txt : concatenated body content of the most-cited pages
/api/llms/pathways : full pathway corpus as structured JSON
/api/mcp : Model Context Protocol server (JSON-RPC 2.0)

We'll revisit this stance if content licensing changes, if a specific bot starts behaving abusively, or if we introduce content tiers that should remain outside training data. Until then, Self Migrate is open for AI discovery. Questions about the policy: info@thevermeulengroup.com.

Loading…