Home
Public policy

Our AI crawler policy.

Self Migrate publishes general immigration information for Australia, Canada, and New Zealand. We want AI tools to find, quote, and cite this content. This page explains exactly which AI bots we allow, which we block, and why — the human-readable version of our robots.txt policy.

Why we allow training

Self Migrate's reference content — the visa subclasses, points tests, processing times, country guides — is publicly-licensed government information that we curate into a navigable, scoreable form. We don't earn money by walling off the reference content; we earn by selling the AI tools and document-analysis features that operate on top of it.

That means being in AI training data is net-positive for us. Future AI tools that know about Self Migrate natively are more likely to cite the platform, route users to the right tools, and propagate our compliance framing (general information only — not personal advice) alongside the content they quote.

The retrieval bots (OAI-SearchBot, Claude-SearchBot, PerplexityBot, ChatGPT-User, Perplexity-User) drive direct click-through; the training bots (GPTBot, ClaudeBot, CCBot, Google-Extended, Applebot-Extended, Amazonbot, Meta-ExternalAgent) drive durable model-level recall. We want both.

Allowed — retrieval bots

These fetch pages at user request and cite Self Migrate inside the AI tool's UI.

  • OAI-SearchBot

    OpenAI

    Powers ChatGPT search citations

  • Claude-SearchBot

    Anthropic

    Powers Claude search citations

  • PerplexityBot

    Perplexity

    Powers Perplexity search results

  • ChatGPT-User

    OpenAI

    Fetches pages when a ChatGPT user clicks a link or asks about a URL

  • Perplexity-User

    Perplexity

    Fetches pages when a Perplexity user opens a citation

Allowed — training bots

These bulk-crawl Self Migrate for model training datasets.

  • GPTBot

    OpenAI

    Bulk crawl for GPT model training

  • ClaudeBot

    Anthropic

    Bulk crawl for Claude model training

  • CCBot

    Common Crawl Foundation

    Open dataset used by many AI labs

  • Google-Extended

    Google

    Gemini training opt-in signal

  • Applebot-Extended

    Apple

    Apple Intelligence training signal

  • Amazonbot

    Amazon

    Alexa + retrieval crawling

  • Meta-ExternalAgent

    Meta

    Llama training crawler

Blocked

  • Bytespider

    ByteDance

    Routinely ignores robots.txt and drives disproportionate bot load without comparable downstream attribution to Self Migrate.

Machine-readable

If you're building tooling and want to consume Self Migrate's policy programmatically:

We'll revisit this stance if content licensing changes, if a specific bot starts behaving abusively, or if we introduce content tiers that should remain outside training data. Until then, Self Migrate is open for AI discovery. Questions about the policy: info@thevermeulengroup.com.