How Mastodon Crawls Your Pages

Per-instance fetching, user agents, and why the fediverse is different

Per-instance fetching

There is no single Mastodon crawler. Every instance fetches pages independently when a user on that instance shares a URL. If your link spreads across 500 instances, your server gets 500 separate requests, each from a different server at unpredictable times. This is the “fediverse hug of death,” and it can flatten small servers when a post takes off.

The user agent

Each instance includes its own domain in the user agent:

Mastodon/4.3.0 (http.rb/5.2.0; +https://mastodon.social/) Bot

The version and instance URL vary per server. If you need to allowlist Mastodon crawlers, match on the Mastodon/ prefix.

No JavaScript execution

Mastodon fetches raw HTML with http.rb and parses it with Nokogiri. No headless browser, no JS engine. OG tags must be in the initial server response.

robots.txt is not respected

Mastodon’s link preview fetcher does not check robots.txt. This is a known open issue. Blocking Mastodon in your robots.txt has no effect on card generation.

Request limits

Mastodon enforces strict limits on preview fetches:

  • Connect timeout: 5 seconds
  • Read timeout: 10 seconds
  • Total timeout: 30 seconds
  • Max HTML size: 1 MB
  • Max redirects: 3 hops

If your page exceeds the 1 MB HTML limit, Mastodon truncates the response and may miss meta tags that appear late in <head>. Put your <meta> tags early.

Concurrent fetch protection

Redis locking prevents duplicate fetches on the same instance. If two users post the same link within seconds, only one fetch runs and both statuses get the same preview card.