Two separate crawlers
ByteDance runs two crawlers that get confused constantly:
TikTokSpider – fetches pages to generate TikTok link previews. Respects robots.txt, crawls on-demand, scope limited to OG metadata extraction.
Bytespider – general-purpose crawler for AI training, search indexing, and content analysis. Aggressive crawling patterns. Commonly blocked.
Why the distinction matters
Many site operators block Bytespider to prevent AI training on their content. If the block is too broad (blocking all ByteDance IPs, or using a wildcard that catches both bots), TikTok link previews break as collateral damage.
How to allow TikTokSpider while blocking Bytespider
User-agent: TikTokSpider
Allow: /
User-agent: Bytespider
Disallow: /
Put the specific TikTokSpider allow rule before broader disallow rules.
Identifying each bot in server logs
TikTokSpiderin the user agent stringBytespiderin the user agent string
If your WAF or CDN blocks by bot category, check whether it distinguishes these two or groups them as “ByteDance bots.”
Firewall considerations
TikTokSpider and Bytespider may share ByteDance IP ranges. IP-based blocking can inadvertently kill link previews. Prefer user-agent-based rules for granular control.