Right now, robots.txt on lemmy.ca is configured this way

User-Agent: *
  Disallow: /login
  Disallow: /login_reset
  Disallow: /settings
  Disallow: /create_community
  Disallow: /create_post
  Disallow: /create_private_message
  Disallow: /inbox
  Disallow: /setup
  Disallow: /admin
  Disallow: /password_change
  Disallow: /search/
  Disallow: /modlog

Would it be a good idea privacy-wise to deny GPTBot from scrapping content from the server?

User-agent: GPTBot
Disallow: /

Thanks!

  • Shadow@lemmy.caM
    link
    fedilink
    arrow-up
    3
    ·
    edit-2
    1 year ago

    I would have thought so too, but == failed the syntax check

    2023/08/07 15:36:59 [emerg] 2315181#2315181: unexpected "==" in condition in /etc/nginx/sites-enabled/lemmy.ca.conf:50
    

    You actually want ~ though because GPTBot is just in the user agent, it’s not the full string.