Anubis is awesome! Stopping (AI)crawlbots

zoey@lemmy.librebun.com · edit-2 2 months ago

Anubis is awesome! Stopping (AI)crawlbots

Mora@pawb.social · edit-2 2 months ago

Besides that point: why tf do they even crawl lemmy. They could just as well create a “read only” instance with an account that subscribes to all communities … and the other instances would send their data. Oh, right, AI has to be as unethical as possible for most companies for some reason.

wizardbeard@lemmy.dbzer0.com · 2 months ago

They crawl wikipedia too, and are adding significant extra load on their servers, even though Wikipedia has a regularly updated torrent to download all its content.

ZombiFrancis@sh.itjust.works · 2 months ago

See your brain went immediately to a solution based on knowing how something works. That’s not in the AI wheelhouse.

dan@upvote.au · 2 months ago

They’re likely not intentionally crawling Lemmy. They’re probably just crawling all sites they can find.

AmbitiousProcess (they/them)@piefed.social · 2 months ago

Because the easiest solution for them is a simple web scraper. If they don’t give a shit about ethics, then something that just crawls every page it can find is loads easier for them to set up than a custom implementation to get torrent downloads for wikipedia, making lemmy/mastodon/pixelfed instances for the fediverse, using rss feeds and checking if they have full or only partial articles, implementing proper checks to prevent double (or more) downloading of the same content, etc.

Anubis is awesome! Stopping (AI)crawlbots

Anubis is awesome! Stopping (AI)crawlbots

Incoherent rant.

Behold, Anubis.

“Weighs the soul of incoming HTTP requests to stop AI crawlers”