Davriellelouna@lemmy.world to Technology@lemmy.worldEnglish · edit-22 days agoThe AI company Perplexity is complaining their bots can't bypass Cloudflare's firewallwww.searchenginejournal.comexternal-linkmessage-square219fedilinkarrow-up1816arrow-down17
arrow-up1809arrow-down1external-linkThe AI company Perplexity is complaining their bots can't bypass Cloudflare's firewallwww.searchenginejournal.comDavriellelouna@lemmy.world to Technology@lemmy.worldEnglish · edit-22 days agomessage-square219fedilink
minus-squareubergeek@lemmy.todaylinkfedilinkEnglisharrow-up8·13 hours agoAnd I’m assuming if the robots.txt state their UserAgent isn’t allowed to crawl, it obeys it, right? :P
minus-squareKissaki@feddit.orglinkfedilinkEnglisharrow-up2·12 hours agoNo, as per the article, their argumentation is that they are not web crawlers generating an index, they are user-action-triggered agents working live for the user.
minus-squareubergeek@lemmy.todaylinkfedilinkEnglisharrow-up2·10 hours agoExcept, it’s not a live user hitting 10 sights all the same time, trying to crawl the entire site… Live users cannot do that. That said, if my robots.txt forbids them from hitting my site, as a proxy, they obey that, right?
And I’m assuming if the robots.txt state their UserAgent isn’t allowed to crawl, it obeys it, right? :P
No, as per the article, their argumentation is that they are not web crawlers generating an index, they are user-action-triggered agents working live for the user.
Except, it’s not a live user hitting 10 sights all the same time, trying to crawl the entire site… Live users cannot do that.
That said, if my robots.txt forbids them from hitting my site, as a proxy, they obey that, right?