• MonkderVierte@lemmy.zip
    link
    fedilink
    English
    arrow-up
    7
    ·
    edit-2
    8 hours ago

    How does archive get the unpaywalled version? I don’t think they pay the subscription for every single tabloid out there?

    Asking for a friend.

    • stoly@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      4 hours ago

      The paywall is JavaScript but the content is still in plaintext below. The crawlers don’t read the JavaScript.

      • MonkderVierte@lemmy.zip
        link
        fedilink
        English
        arrow-up
        4
        ·
        3 hours ago

        Disabling 3rd-party js has no paywall, but only the first paragraph too. Crawlers get full access?

    • AnarchistArtificer@slrpnk.net
      link
      fedilink
      English
      arrow-up
      3
      ·
      7 hours ago

      I think they use the same thing that web crawlers use. If Google’s crawler couldn’t access the content of the page (or could only access a limited amount of content), it would likely rank far lower in search results

      • MonkderVierte@lemmy.zip
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        3 hours ago

        Btw, how come there is no search engine where you can sort and filter how you want instead of how they want? (except self-hosted i mean)

        Pornhub has better searchability than, uh, all search sites i know.