Inside The Internet Archive's Infrastructure

(hackernoon.com)

86 points | by dvrp 1 day ago

6 comments

  • schmuckonwheels 0 minutes ago
    Disappointed with the lack of pictures.
  • hedora 41 minutes ago
    It's frustrating that there's no way for people to (selectively) mirror the Internet Archive. $25-30M per year is a lot for a non-profit, but it's nothing for government agencies, or private corporations building Gen AI models.

    I suspect having a few different teams competing (for funding) to provide mirrors would rapidly reduce the hardware cost too.

    The density + power dissipation numbers quoted are extremely poor compared to enterprise storage. Hardware costs for the enterprise systems are also well below AWS (even assuming a short 5 year depreciation cycle on the enterprise boxes). Neither this article nor the vendors publish enough pricing information to do a thorough total cost of ownership analysis, but I can imagine someone the size of IA would not be paying normal margins to their vendors.

  • BryantD 1 hour ago
    They have come a very long way since the late 1990s when I was working there as a sysadmin and the data center was a couple of racks plus a tape robot in a back room of the Presidio office with an alarmingly slanted floor. The tape robot vendor had to come out and recalibrate the tape drives more often than I might have wanted.
    • textfiles 48 minutes ago
      There is a fundamental resistance to tape technology that exists to this day as a result of all those troubles.
  • cowhax 34 minutes ago
    >And the rising popularity of generative AI adds yet another unpredictable dimension to the future survival of the public domain archive.

    I'd say the nonprofit has found itself a profitable reason for its existence

  • brcmthrowaway 1 hour ago
    Does IA do deduplication?
    • textfiles 48 minutes ago
      Not in the way I think you're talking about. The archive has always tried to maintain a situation where the racks could be pushed out of the door or picked up after being somewhere and the individual drives will contain complete versions of the items. We have definitely reached out to people who seem to be doing redundant work and ask them to stop or for permission to remove the redundant item. But that's a pretty curatorial process.
    • HumanOstrich 48 minutes ago
      Yes[1].

      [1]: The Article, Paragraph 2

      • sltkr 9 minutes ago
        I don't think the article mentions anything about deduplication. Can you be less snarky and actually quote the relevant sentence?
  • brcmthrowaway 1 hour ago
    [flagged]