Package managers keep using Git as a database, it never works out
Using git as a database is a seductive idea. You get version history for free. Pull requests give you a review workflow. Itâs distributed by design. GitHub will host it for free. Everyone already knows how to use it. Package managers keep falling for this. And it keeps not working out. Cargo The crates.io index started as a git repository. Every Cargo client cloned it. This worked fine when the registry was small, but the index kept growing. Users would see progress bars like âResolving deltas: 74.01%, (64415/95919)â hanging for ages, the visible symptom of Cargoâs libgit2 library grinding through delta resolution on a repository with thousands of historic commits. The problem was worst in CI. Stateless environments would download the full index, use a tiny fraction of it, and throw it away. Every build, every time. RFC 2789 introduced a sparse HTTP protocol. Instead of cloning the whole index, Cargo now fetches files directly over HTTPS, downloading only the metadata for dependencies your project actually uses. (This is the â full index replication vs on-demand queries â tradeoff in action.) By April 2025, 99% of crates.io requests came from Cargo versions where sparse is the default. The git index still exists, still growing by thousands of commits per day, but most users never touch it. Homebrew GitHub explicitly asked Homebrew to stop using shallow clones. Updating them was âan extremely expensive operationâ due to the tree layout and traffic of homebrew-core and homebrew-cask. Users were downloading 331MB just to unshallow homebrew-core. The .git folder approached 1GB on some machines. Every brew update meant waiting for git to grind through delta resolution. Homebrew 4.0.0 in February 2023 switched to JSON downloads for tap updates. The reasoning was blunt: âthey are expensive to git fetch and git clone and GitHub would rather we didnât do that... they are slow to git fetch and git clone and this provides a bad experience to end users.â Auto-updates now run every 24 hours instead of every 5 minutes, and theyâre much faster because thereâs no git fetch involved. CocoaPods CocoaPods is the package manager for iOS and macOS development. It hit the limits hard. The Specs repo grew to hundreds of thousands of podspecs across a deeply nested directory structure. Cloning took minutes. Updating took minutes. CI time vanished into git operations. GitHub imposed CPU rate limits. The culprit was shallow clones, which force GitHubâs servers to compute which objects the client already has. The team tried various band-aids: stopping auto-fetch on pod install , converting shallow clones to full clones, sharding the repository . The CocoaPods blog captured it well: âGit was invented at a time when âslow networkâ and âno backupsâ were legitimate design concerns. Running endless builds as part of continuous integration wasnât commonplace.â CocoaPods 1.8 gave up on git entirely for most users. A CDN became the default, serving podspec files directly over HTTP. The migration saved users about a gigabyte of disk space and made pod install nearly instant...
Preview: ~500 words
Continue reading at Hacker News
Read Full Article