đŸ“±

Read on Your E-Reader

Thousands of readers get articles like this delivered straight to their Kindle or Boox. New articles arrive automatically.

Learn More

This is a preview. The full article is published at news.ycombinator.com.

Package managers keep using Git as a database, it never works out

By Andrew NesbittHacker News: Front Page

Using git as a database is a seductive idea. You get version history for free. Pull requests give you a review workflow. It’s distributed by design. GitHub will host it for free. Everyone already knows how to use it. Package managers keep falling for this. And it keeps not working out. Cargo The crates.io index started as a git repository. Every Cargo client cloned it. This worked fine when the registry was small, but the index kept growing. Users would see progress bars like “Resolving deltas: 74.01%, (64415/95919)” hanging for ages, the visible symptom of Cargo’s libgit2 library grinding through delta resolution on a repository with thousands of historic commits. The problem was worst in CI. Stateless environments would download the full index, use a tiny fraction of it, and throw it away. Every build, every time. RFC 2789 introduced a sparse HTTP protocol. Instead of cloning the whole index, Cargo now fetches files directly over HTTPS, downloading only the metadata for dependencies your project actually uses. (This is the “ full index replication vs on-demand queries ” tradeoff in action.) By April 2025, 99% of crates.io requests came from Cargo versions where sparse is the default. The git index still exists, still growing by thousands of commits per day, but most users never touch it. Homebrew GitHub explicitly asked Homebrew to stop using shallow clones. Updating them was “an extremely expensive operation” due to the tree layout and traffic of homebrew-core and homebrew-cask. Users were downloading 331MB just to unshallow homebrew-core. The .git folder approached 1GB on some machines. Every brew update meant waiting for git to grind through delta resolution. Homebrew 4.0.0 in February 2023 switched to JSON downloads for tap updates. The reasoning was blunt: “they are expensive to git fetch and git clone and GitHub would rather we didn’t do that... they are slow to git fetch and git clone and this provides a bad experience to end users.” Auto-updates now run every 24 hours instead of every 5 minutes, and they’re much faster because there’s no git fetch involved. CocoaPods CocoaPods is the package manager for iOS and macOS development. It hit the limits hard. The Specs repo grew to hundreds of thousands of podspecs across a deeply nested directory structure. Cloning took minutes. Updating took minutes. CI time vanished into git operations. GitHub imposed CPU rate limits. The culprit was shallow clones, which force GitHub’s servers to compute which objects the client already has. The team tried various band-aids: stopping auto-fetch on pod install , converting shallow clones to full clones, sharding the repository . The CocoaPods blog captured it well: “Git was invented at a time when ‘slow network’ and ‘no backups’ were legitimate design concerns. Running endless builds as part of continuous integration wasn’t commonplace.” CocoaPods 1.8 gave up on git entirely for most users. A CDN became the default, serving podspec files directly over HTTP. The migration saved users about a gigabyte of disk space and made pod install nearly instant...

Preview: ~500 words

Continue reading at Hacker News

Read Full Article

More from Hacker News: Front Page

Subscribe to get new articles from this feed on your e-reader.

View feed

This preview is provided for discovery purposes. Read the full article at news.ycombinator.com. LibSpace is not affiliated with Hacker News.

Package managers keep using Git as a database, it never works out | Read on Kindle | LibSpace