📱

Read on Your E-Reader

Thousands of readers get articles like this delivered straight to their Kindle or Boox. New articles arrive automatically.

Learn More

This is a preview. The full article is published at news.ycombinator.com.

Instant database clones with PostgreSQL 18

Instant database clones with PostgreSQL 18

By Radim MarekHacker News: Front Page

Have you ever watched long running migration script, wondering if it's about to wreck your data? Or wish you can "just" spin a fresh copy of database for each test run? Or wanted to have reproducible snapshots to reset between runs of your test suite, (and yes, because you are reading boringSQL) needed to reset the learning environment? When your database is a few megabytes, pg_dump and restore works fine. But what happens when you're dealing with hundreds of megabytes/gigabytes - or more? Suddenly "just make a copy" becomes a burden. You've probably noticed that PostgreSQL connects to template1 by default. What you might have missed is that there's a whole templating system hiding in plain sight. Every time you run CREATE DATABASE dbname; PostgreSQL quietly clones standard system database template1 behind the scenes. Making it same as if you would use CREATE DATABASE dbname TEMPLATE template1; The real power comes from the fact that you can replace template1 with any database. You can find more at Template Database documentation . In this article, we will cover a few tweaks that turn this templating system into an instant, zero-copy database cloning machine. CREATE DATABASE ... STRATEGY Before PostgreSQL 15, when you created a new database from a template, it operated strictly on the file level. This was effective, but to make it reliable, Postgres had to flush all pending operations to disk (using CHECKPOINT ) before taking a consistent snapshot. This created a massive I/O spike - a "Checkpoint Storm" - that could stall your production traffic. Version 15 of PostgreSQL introduced new parameter CREATE DATABASE ... STRATEGY = [strategy] and at the same time changed the default behaviour how the new databases are created from templates. The new default become WAL_LOG which copies block-by-block via the Write-Ahead Log (WAL), making I/O sequential (and much smoother) and support for concurrency without facing latency spike. This prevented the need to CHECKPOINT but made the database cloning operation potentially significantly slower. For an empty template1 , you won't notice the difference. But if you try to clone a 500GB database using WAL_LOG, you are going to be waiting a long time. The STRATEGY parameter allows us to switch back to the original method FILE_COPY to keep the behaviour, and speed. And since PostgreSQL 18, this opens the whole new set of options. FILE_COPY Because the FILE_COPY strategy is a proxy to operating system file operations, we can change how the OS handles those files. When using standard file system (like ext4 ), PostgreSQL reads every byte of the source file and writes it to a new location. It's a physical copy. However starting with PostgreSQL 18 - file_copy_method gives you options to switch that logic; while default option remains copy . With modern filesystems (like ZFS, XFS with reflinks, APFS, etc.) you can switch it to clone and leverage CLONE ( FICLONE on Linux) operation for almost instant operation. And it won't take any additional space. All you have to do is:...

Preview: ~500 words

Continue reading at Hacker News

Read Full Article

More from Hacker News: Front Page

Subscribe to get new articles from this feed on your e-reader.

View feed

This preview is provided for discovery purposes. Read the full article at news.ycombinator.com. LibSpace is not affiliated with Hacker News.

Instant database clones with PostgreSQL 18 | Read on Kindle | LibSpace