It's always TCP_NODELAY. Every damn time. - Marc's Blog

By Marc BrookerDecember 22, 2025Hacker News: Front Page

It’s always TCP_NODELAY. Every damn time. It's not the 1980s anymore, thankfully. The first thing I check when debugging latency issues in distributed systems is whether TCP_NODELAY is enabled. And it’s not just me. Every distributed system builder I know has lost hours to latency issues quickly fixed by enabling this simple socket option, suggesting that the default behavior is wrong, and perhaps that the whole concept is outmoded. First, let’s be clear about what we’re talking about. There’s no better source than John Nagle’s RFC896 from 1984 1 . First, the problem statement: There is a special problem associated with small packets. When TCP is used for the transmission of single-character messages originating at a keyboard, the typical result is that 41 byte packets (one byte of data, 40 bytes of header) are transmitted for each byte of useful data. This 4000% overhead is annoying but tolerable on lightly loaded networks. In short, Nagle was interested in better amortizing the cost of TCP headers, to get better throughput out of the network. Up to 40x better throughput! These tiny packets had two main causes: human-interactive applications like shells, where folks were typing a byte at a time, and poorly implemented programs that dribbled messages out to the kernel through many write calls. Nagle’s proposal for fixing this was simple and smart: A simple and elegant solution has been discovered. The solution is to inhibit the sending of new TCP segments when new outgoing data arrives from the user if any previously transmitted data on the connection remains unacknowledged. When many people talk about Nagle’s algorithm, they talk about timers, but RFC896 doesn’t use any kind of timer other than the round-trip time on the network. Nagle’s Algorithm and Delayed Acks Nagle’s nice, clean, proposal interacted poorly with another TCP feature: delayed ACK . The idea behind delayed ACK is to delay sending the acknowledgement of a packet at least until there’s some data to send back (e.g. a telnet session echoing back the user’s typing), or until a timer expires. RFC813 from 1982 is that first that seems to propose delaying ACKs : The receiver of data will refrain from sending an acknowledgement under certain circumstances, in which case it must set a timer which will cause the acknowledgement to be sent later. However, the receiver should do this only where it is a reasonable guess that some other event will intervene and prevent the necessity of the timer interrupt. which is then formalized further in RFC1122 from 1989. The interaction between these two features causes a problem: Nagle’s algorithm is blocking sending more data until an ACK is received, but delayed ack is delaying that ack until a response is ready. Great for keeping packets full, not so great for latency-sensitive pipelined applications. This is a point Nagle has made himself several times. For example in this Hacker News comment : That still irks me. The real problem is not tinygram prevention. It’s ACK delays, and that stupid...

Preview: ~500 words

Continue reading at Hacker News

Read Full Article

Read on Your E-Reader

It's always TCP_NODELAY. Every damn time. - Marc's Blog

More from Hacker News: Front Page