
Scaling LLMs to larger codebases
How do we scale LLMs to larger codebases? Nobody knows yet. But by understanding how LLMs contribute to engineering , we realize that investments in guidance and oversight are worthwhile. Guidance : The context, the environment. Oversight : The skill set needed to guide, validate, and verify the implementor's 1 choices . Investing in guidance When an LLM can generate a working high-quality implementation in a single try, that is called one-shotting . This is the most efficient form of LLM programming. The opposite of one-shotting is rework. This is when you fail to get a usable output from the LLM and must manually intervene. 2 This often takes longer than just doing the work yourself. So how do we create more opportunities for one-shotting? Better guidance. Better guidance LLMs are choice generators. Every set of tokens is a choice added to your codebase: how a variable is named, where to organize a function, whether to reuse/extend/or duplicate functionality to solve a problem, whether Postgres should be chosen over Redis, and so on. Often, these choices are best left up to the designer (e.g., via the prompt). However, it's not efficient to exhaustively list all of these choices in a prompt. It's also not efficient to rework an LLM output whenever it gets these choices wrong. In the ideal world, the prompt only captures the business requirements of a feature. The rest of the choices are either inferrable or encoded. Write a prompt library A prompt library is a set of documentation that can be included as context for an LLM. Writing this is simple: collate documentation, best practices, a general map of the codebase, and other context an engineer needs to be productive in your codebase. 3 Making a prompt library useful requires iteration. Every time the LLM is slightly off target, ask yourself, "What could've been clarified?" Then, add that answer back into the prompt library. A prompt library needs to strike the right balance between comprehensive and lean . The environment is your context A peer at Meta told me that they weren't in a position to make Zuckerberg's engineering automation claims a reality. The reason is their codebase is riddled with technical debt. He wasn't surprised by this. Meta (apparently) historically has not prioritized paying down their debts. Compare this to the mentality from the Cursor team : I think ultimately the principles of clean software are not that different when you want it to be read by people and by models. When you are trying to write clean code you want to, not repeat yourself, not make things more complicated than they need to be. I think taste in code... is actually gonna become even more important as these models get better because it will be easier to write more and more code and so it'll be more and more important to structure it in a tasteful way. This is the garbage in, garbage out principle in action. The utility of a model is bottlenecked...
Preview: ~500 words
Continue reading at Hacker News
Read Full Article