📱

Read on Your E-Reader

Thousands of readers get articles like this delivered straight to their Kindle or Boox. New articles arrive automatically.

Learn More

This is a preview. The full article is published at news.ycombinator.com.

Unix "find" expressions compiled to bytecode

By rcarmoHacker News: Front Page

nullprogram.com/blog/2025/12/23/ In preparation for a future project, I was thinking about at the unix find utility . It operates a file system hierarchies, with basic operations selected and filtered using a specialized expression language. Users compose operations using unary and binary operators, grouping with parentheses for precedence. find may apply the expression to a great many files, so compiling it into a bytecode, resolving as much as possible ahead of time, and minimizing the per-element work, seems like a prudent implementation strategy. With some thought, I worked out a technique to do so, which was simpler than I expected, and I’m pleased with the results. I was later surprised all the real world find implementations I examined use tree-walk interpreters instead. This article describes how my compiler works, with a runnable example, and lists ideas for improvements. For a quick overview, the syntax looks like this: $ find [-H|-L] path... [expression...] Technically at least one path is required, but most implementations imply . when none are provided. If no expression is supplied, the default is -print , e.g. print everything under each listed path. This prints the whole tree, including directories, under the current directory: To only print files, we could use -type f : $ find . -type f -a -print Where -a is the logical AND binary operator. -print always evaluates to true. It’s never necessary to write -a , and adjacent operations are implicitly joined with -a . We can keep chaining them, such as finding all executable files: $ find . -type f -executable -print If no -exec , -ok , or -print (or similar side-effect extensions like -print0 or -delete ) are present, the whole expression is wrapped in an implicit ( expr ) -print . So we could also write this: $ find . -type f -executable Use -o for logical OR. To print all files with the executable bit or with a .exe extension: $ find . -type f \( -executable -o -name '*.exe' \) I needed parentheses because -o has lower precedence than -a , and because parentheses are shell metacharacters I also needed to escape them for the shell. It’s a shame find didn’t use [ and ] instead! There’s also a unary logical NOT operator, ! . To print all non-executable files: $ find . -type f ! -executable Binary operators are short-circuiting, so this: $ find -type d -a -exec du -sh {} + Only lists the sizes of directories, as the -type d fails causing the whole expression to evaluate to false without evaluating -exec . Or equivalently with -o : $ find ! -type d -o -exec du -sh {} + If it’s not a directory then the left-hand side evaluates to true, and the right-hand side is not evaluated. All three implementations I examined (GNU, BSD, BusyBox) have a -regex extension, and eagerly compile the regular expression even if the operation is never evaluated: $ find . -print -o -regex [ find: bad regex '[': Invalid...

Preview: ~500 words

Continue reading at Hacker News

Read Full Article

More from Hacker News: Front Page

Subscribe to get new articles from this feed on your e-reader.

View feed

This preview is provided for discovery purposes. Read the full article at news.ycombinator.com. LibSpace is not affiliated with Hacker News.

Unix "find" expressions compiled to bytecode | Read on Kindle | LibSpace