Bonus Drop #121 (2026-07-05): More Is More!

moreutils: the rest of what’s inside the tin

Back in June we gave sponge its own (brief) moment (to refresh memories, it’s the little tool that soaks up stdin before it clobbers the file you’re reading from). As noted in that post, this utilitiy is the headliner of a package called moreutils, but it has a few siblings that pull their own respective weight in a pipeline, and most folks never meet them (especially if an LLM agent is doing all the shell work). Today’s Bonus Drop covers the ones worth knowing.

But first: a smattering of lore!

moreutils is Joey Hess’s project. Hess is a name you’ve run into whether you knew it or not: the Debian Installer, debhelper, debconf, alien, git-annex, ikiwiki. He describes moreutils as a collection of the unix tools nobody thought to write long ago when unix was young. This full-on project started as a blog post around 2006 where he wondered aloud whether the unix toolbox had any room left for new general-purpose commands. What he learned: nobody was writing basic tools – they kept falling through the cracks and never got noticed by the people who’d benefit from them. moreutils exists to catch them.

sponge was the first proof the concept worked. The collection grew from there and now sits at dozen or so utilities. It’s been closed to new submissions for years (Hess froze intake rather than let it sprawl).

Though Hess has made prolific contributions, this isn’t a one-author project. Tools such as errno and isutf8 came from Lars Wirzenius, the venerable parallel from Tollef Fog Heen (apologies for a LI link), mispipe from Nathanael Nerode, pee from the alawys awesome Miek Gieben, ifdata from Benjamin Bayart (no useful URL to give you), and lckdo from Michael Tokarev.

We’ll cover each in alphabetical order from here. Each one gets the “what”, the “does”, and a usage y’all can put to work.

Aside: I try to keep the Drops mostly “AI-free” except when we’re covering something important-ish on the subject. For the past few weeks, this has been one of the reasons the Bonus Drops have been the only ones. Either some lesser-known-but-cool-and-useful resources have added agents as co-authors (which I know AI-detractors loathe) or involve AI directly.

I have some new “free range” resources that I’m hoping to use to get back into a daily cadence with.

However, for this Drop, I do feel compelled to mention — for the AI-neutral and/or AI-users — this skill which will help all the brain-dead agents install and use components of moreutils vs. a ridiculously bonkers set of bash commands with hastily crafted awk/tr/etc. incantations.

chronic

What it is: a wrapper that runs a command and swallows its output unless the command fails.

What it does: run something under chronic and its stdout and stderr get buffered and thrown away on success. Exit nonzero (or crash) and chronic dumps everything it held back. The -v flag makes it verbose about the separation between stdout and stderr and reports the return value. The -e flag flips the trigger to fire on any stderr output, not just a nonzero exit.

The canonical use is cron. cron mails you whatever a job prints, so a chatty-but-successful job trains you to ignore its mail, which means you also ignore it the day it breaks. The usual fix is >/dev/null 2>&1, which throws the output away even when you need it.

Practical usage: wrap a nightly enrichment job so it stays silent when it works and screams when it doesn’t.

			
# crontab
15 2 * * * chronic /opt/censys/pull-cert-feed.sh

pull-cert-feed.sh can be as loud as it likes. Log lines, progress counters, rsync chatter – none of it reaches your inbox on a clean run. The morning it exits nonzero because the upstream endpoint moved or a disk filled, you get the full transcript, stdout and stderr in order, and you already know what happened before you SSH in.

combine

What it is: boolean set operations on the lines of two files.

What it does: combine file1 OP file2 where OP is and, not, or, or xor. and gives you lines in file1 that also appear in file2. not gives you lines in file1 absent from file2. or is the union, xor is the symmetric difference. A - reads stdin for either file.

One detail worth remembering: it is not commutative. combine a and b orders and dedupes based on file1, so it won’t necessarily match combine b and a. Inputs don’t need to be sorted. If you want commutative behavior, sort -u the result.

Practical usage: daily diffing of a host inventory without writing a comm/sort/join incantation you’ll misremember next week.

			
# hosts responding on 8443 today that weren't yesterday
combine today.txt not yesterday.txt > newly-exposed.txt
# hosts in your scan list that also show up in a known-bad feed
combine scan-targets.txt and threat-feed.txt > priority-review.txt

The first line is the “what changed” question you ask every morning. The second is a cheap intersection against intel. No sorting ceremony, no remembering which column join wants keyed.

errno

What it is: a lookup table for errno names, codes, and descriptions.

What it does: hand it a name and it prints the code and description. Hand it a number and it prints the name and description. errno ENOENT and errno 2 both land on “No such file or directory.” -l lists everything. -s WORD searches descriptions case-insensitively; -S does the same across every installed locale.

Practical usage: decoding syscall failures out of strace or a crash log without a browser tab.

			
$ errno ECONNREFUSED
ECONNREFUSED 111 Connection refused
$ errno -s timed
ETIMEDOUT 110 Connection timed out
ETIME 62 Timer expired

		

When a scanner or resolver spews a bare errno=111 and you’re three panes deep in a terminal, errno 111 is faster than remembering that 111 is a refused connection. The search mode — e.g., errno -s timed — surfaces every timeout-flavored error at once, which is handy when you’re staring at flaky connect behavior and want to know your options.

ifdata

What it is: interface information with output built for scripts instead of humans.

What it does: ifconfig and ip print for eyeballs; scraping them means regexes that break across versions and locales. ifdata prints one clean value per flag. -pa is the IPv4 address, -pn the netmask, -pN the network address, -pb the broadcast, -pm the MTU. -e tests existence and exits nonzero if the interface isn’t there. -pe prints a plain yes/no. The Linux-only stat flags (-si*, -so*) pull packet, byte, error, and drop counters straight out.

Practical usage: grab the source address a scan will egress from, with no parsing.

			
SRC=$(ifdata -pa eth0)
masscan -e eth0 --source-ip "$SRC" -p443 10.0.0.0/8 -oJ scan.json

No awk '{print $2}' fragile against whatever ip addr decides its output looks like this release. If you’re stamping scan metadata, ifdata -pm eth0 gives you the MTU as a bare integer to log alongside results. Fair warning: this tool is from the ifconfig era and its worldview is one-address-per-interface, so it’s a poor fit for modern multi-address setups. For the simple “what’s my v4 and MTU” question in a script, it’s still the least annoying answer.

ifne

What it is: run a command only if stdin has something in it. “if not empty.”

What it does: ifne COMMAND runs COMMAND only when its standard input carries at least one byte. -n reverses the logic – run the command when stdin is empty, and pass the input through otherwise.

Practical usage: kill alert spam at the source. A cron check that finds nothing should send nothing.

			
# only mail when there's actually something to report
grep -f watchlist-ips.txt /var/log/proxy/access.log \
  | ifne mail -s "watchlist hits" soc@censys.io

grep produces no output on a quiet day, so mail never runs, so the SOC doesn’t get an empty “watchlist hits” message they’ll start ignoring. The moment a watchlist IP shows up, the mail fires with the matching lines already in the body. Same pattern works for Slack webhooks, ticket creation, paging – anything you want gated on “there is real signal here.”

isutf8

What it is: a validator that tells you whether files are syntactically valid UTF-8.

What it does: point it at files, or pipe into it. It reports which files aren’t valid UTF-8 and sets exit status accordingly. -q stays quiet and lets you branch on exit status alone. -l prints only the names of bad files. -i inverts to list the clean ones. -v gives you a hexdump-style view of the offending bytes with context, which is the flag you want when you need to know what broke.

Practical usage: gate a data-ingest step. Banner grabs, scraped TLS fields, and scan JSON pick up mojibake and stray bytes, and a single bad record can wreck a load.

			
# find the junk before DuckDB does
isutf8 -l raw/*.jsonl > bad-files.txt
# or branch in a loader
for f in raw/*.jsonl; do
  if isutf8 -q "$f"; then
    duckdb intel.db "COPY banners FROM '$f' (FORMAT json)"
  else
    mv "$f" quarantine/
  fi
done

		

The -l pass gives you a fast triage list. The loop quarantines anything that would otherwise throw a decode error mid-COPY and leave you guessing which of four hundred files was the culprit. When you need to fix a file, isutf8 -v badfile.jsonl shows the byte offset and the surrounding context so you can see whether it’s a truncated multibyte sequence or someone’s Latin-1 that wandered in.

lckdo

What it is: run a program while holding a lock, so two copies don’t run at once.

What it does: lckdo LOCKFILE PROGRAM [args], used like nice or nohup. -w waits for the lock instead of failing; -W sec caps the wait. -s takes a shared lock, -x an exclusive one.

Read this part carefully!: lckdo is deprecated. util-linux ships flock, which does the same job, and moreutils’ own documentation says lckdo will be dropped from a future release. Reach for flock. (Tis sad that this name now has such a negative connotation thanks to late-stage capitalism.)

Practical usage: the thing you’d have used lckdo for, done with flock.

			
# don't let a slow scan overrun its next scheduled run
flock -n /var/lock/nightly-scan.lock /opt/censys/nightly-scan.sh

-n fails immediately if the lock’s held, so an already-running scan blocks the new one instead of stacking a second copy on top. If you want the new run to queue behind the old one, drop -n and it waits. lckdo lingers in old scripts. Use flock in anything you write today.

mispipe

What it is: pipes two commands together but returns the exit status of the first one.

What it does: mispipe "command1" "command2" behaves like command1 | command2 at the shell, except the shell hands you command2’s exit status and mispipe hands you command1’s. If command1 dies from a signal, mispipe adds 128 to the status the way a shell would.

This is a sharper tool than it might appear at first glance. The man page calls out a significant trap folks may fall into: bash’s pipefail is not the same. pipefail returns failure if any command in the pipe fails. mispipe cares only about the first.

Practical usage: you’re piping a scanner into a logger, and the exit code that matters is the scanner’s.

mispipe "zmap -p 443 -o - 10.0.0.0/8" "tee results.csv"

You want to know whether zmap succeeded. tee almost always succeeds, so plain-pipe semantics (last command wins) tell you nothing useful, and pipefail would muddy the scanner’s status with tee’s. mispipe reports exactly the code you care about, so your if / || handling keys off the scan and not the plumbing behind it.

parallel

I am not going to waste your time telling you about parallel since I refuse to believe there’s a single Drop reader who does not know.

For macOS folks, one caveat: you’ll need to remove parallel from homebrew since it comes along for the ride with this formula.

pee

What it is: tee, but the copies go to pipes instead of files.

What it does: pee "command1" "command2" ... runs each command and feeds every one of them a full copy of stdin. Their outputs all land on stdout. Unlike tee, pee does not also pass a copy of the input to stdout – if you want that, add cat as one of the commands: pee cat .... By default it ignores SIGPIPE and write errors, which you can flip with --no-ignore-sigpipe and --no-ignore-write-errors.

The payoff: you read the input once and split it to several consumers, which matters when the input is large, expensive to produce, or not seekable – a live capture, a stream off the network, a slow decompress.

Practical usage: hash a sample every way you need in a single read.

cat suspicious.bin | pee 'md5sum' 'sha1sum' 'sha256sum'

One pass over the file, three digests, instead of reading it three times. The stream version is dope:

			
# one read of the pcap, split to several analyzers
tcpdump -r capture.pcap -w - 'tcp' \
  | pee 'tshark -r - -Y http.request -T fields -e http.host > hosts.txt' \
        'tshark -r - -q -z conv,tcp > conversations.txt'

The capture streams through once and fans out to independent analyses, each getting the full input. For a multi-gigabyte capture, reading it once instead of once-per-question is the difference between a coffee break and a lunch.

ts

What it is: a timestamp tool for pipeline output.

What it does: prepends a timestamp to each line of stdin. The default format is %b %d %H:%M:%S (strftime(3)). %.S, %.s, and %.T extend to subsecond resolution. -r converts existing timestamps to relative times like 15m5s ago. -i reports the delta since the last line. -s reports the time since the program started. -m switches to the monotonic clock.

The relative mode is the killer sleeper feature. Just point it at a log with timestamps and it tells you how long each step took without mental subtraction.

Practical usage: stamp a long-running probe so you can trace duration between events at a glance.

			
# timestamp every line of a scan
zmap -p 443 -o - 10.0.0.0/8 | ts > timed-scan.log
# convert existing timestamps to elapsed times
cat run.log | ts -r

The first line annotates a raw scan with wall-clock times, so you know whether the gap between two results is a slow network hop or the scanner pausing between batches. The second answers “how long did each phase actually take” from a log that only has absolute timestamps.

vidir

What it is: edit a directory’s contents and filenames in your text editor.

What it does: opens the current directory (or specified files/directories) in $EDITOR as a numbered list. Delete a line to remove the file. Edit a filename to rename. Swap two numbers to exchange names. Pass - to read a file list from stdin. --verbose prints what it did.

No scripting required, no for f in *; do mv "$f" "${f//foo/bar}"; done incantation. Just the editor and a numbered list.

Practical usage: bulk-rename a batch of scanner output files without writing a rename loop.

			
# rename all pcap files at once
vidir *.pcap
# clean up stale scan artifacts
vidir /tmp/old-scans/

Open the editor, cw through each line, change the names, :wq. Done. The delete-to-remove behavior is the feature people discover second and use just as often – triage a directory of results by deleting the lines for files you no longer need.

vipe

What it is: insert your fav editor into the middle of a pipeline.

What it does: command1 | vipe | command2 captures command1’s stdout, opens it in $EDITOR, and pipes whatever you save back into command2. --suffix .csv gives the temp file an extension so your editor applies syntax highlighting.

Practical usage: hand-edit a data stream between pipeline stages.

			
# review and clean scanner output before loading it
zmap -p 443 -o - 10.0.0.0/8 | vipe | duckdb intel.db "COPY banners FROM '/dev/stdin'"
# filter a config list and hand-tweak before using
grep -r "listen_addr" /etc/ | vipe --suffix csv | column -t

The first line lets you scrub a scan result before it hits the database – drop misparsed lines, fix mojibake, trim fields – without touching the scanner or writing a cleanup script. The second pipes a grep result through your editor for fast interactive filtering before formatting.

zrun

What it is: transparently decompress compressed file arguments for any command.

What it does: zrun command file.gz decompresses file.gz to a temp file and runs command against it. Supports gz, bz2, Z, xz, lzma, lzo, zstd. If you create a symlink like zprog -> zrun, running zprog file.gz is equivalent to zrun prog file.gz. Temp files are not written back, so it is read-only.

Practical usage: run a tool against compressed data without decompressing by hand.

			
# grep compressed logs without gunzip first
zrun grep "192.168.1.1" access.log.gz
# count unique IPs across compressed files
zrun awk '{print $1}' scan-results.json.xz | sort -u | wc -l

The symlink trick is worth remembering: if you reach for a compressed file often enough, sudo ln -s $(which zrun) /usr/local/bin/zless gives you zless bigfile.gz without thinking about it.

FIN

Remember, you can follow and interact with the full text of The Daily Drop’s free posts on:

🐘 Mastodon via @dailydrop.hrbrmstr.dev@dailydrop.hrbrmstr.dev
🦋 Bluesky via https://bsky.app/profile/dailydrop.hrbrmstr.dev.web.brid.gy

☮️