Drop #775 (2026-02-23): Another Regularly Expressive Drop

portit; MinRX; regexle

Today, we take an unexpectedly themed ride from an lsof TUI to…playing games? o_o

TL;DR

(This is an LLM/GPT-generated summary of today’s Drop. Ollama and MiniMax M2.1.)

Portit is a minimal Rust TUI for inspecting listening TCP ports and killing processes, which can be largely replicated with a custom Bash shell function using lsof, ps, and awk (https://github.com/odysa/portit)
GNU gawk 5.4.0 introduces MinRX, a new non-backtracking POSIX ERE regex engine by Mike Haertel that uses Stacked NFAs to provide polynomial-time matching with genuine POSIX compliance (https://github.com/mikehaertel/minrx)
Regexle is a daily hexagonal crossword puzzle game combining Wordle mechanics with regular expressions, where players fill cells with alphanumeric sequences that match the regex patterns along each edge (https://regexle.com/)

portit

portit is a _“minimal Rust TUI for inspecting listening TCP ports and killing processes”. It’s, essentially, a TUI over the venerable lsof, and designed to speed up the “I need to kill whatever is listening on X” workflow.

The TUI also uses ps and (if filtering and killing) grep + kill. It shells out to lsof -iTCP -sTCP:LISTEN -P -n, parses the whitespace-delimited output line by line, then makes a second call to ps -ww -p <pids> -o pid=,command= to get full command lines. Results are deduped by (PID, port) and sorted by port number.

You can get most of the way there without adding another binary dependency to your system. I modified my listening() shell function:

			
listening() {
  if [ $# -eq 0 ]; then
    sudo lsof -iTCP -sTCP:LISTEN -n -P
  elif [ $# -eq 1 ]; then
    sudo lsof -iTCP -sTCP:LISTEN -n -P | grep -i --color $1
  else
    echo "Usage: listening [pattern]"
  fi
}

		

to work a bit more like portit:

			
listening() {
  if [ $# -gt 1 ]; then
    printf "Usage: listening [pattern]\n" >&2
    return 1
  fi
  local raw
  raw=$(sudo lsof -iTCP -sTCP:LISTEN -n -P 2>/dev/null) || return 1
  local pids
  pids=$(printf '%s\n' "$raw" | awk 'NR>1 {print $2}' | sort -u | paste -sd, -)
  if [ -z "$pids" ]; then
    printf "No listening ports found.\n"
    return 0
  fi
  local cmds
  cmds=$(ps -ww -p "$pids" -o pid=,command= 2>/dev/null)
  printf '%s\n' "$raw" | CMDS="$cmds" awk '
    BEGIN {
      n = split(ENVIRON["CMDS"], lines, "\n")
      for (i = 1; i <= n; i++) {
        line = lines[i]
        gsub(/^[ \t]+/, "", line)
        split(line, parts, /[ \t]+/)
        p = parts[1] + 0
        sub(/^[ \t]*[0-9]+[ \t]+/, "", line)
        if (p > 0) pidcmd[p] = line
      }
    }
    NR == 1 {
      printf "%-12s %-8s %-6s %-22s %-7s %s\n", "PROCESS", "PID", "PROTO", "ADDRESS", "PORT", "COMMAND"
      next
    }
    {
      pid = $2 + 0
      addrport = $(NF-1)
      n2 = split(addrport, ap, ":")
      port = ap[n2]
      addr = substr(addrport, 1, length(addrport) - length(port) - 1)
      if (addr == "") addr = "*"
      cmd = (pid in pidcmd) ? pidcmd[pid] : "-"
      printf "%-12s %-8s %-6s %-22s %-7s %s\n", $1, pid, "TCP", addr, port, cmd
    }
  ' | if [ $# -eq 1 ]; then
    head -1
    grep -i --color "$1"
  else
    cat
  fi
}

		

With a few tweaks and some Charm utils you may already have installed, you can likely fully replicate the TUI in Bash.

Still, it is a focused app that does one thing well, and we at the Drop do like those types of utilities.

The neat thing about me “having” to modify my shell function is that it provided a segue to the middle section, since my shell hack uses awk and the “MinRX” section is all about GNU awk’s new regex engine.

MinRX

When I read the announcement regarding the release of Gawk 5.4.0, I learned about the switch to a new regular expression library, MinRX, by Mike Haertel (author of GNU grep). This is a major change, and since it may break existing scripts, GAWK_GNU_MATCHERS lets folks switch back to the now legacy engine (until 5.5.0 comes out).

I hadn’t heard about MinRX before, so I hit up the repo and the intertubes to learn a bit more about it, since a new regex engine in GNU land is a pretty big deal.

MinRX is a POSIX Extended Regular Expression (ERE). The current development focus is on correctness and simplicity, with performance improvements planned later. Arnold Robbins (gawk maintainer) apparently pestered Haertel for years to write it and then enthusiastically tested early versions against gawk’s test suite.

It introduces what Haertel calls a “stacked NFA,” which is (as one might expect) an NFA (nondeterministic finite automata) augmented with a stack of arbitrary integers. NFA is a type of finite state machine where, given a current state and an input symbol, there can be multiple possible next states (or none at all). Conversely, in a deterministic finite automaton (DFA), every state has exactly one transition for each input symbol. So, in an NFA, you can think of the machine as exploring all possible paths simultaneously. If any path leads to an accept state, the input is accepted.

This new library is a non-backtracking matcher, which means it makes a single forward scan through the input, one character at a time, considering all possible matches in parallel. This is a key feature, as it means one can’t craft a pathological regex that causes exponential blowup the way one can with backtracking engines (PCRE, Python’s re, etc.).

The SNFA is more powerful than a traditional NFA because the stack lets it track subgroup match positions (perhaps the most difficult part of POSIX regex matching). A regular NFA can tell you if a string matches, but POSIX semantics require reporting which subgroups matched what, following leftmost-longest rules. The stack machinery handles this without backtracking.

Unlike existing GNU matchers, MinRX aims to fully follow the POSIX 2024 standard for extended regular expressions. In the process, it reveals a spot where the POSIX spec actually contradicts itself: when a pattern like ABC can be split up multiple ways and still produce the same overall match, the spec’s grammar says “feed the left part first” but one of its own examples says “feed the right part first.” MinRX goes with left-first, since it’s faster and matches how repetition operators like * and + already behave.

Of note: it deliberately does not support POSIX Basic Regular Expressions (BREs), and Haertel’s reasoning is principled: BREs include backreferences, which make them not true regular expressions. They don’t correspond to finite automata at all, and matchers for them have exponential worst-case complexity. Haertel argues a modified SNFA could probably match BREs but would have exponential space complexity, making it broadly useless.

The library is C++20, but Haertel plans to rewrite it in C once the algorithm stabilizes. He explicitly says folks using the library anywhere should not have any expectations regarding performance (including the rxgrep tool that comes along for the ride), as speed optimization are planned, but not yet implemented. The gawk integration means this is now running in production across a huge number of systems, which should flush out remaining correctness issues quickly.

So, gawk now has a regex engine that is provably non-backtracking with polynomial time guarantees and genuine POSIX compliance, written by someone with deep expertise in this exact problem space. This is a spiffy foundation for any part of the GNU ecosystem that uses regex to build upon.

regexle

The MinRX rabbit hole ate up way more time than it should have, so we’ll keep this section tight.

As the name implies, regexle is a homunculus of Wordle and regular expressions. You get a daily hexagonal crossword where you fill cells with alphanumeric sequences that must fully match the regular expressions listed along each edge (empty cells count as spaces). Rules turn bold green when satisfied, red when not; selecting a cell underlines the rules along its axes, and you navigate with Tab/Shift-Tab (left-to-right, top-to-bottom by default), or click a rule to constrain traversal to that axis. A lightbulb icon toggles hint mode, highlighting correct cells green and incorrect ones yellow while tracking hint usage. For regex help, try regex101.com.

FIN

Remember, you can follow and interact with the full text of The Daily Drop’s free posts on:

Mastodon via @dailydrop.hrbrmstr.dev@dailydrop.hrbrmstr.dev
Bluesky via <https://bsky.app/profile/dailydrop.hrbrmstr.dev.web.brid.gy>

☮️