Drop #499 (2024-07-12): Hulk BASH!

Some Quick Bash Tricks For Fun And Profit¹

I had to do some grungy work at the CLI this week. In doing so, I realized that I’d actually been keeping up jotting things down in Bear notes when I went to it for some Bash references that I had managed to stash away in there, over the past few months (I’ve been increasingly relying on it as a brain extension due to the long covid stuff).

So, rather than some third-party tool references, today’s Drop is just a collection of some Bash idioms I rarely see used in most scripts I come across, but that can be quite useful. Each section will start with an example and then work through it.

Please note that some of these require Bash 4+. macOS folks will need to rely on Homebrew (et al.) for that.


Process Substitution

while read line; do
  echo "Processing: $line"
done < <(ls -1 *.txt)

I’m not sure why we don’t see the <(COMMAND) / < <(COMMAND) idioms more. It’s pretty handy! It avoids creating temporary files; the while loop runs in the current shell, not a subshell, so variables set inside the loop persist after the loop ends; and, it allows for direct processing of command output in a line-by-line manner.

Process substitution allows a command’s input or output to appear as a file, which enables direct reading from or writing to another program. It exits to make it possible to provide data to programs that only accept files as inputs/outputs.

Let’s break down the example:

  1. ls -1 *.txt: This command lists all .txt files in the current directory, one file per line.
  2. <(ls -1 *.txt): This process substitution runs the ls command and makes its output available as if it were a file.
  3. < <(ls -1 *.txt): The outer < redirects the output of the process substitution as input to the while loop.
  4. while read line; do ... done: This loop reads each line from the input (in this case, each filename output by ls).
  5. echo "Processing: $line": For each filename read, this line prints a message indicating its being processed. You can, of course, do anything you want with that data.

Note the space between the < < is important, and the input version has a sibling output version: > >(COMMAND). One more common use for the output version is something like:

SOMEOTHERPROCESSTHATMAKESALOTOFBYTES | tee >(gzip > file.gz) >(bzip2 > file.bz2) > file

This uses output process substitution with tee command to simultaneously output data to multiple destinations, including compressed files. Let’s break this example down, too:

  1. tee: This command reads from standard input and writes to both standard output and files.
  2. >(gzip > file.gz): This is process substitution. It runs the gzip command and redirects its output to file.gz. The >() syntax makes this appear as a file to tee.
  3. >(bzip2 > file.bz2): Similar to the previous one, but uses bzip2 compression and outputs to file.bz2.
  4. > file: This is a regular output redirection to file.

What’s happening with the data is:

  1. Some data is piped into this command (not shown in the example).
  2. tee takes this input and:
    • Writes it uncompressed to file
    • Sends it to gzip, which compresses it and writes to file.gz
    • Sends it to bzip2, which compresses it and writes to file.bz2
    • Also outputs it to stdout

This allows you to create three versions of the same data simultaneously:

  • An uncompressed version in file
  • A gzip-compressed version in file.gz
  • A bzip2-compressed version in file.bz2

This is super useful when you want to process or store data in multiple formats without having to read it multiple times or create intermediate files. It’s an efficient way to create backups, compressed versions of data streams, or, say, turn some JSON into multiple formats all at once.

Indented Heredocs

if true; then
    cat <<-EOF
    I am a heredoc is inside an if statement!
    I'm indented to match the surrounding code.
    This makes the script more readable.
EOF
fi

An indented heredoc uses the <<- syntax instead of the standard << for heredocs. The hyphen (-) tells Bash to ignore leading tab characters in the heredoc content and the closing delimiter. I really wish I used them more in the past, and also wish more folks used them now.

Please note that:

  • only tab characters are ignored (you can’t copy/paste that example due to “WordPress”); if you use spaces for indentation, they will be preserved in the output.
  • the closing delimiter (EOF in these examples) must be at the start of the line with no leading whitespace.
  • using <<- instead of << is what enables this tab-stripping behavior.

readarray + jq == 💙

readarray -t tags < <(jq -c '.tags[]' tags.json)
for tag in "${tags[@]}"; do
  echo "$tag"
done

If you’ve got the memory (and Bash 4+), readarray can help save you extra operations by reading in data into an associative array (in the example, the keys are numeric), which lets you use that data again without re-reading it.

I had to do a bunch of different ops with our tags (I talk about our work data alot, apologies), and this made the operations way more efficient. Let’s break this down, too:

  1. readarray -t tags < <(jq -c '.tags[]' tags.json)
    • jq -c '.tags[]' tags.json: This command uses jq to process the tags.json file. It extracts all elements from the tags array in the JSON file and outputs them in compact form (the -c flag).
    • <(...): This is process substitution! It makes the output of the jq command appear as a file.
    • readarray -t tags < ...: This reads the output from the process substitution into an array named tags. The -t option trims trailing newlines from each line read.
  2. for tag in "${tags[@]}"; do
    This starts a loop that iterates over each element in the tags array.
  3. echo "$tag" (do something more useful than this pls)

The @ symbol in "${tags[@]}" is used to reference all elements of the tags array. It expands to all elements of the array, with each element treated as a separate word. When used inside double quotes like "${tags[@]}", it preserves the integrity of each array element, even if the elements contain spaces or special characters (so, no word splitting). This syntax ensures that each element of the array is processed separately in the loop, regardless of any whitespace or special characters within the elements. You could use * instead, but that has no similar preservation guarantees.

mapdef / readarray has some cool options that you may want to check out.

UseView The Source, Dude!

readarray is one of Bash’s “builtins”. They’re just magically there, waiting for you to use them. They’re written in C, and to get a better idea of how they really operate, you can view their source code. As intuited in the previous section readarray is really mapdef, and you can see what it does right here.

Checking out the source may help you workaround any issues you run into, or learn a ton more about the functionality of each builtin.

And, if you’re not a C expert, you can always phone an AI friend

FIN

¹ Warning: Use of these Bash tricks may result in sudden bursts of productivity, uncontrollable urges to automate everything, and an inexplicable desire to speak in command-line syntax. Side effects may include elevated geek status, spontaneous terminal opening, and the ability to bend computers to your will. Profits not guaranteed, but fun is virtually assured. The author accepts no responsibility for late-night coding sessions, neglected social lives, or the inevitable “Just one more script” syndrome. Use at your own risk, and may the force of the command line be with you.

Remember, you can follow and interact with the full text of The Daily Drop’s free posts on Mastodon via @dailydrop.hrbrmstr.dev@dailydrop.hrbrmstr.dev ☮️

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.