Bonus Drop #5 (2023-03-05): Search. Store. Extract

ugrep; Foam; textra

Today’s Bonus Drop features three diverse apps. The last two hit the cutting room1 floor this week, and the first one is included thanks to a short exchange on Mastodon.

ugrep

In the world of command line utilities, there are few tools more ubiquitous than the beloved grep. As you very likely know, grep is a seriously powerful utility that enables you to search for specific patterns within text files, making it an essential tool for developers, system administrators, and anyone who works with text-based data. However, there are times when the standard grep utility falls short in terms of functionality. In fact, even the modern (and beloved) ripgrep — which many of us use instead of grep — is not “all powerful”, and lacks many of the batteries that come included with the utility that this section features.

The ugrep program, by Genivia Research Labs, is a search tool, available for Linux, macOS, and Windows — that offers several advantages over the aforementioned tools. It’s a fast and highly configurable search tool that supports a wide range of features, including advanced regular expressions, colorized output, support for searching binary files, and — as we will see — much more.

Here are some of the batteries included with ugrep that set it apart from other CLI search tools:

  • advanced regular expressions: you can search for complex patterns within files. This includes support for features like lookarounds, named captures, and more.

  • support for searching binary files: you are not limited to searching just within text files, which can be super handy for cybersecurity folks and other researchers

  • colorized output: while some greppy tools are stuck in the early 1940’s2, ugrep can display search results in color, making it easier to identify matches and other important information at a glance.

It also has some crazy cool extra features/options, such as:

  • --cpp: enables C++ syntax highlighting, making it easier to identify matches within C++ code.

  • --csv & --json: formats search results as CSV or JSON, making it easy to import search results into other tools.

  • --compressed: allows ugrep to search through compressed files (e.g. gzip, bzip2, etc.), without the need to manually decompress them first.3

  • --hexdump: displays search results as a hex dump, making it easier to identify binary patterns within files.

  • --hex: enables hexadecimal output, making it easier to search for specific byte sequences within binary files.

  • --fuzzy: enables fuzzy matching, allowing users to search for approximate matches of a specific pattern.

  • --decompress: allows ugrep to automatically decompress files before searching, making it easier to search through large archives or compressed data sets.

But wait, there’s more!!

There is also a TUI (text user interface) query mode (--query) in ugrep. This feature provides an interactive way to search for patterns within files. With it, you can refine search criteria on the fly, without having to exit and re-run the search command.

The top panel displays the current search pattern, along with any search options that have been selected. The middle panel shows the search results, with matched lines highlighted in color. The bottom panel provides a command prompt where users can enter search options or refine the search pattern.

To begin a search, simply enter a search pattern at the command prompt and press enter/return; ugrep will then display any matching results in the middle panel, and you can then refine the search by entering additional options or modifying the search pattern.

I could go on (yes, there are even more features!), but I know y’all are 100% capable of typing man ugrep or ugrep --help (or, even, ug --help, since most package managers will drop that alias for ugrep on your system when you install it).

Foam

waves crashing at the shore during daytime

I’ve mentioned before that, at $WORK, we use Notion as our knowledge graph manager of choice.

I detest (yes, detest) Notion’s UX. I find it very clunky, and end up spending 1.25x the effort to get things to look the way I want them to look. This could be me, but I’ve played with a ton of similar tools, and most have a far more well-thought-out UX.

Much like how Alton Brown, the host of the popular show “Good Eats”, tries to find and use kitchen tools/gadgets that have multiple functions, I do my best to avoid “app sprawl”, and seek out ones that resemble a “Swiss army knife”.

One of the knowledge graph keepers I came across recently is Foam [GH]. It’s named in homage to Roam Research’s knowledge graph app, but it is built upon Visual Studio Code (VS Code from now on) and GitHub. I’m, sadly4, in VS Code and GitHub all the time. So, when I can find something that works in it, the “digital Alton Brown” side of me is very pleased.

Just like other knowledge graph tools, Foam can be used to organize your research, keep re-discoverable notes, craft long-form content and even publish content from the knowledge graph to the web.

I am only, now, beginning to work with Foam, so look for a longer section in an upcoming M-F Drop; but, I figured I’d share this with you, extra spiffy Friends of The Drop, so you could get an advance look at this new find of mine.

textra

This is a macOS 13+ app, so if that’s not something you run with, you may not get much value from this section.

Apple was one of the first companies to bake some “data science” frameworks into their operating system (and chips). They expose these features in fairly powerful frameworks (Apple’s fancy word for APIs/libraries) to developers.

One example of the utility of these frameworks is swiftspeech, a small R package I threw together a few years ago. It lets researchers (who use R) classify parts of speech using Apple’s CoreML and NaturalLanguage Libraries.

Another, more recent and more general purpose example is textra, a command line utility, crafted by Dylan Freedman, which enables you to extract text from images, PDFs, and audio files using Apple’s Vision and Speech APIs.

Here are some examples, ripped from Dylan’s README:

  • textra audio.mp3: Extract the text from “audio.mp3” and output to stdout

  • textra page1.png page2.png -o combined.txt: Extract the text from “page1.png” and “page2.png” and output the combined text to “combined.txt”

  • textra doc.pdf -o doc.txt -t doc/page-{}.txt: Extract text from “doc.pdf” and output in two formats: 1) combined text of all the pages stored in “doc.txt” and 2) positional text from each page extracted at the pattern “doc/page-{}.txt” (e.g., “doc/page-1.txt”, “doc/page-2.txt”, etc.)

  • textra image1.png -o text1.txt image2.png -o text2.txt: Extract text from “image1.png” and output at “text1.txt”; extract text from “image2.png” and output at “text2.txt”

  • textra image.png --outputPositions positionalText.json: Extract positional text from “image.png” and output at “positionalText.json”

While I mentioned I like “Swiss army knife” apps, there is always a place in my bin for useful ones — like textra — that do one thing, and do it very well.

FIN

Thank ye all for your extra support! Catch y’all Monday! ☮

1

It occurred to me that this expression may no longer be familiar to folks who’ve only experienced the modern vernacular. If you’re unfamiliar with it, head on over to Webster for some clarification.

4

I have orders of magnitude more disdain for Microsoft than I do Notion

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.