Bonus Drop #46 (2024-03-24): Method To The Madness

madonctl; csvlens; Doing Great Things With DuckDB

The three resources on tap for today’s Bonus Drop are linked, but you won’t know how until the final section. As such, no TL;DR (I mean, peeking is no fair!). I will provide hints throughout as to how they are linked together.

madonctl

I do not know how many folks who read these Drops are on Mastodon (trying to get the terminology right so as not to invoke the wrath of the daft purists in the Fediverse), but I suspect more than a few are. All of us likely use a diverse array of GUI clients (Ivory/Mona here), but Mastodon can be experienced just fine in the terminal, and can even be used as a data source, of sorts (HINT).

One client that that is great for both human and coder consumption is madonctl, a robust and nigh feature complete Golang client for interacting with Mastodon.

The README and places it points do do a fine job showcasing what it can do. I will offer up a few assertions to tack on to them.

First, use access tokens, not your password. You can revoke the token if it is compromised, and it is something the system generates, not you, so it will be fairly unique. The credentials go in ~/.config/madonctl/madonctl.yaml, and here’s mine, sans private info:

instance: "mastodon.social"
login: "hrbrmstr"
token: "JHMnkS_---TOKEN-GOES-HERE_upE"

I will, occasionally, use madonctl timeline to peruse my timeline. I do this on especially busy days when I’m knee deep in terminal land.

One cool feature of madonctl is that it knows you’re cool, and may want the output in a way you can manipulate it. As such, I can get my timeline — which can be the reverse chronological view of posts, the public timeline of the instance, a list’s timeline or a hashtag timeline (HINT). And, you can have it scrollback prett far with the --limit # CLI parameter (HINT).

You never know when you’re going to want or need access to the underbelly of Mastodon data, and this is a great cross-platform tool to do so.

csvlens

An rg -i csvlens says we haven’t covered this before, and I feel horrible about that.

csvlens is a CLI tool for viewing CSV files.

It can take output from a file or stdin (HINT), handle different delimiters than a ,, reacts beautifully to terminal resizing, and is very, very performant.

We can use regular expressions to select columns to display, filter rows to display, and find + highlight matching strings.

It does one thing, and does it well.

Doing Great Things With DuckDB

I, once more, came across folks who do not know about DuckDB and all the amazing things it can do. I feel compelled to do something about this, so I’m putting together a “cookbook”/recipes”-esque epub dubbed “Cooking With DuckDB“. It will cover all the bumps, bruises, and AHA!’s I’ve had along the way in my DuckDB journey, and will include sections on the CLI, use in things like R, WASM DuckDB and making apps with it.

I haven’t linked the Codeberg home to the Quarto book yet, but will. And, you can and should file issues there if you want any specifics covered.

What does this have to do with today’s Drop? Well, how about we combine madonct, DuckDB, and csvlens to see who posts what and when to the #RStats hashtag on Mastodon?

Let’s get some toots!

madonctl timeline "#rstats" --limit 2000 --output json > 2024-03-25-rstats-fediverse.json

You can totally pipe JSON/CSV/etc. to DuckDB via stdin, but making unnecessary repeated calls to Mastodon servers is disrespectful, so we save them off.

Now, let’s pull the username, and date of #RStats toot from the JSON, count up the times of tooting to that hashtag per-day, per-user, then pivot the data from long to wide:

CREATE TABLE x AS (
  WITH ts AS (
    FROM read_json_objects('/dev/stdin')
    SELECT 
      json_extract_string(json, '$.account.username') AS user,
      substr(json.created_at, 2, 10) AS day
  )
  FROM ts
  SELECT
    day,
    user,
    COUNT(*) AS n
  GROUP BY 1, 2
  ORDER BY 1, 3 DESC
);

PIVOT x ON day USING sum(n) GROUP BY user

Gosh that looks quite a bit like some {tidyverse} operations, doesn’t it?

But, that display is truncated, and most terminals do not horizontally scroll.

I know! Let’s output it to csvlens where we can scroll around to our heart’s content (this is all the steps in one Bash view, and the result is the section header):

$ madonctl timeline "#rstats" --limit 2000 --output json | tee 2024-03-23-rstats-fediverse.json

$ duckdb -csv -c "
CREATE TABLE x AS (
  WITH ts AS (
    FROM read_json_auto('2024-03-24-rstats-fediverse.json')
    SELECT 
      account.username AS user,
      substring(created_at, 1, 10) AS day
  )
  SELECT
    day,
    user,
    COUNT(*) AS n
  FROM ts
  GROUP BY 1, 2
  ORDER BY 1, 3 DESC
);

PIVOT x ON day USING sum(n) GROUP BY user;
" | csvlens

Do we even need R or Python anymore?! (I kid.)

Give those steps a go on your own, submit requests for the mini-book, and start having fun with data again!

FIN

Remember, you can follow and interact with the full text of The Daily Drop’s free posts on Mastodon via @dailydrop.hrbrmstr.dev@dailydrop.hrbrmstr.dev ☮️

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.