Bonus Drop #59 (2024-08-31): A Flare For Change

Keeping Up With DuckDB Core Extension Changes

Tis bit of a single-issue Bonus Drop, this weekend, that walks you down the same rabbit hole I wandered into.

As I have indicated, I’m building a DuckDB extension to read packet capture (PCAP) files, and actually used it to convert some PCAPs from our new sensors to look at the data. While said extension can be used for ad-hoc queries directly against PCAPs, folks should strongly consider turning PCAPs into Parquet or, even better, DuckDB databases, since all operations will be much faster.

While I usually prefer Parquet files (for interoperability), DuckDB databases have a native INET data type thanks to a core extension provided by the DuckDB team. Under the hood, IPs are just (big) numbers, so you get to do way faster comparisons and sorting than one would using just plain strings. Many other databases support INET/IPADDRESS data types, and I went looking in the DuckDB docs for how to test if an IP is in a particular subnet. It turns out there is no way to do this in DuckDB v1.0.0. O_O

I’m pretty handy with C++, so I went to the extension source to see how hard it would be to do a PR, when I saw that it most certainly did have said functionality, as well as a couple fancy new functions to perform HTML escaping (I’m not convinced this extension is where that belongs, but I don’t own DuckDB, so who am, I to say?.)

Now I was curious what else is “coming soon” in core extensions!

The src/include/duckdb/main/extension_entries.hpp file enumerates all the core extension functions, and the entries look like this:

{"->>", "json", CatalogType::SCALAR_FUNCTION_ENTRY},
{"<<=", "inet", CatalogType::SCALAR_FUNCTION_ENTRY},
{">>=", "inet", CatalogType::SCALAR_FUNCTION_ENTRY},

Thank the Great Maker for source code auto-formatting rules!

Sure, I could diff this file from the release version and current main trunk version, but who wants “easy”?

Why not build a small app that visualizes what extension functions have been added since the last release?

You can check it out here before reading on.

Wiring Up GitHub’s API To Observable Plot

GitHub’s API has a very generous CORS policy. That means we can write browser-based, client-side JavaScript, and avoid any server work.

We’re going to need a few items of data to make our “visual diff’r”:

  • the current source file: https://api.github.com/repos/duckdb/duckdb/contents/src/include/duckdb/main/extension_entries.hpp
  • metadata about the most recent release: https://api.github.com/repos/duckdb/duckdb/releases/latest
  • the same source file from ^^ (https://api.github.com/repos/duckdb/duckdb/contents/src/include/duckdb/main/extension_entries.hpp?ref=LATEST_RELEASE_TAG_NAME,)

Provided the DuckDB keeps the start and markers consistent, we can target, extract, and turn the extension + function items into a slash (/) separated “path” string, suitable for use in D3 hierarchy contexts, like Observable Plot’s Tree mark and transforms:

const extractExtensionFunctionsFlare = (input) => {
  const startMarker =
    "static constexpr ExtensionFunctionEntry EXTENSION_FUNCTIONS[] = {";
  const endMarker = "}; // END_OF_EXTENSION_FUNCTIONS";
  const escapedStartMarker = startMarker.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
  const escapedEndMarker = endMarker.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
  const regex = new RegExp(
    `${escapedStartMarker}\\s*\\n([\\s\\S]*?)\\n\\s*${escapedEndMarker}`,
    "m",
  );
  const [, match] = input.match(regex) || [];

  if (match) {
    return match
      .split("\n")
      .map((line) => line.trim())
      .filter(Boolean)
      .map((d) =>
        d
          .split('", ', 3)
          .slice(0, 2)
          .map((d) => d.replace(/^{|"/g, "")),
      )
      .map(([func, namespace]) => `DuckDB/${namespace}/${func}`);
  }
  return [];
};

That function will make an array of strings like this:

DuckDB/json/->>
DuckDB/parquet/add_parquet_key
DuckDB/json/array_to_json
DuckDB/fts/create_fts_index
DuckDB/icu/current_localtime
DuckDB/icu/current_localtimestamp

And, all we have to do is feed the data to Plot.tree:

Plot.tree(ddb_functions, {
  textStroke: "var(--text-stroke-color)",
  fill: (d) =>
    rls_functions.includes(d)
      ? "var(--fill-color-existing)"
      : "var(--fill-color-new)",
  fontFamily: "monospace",
}),

Now, those var(…)s help us avoid bloating either CSS or JS code to handle dark/light mode, with a little help from a neat root CSS trick:

:root {
  --color-scheme: dark light;
  --text-stroke-color: light-dark(white, black);
  --fill-color-new: light-dark(black, white);
  --fill-color-existing: gray;
}

that I learned from Roma Komarov‘s post on “Querying the Color Scheme”.

And, now, all one has to do to see what’s core extension functions have been added since the last release of DuckDB is visit: https://playground.hrbrmstr.dev/duckdb-functions/.

It’s still just vanilla HTML/CSS/JS, so “view-source:” gets you the code, but you can also hit up the Codeberg repo for it.


FIN

The mini-app started as a single, sparse index.html file with CDN-loaded Plot. I separated the concerns (HTML/CSS/JS) and used my esmdl tool to un-CDN-ify the vendor JS libs.

Remember, you can follow and interact with the full text of The Daily Drop’s free posts on Mastodon via @dailydrop.hrbrmstr.dev@dailydrop.hrbrmstr.dev ☮️

One response to “Bonus Drop #59 (2024-08-31): A Flare For Change”

  1. Matt Peyton Avatar
    Matt Peyton

    Apologies if you’ve covered this already and I missed it, but I recently saw that Quarto can’t use ojs blocks in stand alone files because of a change in the way they do module imports.

    https://github.com/quarto-dev/quarto-cli/issues/6371

    This was a big disappointment to me because I used to love generating interactive reports using Quarto in RStudio and being able to just email those files around the company.

    I’m wondering if something like your esmdl could help me get around this limitation or if you have other thoughts. It’s really forced me to move away from Quarto which I otherwise love.

    Best,

    Matt

    Like

Leave a reply to Matt Peyton Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.