Drop #767 (2026-02-04): If It Walks Like A…

duckdb_mcp; (It’s alwasy) dns; 🪵

Monday was a fairly overwhelming day (the entirety of 2026 has kind of been overwhelming tbh), hence no Drop, but I can use the normally Drop-free Wednesday to make up for it.

As I noted on Mastodon a bit ago, there are SO MANY AMAZING NEW DuckDB COMMUNITY EXTENSIONS!

Today, we cover two more, but look at one likely underutilized DuckDB capability.


TL;DR

(This is an LLM/GPT-generated summary of today’s Drop. Ollama and MiniMax M2.1.)


duckdb_mcp

Photo by Designecologist on Pexels.com

Long time Drop readers know I’m a yuge fan of the Model Context Protocol (MCP). I build new ones all the time for work, and mostly use them in an LLM context. Those same readers will also know that I almost always note that the utility of MCPs is not limited to an LLM context. They’re generic local stdio or http remote servers that use a well-defined JSON protocol, and usually emit actual structured JSON records in responses to tool calls.

Teague Sterling groks this and built something I can finally reference when pimping MCPs: duckdb_mcp (GH). It’s an MCP extension for DuckDB that enables seamless integration between SQL databases and MCP servers. The extension provides both client capabilities for accessing remote MCP resources via SQL and server capabilities for exposing database content as MCP resources. We’re covering the first use case today.

One of my MCP servers processes JA4 hashes, and I’ve been itching to make a DuckDB extension for it. I haven’t dipped my toes back into those DuckDB extension development waters in a while, and I belive some things have changed, but all I remember is that it was a hades-like experience with C++, vcpkg, and slow build times due to the way it had to pretty much rebuild parts of DuckDB every time.

But now I don’t have to build any extensions for DuckDB since I can just wire them up to MCP servers! Here’s an example of using it to see what tools my JA4 MCP server has avaialble:

INSTALL duckdb_mcp FROM community;
LOAD duckdb_mcp;
WITH tools AS (
SELECT unnest(from_json(
(SELECT (mcp_list_tools('ja4')).tools),
'["JSON"]'
), recursive:=true) AS tool
)
SELECT
tool->>'name' AS name,
tool->'inputSchema' AS input_schema
FROM tools;
┌──────────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ name │ input_schema │
│ varchar │ json │
├──────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ analyze_ja4 │ {"type":"object","properties":{"fingerprint":{"description":"The JA4 fingerprint to analyze (format: a_b_c)","type":"s… │
│ analyze_ja4h │ {"type":"object","properties":{"fingerprint":{"description":"The JA4H fingerprint to analyze (format: a_b_c_d)","type"… │
│ analyze_ja4s │ {"type":"object","properties":{"fingerprint":{"description":"The JA4S fingerprint to analyze (format: a_b_c)","type":"… │
│ analyze_ja4t │ {"type":"object","properties":{"fingerprint":{"description":"The JA4T fingerprint to analyze (format: windowsize_tcpop… │
│ analyze_ja4x │ {"type":"object","properties":{"fingerprint":{"description":"The JA4X fingerprint to analyze (format: a_b_c)","type":"… │
│ compare_fingerprints │ {"type":"object","properties":{"fingerprint1":{"description":"First fingerprint to compare","type":"string"},"fingerpr… │
│ database_stats │ {"type":"object"} │
│ detect_patterns │ {"type":"object","properties":{"fingerprints":{"description":"Array of fingerprints to analyze (minimum 2)","type":"ar… │
│ get_investigation_… │ {"type":"object","properties":{"fingerprint":{"description":"The fingerprint to analyze for investigation tips","type"… │
│ refresh_database │ {"type":"object"} │
│ search_database │ {"type":"object","properties":{"limit":{"description":"Maximum number of results to return (default: 50)","type":"numb… │
├──────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ 11 rows 2 columns │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Most MCP servers provide that list in their documentation, and you can have an LLM give you them:

analyze_ja4(fingerprint: string, include_database_lookup?: boolean = true)
- Analyze a JA4 (TLS Client) fingerprint with format a_b_c
analyze_ja4h(fingerprint: string, include_database_lookup?: boolean = true)
- Analyze a JA4H (HTTP Client) fingerprint with format a_b_c_d
analyze_ja4s(fingerprint: string, include_database_lookup?: boolean = true)
- Analyze a JA4S (TLS Server) fingerprint with format a_b_c
analyze_ja4t(fingerprint: string, include_database_lookup?: boolean = true)
- Analyze a JA4T (TCP Client) fingerprint with format windowsize_tcpoptions_mss_windowscale
analyze_ja4x(fingerprint: string, include_database_lookup?: boolean = true)
- Analyze a JA4X (X.509 Certificate) fingerprint with format a_b_c
compare_fingerprints(fingerprint1: string, fingerprint2: string)
- Compare two fingerprints and identify similarities and differences
database_stats()
- Get statistics about the JA4 database including record counts and last update time
detect_patterns(fingerprints: array)
- Analyze multiple fingerprints (minimum 2) to detect common patterns, outliers, and groupings
get_investigation_tips(fingerprint: string, fingerprint_type: "ja4" | "ja4s" | "ja4h" | "ja4x" | "ja4t")
- Generate investigation recommendations and threat indicators for a fingerprint
refresh_database()
- Force refresh of the JA4 database from ja4db.com
search_database(query: string, search_type: "app" | "os", limit?: number = 50)
- Search the JA4 database for fingerprints by application name or operating system

Now, I can analyze various hashes right in DuckDB:

WITH raw AS (
SELECT mcp_call_tool('ja4', 'analyze_ja4t', '{"fingerprint": "64240_2-1-3-1-1-4_1460_8"}') AS result
),
parsed AS (
SELECT json(result->'content'->0->>'text') AS analysis
FROM raw
)
SELECT
analysis->'analysis'->'breakdown'->>'window_size' AS window_size,
--analysis->'analysis'->'breakdown'->>'tcp_options' AS tcp_options,
analysis->'analysis'->'breakdown'->>'mss' AS mss,
analysis->'analysis'->'breakdown'->>'window_scale' AS window_scale,
analysis->'analysis'->>'fingerprint' AS fingerprint,
analysis->'analysis'->>'format' AS format
FROM parsed;
┌─────────────┬─────────┬──────────────┬──────────────────────────┬───────────────────────────────┐
│ window_size │ mss │ window_scale │ fingerprint │ format │
│ varchar │ varchar │ varchar │ varchar │ varchar │
├─────────────┼─────────┼──────────────┼──────────────────────────┼───────────────────────────────┤
│ 64240 │ 1460 │ 8 │ 64240_2-1-3-1-1-4_1460_8 │ JA4T (TCP Client Fingerprint) │
└─────────────┴─────────┴──────────────┴──────────────────────────┴───────────────────────────────┘

As I hack this into some workflows, I’ll report back on how it has changed up anything (if it does end up doing that).


(It’s always) dns

Photo by SevenStorm JUHASZIMRUS on Pexels.com

DNS is one of the most udertapped resources in an organization when it comes to cybersecurity. Even outside that context, most folks aren’t aware of all the fun things one can store in various DNS record types.

Tobias Müller groks that DNS can and should be used as a data source, and build the dns (GH) DuckDB extension.

The extension page has lots of examples as does the README, but the TL;DR is that you can make any kind of DNS record lookup, reverse DNS lookups and use the special corey (which is a soundalike for “query”) function to slurp up all the TXT records for a domain into a table.

INSTALL dns FROM community;
LOAD dns;
FROM corey("apple.com");
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ txt_record │
│ varchar │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ yahoo-verification-key=Ay+djyw0qWQgXKWGA/jstjYryTMrKb+PBXI5l8u5/jw= │
│ cisco-ci-domain-verification=6f3bfb849796a518061f8e8c4356f687a138502d86db742791685059176547dd │
│ cerner-client-id=22dd1d8a-5e8b-4e1e-80ef-39bcdfd42798 │
│ v=spf1 include:_spf.apple.com include:_spf-txn.apple.com ~all │
│ google-site-verification=L5kkMdiFI8npvb6KlHui84fJaCw5G64DWhaDRIAT4_c │
│ json:eyJ3aHkiOiJUaGlzIGlzIHRvIHRydW5jYXRlIFVEUCByZXNwb25zZXMgZm9yIFRYVCBxdWVyaWVzIHRvIGFwcGxlLmNvbSIsInBhZGRpbmciOiJxdWFoMGVpamFhNGVlajh0aWVk… │
│ cerner-client-id=ce3abf18-ee87-43b9-9927-9eb24b4bac4a │
│ google-site-verification=8M6XjQCzydT62jk8HY3VXPAG-nKDllTRV-JpA3-Ktyw │
│ atlassian-domain-verification=mLabq99iaT8kquJechF6l31FAYoNUe3WB7tLpLFUiUYVJCse9SKq83hOJzFkwqrh │
│ webexdomainverification.8C462=b728ec3f-dfc9-42f9-92cb-9ba8853cbee8 │
│ ValidationTokenValue=77a4a6de-da14-449c-83c4-85366e0f55f9 │
│ apple-domain-verification=X5Jt76bn3Dnmgzjj │
│ json:eyJ3aHkiOiJUaGlzIGlzIHRvIHRydW5jYXRlIFVEUCByZXNwb25zZXMgZm9yIFRYVCBxdWVyaWVzIHRvIGFwcGxlLmNvbSIsInBhZGRpbmciOiJpZW4wYWVHaGF0aG9oNmhhaHZp… │
│ facebook-domain-verification=n6cqjfucq6plswmtfbwnbbeu1qiq3v │
│ google-site-verification=zBSq1mG5ssu2If-C17UAz_MzSZDcx03MVxmeDwMNc5w │
│ miro-verification=2494d255c4c50b1e521650a0659cbf3fa08b0072 │
│ 77a4a6de-da14-449c-83c4-85366e0f55f9 │
│ Dynatrace-site-verification=7d881a7c-c13f-4146-9d27-2731459e2509__iqls0105tagglcsaul0m16ibrf │
│ adobe-idp-site-verification=6bd5e74c-a3a0-4781-b2e1-e95399b5e11c │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ 19 rows │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

You can also just use the regular functions in queries:

FROM read_csv('https://gist.githubusercontent.com/jgamblin/62fadd8aa321f7f6a482912a6a317ea3/raw/33c6752125188cfdacdeee3f4fd6e01909e50eef/urls.txt', header:=false, columns := { 'domain': 'varchar'})
SELECT
trim(domain) AS domain,
dns_lookup(trim(domain)) AS A
LIMIT 10;
┌───────────────┬────────────────┐
│ domain │ A │
│ varchar │ varchar │
├───────────────┼────────────────┤
│ google.com │ 142.251.41.174 │
│ youtube.com │ 142.250.64.78 │
│ facebook.com │ 157.240.245.35 │
│ baidu.com │ 111.63.65.103 │
│ yahoo.com │ 98.137.11.163 │
│ amazon.com │ 98.87.170.74 │
│ wikipedia.org │ 208.80.154.224 │
│ google.co.in │ 142.251.40.131 │
│ twitter.com │ 151.101.66.146 │
│ qq.com │ 123.150.76.218 │
├───────────────┴────────────────┤
│ 10 rows 2 columns │
└────────────────────────────────┘

If you decide to try that query at home and remove the limit (it’s a list of 1,000 domains made by Jerry Gamblin, a salt-of-the-earth security data scientist), you may want to bump up how many queries it’ll make at one time via set_dns_concurrency_limit.


🪵

Photo by Barbara Barbosa on Pexels.com

DuckDB supports logging, just like a propery query engine should.

That page has tons of examples, so we’ll focus on the “what”, here, with some light “how”s tossed in for good measure.

To enable logging, you do CALL enable_logging(); and query the results from duckdb_logs. You can turn that off with CALL disable_logging(); and clear the buffer with CALL truncate_duckdb_logs();.

Five levels are available: ERRORWARNINFO (default), DEBUG, and TRACE, and you can focus the logging on a specific level with CALL enable_logging(level = 'debug');.

The real utility here is filtering by log type. Three types ship with core DuckDB: QueryLog (tracks executed queries), FileSystem (filesystem interactions), and HTTP (all HTTP traffic from DuckDB’s internal client). Just enable specific individual types by doing CALL enable_logging('HTTP'); or multiple ones with CALL enable_logging(['HTTP', 'QueryLog']);. The HTTP and FileSystem types support structured logging, meaning you can parse fields directly: SELECT request.headers FROM duckdb_logs_parsed('HTTP');.

Logs go to an in-memory buffer by default, and you can redirect to stdout (CSV format) with CALL enable_logging(storage = 'stdout'); or to files with CALL enable_logging(storage_path = '/path/to/logs');. File storage supports normalized (separate context and entry files) or denormalized (single CSV) output. A path ending in .csv gets you denormalized; a directory path gets normalized output.

The default buffer is 2,048 entries for memory and file storage (it’s disabled for stdout). If you’re debugging crashes and need immediate writes, set storage_buffer_size = 0. If stdout logging is slow, bump the buffer up.

This ability is super useful for debugging HTTP issues with remote data sources, tracking query patterns, or understanding filesystem access during development.


FIN

Remember, you can follow and interact with the full text of The Daily Drop’s free posts on:

  • Mastodon via @dailydrop.hrbrmstr.dev@dailydrop.hrbrmstr.dev
  • Bluesky via <https://bsky.app/profile/dailydrop.hrbrmstr.dev.web.brid.gy>

☮️

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.