Drop #514 (2024-08-09): DuckDB Vector Search

Kicking The VSS Tyres With CISA’s KEV

Since we showcased some fancy vector search ops with SQLite in Drop #510, it’s only fair to do the same thing with DuckDB’s nascent VSS superpowers, too.

The main difference is that we will need to lean a bit heavier on R (it should be straightforward to translate the ops to some lesser language) for some of the work.

You’re going to want to have Ollama around and, since I prefer using llamafiles for generating embeddings, grab a copy of mixedbread and keep it handy, too.

We’re going to create a vector search index for CISA KEV vulnerabilities, see how to use that to identify vulnerabilities in KEV that are similar to each other, then see if we can use the embeddings to create natural groupings, and let Ollama title them for us.

You can grab the KEV JSON from CISA’s site (that’s all you’re going to need third-party data-wise).

Since this is a code block-heavy post, we’ll have to shunt y’all over to the Daily Drop Companion site, where it’ll be a little easier on the eyes.

FIN

Remember, you can follow and interact with the full text of The Daily Drop’s free posts on Mastodon via @dailydrop.hrbrmstr.dev@dailydrop.hrbrmstr.dev ☮️