Adding CIDR Handling To DuckDB’s INET Extension
I’m getting “OK” with building personal-use extensions for DuckDB, but today is my first time trying to PR some new functionality into an existing extension.

The inet extension defines the INET data type for storing IPv4 and IPv6 Internet addresses. It supports the CIDR (Classless Inter-Domain Routing) notation for subnet masks (e.g., 198.51.100.0/22, 2001:db8:3c4d::/48).
CIDR notation comes up quite a bit in my $DAYJOB, and I had need to add this functionality to the DuckDB INET extension (well, “desire” more than “need”), so let’s walk through how I made it, since DuckDB extensions can — at first glance — be pretty daunting. While most of the extensions are in C++ (like this one), there are numerous efforts to bring extension building to Rust, Zig, and other languages.
When working with existing or a new template extension, it is quite important to ensure you’re collecting the source from the submodules used in the extension:
$ git clone --recurse-submodules git@github.com:hrbrmstr/duckdb_inet.git
We care mostly about the src directory in there. This is the structure with the files we’re modifying highlighted:
├── include
│ ├── html_charref.hpp
│ ├── inet_extension.hpp
│ ├── inet_functions.hpp 👈🏼
│ └── ipaddress.hpp 👈🏼
├── inet_escape_functions.cpp
├── inet_extension.cpp 👈🏼
├── inet_functions.cpp 👈🏼
└── ipaddress.cpp 👈🏼
First up: CIDR notation combines an IP address with a network prefix length (mask) to specify a range of IP addresses. For example:
192.168.1.0/24represents all IP addresses from192.168.1.0to192.168.1.2552001:db8::/32represents a range of IPv6 addresses
We’ll first need a way (in ipaddress.[ch]pp) to test if an IP address value includes a CIDR (i.e., is a “network”):
bool IPAddress::IsCIDR() {
return (type == IPAddressType::IP_ADDRESS_V4 && mask < IPV4_DEFAULT_MASK) ||
(type == IPAddressType::IP_ADDRESS_V6 && mask < IPV6_DEFAULT_MASK);
}
This method determines if an IP address represents a CIDR range by checking:
- For IPv4: if the mask is less than 32 (
IPV4_DEFAULT_MASK) - For IPv6: if the mask is less than 128 (
IPV6_DEFAULT_MASK)
If true, the address represents a network range rather than a single host.
We’re now going to want a function to take in an IP address and expand it into a DuckDB list/array (which may be one element if we have a bare IPv4 or IPv6). Let’s register that function (in inet_extension.cpp):
ExtensionUtil::RegisterFunction(
*db.instance,
ScalarFunction("expand_cidr", {inet_type}, LogicalType::LIST(inet_type),
INetFunctions::ExpandCIDR));
This registers a scalar function that:
- Takes an
INETtype as input - Returns a
LISTofINETvalues - Uses
INetFunctions::ExpandCIDRas the implementation
The main implementation in INetFunctions::ExpandCIDR (in inet_functions.[ch]pp) does the heavy lifting.
This part.
auto & ipaddress_vector = args.data[0];
UnifiedVectorFormat ipaddress_data;
ipaddress_vector.ToUnifiedFormat(args.size(), ipaddress_data);
gets the input vector and converts it to a unified format for processing.
We then need to extract the components of each INET value (type, address, mask).
auto & entries = StructVector::GetEntries(ipaddress_vector);
auto ip_type_data = FlatVector::GetData<uint8_t>(*entries[0]);
auto address_data = FlatVector::GetData<hugeint_t>(*entries[1]);
auto mask_data = FlatVector::GetData<uint16_t>(*entries[2]);
DuckDB gives us batches of rows to process (hence the for (idx_t i = 0; i < args.size(); i++) { … } loop).
Next up, we have to figure out the number of hosts based on the address type and mask (2^(bits-mask)):
if (inet.IsCIDR()) {
IPAddress network = inet.Network();
IPAddress broadcast = inet.Broadcast();
hugeint_t hosts;
if (addr_type == IPAddressType::IP_ADDRESS_V4) {
hosts = hugeint_t(1) << (32 - mask);
} else {
hosts = hugeint_t(1) << (128 - mask);
}
Then, it’s a boring for loop to do the expansion:
for (hugeint_t j = 0; j < hosts && current.address <= broadcast.address; j = j + one) {
ip_list.emplace_back(Value::STRUCT(inet_type, {
Value::UTINYINT(uint8_t(current.type)),
Value::HUGEINT(ToCompatAddr(current.address, current.type)),
Value::USMALLINT(current.type == IPAddressType::IP_ADDRESS_V4 ? 32 : 128)
}));
current.address = current.address + one;
}
NOTE: We will have to work out what to do when a crazy request is made for expanding a gazillion IPv6 addresses.
If it wasn’t a CIDR we just emplace_back the original value.
The resultant LIST is added to return response:
result.SetValue(i, Value::LIST(inet_type, std::move(ip_list)));
With the above context handy, the full source for the function should be a bit easier to grok.
See It In Action
Here’s a sample of how it works:
🦆>SELECT UNNEST(expand_cidr('192.168.1.0/32'::INET)) ip;
┌─────────────┐
│ ip │
│ inet │
├─────────────┤
│ 192.168.1.0 │
└─────────────┘
🦆>SELECT UNNEST(expand_cidr('192.168.1.0/30'::INET)) ip;
┌─────────────┐
│ ip │
│ inet │
├─────────────┤
│ 192.168.1.0 │
│ 192.168.1.1 │
│ 192.168.1.2 │
│ 192.168.1.3 │
└─────────────┘
🦆>FROM (SELECT UNNEST(expand_cidr('192.168.1.0/16'::INET)) ip) SELECT COUNT(ip);
┌───────────┐
│ count(ip) │
│ int64 │
├───────────┤
│ 65536 │
└───────────┘
🦆>SELECT expand_cidr('2001:db8::/126'::INET);
┌─────────────────────────────────────────────────────┐
│ expand_cidr(CAST('2001:db8::/126' AS INET)) │
│ inet[] │
├─────────────────────────────────────────────────────┤
│ [2001:db8::, 2001:db8::1, 2001:db8::2, 2001:db8::3] │
└─────────────────────────────────────────────────────┘
🦆>SELECT expand_cidr('2603:3005:1507:5900:9c2b:2430:c08e:addf'::INET);
┌──────────────────────────────────────────────────────────────────────┐
│ expand_cidr(CAST('2603:3005:1507:5900:9c2b:2430:c08e:addf' AS INET)) │
│ inet[] │
├──────────────────────────────────────────────────────────────────────┤
│ [2603:3005:1507:5900:9c2b:2430:c08e:addf] │
└──────────────────────────────────────────────────────────────────────┘
I’ve got a PR up for this, so it will hopefully make it into the official extension soon-ish.
FIN
We all will need to get much, much better at sensitive comms, and Signal is one of the only ways to do that in modern times. You should absolutely use that if you are doing any kind of community organizing (etc.). Ping me on Mastodon or Bluesky with a “🦇?” request (public or faux-private) and I’ll provide a one-time use link to connect us on Signal.
Remember, you can follow and interact with the full text of The Daily Drop’s free posts on:
- 🐘 Mastodon via
@dailydrop.hrbrmstr.dev@dailydrop.hrbrmstr.dev - 🦋 Bluesky via
https://bsky.app/profile/dailydrop.hrbrmstr.dev.web.brid.gy
Also, refer to:
to see how to access a regularly updated database of all the Drops with extracted links, and full-text search capability. ☮️
Leave a comment