Bonus Drop #86 (2025-06-15): I Think You May Be Projecting

Rocky Weather; Webbed Fit

Tech headlines would have you believe the only things worth investing your time in are “AI” and “big” projects that meet some universal “need”.

To counter that, we’re featuring, today, two recent projects caught in-the-wild, both of which involve using DuckDB in a very accessible way to do something focused and pretty spiffy. And, since it’s Father’s Day, I’ll lay off the Python bashing this one time!

Fear not: I’m not bringing back “Weekend Project Editions”, but I can attest to the fact that having something else to focus on can be pretty helpful during these times of ours.

Also, we had #NoKings work to do on Friday, hence no Friday Drop.


Rocky Weather

Photo by RDNE Stock project on Pexels.com

If you’re a rock-climber or even “just” an outdoor enthusiast, you must know the frustration of planning the perfect weekend trip, driven hours to your favorite spot, only to find it rain-soaked, windy, and generally unpleasant. What if you could match your passion — in this case climbing routes — with precise weather forecasts before leaving home?

That’s exactly what this data engineering project (GH) tackles: a pipeline that ingests 127,000+ UK climbing routes and pairs them with 7-day hourly weather forecasts. Built with practical tools like Pandas, DuckDB, and Airflow, it’s a practical example of how thoughtful data infrastructure can solve a real problem.

The architecture follows classic ETL principles: extract climbing data from nested JSON files scraped from UKC (United Kingdom Climbing), transform and clean the route information, then enrich it with location-based weather data from OpenMeteo’s API. The result gets validated through Great Expectations and stored in a DuckDB schema, all of which is orchestrated by a daily Airflow DAG running in Docker containers.

What made this project particularly interesting to me isn’t just the technical implementation, but the honest documentation of its challenges (something I’m trying to get better at in the Drops, though I usually tend to just not cover resources that ended up bein gnarly). The developer openly discusses Docker configuration headaches, Great Expectations validation bottlenecks, and the scalability limits of the current single-machine architecture. It’s pretty refreshing to see a project that acknowledges its rough edges while also demonstrating that the pain is worth it for the end result.

The choice of DuckDB over PostgreSQL was another thoughtful tradeoff between analytical efficiency and transactional needs, and ended up being perfect for this read-heavy, analytics-focused use case. Though as the creator notes, scaling beyond the UK’s 120,000 routes may require rethinking the architecture entirely.

The whole endeavor a practical demonstration of how engineering decisions get made in the real world, where perfect solutions take a backseat to working systems that solve actual problems. It’s also a fine example where the best projects are the ones that scratch your own itch. In this case, said “itch” is literally helping you avoid getting rained on while hanging off a cliff.


Webbed Fit

Photo by NaturEye Conservation on Pexels.com

The post I’m referring to in this section is super short, so I don’t want to steal any thunder from it with a detailed breakdown

This second post chronicles a personal project to wrangle years of Garmin activity data into a clean, query-able database. I fondly remember doing something similar with R ages ago (before there were a billion .fit converters out there). The author started by requesting a full export from Garmin Connect, then wrote a Python script to standardize and rename .fit files using their timestamps. This process revealed that many files in the export were duplicates or lacked valid activity data, reducing the dataset to a more realistic size. (“Go to the cloud!”, they said…; also, FWIW, I also have had the same experience with Connect and .fit files).

With the files cleaned, the author loaded them into a DuckDB database using custom scripts. For analysis and exploration, they used the Harlequin SQL IDE, finding it approachable even with limited SQL experience. The project is ongoing, with plans to automate future data imports, generate GPX files for spatial analysis, and eventually sync the enriched dataset to a website. The post emphasizes hands-on learning, tool experimentation, and the potential for future GIS integration.

It’s a stellar example of learning what you need to get a focused project out the door.


FIN

Remember, you can follow and interact with the full text of The Daily Drop’s free posts on:

  • 🐘 Mastodon via @dailydrop.hrbrmstr.dev@dailydrop.hrbrmstr.dev
  • 🦋 Bluesky via https://bsky.app/profile/dailydrop.hrbrmstr.dev.web.brid.gy

☮️