Optimized DuckDB Over Raw Tidyverse Ops; Safe(r) GitHub Actions; Surveillance Self-Defense
Today’s Drop explores three distinct topics centered around safety. We take a deep look into DuckDB’s sophisticated query optimization capabilities, examine a new security tool for GitHub Actions workflows, and review the Electronic Frontier Foundation’s comprehensive guide to surveillance self-defense. Each section provides practical insights and actionable guidance for their respective domains.
TL;DR
(This is an AI-generated summary of today’s Drop using Ollama + llama 3.2 and a custom prompt.)
- DuckDB’s query optimizer provides substantial performance improvements over hand-optimized queries through multiple optimization phases, including Filter Pushdown, Join Order Optimizer, TopN Optimizer, Expression Rewriter, Statistics Propagation, and Join Filter Pushdown optimizations (https://duckdb.org/2024/11/14/optimizers.html)
- Zizmor is a specialized security analysis tool designed to identify potential security vulnerabilities in GitHub Actions CI/CD configurations, operating as a security scanner targeting GitHub Actions infrastructure (https://woodruffw.github.io/zizmor/)
- The Electronic Frontier Foundation’s Surveillance Self-Defense guide provides comprehensive resources for protecting online privacy, including device protection, secure messaging, network privacy, and physical security guides (https://ssd.eff.org/)
In Defense Of Optimized DuckDB Over Raw Tidyverse Ops

It’s more than fair to say R’s {tidyverse} has revolutionized the art and science of data processing and statistical analyses. {dplyr} (et al.) chains make code more readable, easier to visually debug, and help our little grey cells design more efficient and sensible workflows. Yet, when used as “just” a native R tool, it just does what we tell it to do. That can mean slower operations as our data scales. Thankfully, we can pair it seamlessly with DuckDB to take advantage of what the equally brilliant minds behind it have baked into it when it comes to optimizing data ops.
DuckDB’s query optimizer is a sophisticated component that provides substantial performance improvements over hand-optimized queries. The optimizer transforms naïve query plans into efficient execution strategies through multiple optimization phases.
The Filter Pushdown optimizer reduces intermediate data processing by pushing filters like Borough = 'Manhattan' (ref. first link in this section) directly to table scans. It can also duplicate filters across equality conditions, significantly reducing data volume early in query execution.
The Join Order Optimizer handles complex transformations by recognizing filter conditions that can become join conditions, eliminating expensive cross products that could otherwise generate trillions of intermediate rows.
The TopN Optimizer specifically handles ORDER BY + LIMIT combinations by maintaining only the required top N values in memory, transforming O(M * log M) operations into O(M + N * log N) operations.
Using DuckDB through R packages like {duckdb} or {duckplyr} along with Parquet files and the {tidyverse} creates an optimal analytical environment. The tidyverse provides an intuitive grammar for data manipulation, while DuckDB’s optimizer handles the heavy computational lifting. Since DuckDB can read Parquet files directly and the optimizer can push predicates into Parquet reads, you get automatic partition elimination and column pruning without writing any special code.
The Expression Rewriter performs sophisticated transformations like constant folding, arithmetic simplification, and conjunction optimization. For example, it can transform x + 1 = 6 into x = 5 and (x AND b) OR (x AND c) OR (x AND d) into x AND (b OR c OR d).
The Statistics Propagation capability is particularly powerful as it creates new filters by analyzing column statistics across join conditions. When joining tables on column a, if one table’s values range from 25 to 50, the optimizer automatically applies these bounds as filters on the other table.
The Join Filter Pushdown optimization tracks minimum and maximum join key values during hash table building, creating implicit range filters that can eliminate up to 40% of probe-side table scans before they occur.
These optimizations make DuckDB particularly well-suited for analytical workloads, especially when compared to traditional data frame libraries like pandas or data.table which require manual optimization. The first linked article goes into much more (accessible) detail, with plenty of examples. It’s well worth a read.
In Defense Of Safe(r) GitHub Actions

Zizmor (GH) is a specialized security analysis tool designed to identify potential security vulnerabilities in GitHub Actions CI/CD configurations. Currently in beta development, it focuses on examining workflow setups to uncover potential security issues that could impact your GitHub Actions pipeline.
The tool operates as a security scanner specifically targeting GitHub Actions infrastructure, helping teams identify weaknesses in their continuous integration and deployment processes. While the project is actively being developed and may contain bugs, it provides a structured approach to CI/CD security analysis.
On the main site, you can find comprehensive documentation that includes installation instructions and detailed usage patterns in the project’s documentation. The tool provides specific “recipes” — predefined analysis patterns — to help folks get started with security scanning.
The beta status of Zizmor means we should expect to encounter some issues during usage, and the developers actively encourage bug reporting to improve the tool’s functionality.
It’s never too late (well, that’s kind of not exactly true) to start taking security of your GHAs a bit more seriously, especially in these dangers, uncertain times.
In Defense Of Surveillance Self-Defense

I’ve promised continued coverage of personal safety, and today we have another installment.
The Electronic Frontier Foundation (EFF) maintains a comprehensive surveillance self-defense guide that has been helping protect online privacy for over thirty years.
The guide is structured into three primary sections that build upon each other. The Basics section establishes fundamental surveillance concepts and encryption principles. The Tool Guides provide practical, hands-on instructions for implementing security measures. The Further Learning section offers deep dives into advanced topics.
The Device Protection guide covers critical device security through encryption implementations across Windows, Mac, Linux, iPhone and Android platforms. This includes secure data deletion procedures and malware protection strategies.
Secure messaging receives significant attention with detailed guides for Signal and WhatsApp usage. The Communication Security guide emphasizes the importance of end-to-end encryption and metadata protection in communications.
In Network Privacy, there is comprehensive coverage of Tor implementation across all major operating systems helps users understand and circumvent network surveillance and censorship. VPN selection guidance helps users make informed choices about network privacy tools.
And, the Physical Security guide addresses real-world scenarios like protest attendance and border crossings, providing concrete steps for protecting digital assets during high-risk situations.
Advanced users can explore detailed technical content on public key encryption systems, key verification protocols, and device fingerprinting. The guide also covers specialized topics like Bluetooth tracker detection and secure password creation using physical dice.
The guide further emphasizes actionable security through guides on:
- two-factor authentication deployment
- password manager selection and usage
- phishing attack prevention
- social network privacy optimization
This resource should be shared far and wide.
FIN
As noted in the third section, we all will need to get much, much better at sensitive comms, and Signal is one of the only ways to do that in modern times. You should absolutely use that if you are doing any kind of community organizing (etc.). Ping me on Mastodon or Bluesky with a “🦇?” request (public or faux-private) and I’ll provide a one-time use link to connect us on Signal.
Remember, you can follow and interact with the full text of The Daily Drop’s free posts on Mastodon via @dailydrop.hrbrmstr.dev@dailydrop.hrbrmstr.dev.
Also, refer to this post to see how to access a database of all the Drops with extracted links, and full text search capability. ☮️
Leave a comment