Checking CSVs; Lightweight CSV SQL Ops
The fairly dreadful symptoms of my thumb paronychia are subsiding (🙏🏽), but typing is still just enough of a chore that y’all are getting two resources this time ’round. However, one also introduces a new companion project to the Daily Drop!
Resources covered:
RBQL
Love it or hate it, SQL is the lingua franca of data ops. We may have niceties such as the bonkers useful {tidyverse} in R and Substrait has the potential to help make friendlier data ops grammars in all environments, nothing is going to replace SQL in my lifetime.
Having lightweight tools that let us use SQL ops on data in as many contexts as possible can be helpful for data scientists, data analysts, visualization crafters, and many other “user personas” (that phrase is a bit ‘tell’ that I work at a startup).
RBQL (GH), or “Rainbow Query Language”, is a CLI and multi-language library that provides a SQL-like language for data-transformation and data-analysis queries for structured data, such as CSV files, log files, Python lists, and JavaScript arrays. It supports all main SQL keywords, including SELECT, UPDATE, WHERE, ORDER BY, JOIN, DISTINCT, GROUP BY, TOP, and LIMIT. And, even cooler, it also allows the use of non-SQL expressions inside SQL statements (“User-Defined Functions”/UDFs in tech-speak), making it even handier in language-specific contexts.
The Python version/docs are pretty solid, and the JS counterparts are almost as good, so I’ll be filling in a usage gap for them in just a mo’.
There are editor plugins (like this VS Code one) for it, a handy cheat sheet, and it can work as a CLI tool. But, the context I find it most helpful in is in JavaScript when I don’t need DuckDB, SQLite, or even frameworks like arquero. But, also where it would be spiffy if I could machinate data with SQL vs. vanilla JS array ops. That’s where RBQL comes in.
I’ve put together a fully documented/working post on the Drop’s new proper-companion site that will help you incorporate and use RBQL in JS projects where you don’t need the power of the previously mentioned frameworks, but would rather avoid dealing with arcane JS array ops.
Give it a go and let me know if you end up using it in any projects.
csvlint

This is an unusually short section, but csvlint does one thing and does it pretty well: tells you if your CSV files are borked before you jump on a data analysis bender.
It’s old enough that I need to provide alternate installation instructions:
$ go install github.com/Clever/csvlint/cmd/csvlint@latestonce you do that, you just csvlint FILE.
I broke a sample mtcars CSV file for you and ran it:
$ csvlint mtcars-bad.csv
Record #10 has error: wrong number of fields
Record #17 has error: wrong number of fieldsthe exit code is non-0, so you can perform scripted tests as well.
There are other CSV linters we’ll cover in future Drops, but this one is helpful to at least ensure your file is at least rectangular.
FIN
So much for the dreaded Lee 🌀! It pretty much missed Southern Maine. I hope the Nova Scotians also come away without much damage. ☮
Leave a comment