Bonus Drop #44 (2024-04-07): Combined Make-Up Bonus + WPE Drop

Universal ctags; Geany; Quickwit ctags

Rather than separate out the content into a make-up Friday Drop and the weekend Bonus Drop, we’ll just make this one a bit longer to make up for the snowmageddon-caused publication interruptions this past week.

Speaking of snowmaggedon, we’re not slated to get mains power back until Tuesday, but the generator is holding out really well. This is the thing causing our power woes right now:

TL;DR

(This is an AI-generated summary of today’s Drop.)

Universal ctags

Photo by Miguel u00c1. Padriu00f1u00e1n on Pexels.com

Universal Ctags (GH) is a tool that generates an index (or “tag”) file of language objects found in source files for various programming languages. This index facilitates quick and easy navigation within text editors and other tools, making it a handy resource for anyone who hacks on code. Think of it as aroadmap, enabling text editors and other tools to quickly locate the indexed items.

This “tagging” usually happens behind the scenes when you use IDEs or clever, code-centric text editors. But, there’s nothing stopping us from generating and using these tags ourselves. An example help show the utility, so here’s a hacky Golang short bit of code we’ll run ctags on:

package main

import (
	"fmt"
	"math"
)

type Circle struct {
	radius float64
}

func (c Circle) Area() float64 {
	return math.Pi * c.radius * c.radius
}

func add(a, b int) int {
	return a + b
}

var global = 42

func main() {
	c := Circle{radius: 5.0}
	area := c.Area()
	fmt.Println("Area of the circle:", area)
	sum := add(3, 4)
	fmt.Println("Sum:", sum)
	fmt.Println("Global variable:", global)
}
$ ctags --output-format=json example.go | jq
{
  "_type": "tag",
  "name": "Area",
  "path": "example.go",
  "pattern": "/^func (c Circle) Area() float64 {$/",
  "typeref": "typename:float64",
  "kind": "func",
  "scope": "main.Circle",
  "scopeKind": "struct"
}
{
  "_type": "tag",
  "name": "Circle",
  "path": "example.go",
  "pattern": "/^type Circle struct {$/",
  "kind": "struct",
  "scope": "main",
  "scopeKind": "package"
}
{
  "_type": "tag",
  "name": "add",
  "path": "example.go",
  "pattern": "/^func add(a, b int) int {$/",
  "typeref": "typename:int",
  "kind": "func",
  "scope": "main",
  "scopeKind": "package"
}
{
  "_type": "tag",
  "name": "global",
  "path": "example.go",
  "pattern": "/^var global = 42$/",
  "kind": "var",
  "scope": "main",
  "scopeKind": "package"
}
{
  "_type": "tag",
  "name": "main",
  "path": "example.go",
  "pattern": "/^func main() {$/",
  "kind": "func",
  "scope": "main",
  "scopeKind": "package"
}
{
  "_type": "tag",
  "name": "main",
  "path": "example.go",
  "pattern": "/^package main$/",
  "kind": "package"
}
{
  "_type": "tag",
  "name": "radius",
  "path": "example.go",
  "pattern": "/^\tradius float64$/",
  "typeref": "typename:float64",
  "kind": "member",
  "scope": "main.Circle",
  "scopeKind": "struct"
}

That’s great information to ask questions of. In a slightly larger codebase that provides some CLI utilities over VulnCheck CVE data, I was able to ask it what and where all my Golang structs were defined:

$ ctags \
  --output-format=json \
  --languages=go -R * | \
  jq -c 'select(.kind == "struct") | { name, path }'
{"name":"Configuration","path":"pkg/types.go"}
{"name":"CpeMatch","path":"pkg/types.go"}
{"name":"CvssData","path":"pkg/types.go"}
{"name":"CvssMetricV31","path":"pkg/types.go"}
{"name":"Description","path":"pkg/types.go"}
{"name":"Meta","path":"pkg/types.go"}
{"name":"Metrics","path":"pkg/types.go"}
{"name":"Node","path":"pkg/types.go"}
{"name":"Nvdv2","path":"pkg/types.go"}
{"name":"Nvdv2Datum","path":"pkg/types.go"}
{"name":"Parameter","path":"pkg/types.go"}
{"name":"Reference","path":"pkg/types.go"}
{"name":"Vccve","path":"pkg/types.go"}
{"name":"Vckev","path":"pkg/types.go"}
{"name":"VckevDatum","path":"pkg/types.go"}
{"name":"VulnData","path":"cmd/vccve/main.go"}
{"name":"VulncheckReportedExploitation","path":"pkg/types.go"}
{"name":"VulncheckXdb","path":"pkg/types.go"}
{"name":"Weakness","path":"pkg/types.go"}

For an even larger example, check out the tags for my {hrbrthemes} R package in this “gist”.

A great feature of ctags is optlib, which lets us define new language parsers from the command line (perfect if you have your own DSL or mini-language associated with a given project).

Now that you know how some of the sausage is made behind many of the code-tools you may use, let’s look at an somewhat nascent IDE that is very ctags-centric.

Geany

I came across Geany (GH) in my quest to abandon Visual Studio Code (I run VS Codium, now). It has a decent fradction of all the crunchy goodness of many other modern IDEs, including:

  • syntax highlighting
  • code completion
  • auto completion of often used constructs like if, for and while
  • auto completion of XML and HTML tags
  • call tips
  • folding
  • many supported filetypes like C, Java, PHP, HTML, Python, Perl, Pascal
  • symbol lists
  • embedded terminal emulation
  • extensibility through plugins, including
    • a command palette
    • filesystem browser
    • extended auto-completion
    • and, more!

It’s not something I’d likely run on any regular basis, but it is a showcase of what you can do with the aforementioned ctags, and also that an IDE with some bonkers decent functionality does not have to be bloated.

The section header image shows the project I used as an example in the ctags section, and you can pretty easily see how tags-centric it is. Rather than a filesystem view (which you can add via a plugin), you get a symbols view by default, which does indeed help with finding struct needles in a Go project haystack.

If you’re looking to help make this IDE more functional, or took a look at it and are playing with it in a more extended fashion than I am, give the HACKING file a go for how to get started making plugins or contributing directly to the editor’s development.

Quickwit ctags

Photo by Ruiyang Zhang on Pexels.com

Ripgrep (rg) and/or ctags+jq are great tools for poking for things in your codebase, but if you have many projects, or work with a larger team, they may be inadequete. While there are tools such as OpenGrok and Sourcegraph to help you retrieve code-level information from your projects, you can also roll your own without too much effort by using a fav of the Drop: Quickwit. Let’s get Quickwit set up.

First, we’ll pull the image just to have it handy:

$ docker pull quickwit/quickwit 

I’ll make ctags for all my Go “projects” (this Bash line assumes I’m in the projects directory):

$ ctags --output-format=json -R --languages=go * | jq -c > ~/Data/tags.json

Now we’ll need an index schema for our extracted tags records (I’m calling it tags.yaml and sticking it in ~/Data as well, but you do you):

version: 0.7
index_id: tags
doc_mapping:
  field_mappings:
    - name: _type
      type: text
      tokenizer: raw
    - name: name
      type: text
      tokenizer: default
    - name: path
      type: text
      tokenizer: default
    - name: pattern
      type: text
      tokenizer: default
    - name: typeref
      type: text
      tokenizer: raw
    - name: kind
      type: text
      tokenizer: raw
    - name: scope
      type: text
      tokenizer: raw
    - name: scopeKind
      type: text
      tokenizer: raw
  mode: dynamic
  dynamic_mapping:
    indexed: true
    stored: true
  store_source: true
search_settings:
  default_search_fields: [name, path]

Add more default search fields if you feel so moved.

My tags.json is close to 100MB and Quickwit can only do record batches of 10M or less so we need to split out the data before loading it up:

$ mkdir load-tags
$ split -b 9M tags.json split -b 9M tags.json load-tags/chunk-

Now, we start up Quickwit (pick your own directories/ports):

$ docker run \
  --detach \
  --volume "$(pwd)/tags-qwdata:/quickwit/qwdata" \
  --publish 127.0.0.1:7281:7280 \
  --name quickwit-tags \
  quickwit/quickwit run

Then, load up the tags index schema (this assumes were in the ~/Data directory):

$ curl --silent \
  --request POST \
  --url "http://127.0.0.1:7281/api/v1/indexes/" \
  --header "Content-Type: application/yaml" \
  --data-binary @"./tags.yaml" | jq -c
{"version":"0.8","index_uid":"tags:01HTWN5H2WQEGDYDKDSKNCWDYT","index_config":{"version":"0.8","index_id":"tags","index_uri":"file:///quickwit/qwdata/indexes/tags","doc_mapping":{…

And, finally, load up the tags records:

$ fd . load-tags/ | while read -r chunk; do 
	curl --silent \
	  --request POST \
	  --url "http://127.0.0.1:7281/api/v1/tags/ingest?commit=force" \
	  --header "Content-Type: application/json" \
	  --data-binary @"./${chunk}" | jq -c
done
{"num_docs_for_processing":40375}
{"num_docs_for_processing":38062}
{"num_docs_for_processing":39936}
{"num_docs_for_processing":39965}
{"num_docs_for_processing":38253}
{"num_docs_for_processing":38499}
{"num_docs_for_processing":40951}
{"num_docs_for_processing":41495}
{"num_docs_for_processing":39257}
{"num_docs_for_processing":10302}

From here, we can use the search UI:

or API:

$ curl \
  --silent \
  --url 'http://localhost:7281/api/v1/tags/search?query=name:Cve' | jq
{
  "num_hits": 1,
  "hits": [
    {
      "_source": {
        "_type": "tag",
        "kind": "member",
        "name": "Cve",
        "path": "vccve/pkg/types.go",
        "pattern": "/^\tCve                           []string                        `json:\"cve,omitempty\"`$/",
        "scope": "pkg.VckevDatum",
        "scopeKind": "struct",
        "typeref": "typename:[]string"
      },
      "_type": "tag",
      "kind": "member",
      "name": "Cve",
      "path": "vccve/pkg/types.go",
      "pattern": "/^\tCve                           []string                        `json:\"cve,omitempty\"`$/",
      "scope": "pkg.VckevDatum",
      "scopeKind": "struct",
      "typeref": "typename:[]string"
    }
  ],
  "elapsed_time_micros": 12567,
  "errors": []
}

to find Go needles in the haystack of projects.

You can dig into the universal ctags manual to add or remove fields from the records if you need to search on more of them or desire to have additional context in the search results.

Periodic re-indexing of your projects is also a good idea, and the easiest method is to drop the index and re-load the updated JSON.

FIN

Remember, you can follow and interact with the full text of The Daily Drop’s free posts on Mastodon via @dailydrop.hrbrmstr.dev@dailydrop.hrbrmstr.dev ☮️