Drop #543 (2024-10-20): Dev-Mode Engage

JavaScript Import Maps ~~For Fun And Profit~~; A Super-Tiny PDF-to-Text Microservice; 🥞 SQL

No “Bonus” moniker despite this being a weekend as work plus continued resurgence of long-covid insomnia and, sadly, brain fog made trying to focus on work, home, and the Drop more difficult than expected.

Part of the work-work involved coding up a prototype app for some new capabilities we’ll be launching “soon”.

TL;DR

(This is an AI-generated summary of today’s Drop using Ollama + llama 3.2 and a custom prompt.)

JavaScript Import Maps ~~For Fun And Profit~~: A relatively recent addition to the EcmaScript specification, import maps allow for shorter, more meaningful aliases for module imports, making code cleaner and easier to read, while also providing a centralized location to manage all module dependencies and their versions. (https://developer.mozilla.org/en-US/docs/Web/HTML/Element/script/type/importmap)
Super-Tiny PDF-to-Text Microservice: A Deno-based microservice that converts PDFs to text using the Hono library for routing and a tiny JS module, allowing for improved caching dynamics and easier migration to Node.js or other environments. (https://deno.land/)
🥞 SQL: The author recommends following the “pancake SQL pattern” blog post on Quesma for more information on combining multiple SQL queries into a single, more efficient query using window functions. (https://quesma.com/blog-detail/pancake-sql-pattern)

JavaScript Import Maps For Fun And Profit

Photo by Andrew Neel on Pexels.com

A relatively recent (2023) addition to the EcmaScript specification is something called an import map. I’m a big fan of prototyping using CDN, or locally-cached JavaScript modules without a build system, and import maps have all sorts of benefits in that mode.

They’re just JSON objects that control how browsers resolve module specifiers in JavaScript module imports. They map the text used as module specifiers in import statements or import() calls to the actual values used for resolution.

The JSON must follow the Import map JSON representation format, and these import maps affect both static and dynamic imports, so they must be declared and processed before any <script> elements that use the mapped specifiers.

Import maps do not affect <script> src attributes or modules loaded into workers/worklets. Also, you cannot use src, async, nomodule, defer, crossorigin, integrity, or referrerpolicy attributes with import maps.

Given those somewhat draconian limitations, why bother with them?

First, import maps allow you to use shorter, more meaningful aliases for module imports instead of long URLs. This makes code cleaner and easier to read. For example, instead of:

import * as Plot from 'https://cdn.jsdelivr.net/npm/@observablehq/plot@0.6/+esm'
import { LitElement, html, css, unsafeCSS } from 'https://cdn.jsdelivr.net/gh/lit/dist@2/core/lit-core.min.js"';

you can use something like:

import * as Plot from 'plot';
import { LitElement, html, css, unsafeCSS } from 'lit';

To do so requires setting up this <script> tag

<script type="importmap">
{
  "imports": {
    "lit": "https://cdn.jsdelivr.net/gh/lit/dist@2/core/lit-core.min.js",
    "plot": "https://cdn.jsdelivr.net/npm/@observablehq/plot@0.6/+esm"
  }
}
</script>

in the context of the HTML source where you’re loading all the modules from.

That’s really not a good enough reason to use them, though, since you added more characters to type. But, using these specifiers means you can provide a centralized location to manage all your module dependencies and their versions. This is similar to how package.json works in Node.js environments. You can easily update or change versions of dependencies in one place without modifying import statements throughout your codebase.

And, by using import maps, you can take advantage of better caching dynamics. So, when you update a single module, only that module’s cache is invalidated, rather than invalidating the cache for an entire file. This can lead to improved performance over time, especially for larger applications.

While there are more benefits, one final one to note, here, is that using these import maps can make it easier to migrate to a full-on Node.js, Deno, Bun, etc. application, where you’re going to be using the shorter specifiers (in most cases).

Super-Tiny PDF-to-Text Microservice

One component of this new app needed to take a PDF and convert it to text (similar to what Jina AI does for HTML to Markdown). It’s built and served with Deno 2 and relies on the epic Hono library for routing and a tiny/handy JS module for PDF conversion.

It’s small enough to include the entire setup and code here.

First, create a new Hono app with Deno 2:

$ deno run -A npm:create-hono@latest

I use Zed as much as possible, and this is a handy Zed settings configuration for Deno projects:

.zed/settings.json

{
  "languages": {
    "TypeScript": {
      "language_servers": [
        "deno",
        "!typescript-language-server",
        "!vtsls",
        "!eslint"
      ],
      "formatter": "language_server"
    },
    "TSX": {
      "language_servers": [
        "deno",
        "!typescript-language-server",
        "!vtsls",
        "!eslint"
      ],
      "formatter": "language_server"
    },
    "JavaScript": {
      "language_servers": [
        "deno",
        "!typescript-language-server",
        "!vtsls",
        "!eslint"
      ],
      "formatter": "language_server"
    },
    "JSX": {
      "language_servers": [
        "deno",
        "!typescript-language-server",
        "!vtsls",
        "!eslint"
      ],
      "formatter": "language_server"
    }
  }
}

Now, we just need to add our PDF helper libary:

$ deno add jsr:@pdf/pdftext

and, your deno.json should look like this (you’ll need to add the --allow-env since I like to have flexibility in what port I start HTTP apps on):

deno.json:

{
  "imports": {
    "hono": "jsr:@hono/hono@^4.6.5",
    "pdftext": "jsr:@pdf/pdftext@^1.2.4"
  },
  "tasks": {
    "start": "deno run --allow-env --allow-net main.ts"
  },
  "compilerOptions": {
    "jsx": "precompile",
    "jsxImportSource": "hono/jsx"
  }
}

this is the only other file you need, and it’s the core of the app:

main.ts:

import type { Context } from "hono";
import { Hono } from "hono";
import { pdfText } from "pdftext";

async function processPdf(pdfBuffer: ArrayBuffer) {
  const pages = await pdfText(pdfBuffer);
  return Object.entries(pages)
    .map(([_, pageText]) => pageText)
    .join("\n");
}

const app = new Hono();

app.get("/", (c: Context) => {
  return c.text("Requires POST");
});

app.post("/", async (c: Context) => {
  const contentType = c.req.header("Content-Type");

  if (contentType !== "application/pdf") {
    return c.json({ error: "Content-Type must be application/pdf" }, 415);
  }

  try {
    const pdfBuffer = await c.req.arrayBuffer();

    if (pdfBuffer.byteLength === 0) {
      return c.json({ error: "No PDF data provided" }, 400);
    }

    const text = await processPdf(pdfBuffer);

    return c.text(text);
  } catch (error) {
    console.error("Error processing PDF:", error);
    return c.json({ error: "Failed to process PDF" }, 500);
  }
});

const port = parseInt(Deno.env.get("PORT") || "9107");

Deno.serve({ port }, app.fetch);

Fire it up:

$ deno task start
Task start deno run --allow-env --allow-net main.ts
Listening on http://0.0.0.0:9107

and give it a go:

$ curl \
  --silent \
  --output test.pdf \
  --url https://rud.is/dl/test.pdf && 
  curl \
  --silent \
  --request POST \
  --header "Content-Type: application/pdf" \
  --url http://localhost:9107/ \
  --data-binary @test.pdf
Ut duis laboris aliquip aliquip dolor amet culpa. Sit consectetur reprehenderit velit ipsum. Aliqua qui commodo excepteur
incididunt mollit anim aliquip id proident nisi ex minim aliqua nulla. Aliquip culpa fugiat Lorem commodo ut proident id.
Reprehenderit qui aute fugiat tempor adipisicing labore ipsum laborum. Magna ullamco velit mollit minim commodo elit
minim incididunt fugiat. Enim nostrud deserunt velit irure reprehenderit nulla et officia reprehenderit ullamco voluptate aliqua
culpa. Occaecat velit consectetur deserunt ad et voluptate voluptate.
Commodo consequat cupidatat non mollit mollit adipisicing aliquip cillum officia ad nulla pariatur ullamco. Quis eu proident
deserunt do excepteur. Irure excepteur consequat ullamco non elit do aliquip aute duis ex ex dolore elit. Eu deserunt dolor
mollit voluptate proident dolor nulla. Tempor est eu laborum ipsum ad culpa ea Lorem do adipisicing nulla ullamco aliquip
duis pariatur. Deserunt eu reprehenderit id labore consequat pariatur qui cupidatat incididunt duis commodo ex aute.
E
…

Hopefully, I’ll get to show off what this is part of in the not-too-distant future.

🥞 SQL

Photo by Monserrat Soldu00fa on Pexels.com

The first two sections are long enough that I think I can just shunt y’all over to a neat Quesma blog on the “pancake SQL pattern”. It’s method they developed that combines multiple SQL queries into a single, more efficient query using window functions. The approach was inspired by the need to reduce database load and improve dashboard performance for a design partner

FIN

Remember, you can follow and interact with the full text of The Daily Drop’s free posts on Mastodon via @dailydrop.hrbrmstr.dev@dailydrop.hrbrmstr.dev ☮️

hrbrmstr's Daily Drop

Drop #543 (2024-10-20): Dev-Mode Engage

TL;DR

JavaScript Import Maps For Fun And Profit

Super-Tiny PDF-to-Text Microservice

🥞 SQL

FIN

Leave a comment Cancel reply

Drop #543 (2024-10-20): Dev-Mode Engage

TL;DR

JavaScript Import Maps For Fun And Profit

Super-Tiny PDF-to-Text Microservice

🥞 SQL

FIN

Share this:

Leave a comment Cancel reply