Bonus Drop #15 (2023-05-21): Quality Time

Quality; Quantity; Responsiveness

Your friendly neighbourhood hrbrmstr is safe and sound back at the Maine compound after a rather horrible, rain-filled drive up the U.S. east coast. As such, I’m taking another “quick hit” liberty with this Bonus Drop, but only in the sense that we’ll cover a single topic in some depth vs three mid-depth sections.

Today’s focus is all about quality, and we’ll hit which “quality” we’re talking about after the break.

Responsiveness Under Working Conditions

written equations on brown wooden board

A couple of years ago, folks at Apple heeded the cries of users throughout the lands begging for more “actionable information” about the state of their network connections. They did some noodling and came up with a proposal for a new metric, “RPM” or “Rounds Per Minute”, a signal which would express the level of responsiveness under working conditions:

This document specifies the “RPM Test” for measuring responsiveness. It uses common protocols and mechanisms to measure user experience especially when the network is under working conditions. The measurement is expressed as “Round-trips Per Minute” (RPM) and should be included with throughput (up and down) and idle latency as critical indicators of network quality.

It’s on an IETF standards track with the somewhat opaque title of “Responsiveness Under Working Conditions”. Said draft describes a technique for measuring responsiveness in a network. It focuses on measuring the time it takes for a network to respond to a request, and the time it takes for the network to become idle after a request has been completed. Here’s the abstract:

For many years, a lack of responsiveness, variously called lag, latency, or bufferbloat [— the undesirable latency that comes from a router or other network equipment buffering too much data —] has been recognized as an unfortunate, but common, symptom in today’s networks. Even after a decade of work on standardizing technical solutions, it remains a common problem for the end users.

Everyone “knows” that it is “normal” for a video conference to have problems when somebody else at home is watching a 4K movie or uploading photos from their phone. However, there is no technical reason for this to be the case. In fact, various queue management solutions have solved the problem.

Our networks remain unresponsive, not from a lack of technical solutions, but rather a lack of awareness of the problem and deployment of its solutions. We believe that creating a tool that measures the problem and matches people’s everyday experience will create the necessary awareness, and result in a demand for solutions.

Now, you are most likely familiar with the “ping” command, which is used to test the quality and speed of a network connection by measuring the round-trip time (RTT) of the packets. You are also likely familiar with the “traceroute” command, which is used to trace a packet from your computer to the host and will also show the number of steps (hops) required to reach there, along with the time by each step. And, I’d be very surprised if you don’t run some “speedtest” client at least once/month, since it’s useful for determining the Internet speed your ISP provides to you.

None of those common diagnostic tools truly measures responsiveness. While the IETF describes the technical components of this measurement, I like to think of it as a vibe metric, since it provides a quantification of what we are “feeling” during any network request.

The RFC is one of the most readable RFCs I’ve ever read, so I’ll respect your ability to give it a quick read and just hit the high points.

Essentially, the test is “just” performing either:

  • An HTTP GET request on a separate connection (“foreign probes”). This test mimics the time it takes for a web browser to connect to a new web server and request the first element of a web page (e.g., “index.html”), or the startup time for a video streaming client to launch and begin fetching media.

  • An HTTP GET request multiplexed on the load-generating connections (“self probes”). This test mimics the time it takes for a video streaming client to skip ahead to a different chapter in the same video stream, or for a navigation client to react and fetch new map tiles when the user scrolls the map to view a different area. In a well-functioning system fetching new data over an existing connection should take less time than creating a brand new TLS connection from scratch to do the same thing.

then doing a bunch of stats work to come up with the final quantitative details.

macOS Quick Start

macOS users on recent macOS versions can just run networkQuality which uses Apple servers and Apple’s implementation. Just run the command in a terminal session and wait a bit (it shows “progress”) to get results.

On an M1 MacBook Pro Max sitting outside (pretty far from the nearest access point) this is what I get:

==== SUMMARY ====
Uplink capacity: 34.823 Mbps
Downlink capacity: 172.013 Mbps
Responsiveness: Low (189 RPM)
Idle Latency: 33.500 milliseconds

SSH’ing from it to my M1 server with an ethernet connection, this is what I get:

==== SUMMARY ====
Upload capacity: 34.737 Mbps
Download capacity: 653.099 Mbps
Upload flows: 12
Download flows: 12
Responsiveness: Medium (781 RPM)

If you are on macOS and give the built-in utility a go, please feel encouraged to drop your results in the comments.

Less Quick Start For Everyone Else

Despite Apple leading the charge, you can run your own client and server to test your network quality.

The network-quality GitHub organization has a Golang client and Server that build cleanly (you need Golang 1.19.x due to how QUIC and Go work together) and work as described on the tin.

The client has some basic options:

  • config string name/IP of responsiveness configuration server. (default “networkquality.example.com“)

  • connect-to string address (hostname or IP) to connect to (overriding DNS). Disabled by default.

  • debug Enable debugging.

  • extended-stats Enable the collection and display of extended statistics — may not be available on certain platforms.

  • insecure-skip-verify Enable server certificate validation. (default true)

  • logger-filename string Store granular information about tests results in files with this basename. Time and information type will be appended (before the first.) to create separate log files. Disabled by default.

  • path string path on the server to the configuration endpoint. (default “config”)

  • port int port number on which to access responsiveness configuration server. (default 4043)

  • probe-interval-time uint Time (in ms) between probes (foreign and self). (default 100)

  • profile string Enable client runtime profiling and specify storage location. Disabled by default.

  • prometheus-stats-filename string If filename specified, prometheus stats will be written. If specified file exists, it will be overwritten.

  • rpmtimeout int Maximum time to spend calculating RPM (i.e., total test time.). (default 10)

  • ssl-key-file string Store the per-session SSL key files in this file.

  • url string configuration URL (takes precedence over other configuration parts)

  • version Show version.

as does the server:
announce announce this server using DNS-SD

  • cert-file string cert to use

  • config-name string domain to generate config for (default “networkquality.example.com“)

  • context-path string context-path if behind a reverse-proxy

  • create-cert generate self-signed certs

  • debug enable debug mode

  • enable-cors enable CORS headers

  • enable-h2c enable h2c (non-TLS http/2 prior knowledge) mode

  • enable-http2 enable HTTP/2 (default true)

  • enable-http3 enable HTTP/3

  • insecure-public-port int The port to listen on for HTTP measurement accesses

  • key-file string key to use

  • listen-addr string address to bind to (default “localhost”)

  • public-name string host to generate config for (same as -config-name if not specified)

  • public-port int The port to listen on for HTTPS/H2C/HTTP3 measurement accesses (default 4043)

  • socket-send-buffer-size uint The size of the socket send buffer via TCP_NOTSENT_LOWAT. Zero/unset means to leave unset

  • tos string set TOS for listening socket (default “0”)

  • version Show version

However, you don’t need to be a parameter expert to run either.

I’ve fired up a server on the Hetzner ARM64 server I mentioned the other week via:

./networkqualityd \
  --create-cert \
  --public-name thats.hrbrmstr.dev \
  --listen-addr 0.0.0.0

and left it running for anyone to tap.

You can do said tapping via:

./networkQuality \
  --url https://thats.hrbrmstr.dev:4043/.well-known/nq \
  --insecure-skip-verify

and, these are the results on the respective (above) systems:

Test did not run to stability, these results are estimates:
RPM:   149 (P90)
RPM:   240 (Double-Sided 10% Trimmed Mean)
Download: 177.231 Mbps ( 22.154 MBps), using 9 parallel connections.
Upload:    16.250 Mbps (  2.031 MBps), using 9 parallel connections.
Test did not run to stability, these results are estimates:
RPM:   196 (P90)
RPM:   326 (Double-Sided 10% Trimmed Mean)
Download: 611.908 Mbps ( 76.489 MBps), using 9 parallel connections.
Upload:    16.500 Mbps (  2.062 MBps), using 9 parallel connections.

The results are different, mostly because I hit a very under-powered server in Europe vs. the “best” server Apple gave me.

If you’re wondering what’s at https://thats.hrbrmstr.dev:4043/.well-known/nq, all you’re missing is this configuration file:

{
    "version": 1,
    "urls": {
        "small_download_url": "https://thats.hrbrmstr.dev:4043/small",
        "large_download_url": "https://thats.hrbrmstr.dev:4043/large",
        "upload_url": "https://thats.hrbrmstr.dev:4043/slurp",
        "small_https_download_url": "https://thats.hrbrmstr.dev:4043/small",
        "large_https_download_url": "https://thats.hrbrmstr.dev:4043/large",
        "https_upload_url": "https://thats.hrbrmstr.dev:4043/slurp"
    }
}

For comparison, this is the config Apple’s utility uses by default:

{
  "version": 1,
  "urls": {
    "small_https_download_url": "https://mensura.cdn-apple.com/api/v1/gm/small",
    "large_https_download_url": "https://mensura.cdn-apple.com/api/v1/gm/large",
    "https_upload_url": "https://mensura.cdn-apple.com/api/v1/gm/slurp"
  },
  "test_endpoint": "usqas2-edge-bx-023.aaplimg.com"
}

Bonus points for folks who run either client against either server (though you’ll need a Golang server with a proper certificate vs. the self-signed one).

FIN

Definitely use this newfound tool/ability to look extra clever at work the next time someone has Zoom connectivity issues.

Many thanks, once more, for your support! ☮

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.