Drop #618 (2025-03-10): Nothin’ But Bluesky

The Blue Report; Understanding AT Protocol: A Practical Approach; Bluesky: Network Topology, Polarization, And Algorithmic Curation

The Weekend Bonus Drop turned out to be a great catalyst to dig into some Bluesky resource links I’ve saved over the past couple weeks. So, y’all finally get a topic-based Drop!


Aside: if you have a GitLab account, this might be a good week to delete it:

“At a Morgan Stanley conference this month, Brian Robins, finance chief for San Francisco-based software maker GitLab, said GitLab is aligned with the goals of DOGE, because the company’s software tools aim to help people do more with less. “What the Department of Government Efficiency is trying to do is what GitLab does,” Robins said.

There’s a script to backup all your GitLab repos and another script to delete all the repos at the end of thi Drop.


TL;DR

(This is an AI-generated summary of today’s Drop using Ollama + llama 3.2 and a custom prompt.)

  • The Blue Report ranks trending links on Bluesky using a scoring formula that combines posts, reposts, and likes, with filtering for English-only content and one interaction per user per link (https://theblue.report/)
  • Samuel Newman’s “atproto by example” series explains how the AT Protocol works through practical examples, highlighting lexicons (schemas), the distinction between records (raw data) and views (enriched versions), and how developers can build applications (https://bsky.app/profile/samuel.bsky.team)
  • A research paper by Quelle and Bovet analyzed Bluesky’s network structure through May 2024, finding that despite its decentralized design, it mirrors traditional social media patterns with power-law distributions and shows a significant left-center political bias (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0318034)

The Blue Report

Photo by Pixabay on Pexels.com

When I decided to build CVESky, I abandoned the URL collection project I started that was an experiment in how to drink from Bluesky’s firehose. I’m glad others, like George Black, continued and expanded that type of experiment.

The Blue Report (GH) is _“a website that shows trending links on Bluesky over the previous 24 hours. Links are scored based on the number of posts, reposts, and likes that reference them”.

It ranks Top Links on Bluesky using a scoring formula: score = (10 × posts) + (10 × reposts) + (1 × likes). Links appear in hourly, daily, and weekly lists. Only English posts count, with one interaction per user per link maximum. Removed interactions still count, but @theblue.report’s own activity is excluded from calculations.

The Top Sites section showcases domains with the most interactions over 30 days. “Interactions” combine posts, reposts, and likes referencing URLs from that domain. The same filtering rules apply: English-only content, maximum three interactions per user per link (one post, one repost, one like), and @theblue.report’s interactions are, again, excluded.

It’s a neat project, and super straightforward to riff from.


Understanding AT Protocol: A Practical Approach

Photo by EDD Sylvia Nenntwich on Pexels.com

As longer-time readers of the Drop will know all too well, the AT Protocol (“ATproto”, or it’s increasingly more commonly all lowercased version) is a decentralized networking protocol created by Bluesky. While many discussions focus on the philosophy and high-level architecture of atproto, the real value comes from understanding how to build practical applications with it.

In “atproto by example part 1: records and views”, Samuel Newman) walks us through some of the key concepts with a well-crafted narrative accompanied by a healthy dose of practical examples. This series should help illuminate how folks like the Tangled and WhiteWind developers built completely different services on top of atproto. Hopefully this overview will whet a few curiosity appetities and enable some intrepid friends of the Drop to build even cooler things on top of atproto. And, while this is a totally biased opinion, I think Samuel’s series pairs nicely with this old Drop.

In atproto, a lexicon is like a schema that defines what data should look like. For example, a simple “status” record might contain just two fields: an emoji status and a timestamp. What makes atproto lexicons special is their flexibility. They define only the bare minimum required structure, allowing for “open unions” where additional fields can appear without breaking the record. This means:

  • Schemas can evolve naturally over time
  • Applications don’t break when they encounter unexpected fields
  • Innovation can happen without requiring central coordination

For instance, if someone adds comments or labels to a status record that originally didn’t include these fields, applications can simply ignore the extra information they don’t understand.

There’s an important distinction between two data concepts within the protocol:

  • Records: Raw, minimal data stored on the decentralized network
  • Views: Enriched versions of records used by applications

While a status record might contain just an emoji and timestamp, a “statusView” might include additional information like the author’s profile details. Views make records more practical for application use.

To connect records and views, developers write transformation functions called “hydrators” that fetch associated data and build more comprehensive view objects. This approach simplifies frontend development since UIs can work with complete data structures.

Those aforementioned lexicons don’t just describe data—they also define APIs through:

  • Queries: For fetching data (similar to HTTP GET)
  • Procedures: For submitting data (similar to HTTP POST)

By defining API interfaces within lexicons, developers ensure consistency between client and server implementations. This provides built-in type safety and clear contracts that can be automatically code-generated.

The typical development workflow involves:

  1. Defining lexicons for your data and APIs
  2. Using tools like @atproto/lex-cli to generate code
  3. Writing transformation functions to create views
  4. Building frontend components that consume these views

This approach reduces boilerplate, speeds up development, and ensures consistency across teams and applications. Lexicons serve as a single source of truth, driving backend handlers, frontend clients, and data models simultaneously.

To explore atproto firsthand, you can start with example projects like Statusphere and extend them by:

  • Adding queries to fetch account-specific statuses
  • Building richer view objects
  • Creating analytical features like emoji usage statistics

This hands-on experimentation should help bridge the gap between abstract decentralized concepts and practical software engineering. I’m really looking forward to future posts by Simon and what y’all might build!


Bluesky: Network Topology, Polarization, And Algorithmic Curation

Photo by Pixabay on Pexels.com

This paper by Dorian Quelle and Alexandre Bovet provides the first detailed analysis of Bluesky. The authors examined data from five million users (as Drop readers know I detest that word, but I have to use it since that’s the term the authors use) between the platform’s launch and May 2024, focusing on network structure, user behavior, political polarization, and Bluesky’s algorithmic content features.

Much has changed since May of 2024. Those five million users turned to ten million in just four months, and the service now sports close to thirty-five million acccounts. Yet, the findings in the paper appear (on the surface, at least) to still hold, hence this introduction to it and suggestion to read it.

The findings show that Bluesky’s growth closely correlated with events at Twitter (now X), particularly during controversies involving potential subscription fees, rate-limiting, and service outages under Elon Musk’s ownership. These events triggered noticeable increases in new user registrations, suggesting many joined Bluesky due to dissatisfaction with Twitter.

Despite its decentralized design, Bluesky’s network characteristics mirror those of traditional social media platforms. Analysis of user interactions (follows, replies, reposts, likes) revealed common patterns such as power-law distributions, high clustering, and short connection paths typical of small-world networks. A small group of highly active “power users” disproportionately influence interactions on the platform.

Bluesky offers a distinctive feature in user-created “feeds,” customizable algorithmic curations. The authors identified 39,639 feeds created by 18,352 users, covering a wide range of interests. However, fewer than 3% of users engaged with these feeds, indicating low adoption of this novel functionality.

Politically, Bluesky shows a significant left-center bias. About 63.4% of political links shared were from left-leaning sources, compared to only 7.9% from right-leaning sources. Individually, 75.3% of users mostly shared left-leaning content, with just 4.8% sharing predominantly right-leaning sources. Notably, the platform showed minimal dissemination of misinformation, conspiracy theories, or fake news.

Yet, despite this overall political alignment, significant polarization emerged around specific issues. The analysis of discussions on the Israel-Palestine conflict revealed clearly divided communities. Engagement on this topic sharply increased after the Hamas attacks on October 7, 2023, with pro-Palestinian perspectives eventually becoming dominant by January 2024.

Overall, the study illustrates how Bluesky, despite innovative decentralization and unique features, reflects typical social media patterns regarding network behavior and political polarization, offering valuable opportunities for transparent and accessible social media research.


GitLab Purge

glab-clone-all.sh
#!/usr/bin/env bash

# Exit on error
set -e

function log_json() {
  local level="$1"
  local message="$2"
  local data="${3:-{}}"
  echo "{\"timestamp\":\"$(date -u +"%Y-%m-%dT%H:%M:%SZ")\",\"level\":\"$level\",\"message\":\"$message\",\"data\":$data}"
}

# Configuration
GITLAB_URL="https://gitlab.com"  # Change to your GitLab instance URL if needed

# Check if GAT is set
if [ -z "$GAT" ]; then
  log_json "error" "GitLab access token not set" "{\"variable\":\"GAT\",\"resolution\":\"Please set the GAT environment variable with your GitLab access token\"}"
  exit 1
fi

TOKEN="$GAT"
OUTPUT_DIR="${1:-$(pwd)}"
TEMP_DIR=$(mktemp -d)
trap 'rm -rf "$TEMP_DIR"' EXIT

log_json "info" "Starting GitLab clone operation" "{\"output_dir\":\"$OUTPUT_DIR\"}"
mkdir -p "$OUTPUT_DIR"
cd "$OUTPUT_DIR"

# Initialize variables for pagination
page=1
per_page=100
total_repos=0

# Paginate through all projects you have access to
log_json "info" "Fetching repositories from GitLab with pagination"
while true; do
  log_json "debug" "Fetching page $page" "{\"page\":$page,\"per_page\":$per_page}"

  page_data=$(curl --silent --header "PRIVATE-TOKEN: $TOKEN" "$GITLAB_URL/api/v4/projects?per_page=$per_page&page=$page&membership=true")
  page_count=$(echo "$page_data" | jq length)

  if [ "$page_count" -eq 0 ]; then
    break
  fi

  # Process this page of repos
  echo "$page_data" | jq -r '.[] | "\(.id) \(.path_with_namespace) \(.ssh_url_to_repo)"' > "$TEMP_DIR/page_$page.txt"

  total_repos=$((total_repos + page_count))
  log_json "info" "Fetched page $page" "{\"repos_on_page\":$page_count,\"total_so_far\":$total_repos}"

  if [ "$page_count" -lt "$per_page" ]; then
    break
  fi

  page=$((page + 1))
done

log_json "info" "Fetched $total_repos repositories in total"

# Combine all pages into a single file, no need to use jq for this
cat "$TEMP_DIR"/page_*.txt > "$TEMP_DIR/all_repos.txt"

# Process repositories
while read -r id namespace repo_url; do
  # Create directory structure matching the namespace
  dir_path=$(dirname "$namespace")
  mkdir -p "$dir_path"

  # Clone the repository if it doesn't exist, otherwise skip it
  repo_name=$(basename "$namespace")
  if [ -d "$namespace" ]; then
    if [ -d "$namespace/.git" ]; then
      log_json "info" "Repository already cloned, skipping" "{\"id\":$id,\"namespace\":\"$namespace\"}"
    else
      log_json "info" "Directory exists but not a git repo, cloning" "{\"id\":$id,\"namespace\":\"$namespace\",\"url\":\"$repo_url\"}"
      rm -rf "$namespace"
      git clone "$repo_url" "$namespace" && \
        log_json "info" "Repository cloned" "{\"id\":$id,\"namespace\":\"$namespace\",\"status\":\"cloned\"}"
    fi
  else
    log_json "info" "Cloning repository" "{\"id\":$id,\"namespace\":\"$namespace\",\"url\":\"$repo_url\"}"
    git clone "$repo_url" "$namespace" && \
      log_json "info" "Repository cloned" "{\"id\":$id,\"namespace\":\"$namespace\",\"status\":\"cloned\"}"
  fi
done < "$TEMP_DIR/all_repos.txt"

log_json "info" "Operation completed" "{\"repositories_processed\":$total_repos}"
glab-delete-all.sh
#!/usr/bin/env bash

# Exit on error
set -e

function log_json() {
  local level="$1"
  local message="$2"
  local data="${3:-{}}"
  echo "{\"timestamp\":\"$(date -u +"%Y-%m-%dT%H:%M:%SZ")\",\"level\":\"$level\",\"message\":\"$message\",\"data\":$data}"
}

# Configuration
GITLAB_URL="https://gitlab.com"  # Change to your GitLab instance URL if needed

# Check if GAT is set
if [ -z "$GAT" ]; then
  log_json "error" "GitLab access token not set" "{\"variable\":\"GAT\",\"resolution\":\"Please set the GAT environment variable with your GitLab access token\"}"
  exit 1
fi

TOKEN="$GAT"
TEMP_DIR=$(mktemp -d)
trap 'rm -rf "$TEMP_DIR"' EXIT

# Initialize variables for pagination
page=1
per_page=100
total_repos=0

# Confirmation to prevent accidental deletion
read -p "WARNING: This will DELETE ALL repositories you have access to! Type 'DELETE ALL' to confirm: " confirmation
if [ "$confirmation" != "DELETE ALL" ]; then
  log_json "info" "Operation aborted by user" "{\"reason\":\"confirmation_failed\"}"
  exit 0
fi

# Paginate through all projects you have access to
log_json "info" "Fetching repositories from GitLab for deletion"
while true; do
  log_json "debug" "Fetching page $page" "{\"page\":$page,\"per_page\":$per_page}"

  page_data=$(curl --silent --header "PRIVATE-TOKEN: $TOKEN" "$GITLAB_URL/api/v4/projects?per_page=$per_page&page=$page&membership=true")
  page_count=$(echo "$page_data" | jq length)

  if [ "$page_count" -eq 0 ]; then
    break
  fi

  # Process this page of repos
  echo "$page_data" | jq -r '.[] | "\(.id) \(.path_with_namespace)"' > "$TEMP_DIR/page_$page.txt"

  total_repos=$((total_repos + page_count))
  log_json "info" "Fetched page $page" "{\"repos_on_page\":$page_count,\"total_so_far\":$total_repos}"

  if [ "$page_count" -lt "$per_page" ]; then
    break
  fi

  page=$((page + 1))
done

log_json "info" "Fetched $total_repos repositories for deletion"

# Combine all pages into a single file
cat "$TEMP_DIR"/page_*.txt > "$TEMP_DIR/all_repos.txt"

# Second confirmation with specific count
read -p "About to DELETE $total_repos repositories. Type 'CONFIRM $total_repos' to proceed: " final_confirmation
if [ "$final_confirmation" != "CONFIRM $total_repos" ]; then
  log_json "info" "Operation aborted by user" "{\"reason\":\"final_confirmation_failed\"}"
  exit 0
fi

# Process repositories for deletion
deleted_count=0
failed_count=0

while read -r id namespace; do
  log_json "info" "Deleting repository" "{\"id\":$id,\"namespace\":\"$namespace\"}"

  # Delete the repository
  response=$(curl --write-out "%{http_code}" --silent --output /dev/null \
    --request DELETE \
    --header "PRIVATE-TOKEN: $TOKEN" \
    "$GITLAB_URL/api/v4/projects/$id")

  if [ "$response" = "202" ] || [ "$response" = "204" ]; then
    log_json "info" "Repository deleted successfully" "{\"id\":$id,\"namespace\":\"$namespace\",\"status\":\"deleted\"}"
    deleted_count=$((deleted_count + 1))
  else
    log_json "error" "Failed to delete repository" "{\"id\":$id,\"namespace\":\"$namespace\",\"status\":\"failed\",\"http_code\":\"$response\"}"
    failed_count=$((failed_count + 1))
  fi

  # Add a small delay to prevent rate limiting
  sleep 0.5
done < "$TEMP_DIR/all_repos.txt"

log_json "info" "Deletion operation completed" "{\"repositories_processed\":$total_repos,\"deleted\":$deleted_count,\"failed\":$failed_count}"

FIN

Remember, you can follow and interact with the full text of The Daily Drop’s free posts on:

  • 🐘 Mastodon via @dailydrop.hrbrmstr.dev@dailydrop.hrbrmstr.dev
  • 🦋 Bluesky via https://bsky.app/profile/dailydrop.hrbrmstr.dev.web.brid.gy

Also, refer to:

to see how to access a regularly updated database of all the Drops with extracted links, and full-text search capability. ☮️

Fediverse Reactions

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.