Bonus Drop #39 (2024-02-05): Perplexing Throughline Arcs

Rabbit R1 (Privacy); Stract (Search); Arc’s Dystopian? (Future)

Today’s Bonus Drop is also Monday’s regular Drop as one of the clan had a bit of a serious health scare at the start of the weekend, and it drained all mental compute resources. (In other news, I’m still fighting with Substack + Stripe + WordPress…the non-technical parts of this hosting move were more painful than expected.)

I’m answering a question from Mastodon in the first section, doing a follow-up of reverting my experiment and ditching Perplexity as Arc’s default search engine, and briefly continuing the Arc discussion. It turns out both Arc and Perplexity have a throughline right down the middle of this edition. I’ll also do my best to just get to the heart of the matter in each.

Reading time estimate: ~15 minutes.

Rabbit R1 (Privacy)

This is the link to the post mentioned in the preamble. The topic is, essentially, “privacy considerations for consumers with emerging AI/GPT tech like Rabbit R1”. We’ll use the Rabbit M1 — a device designed to use AI to “simplify user interaction with technology” — to at least start the discussion in this issue, but it’s broad enough that I won’t DoS you all in one go.

The M1 is phone sized, requires your phone for internet (for now) and has a:

  • push-to-talk button
  • far-field mic
  • 360° rotational eye
  • touchscreen
  • speaker

that it uses for input/output. We touched on this device in a previous Drop.

What sets this “destined for the landfill” creation from all our other devices destined to meet a similar end is something called a Large Action Model. The gist is that you can ask it to perform a task in English (like how you’d interact with ChatGPT) and it can be trained how to take a series of actions based on that task. Think of it as an upgraded Alexa, where (theoretically) it doesn’t require explicit if-this-then-that step programming. I say in theory, since I haven’t played with one yet (they’re not generally available) and I don’t trust anything anyone in Silicon Valley says.

They’ve partnered with Perplexity, and a year of Perplexity Pro is included with the purchase price. This makes sense since Perplexity is very good at distilling what one wants from free-form input, and the Rabbit folks needed to figure out a business model since there’s no initial mention of a monthly fee (which there will have to be, and that looks to be Perplexity).

It gains access via a web portal called “rabbit hole,” which is a:

… “cloud hub where users can relay access to their existing apps to rabbits, users can enable different features and functionalities for their r1 device. Similar to handing one’s unlocked phone to a friend who will help order takeout, rabbit OS performs tasks for users with their permission, without preemptively storing their identity information or passwords. rabbit OS does not create any proxy accounts, or require users to purchase additional subscriptions for their existing services, which leads to increased safety, security, and efficiency.”

None of that is yet verifiable, but I’d like you to focus on the bolded sentence fragment. You need to give this device and service full control over your smartphone. And, it will need to interact with all your accounts (Uber is used quite a bit as an example).

Before we get to a summary of what privacy concerns you should have with the blending of AI/GPT tech into everyday items, I’ll give you something to listen to on this subject.

Recently, the Behind The Bastards (BtB) podcast did a segment it in a recent episode — Part One: Tech Bros Have Built A Cult Around AI. Here’s the breakdown (I had Perplexity make up the bullets from the show’s transcript):

  • Skepticism is expressed about the Rabbit M1’s practicality and security, particularly regarding its access to all user apps.
  • Concerns are raised about AI’s impact on creativity and commerce, with a suggestion that commerce may suppress creativity for profit.
  • Criticism is directed at the Rabbit M1’s design, which appears to overlook left-handed users and includes a questionable swiveling camera.
  • The device is not yet a smartphone replacement, requiring users to carry both the Rabbit M1 and their phone.
  • Despite criticisms, the Rabbit M1’s pre-order models sold out quickly at the Consumer Electronics Show, indicating significant interest.
  • The tech industry’s obsession with AI is noted, with AI potentially serving as an “AI agent” for customers.
  • The speakers are unconvinced that the Rabbit M1 substantially improves upon existing technology and note the inconvenience of an additional device.
  • A portion of the keynote presentation is mocked, where the CEO’s attempt at humor during a demonstration of the Rabbit M1 is deemed unsuccessful.

All of that tracks with what I heard. I should also note that BtB episodes usually contain quite a bit of profanity. I generally only listen to ones with topics I’m interested in and otherwise consume the transcripts (which also helps avoid adverts).

Rather than emit a half dozen more prose paragraphs, let’s walk through some things you can ponder regarding how you already value your privacy…

  • You already (likely) give Siri and Alexa (both legacy AI bots) access to tons of things, so you are used to interacting with tech this way.
  • If you use apps like Shop, or other shipping apps, or IFTTT-like services, you’ve likely given them access to your email, calendar, and more. (This is a good week to review all the OAuth apps you’ve let used your central credentials, too).
  • Unlike the majority of Apple’s AI tooling, all of the data processed through this Rabbit device is going to be done “in the cloud” (vs. on-device), likely by Perplexity. Just like your Google searches and ChatGPT prompts aren’t truly private.
  • Regardless of any fine-print text, everything you say, type, and record on the M1 will be used for analysis and training.
  • Whatever is distilled from that data will turn into a commodity.
  • Technology like this will most certainly need broad access to all account data and all apps to be of general use.
  • Almost no startup takes security and privacy seriously (along with non-startups).
  • You have received at least one (likely many more) breach notifications in the past 18-24 months.

With that in mind, here are some questions you should ask whenever you sign up for some new LLM/GPT service or use a device like this one:

  • What data was it trained and retrained on?
  • Will my inputs be used for training?
  • Does it have deep access to personal information (e.g., email, finance, etc.)?
  • Who are the company’s processing and content delivery partners?
  • What happens to my data if the company shuts down or is sold?
  • What is the worst that could happen to you or your information in the event of a data breach or misuse by an internal company actor?
  • Will law enforcement have access to this data and what are the implications if they do?
  • If every chat log or app use became public information, what is the maximum damage that could occur to you?

Finally, and most importantly, what actual IRL problem is this device or service solving for you and is there a safer alternative that may just not be as convenient, but gets the job done?

Every new service you sign up for introduces risk into your digital life.

Every piece of information you enter into a textbox on the internet and hit “submit” on reduces your privacy.

Every app you install on your computer or device increases your attack surface.

When you end up ignoring all ^^, at least take 24 hours to wait and see if you really do want to install that app, enter that information, or try that new service/device.

I recently noted that Arc partnered with Perplexity to make it a default search selection/offering in-browser. Since I already use and subscribe to Perplexity, I figured it’d be neat to see if this proved beneficial.

While Perplexity does do actual internet searches — and returns the list so you can tap the sites if the information curated by their model and code (the model doesn’t do all the work) — the experience of having it on by default was less than stellar.

For starters, it’s too slow. Until hardware gets better, there’s no way a model that has to interpret the meaning of what you type, then has to go retrieve site data, then process said data, and then return some paragraphs of English to you, will be faster than a traditional search engine for a large percentage of query types.

Other reasons are found in the third section.

So, what replaces Perplexity in Arc for me?

I’m still looking for a full-on Kagi replacement (they’ve turned out to be narcissistic Silicon Valley bros), and have been giving Stract (GH) a go. It’s “an open-source search engine where the user has the ability to see exactly what is going on and customize almost everything about their search results.” Here are the features (so far):

  • Keyword search that respects your search query.
  • Fully independent search index.
  • Advanced query syntax (site:intitle: etc.).
  • DDG-style !bang syntax
  • Wikipedia and stackoverflow sidebar
  • De-rank websites with third-party trackers
  • Use optics to almost endlessly customize your search results.
    • Limit your searches to blogs, indieweb, educational content etc.
    • Customize how signals are combined during search for the final search result
  • Prioritize links (centrality) from the sites you trust.
  • Explore the web and find sites similar to the ones you like.

It also has an API, so you can do things like this:

$ curl -X 'POST' \
  'https://stract.com/beta/api/search' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "query": "characters of the 'The Expanse' novels"
}'
{
  "type": "websites",
  "webpages": [
    {
      "title": "How Amazon Saved 'The Expanse' | Space",
      "url": "https://www.space.com/the-expanse-how-amazon-jeff-bezos-saved-scifi.html",
      "site": "space.com",
      "domain": "space.com",
      "prettyUrl": "https://www.space.com › the-expanse-how-amazon-jeff-bezos-saved-scifi.html",
      "snippet": {
        "type": "normal",
        "date": "03. Jan. 2020",
        "text": {
          "fragments": [
            {
              "kind": "normal",
              "text": "(Image credit: Amazon) Cast your mind back to December 2015, when a series called \""
            },
            {
              "kind": "highlighted",
              "text": "The"
            },
            …
            {
              "kind": "highlighted",
              "text": "the"
            …
            {
              "kind": "normal",
              "text": "t take long for sci-fi fans to sit up and take notice; it was well-written and had amazing production design and a good "
            }
          ]
        }
      },
      "rankingSignals": null,
      "score": 3.3790412828921705,
      "likelyHasAds": true,
      "likelyHasPaywall": true
    },
    …
  ],
  "numHits": null,
  "searchDurationMs": 1721,
  "hasMoreResults": true
} 

The API will eventually cost money, and there will eventually be ads.

You can use their code and setup your own search indexer, too (if you can afford that).

And, you’ll also have to be more old-school smart about how you enter queries, but the results come back faster than Perplexity and I may not be losing as much privacy for basic searches (never can trust an internet service).

As usual, I’ll continue using it and report back after I kick the tyres a bit longer.

Arc’s Dystopian? (Future)

Photo by Andre Frueh on Unsplash

Arc also seems to be partnering with Perplexity to fuel their new “browse for me” feature, where it will do pretty much what Perplexity does now, just with a different interface. My experience of this on mobile (it’s only available there, now) has been terrible, and others have had similar experiences. I’m not sure who they’re testing these concepts out with, so this may seem to work great for the average internet human, but I’m not sure removing us humans from interacting with the information sources is a good thing.

I do not trust Perplexity to make sense of things for me. For instance, I used it to summarise the transcript for the first section, and use it to create the “TL;DR” section that appears in most editions. But, I would not have done the former without either having listened to the podcast or read the transcript. And, I write this newsletter, so I know what’s in it 🙃.

Perplexity, ChatGPT, and all the other retrieval augmented generation tools break the contract the information owner (or displayer) has with us, the visitor. Now, I loathe the pop-up ridden recipe sites (on mobile…I never see those things on my nigh-bulletproof desktops) as much as the next person. And, news sites aren’t much better. But, the solution to the invasion of those sanity snatchers is not to balkanize us from the sites. The real solution is to change the incentive model for observing those practices. These sites rely on direct traffic for revenue, often through advertisements or subscriptions. By summarizing and curating content, Arc + Perplexity could potentially reduce the number of direct visits to these sites. This could lead to a decrease in ad revenue and less engagement with the site’s own content and user interface. For content creators and publishers (that’s us, btw), this shift could necessitate a reevaluation of how we either monetize and/or measure engagement.

I also loathe the “search engine optimization” (SEO) space. That space continues to suffer thanks to the en💩ification of the internet, especially with all this AI-generated content. In the near term, SEO isn’t going away (most folks still use Google for search). However, with Arc and Perplexity summarizing content, the nuances of SEO may become less relevant if we humans are interacting primarily with these AI-curated summaries instead of visiting the sites directly. This could lead to content disappearing due to the way relevance is measured.

Continuing (so much for brevity), when we task these agents with curation and summarization, there is a real risk that we’re getting bad information back. These are still probabilistic models, so randomness is baked in. And, some topics require varying degrees of nuance that these models are far from groking. If summaries do not accurately reflect the original intent and depth of the sources, using that information — and, worth, citing the original source despite it being randomly paraphrased — the resulting confusion could be very problematic (potentially costly in real-world terms).

There’s more, but this edition is already a ~15 minute read (if you made it this far, thank you!).

FIN

I will continue to use Perplexity, but I did not see this throughline coming, and am concerned that even more is in store. Time to start threat modeling!

Remember, you can follow and interact with the full text of The Daily Drop’s free posts on Mastodon via @dailydrop.hrbrmstr.dev@dailydrop.hrbrmstr.dev ☮️