Drop #351 (2023-10-12): 🛡️ Happy (Secure) ThursdAI

nbdefense; modelscan; rebuff; Perplexity.ai API

We’re a week early with our ~monthly recurring glance at advancements in the gradual subjugation of humanity, but when two of my worlds collide — cybersecurity and data science — I gotta make exceptions. Plus, I’m still in travel mode — and all 3 + 1 (the last one has nothing to do with cybersecurity) resources have excellent documentation — which means I can let them do most of the heavy lifting when it comes to providing examples.

TL;DR

This is an AI-generated summary of today’s Drop.

I switched Perplexity over to Claude-2, today, with a prompt that explicitly said, “four sections” and it only gave me three, but the three were superb and quite concise.

Here is a concise three bullet summary of the key sections in the attached blog post:

  • NBDefense helps secure Jupyter notebooks by detecting leaked credentials, PII, and licensing issues

  • ModelScan scans AI models for malicious code before deployment or loading

  • Rebuff protects against prompt injection attacks on large language models like GPT-3

  • Perplexity has a new/second API endpoint.


nbdefense

Captain America shield

This is the first of three open-source tools that Protect AI has released to help orgs and individuals stay safe as they (I guess I have to say “we”) crank on environmentally-destructive data science tasks.

If you are in the “notebook cult”, you are a target. Attackers are clever, and have figured out that anything “Jupyter” can be (pretty easily) pwnd.

NBDefense (GH) is an open-source tool created by Protect AI to help secure Jupyter Notebooks. It is the first security solution designed specifically for Jupyter Notebooks.

NBDefense can scan Jupyter notebooks and detect potential security issues such as:

It can be used as either a JupyterLab extension that scans notebooks within JupyterLab or as a CLI tool that can scan notebooks and projects. The CLI is especially useful for scanning many notebooks at once or setting up automated scanning workflows. The JupyterLab extension enables continuous scanning of notebooks as y’all work on them interactively.

Jupyter Notebooks often contain an organization’s most sensitive code and data. According to research from Protect AI, public notebooks from major tech companies frequently contain credentials, PII, and licensing issues.

Attackers can exploit these notebooks to steal data, hijack accounts, or reverse engineer models. Notebooks may also unintentionally expose your organization to legal risks if they use restrictive open-source licenses improperly.

Detailed installation instructions are available in the NBDefense documentation.

modelscan

brown bear plush toy on white textile

AI models are increasingly being used to make critical decisions across various industries. However, like any software application, AI models can have vulnerabilities that malicious actors could exploit. A new type of attack called “model serialization attacks” poses a particular risk for organizations using AI.

Model serialization attacks involve inserting malicious code into AI model files that are shared between teams or deployed into production. For example, an attacker could add code that steals credentials or sensitive data when the model file is loaded. This is similar to a Trojan horse attack.

To defend against model serialization attacks, organizations need to scan AI models before use to detect any malicious code that may have been inserted. However, until recently, there were no open-source tools available to easily scan models from various frameworks like PyTorch, TensorFlow, and scikit-learn.

ModelScan is a project from Protect AI that provides protection against model serialization attacks. It is the first model scanning tool that supports multiple model formats including H5, Pickle, and SavedModel.

ModelScan scans model files to detect unsafe code without actually loading the models. This approach keeps the environment safe even when scanning a potentially compromised model. ModelScan can quickly scan models in just seconds and classify any unsafe code it finds as critical, high, medium or low risk.

With ModelScan, we can:

  • scan models from PyTorch, TensorFlow, Keras, scikit-learn, XGBoost, and more

  • check models in seconds by reading files instead of loading models

  • see which bits of code are unsafe (i.e., categorized as critical, high, medium or low risk)

It can also be used at any stage in the development process:

  • before loading models: Scan all pre-trained models from third parties before use in case they have been compromised.

  • during model development: Scan models after training to detect any poisoning attacks on new models.

  • before model deployment: Verify models contain no unsafe code before deployment to production.

  • in ML pipelines: Integrate ModelScan scans into CI/CD pipelines and at each stage of ML workflows.

Provided your Python environment is not woefully busted, you are a quick:

$ python3 -m pip install modelscan
$ modelscan -p /path/to/model_file

away from leveling up your safety.

rebuff

gray stainless steel armor

It’s no secret that prompt injection attacks have emerged as a serious threat to AI systems built on large language models (LLMs) like GPT-3 and ChatGPT. In these attacks, adversaries manipulate the prompts fed into the LLM to make it behave in unintended ways.

Prompt injection can provide a means for attackers to exfiltrate sensitive data, take unauthorized actions, or cause the model to generate harmful content. Recent examples include bypassing content filters, extracting training data, and stealing API keys. Prompt injection has been ranked as the number one threat to LLMs (direct PDF) by OWASP.

The core vulnerability arises from the fact that LLMs process instructions and input text in the same way. There is no built-in mechanism to distinguish harmless user input from malicious instructions designed to manipulate the model.

Rebuff (GH) is the third open-source framework released by Protect AI to detect and prevent prompt injection attacks. It:

  • filters potentially malicious inputs before they reach the LLM

  • uses a separate LLM to analyze prompts for attacks

  • stores embeddings of past attacks to recognize similar ones

  • plants canary words to detect data leaks

This new tool makes it possible to integrate prompt injection protection into our LLM apps with minimal code changes. We simply need to pass the prompt through Rebuff’s detect_injection method, which returns whether an attack was detected. (NOTE: “simply” is doing an awful lot of heavy lifting in that sentence).

While Rebuff mitigates many prompt injection risks, it is not foolproof. Skilled attackers may still find ways to bypass protections. However, Rebuff offers a pretty spiffy first line of defense.

Perplexity API

I pay my AI tax to Perplexity (and, also — indirectly — to Kagi, too, I guess) and they’ve added replit-code-v1.5-3b (ref 1/ref 2) via a new /completions endpoint. They’re mimicking the OpenAI API, but for those not familiar with that, this endpoint generates text to complete a non-conversational prompt. They’ve had a /chat/completions endpoint for a few weeks — now.

Find out more at the API docs.

FIN

In unrelated — and, arguably, far more important — news: those looking to provide some practical help to folks in need during this time of crisis and conflict in the Middle East, CharityWatch has a list of “legitimate, efficient, and accountable charities involved in efforts to aid and assist the people of Israel – Palestine during active conflict in the region”. ☮️

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.