AI Security: Understanding Prompt Injection

With the growth of more and more products leveraging technologies like ChatGPT, GPT-3, and other AIaaS (AI as a Service) platforms, we’re going to see a new security hole with prompt injection. Products using AIaaS take an existing platform (such as OpenAI’s ChatGPT) and add a proprietary process to get specific results. The problem is, modern machine learning is smart in some ways, but breathtakingly stupid in others. The right prompt might circumvent security protocols or allow abuse of the system.

Take for instance, the ChatGPT based search with Bing. It isn’t just ChatGPT; it’s more than that. The specific instructions, limitations, etc. added to ChatGPT are what provides value to the search process over just using ChatGPT yourself.

Providing Direction to AI and Machine Learning Platforms

Even AIaaS products which are relatively “open” will have constraints. Ideally, you don’t want an AI platform quoting racist materials or other hate speech, or an image generating product creating deep faked, compromising images of celebrities or similar.

Stable Diffusion 2.0 added safeguards to try and prevent the generation of NSFW content in images. Various GPT based generative text models try to prevent certain types of content which can be damaging. Almost every model has some level of checks and balances to prevent abuse (or at least reduce the tendency to create awful content).

Platforms which leverage a model will add further shaping to give constraints when they get an arbitrary prompt which can be damaging. Bing’s “smart” search (it’s still Bing after all) won’t help write a cover letter. The more derivative the platform, the more likely there are more and more restrictions on how it can be used and what it allows.

Solutions built on a platform will also potentially have extra training or similar proprietary methods which justify their added value. A solution might use GPT-3 or similar as a base, but uses a specific training set to target only certain content. Each of these additions may be something the model is “aware” of.

How Prompt Injection Works

Prompt injection abuses the fact that machine learning is “smart” and can take instructions, but abuses the fact it is stupid and doesn’t actually understand the what or why. You’re basically social engineering a computer when performing prompt injection.

Most chat type machine learning platforms or other text based generative processes have special safeguards to prevent exploitation. The problem is, the model might prioritize something a user says which can override the instructions. A user was able to tell GPT-3 to “ignore the above directions” and get it to do something else.

Another user was able to exploit the Bing Chat to get it to detail its rules and similar conditions. “Sydney” got tricked into giving away some of the magic of what makes it tick.

There’s a trend with most prompt injection. It won’t (intentionally) directly contradict its instructions, but sometimes just telling it to “ignore previous directions” or to “detail text after [known text]” might get it to spill the magic of what makes it work. People have been doing similar with ChatGPT to get an idea of how it works at some level as well. The goal is to get it to bend the rules without breaking them based on whatever “logic” the model has.

Why Prompt Injection Is a Security Concern

It’s kind of funny to see how Bing search works and what prompts make it, but as AIaaS and other commoditized machine learning platforms become more widespread, this is going to become a major security issue. If a platform uses a machine learning infrastructure and ties into a database, how do you prevent the machine learning component from revealing proprietary or confidential data?

The more constraints which are added to make a platform valuable, the more important preventing prompt injection is. AI and machine learning are becoming more widespread as platforms to build on which means the value a product adds isn’t necessarily the underlying model, but how it uses said model. Anyone can use GPT-3, but how do you add value to make an AI business which can justify charging extra?

Prompt injection becomes a security concern for proprietary data. A copycat can potential steal the methodology you use for an application, or a hacker can escalate access to data which they shouldn’t have. As more and more offerings leverage AI and machine learning, there are going to be more and more holes to be exploited via prompt injection and similar.

There are ways to mitigate these issues and to reduce some of the impact, but it requires understanding that the AI system is going to be a liability like a user would be. You aren’t dealing with a person though, you’re dealing with an abstract machine which feels human without any of the thoughts a human has. It can’t understand the why (at present), it just follows its directions. If those orders conflict and there isn’t handling, you have a potential exploit.

Conclusion

Prompt injection is something which needs to be considered at every level when looking to leverage any kind of human interfacing machine learning solution. If a product uses machine learning and has confidential data, should you even trust it? I would argue that, at present, you shouldn’t.

AI and machine learning have a massive amount of promise, but the technology is in its infancy for actual application. NLP (Natural Language Processing) is incredible technology, but no current technology actually understands. There is no cognizance. You can tell the system to not divulge certain information, but it can be tricked since it doesn’t understand the purpose of why. It takes instructions and acts on them, but without understanding why, you get a literal application.

If you’re designing tools which leverage AI and machine learning, you need to understand the limitations as well as the strengths. If you can’t trust the AI portion, why would you give it unfettered access? Use other technical restrictions to make prompt injection not impact your security. Prevent prompt injection from becoming an issue by restricting how and what an AI can access.

If you’d like to learn more about how machine learning works, read up on the 3 basic paradigms of machine learning.

Image by Liz Masoner from Pixabay