Logo

Prompt Injection – Practical Mitigations

By Rajiv Bahl / November 6, 2024

Unveiling the future

Our experts analyze the latest tech trends and industry breakthroughs.

Prompt Injection

Mitigations

Prompt Injection occurs when an attacker manipulates a large language model (LLM) through crafted inputs, causing the LLM to unknowingly execute the attacker's intentions. This can be done directly by "jailbreaking" the system prompt or indirectly through manipulated external inputs, potentially leading to data exfiltration, social engineering, and other issues. (Ref: OWASP Top 10 for LLM Applications) Prompt injection consistently has been at the top of the vulnerability listed in OWASP Top 10 for LLM applications. It finds its mention in MITRE ATLAS under technique (AML.T0051). Instead of focusing on the specifics of the attack, we focus on practical implementable mitigation steps leading to security by design, this keeping defender's in viewpoint.

Prompt Injection Mitigations

GenAI attacks is a relatively new and evolving area, the challenges are compounded by the natural language nature of the prompts. Secure by design principles can help mitigate the risks associated with prompt injection. Some of the mitigations which can be looked at are as follows:

1. Input Filtering -Filter inputs for keywords or indicators of attempted prompt injection. This is similar to input filtering for SQL injection.
2. Expected Input Token CountFor applications where the input is restricted to a defined set of keywords in the application, we should monitor the expected input token count and compare it with the actual input token count. When the deviation is very significant, checks should be done and prevent the model response.
3. Avoid short prompts -Short prompts are more guessable and should be avoided.
4. Use unique and complex prompts -Structure the prompts to make them less guessable. When possible, include domain-specific contents to make it harder to do prompt injection.
5. Use labels and tags to segregate uploaded content from user prompts -Separate and denote where untrusted content is being used using tags or labels to limit their influence on user prompts.
6. Prompt chaining -Feed output from one model to another model only after verification for prompt injection. This limits the attacker's visibility of the impact of their injected prompt making it more difficult to achieve malicious objectives.
7. Least privilege access -Restrict the LLM access to only the least level of access necessary for achieving its objectives.
8. Segment the LLM infrastructure -Treat the LLM as a vulnerable untrusted domain. Segment the LLM infra to restrict lateral movement to the organization’s crown jewels.
9. Keep human in loop -Maintain final human user control on decision-making processes to avoid excessive agency issues.
10. Model inbuilt guardrails and moderation services -Use guardrails and moderation services when the foundation model presents these features to avoid undesired behaviors.
11. Red Teaming LLMs -Conduct red teaming specific to LLMs to test for vulnerabilities like bias, offensive behavior, prompt injections, etc. Engage consultants with expertise in LLM testing for critical public-facing applications leveraging the LLMs.

As AI continues to evolve and LLMs are rapidly adopted, the attack surface for the Organizations increases. Prompt injection represent a key risk introduced by this technological advancement. However, these risks can be mitigated to an extend by following secure-by-design principles as mentioned.