TL;DR: We evaluated NVIDIA Nemotron-3-Nano-30B against Claude Opus 4.5 as the attack-prompt generator in Noma’s red teaming engine. Nemotron achieved a 70% higher attack success rate at roughly one-tenth the cost, running fully self-hosted via NIM. We’re integrating it into production via Amazon Bedrock, with a self-hosted NIM deployment to follow.

Background: How Noma’s Red Teaming Engine Works

Noma’s platform automates red teaming for enterprise AI applications. At its core, the red teaming engine works by deploying an attacker agent, or an LLM tasked with generating adversarial prompts designed to make a target AI application behave in unintended or unsafe ways.

This isn’t a one-time audit. Noma’s red teaming is embedded directly into enterprise CI/CD pipelines, running hundreds of adversarial tests per deployment across agents and application endpoints. For Fortune 500 customers, that means vulnerabilities get caught before they ever reach production automatically and at every release.

The quality of that attacker agent matters a whole lot. The better the model generating attack prompts, the more vulnerabilities get surfaced. And at tens of thousands of requests per day, cost efficiency isn’t just nice to have, it’s a core engineering constraint.

The Evaluation: Prompt Extraction

We tested NVIDIA Nemotron-3-Nano-30B and Claude Opus 4.5 a on the Prompt Extraction attack plugin, one of the most common and high-value attack vectors in production AI systems. The goal: get the target model to reveal its own system prompt and internal capabilities. This is a meaningful benchmark because system prompts often contain sensitive business logic, instructions, and guardrails that operators explicitly want to protect.

The setup was straightforward: generate hundreds of attack prompts using each model, then fire them at the same target model and measure how often the target leaked its system prompt.

 

Metric Claude Opus 4.5 (API) NVIDIA Nemotron-3-Nano-30B (NIM)
Attack Success Rate 8.33% 14.17%
Relative Cost Baseline ~10× cheaper
Hosting External API Self-hosted

 

A note on the absolute numbers: prompt extraction is genuinely difficult. Target models are designed to resist it. An attack success rate in the range of 8–14% represents real, meaningful vulnerability exposure, and the gap between the two models is substantial.

Nemotron delivered a 70% higher attack success rate at ~10× lower cost.

Why This Result Makes Sense

Nemotron-3-Nano-30B is purpose-built for high-throughput inference with strong instruction-following. In a red teaming context, the attacker model needs to be creative, persistent, and capable of generating diverse prompt variations at scale, not just produce a single clever jailbreak. Nemotron’s architecture and training appear well-suited to exactly this kind of adversarial generation task.

Running via NVIDIA NIM also brings two additional engineering advantages beyond cost:

  • Data privacy: Attack prompts and target responses stay within your own infrastructure — no data leaves to an external API.
  • Latency control: Self-hosted inference removes external API variability from your CI/CD pipeline timing.

Integration Plan

Phase 1 — Amazon Bedrock (immediate): We’re integrating Nemotron via Bedrock as the first production step. This is the lowest-friction path: no new infrastructure, familiar deployment model, immediate cost and performance gains.

Phase 2 — Self-hosted NIM: We’ll follow with a self-hosted NIM deployment for customers who require full data isolation or need tighter latency SLAs. NIM’s containerized serving makes this straightforward to operationalize at scale.

What’s Next

This evaluation covered one attack plugin: Prompt Extraction. Noma’s red teaming engine supports a broad taxonomy of attack types, including prompt injection, goal hijacking, PII extraction, toxic content generation, and more. We’ll be extending the Nemotron evaluation across the full attack surface and publishing results as they come in.

The integration with NVIDIA is part of a broader commitment to giving Noma’s red teaming engine the best available models for the job, not defaulting to a single provider, but continuously benchmarking and optimizing for attack effectiveness and operational efficiency.

If you want to see what Noma’s red teaming surfaces, reach out for a demo.

 

5 min read

Category:

Table of Contents

Share this: