How a scan works
A full scan is a genuine agentic penetration test. It doesn't run a fixed checklist — it understands your product and then attacks it, the way a skilled human tester would.
1. Map the app
Pentu launches a real Chromium browser and crawls the public site. It records every page, form, and cookie — and, crucially, the API calls the app makes, so it learns your real backend surface, not just the marketing pages.
2. Sign up like a real user
This is the part scanners can't do. Pentu drives your actual signup and onboarding — multi-step wizards, "choose your role" modals, "add your first website" flows — with an agentic loop that reads the DOM, decides the next action, and does it. No hardcoding; it works on arbitrary apps.
When your app sends a verification link, magic link, or OTP, Pentu reads its own inbox (a controlled mailbox on our infrastructure) to complete the flow. It signs up two separate accounts — so it can later test whether one tenant can reach another's data.
3. Explore the product, logged in
Now inside, it clicks through the real app and captures the authenticated surface — the endpoints, the object IDs, the settings, the API keys. This is the surface that actually matters, and it's invisible from the outside.
4. Understand what your product is
Pentu reasons over everything it has seen and writes a deep, specific description of your product: what it does, who uses it, and — most importantly — where the sensitive actions are (billing, permissions, data exports, API keys). A tester who understands the business finds the bugs that matter.
5. Test relentlessly (explore → hypothesize → test → repeat)
The tester doesn't plan everything up front. It starts from a handful of leads, then works like a human: poke at something, watch the response, form a hypothesis, test it, learn, and follow the thread. It keeps notes so it can chain findings ("here's account A's object ID → now try it as account B").
Between short bursts, a principal-tester reviewer reads the entire transcript, names what was missed or given up on too easily, and sends the tester back with specific new probes. It only stops when the reviewer is genuinely satisfied — so depth scales with how interesting your app turns out to be.
And it proves things. When it finds a privilege-escalation vector, it doesn't stop at "the field was accepted" — it escalates the account and then performs an action only an admin should be able to do, to demonstrate real impact.
6. Verify (low false positives)
Every candidate finding carries exact reproduction steps. Before anything reaches the report, Pentu replays those steps deterministically. If it doesn't reproduce, it's discarded. Findings confirmed by a tool (like a template match or a time-based SQL probe) pass straight through — the tool is the proof.
7. Report
Finally it scores the result, writes the findings with business impact, reproduction, and a ready-to-paste fix prompt, and delivers an interactive report, a PDF, a slideshow, an agent-friendly Markdown version, and an email — all on completion. See Reports & score.
The models behind it
- Reasoning (Opus 4.8) — the deep thinking: understanding the product, planning, the senior-critic review, and writing up findings the way a human presents them.
- Executor (Sonnet 5) — the relentless tool loop that actually issues the requests and chains findings.
- Grunt (Haiku 4.5) — cheap, high-volume reading tasks.
Prompt caching keeps a full scan around $1.50–2 in AI cost, and every scan reports its exact cost, broken down by model.
Built to never hang
Long-running browser work is bounded at every phase, and a scan-level watchdog guarantees a scan can't get stuck forever — if anything stalls, the scan fails cleanly with a diagnostic naming the exact phase, and a self-healing reaper auto-clears any hung job. You can also cancel a running scan at any time.