Overview
ISMS-P Assist is a reference and case study on how to engineer an LLM agent harness for a domain where wrong answers are expensive.
The vehicle is ISMS-P (Korea’s Information Security & Privacy Management System certification, 102 controls). Claude Code reviews code, infrastructure, and documents through a domain experts → interviewer → auditor multi-role pipeline, and outputs the results as a single self-contained HTML report with zero external dependencies.
The value is not the ISMS-P tool itself, but the reusable harness patterns behind it. Evidence router, deterministic/LLM split, agents-that-build-agents, information-boundary guardrail, abstention-first, and more — 10 patterns cataloged in docs/PATTERNS.en.md.
⚠️ This is an ISMS-P self-assessment / harness reference. It does not guarantee KISA’s official certification results. For legal judgments, consult KISA’s official guide and the source regulations.
Architecture
Rather than “finish everything with code,” each criterion is closed out with its strongest evidence — a 4-tier evidence router sits at the center.
flowchart TD
U([User]) -->|service profile| SETUP[isms-setup: intake]
SETUP -->|project.json| GEN[make_experts.py: generate experts]
GEN -. generates .-> EXPERTS
ROUTER[router.py: 4-tier routing<br/>102 criteria, A/B/C/D]
subgraph EXPERTS[Domain experts — generated, run in parallel]
BE[BE expert]
DO[DevOps expert]
FE[FE expert]
MOB[iOS / AOS expert]
end
ROUTER -->|assigns criteria| EXPERTS
BE -. uses .-> SCAN[scan.py: deterministic rules]
EXPERTS -->|evidence + open questions| STORE[(evidence_store<br/>versioned, append-only)]
EXPERTS -->|ambiguous / non-code| INT[Interviewer<br/>asks user, demands artifacts + sign-off]
INT --> U
INT --> STORE
STORE --> AUD[Auditor: judge from evidence only]
AUD --> REP[build_report.py: self-contained HTML]
GUARD{{Guardrail: NO external research on the audit target}}
GUARD -. constrains .-> EXPERTS
GUARD -. constrains .-> AUD
classDef det fill:#e7f0fd,stroke:#1a56c4;
classDef human fill:#fff4cc,stroke:#8a6d00;
classDef guard fill:#fce8e6,stroke:#b3261e;
class SCAN,ROUTER,STORE,REP det;
class INT,U human;
class GUARD guard;
Blue = deterministic (code/rules) · Yellow = human (interview) · Red = guardrail.
4-Tier Evidence Router
Each criterion is routed to its strongest source of truth (measured, router.py):
| Tier | Evidence Source | Criteria | Tools |
|---|---|---|---|
| A Code/IaC | Static analysis | 35 | scan.py · Checkov · OPA |
| B Runtime | Read-only cloud API | 15 | Prowler (KISA-ISMS-P pack) · Steampipe |
| C Op records | Logs · registers · history upload | 9 | evidence.py |
| D Interview | Conversational Q + evidence + sign-off | 43 | isms-interview · interview.py |
→ Automated (A+B) 49% · Interview (D) 42%. The interviewer is half the product. Only half can be finished by code; the other half requires evidence from a human — and the key is to not hide this, but make it explicit in the architecture.
Reusable Harness Patterns
What remains after you strip away ISMS-P — a catalog of high-stakes-domain agent design patterns (docs/PATTERNS.en.md):
| Pattern | Core | Implementation |
|---|---|---|
| Evidence router | Route work to its strongest source of truth (deterministic / runtime / records / human) | router.py |
| Deterministic/LLM split | Rule-decidable cases in code, judgment-needed cases via LLM | scan.py |
| Agents that build agents | Intake profile → generate target-specific expert skills | make_experts.py |
| Information-boundary guardrail | No external research about the audit target (prevents contamination/hallucination) | docs/AGENTS.md |
| Abstention-first | When unsure, needs-review. Reliability via architecture, not prompting | docs/RELIABILITY.md |
| Versioned evidence store | Append-only, source and timestamp traceable | store.py |
| Self-verifying harness | Scan results carry confidence + false-positive reasoning → second pass removes false positives | scan.py |
Verification (Hybrid)
Each criterion has a classified verification method (knowledge/criteria.json):
- Code (17): Inspect source/config directly. e.g. password hashing, encryption, logging, access control, masking, dependency vulnerabilities
- Mixed (39): Code + documentary evidence together
- Document (46): Administrative criteria invisible to code → collected via evidence templates. e.g. executive involvement, training, risk assessment
The code scanner detects weak hashes, hardcoded keys, SQL injection, plaintext logging, vulnerable dependencies, etc. at file:line, attaching confidence, surrounding context, and false-positive reasoning to each candidate. These are defect candidates only — the final verdict comes from Claude/a reviewer reading the code.
Claude Code Skills
| Skill | Description |
|---|---|
isms-setup | Service intake interview → generate target-specific expert skills |
isms-collect | Orchestration: domain experts → evidence store → interview → audit |
isms-review | Check compliance against 102 criteria (code inspection + documentary evidence) |
isms-audit | Mock audit from a certified auditor’s perspective → defect report |
isms-interview | Conversationally collect evidence for code-invisible criteria (attachment + sign-off) |
isms-qa | Answer questions on criteria, evidence, structure |
Web Report
Review/audit results are output as a single HTML file (opens with no internet). It has two tabs:
- Review/Audit Results tab — verdicts, compliance scores, per-domain progress bars, status color badges, filter/search, print→PDF. Click a criterion code to jump to and highlight it in the All Criteria tab.
- All Criteria tab — the 102 criteria source (summary, checklist items, evidence, verification method) + data date and KISA official source links. You can cross-check the tool’s encoded interpretation against the source in the other tab.
Honest Limits
The harness skeleton works (smoke CI). Golden-set regression, source grounding (RAG), and runtime checks are at the design/partial stage.
For a high-stakes-domain tool, the most important thing is to state precisely “what works and what doesn’t yet.” The non-inflated status is documented in PATTERNS.en.md.
Tech Stack
| Category | Tech | Why |
|---|---|---|
| Language | Python 3 (zero deps) | Runs anywhere, no extra install |
| Agent Runtime | Claude Code (skills · subagents) | Multi-role orchestration |
| Static analysis | scan.py · Checkov · OPA | Deterministic code/IaC checks |
| Runtime checks | Prowler (KISA-ISMS-P pack) · Steampipe | Read-only cloud API |
| Report | Single HTML (zero external deps) | Opens anywhere, print→PDF |
| License | Apache-2.0 | Open source |
Data Source
- KISA “Information Security & Privacy Management System Certification Criteria Guide” (2023.11 revision)
- Domain 1: 16 / Domain 2: 64 / Domain 3: 22 = 102 criteria
- Source metadata & links:
meta.sourcesinknowledge/criteria.json