A trustworthy workflow needs to show its controls, not just its result.
AI-assisted delivery changes the review problem. The output can look complete, confident, and well-structured before anyone has proved that the right risks were checked. In a payments system, that gap is not cosmetic. It can mean a forged webhook, a replayed event, a misleading payout status, or a release owner approving a change without seeing the residual risk.
The evidence pack is the control surface: a structured record that ties each release risk to the proof required to ship.
flowchart LR
A[Confident AI-assisted PR] --> B{Controls visible?}
B -->|No| C[Reviewer infers safety]
B -->|Yes| D[Reviewer inspects evidence]
C --> E[Hidden release risk]
D --> F[Auditable release decision]
The value is not a better PR description. The value is changing the release conversation from “does this look right?” to “which risks are controlled, which risks remain, and what evidence supports the decision?”
| Pain | Without Evidence Pack | With Evidence Pack |
|---|
| Reviewers infer risk from scattered comments | Slow, inconsistent review | Risk, controls, and evidence are explicit |
| Security asks repeat questions late | Release delay | Required proof is attached before approval |
| Release owner cannot see residual risk | Binary ship/no-ship judgment | Staged release with rollback triggers |
| Audit trail is reconstructed after the fact | Expensive incident review | Decision record exists at merge time |
| Agent-assisted work looks confident but opaque | Low trust | Agent findings become structured review signals |
| Operating Question | Without This Pattern | With This Pattern |
|---|
| Which AI-assisted changes are safe to delegate? | Debated case by case | Tracked by risk class and evidence completeness |
| Where do reviews get stuck? | Anecdotal | Visible missing-evidence categories |
| Which controls are repeatedly absent? | Found late by senior reviewers | Aggregated across packs |
| Can release owners trust agent output? | Only after manual rereview | Only when required controls are evidenced |
| Can incidents be reconstructed? | Pull comments, logs, and memory | Start from the release evidence record |
Most PR tooling shows checks. This pack shows judgment.
| Usual PR Surface | Evidence Pack Surface |
|---|
| CI passed or failed | Which risk each check controls |
| Reviewer comments | Review signals classified by severity |
| Deployment status | Rollout stage, rollback trigger, residual risk |
| Security approval | Proof required for that approval |
| PR description | Machine-readable release evidence |
The differentiating claim is scale. A senior engineer can build one good PR checklist. A governance system turns every high-risk PR into the same queryable evidence object, so review, release readiness, audit, and agent evaluation all read from the same record.
flowchart LR
A[PR diff] --> B[Risk classifier]
B --> C[Required evidence]
C --> D[Review signals]
D --> E[Release decision]
E --> F[Audit-ready record]
The artifact below is the output. The system value comes from producing that output consistently without asking every team to invent a release packet by hand.
flowchart LR
P[Policy sources] --> C[Control catalog]
O[Ownership + service metadata] --> R[Risk classifier]
D[PR diff] --> R
R --> E[Evidence requirements]
CI[CI results] --> EP[Evidence pack]
RT[Review threads] --> EP
AG[Agent review findings] --> EP
E --> EP
EP --> PR[PR review surface]
EP --> RD[Release decision]
EP --> AU[Audit record]
EP --> EV[Agent evaluation]
system_capabilities:
classify:
input: pr_diff + service_metadata + policy_catalog
output: risk_class + required_controls
collect:
input: ci_results + review_threads + agent_findings + rollout_plan
output: normalized_evidence_record
enforce: high_risk_pr_requires_complete_required_evidence
aggregate:
- recurring_missing_controls
- review_bottlenecks
- agent_false_confidence_patterns
- exception_frequency
| Claim | Status | What Would Close It |
|---|
| The evidence categories are useful | Demonstrated by representative pack | Run against multiple real high-risk PRs |
| The schema can express release judgment | Demonstrated by representative payments example | Validate against actual review threads and CI output |
| The system can generate packs repeatedly | Product hypothesis | Automated generator wired to PR metadata, CI, and review comments |
| Aggregation creates governance value | Product hypothesis | Dashboard of missing controls, exceptions, and agent review gaps |
| The public artifact is safe to inspect | Demonstrated | Keep raw identifiers and private implementation detail out of public view |
| Field | Why It Exists | Failure Without It |
|---|
| Risk class | Selects the evidence required before review | Same checklist for low-risk and high-risk work |
| Changed controls | Shows what protection actually changed | Review focuses on code shape, not safety impact |
| Unchanged contracts | Protects downstream assumptions | Review misses accidental behavior drift |
| Validation map | Links checks to risks | CI passes without proving the relevant control |
| Review signals | Captures what humans and agents challenged | Important objections disappear into comment history |
| Release decision | Records why ship, block, defer, or stage was chosen | Approval becomes an undocumented judgment call |
| Rollback triggers | Makes staged release operational | ”Rollback available” remains vague |
| Disclosure limits | Separates proof from sensitive detail | Public evidence overexposes or becomes unusably vague |
| Approach | Why It Fails |
|---|
| Longer PR template | Adds prose but not structured evidence |
| More CI checks | Shows pass/fail, not release judgment |
| Security review label | Shows approval, not what proof supported it |
| Post-merge audit note | Too late to guide the release decision |
| Agent summary only | Can repeat confidence without proving controls |
| Field | Value |
|---|
| Pattern | Risk-to-evidence release pack |
| Domain | Payments platform |
| Change | Webhook signature rotation for payout events |
| Risk class | High integrity, medium availability |
| Primary failure mode | Incorrect payout status from unauthenticated or stale webhook |
| Release decision | Ship behind staged rollout |
| Public posture | Sanitized, representative |
flowchart LR
PSP[Payment Provider] -->|signed payout webhook| API[Webhook API]
API --> V[Signature Verifier]
V --> Q[Payout Event Queue]
Q --> W[Worker]
W --> DB[(Ledger Status)]
DB --> OPS[Operations View]
V --> AUD[(Audit Log)]
change:
summary: Rotate payout webhook signature verification from static shared secret to versioned key set.
added:
- key_id_header_validation
- dual_key_verification_window
- timestamp_tolerance_check
- replay_nonce_store
- audit_event_for_rejected_webhooks
changed:
- payout_webhook_handler
- provider_webhook_config
- payout_event_validation_tests
not_changed:
- payout_execution_logic
- ledger_write_contract
- operations_status_schema
flowchart TD
A[Incoming webhook] --> B{Known key id?}
B -->|No| X[Reject + audit]
B -->|Yes| C{Timestamp valid?}
C -->|No| X
C -->|Yes| D{Nonce unused?}
D -->|No| X
D -->|Yes| E{Signature valid?}
E -->|No| X
E -->|Yes| F[Accept payout event]
F --> G[Queue processing]
| Control Area | Requirement | Evidence |
|---|
| Integrity | Reject forged payout webhooks | Negative signature tests |
| Replay protection | Reject duplicate webhook deliveries outside allowed retry model | Nonce-store tests |
| Availability | Preserve provider retry compatibility during rotation window | Dual-key rollout plan |
| Auditability | Record rejected webhook reason without sensitive payload leakage | Audit event schema |
| Rollback | Restore previous verification key without code redeploy | Versioned key config |
| Human approval | Security-sensitive payout path requires explicit approval | Release decision record |
| Risk | Severity | Control | Residual Status |
|---|
| Forged webhook marks payout complete | High | Signature + key id verification | Controlled |
| Replay changes payout status twice | High | Nonce store + idempotent worker | Controlled |
| Clock skew rejects valid provider events | Medium | Timestamp tolerance + alert | Accepted |
| Key rotation breaks live provider callbacks | Medium | Dual-key window + staged rollout | Controlled |
| Logs expose sensitive payload | Medium | Redacted audit schema | Controlled |
| Evidence Class | Artifact | Result |
|---|
| Unit tests | signature verifier accepts valid current key | Pass |
| Unit tests | signature verifier rejects unknown key id | Pass |
| Unit tests | verifier rejects stale timestamp | Pass |
| Unit tests | replay nonce cannot be reused | Pass |
| Integration tests | provider retry remains idempotent | Pass |
| Contract tests | accepted event schema unchanged | Pass |
| Static review | no secrets written to logs | Pass |
| Manual review | rollout and rollback plan inspected | Pass |
sequenceDiagram
participant Dev as Author
participant Agent as Review Agent
participant CI as CI
participant Sec as Security Reviewer
participant Rel as Release Owner
Dev->>CI: Open PR
CI-->>Dev: Unit + integration pass
Agent-->>Dev: Flag missing replay evidence
Dev->>CI: Add nonce reuse tests
CI-->>Dev: Replay tests pass
Sec-->>Dev: Ask for log redaction proof
Dev->>Sec: Add audit schema evidence
Rel-->>Dev: Approve staged rollout
| Review Signal | Severity | Response |
|---|
| Replay test missing | Blocker | Added nonce reuse tests |
| Redaction proof unclear | Blocker | Added audit event schema check |
| Rollback path too implicit | Required clarification | Added key config rollback step |
| Provider retry behavior | Question | Linked to idempotency evidence |
release_decision:
decision: approve_staged_rollout
approvers:
engineering_owner: approved
security_reviewer: approved
release_owner: approved
required_conditions:
- dual_key_window_enabled
- rejected_webhook_alert_enabled
- rollback_key_available
- audit_log_redaction_verified
rollout:
stage_1: internal_provider_test_endpoint
stage_2: five_percent_live_callbacks
stage_3: full_provider_traffic
rollback:
trigger:
- rejected_webhook_rate_above_threshold
- payout_status_lag_above_threshold
- provider_retry_spike
action: restore_previous_key_config
stateDiagram-v2
[*] --> ReadyForStaging
ReadyForStaging --> Stage1: controls pass
Stage1 --> Stage2: no alert breach
Stage2 --> FullRollout: no alert breach
Stage1 --> Rollback: alert breach
Stage2 --> Rollback: alert breach
FullRollout --> Monitor
Rollback --> Monitor
Decision log: shipping without replay protection, audit evidence, or staged rollout was rejected. The accepted path used temporary dual-key support, staged rollout, and rollback triggers; an indefinite dual-key window was rejected because it would expand the attack window.
{
"id": "sterlingpay-payout-webhook-signature-rotation-v1",
"domain": "payments",
"risk_class": "high_integrity_medium_availability",
"change": {
"surface": "payout_webhook_ingestion",
"control_upgrade": "versioned_signature_verification",
"unchanged_contracts": [
"payout execution",
"ledger status schema",
"operations status view"
]
},
"governance": {
"approval_required": true,
"rollback_required": true,
"audit_record_required": true,
"residual_risks": [
"provider clock skew within tolerance window"
]
},
"validation": {
"unit": "pass",
"integration": "pass",
"contract": "pass",
"manual_release_review": "pass"
},
"release": {
"decision": "approve_staged_rollout",
"rollback_available": true
}
}
| Limit | Handling |
|---|
| Organization label | SterlingPay example |
| Raw PR URL | Withheld |
| Repository and service names | Withheld |
| Provider identifiers | Fictionalized |
| Secrets and key material | Never shown |
| Production metrics | Represented as thresholds only |