Governed PR Evidence Pack

The Risk

A trustworthy workflow needs to show its controls, not just its result.

AI-assisted delivery changes the review problem. The output can look complete, confident, and well-structured before anyone has proved that the right risks were checked. In a payments system, that gap is not cosmetic. It can mean a forged webhook, a replayed event, a misleading payout status, or a release owner approving a change without seeing the residual risk.

The evidence pack is the control surface: a structured record that ties each release risk to the proof required to ship.

flowchart LR
  A[Confident AI-assisted PR] --> B{Controls visible?}
  B -->|No| C[Reviewer infers safety]
  B -->|Yes| D[Reviewer inspects evidence]
  C --> E[Hidden release risk]
  D --> F[Auditable release decision]

Value

The value is not a better PR description. The value is changing the release conversation from “does this look right?” to “which risks are controlled, which risks remain, and what evidence supports the decision?”

Pain	Without Evidence Pack	With Evidence Pack
Reviewers infer risk from scattered comments	Slow, inconsistent review	Risk, controls, and evidence are explicit
Security asks repeat questions late	Release delay	Required proof is attached before approval
Release owner cannot see residual risk	Binary ship/no-ship judgment	Staged release with rollback triggers
Audit trail is reconstructed after the fact	Expensive incident review	Decision record exists at merge time
Agent-assisted work looks confident but opaque	Low trust	Agent findings become structured review signals

Six Months In

Operating Question	Without This Pattern	With This Pattern
Which AI-assisted changes are safe to delegate?	Debated case by case	Tracked by risk class and evidence completeness
Where do reviews get stuck?	Anecdotal	Visible missing-evidence categories
Which controls are repeatedly absent?	Found late by senior reviewers	Aggregated across packs
Can release owners trust agent output?	Only after manual rereview	Only when required controls are evidenced
Can incidents be reconstructed?	Pull comments, logs, and memory	Start from the release evidence record

Differentiation

Most PR tooling shows checks. This pack shows judgment.

Usual PR Surface	Evidence Pack Surface
CI passed or failed	Which risk each check controls
Reviewer comments	Review signals classified by severity
Deployment status	Rollout stage, rollback trigger, residual risk
Security approval	Proof required for that approval
PR description	Machine-readable release evidence

The differentiating claim is scale. A senior engineer can build one good PR checklist. A governance system turns every high-risk PR into the same queryable evidence object, so review, release readiness, audit, and agent evaluation all read from the same record.

flowchart LR
  A[PR diff] --> B[Risk classifier]
  B --> C[Required evidence]
  C --> D[Review signals]
  D --> E[Release decision]
  E --> F[Audit-ready record]

System Model

The artifact below is the output. The system value comes from producing that output consistently without asking every team to invent a release packet by hand.

flowchart LR
  P[Policy sources] --> C[Control catalog]
  O[Ownership + service metadata] --> R[Risk classifier]
  D[PR diff] --> R
  R --> E[Evidence requirements]
  CI[CI results] --> EP[Evidence pack]
  RT[Review threads] --> EP
  AG[Agent review findings] --> EP
  E --> EP
  EP --> PR[PR review surface]
  EP --> RD[Release decision]
  EP --> AU[Audit record]
  EP --> EV[Agent evaluation]

system_capabilities:
  classify:
    input: pr_diff + service_metadata + policy_catalog
    output: risk_class + required_controls
  collect:
    input: ci_results + review_threads + agent_findings + rollout_plan
    output: normalized_evidence_record
  enforce: high_risk_pr_requires_complete_required_evidence
  aggregate:
    - recurring_missing_controls
    - review_bottlenecks
    - agent_false_confidence_patterns
    - exception_frequency

Proof Status

Claim	Status	What Would Close It
The evidence categories are useful	Demonstrated by representative pack	Run against multiple real high-risk PRs
The schema can express release judgment	Demonstrated by representative payments example	Validate against actual review threads and CI output
The system can generate packs repeatedly	Product hypothesis	Automated generator wired to PR metadata, CI, and review comments
Aggregation creates governance value	Product hypothesis	Dashboard of missing controls, exceptions, and agent review gaps
The public artifact is safe to inspect	Demonstrated	Keep raw identifiers and private implementation detail out of public view

Why These Fields

Field	Why It Exists	Failure Without It
Risk class	Selects the evidence required before review	Same checklist for low-risk and high-risk work
Changed controls	Shows what protection actually changed	Review focuses on code shape, not safety impact
Unchanged contracts	Protects downstream assumptions	Review misses accidental behavior drift
Validation map	Links checks to risks	CI passes without proving the relevant control
Review signals	Captures what humans and agents challenged	Important objections disappear into comment history
Release decision	Records why ship, block, defer, or stage was chosen	Approval becomes an undocumented judgment call
Rollback triggers	Makes staged release operational	”Rollback available” remains vague
Disclosure limits	Separates proof from sensitive detail	Public evidence overexposes or becomes unusably vague

Adjacent Approaches Ruled Out

Approach	Why It Fails
Longer PR template	Adds prose but not structured evidence
More CI checks	Shows pass/fail, not release judgment
Security review label	Shows approval, not what proof supported it
Post-merge audit note	Too late to guide the release decision
Agent summary only	Can repeat confidence without proving controls

Representative Pack

Field	Value
Pattern	Risk-to-evidence release pack
Domain	Payments platform
Change	Webhook signature rotation for payout events
Risk class	High integrity, medium availability
Primary failure mode	Incorrect payout status from unauthenticated or stale webhook
Release decision	Ship behind staged rollout
Public posture	Sanitized, representative

System Context

flowchart LR
  PSP[Payment Provider] -->|signed payout webhook| API[Webhook API]
  API --> V[Signature Verifier]
  V --> Q[Payout Event Queue]
  Q --> W[Worker]
  W --> DB[(Ledger Status)]
  DB --> OPS[Operations View]
  V --> AUD[(Audit Log)]

Change Record

change:
  summary: Rotate payout webhook signature verification from static shared secret to versioned key set.
  added:
    - key_id_header_validation
    - dual_key_verification_window
    - timestamp_tolerance_check
    - replay_nonce_store
    - audit_event_for_rejected_webhooks
  changed:
    - payout_webhook_handler
    - provider_webhook_config
    - payout_event_validation_tests
  not_changed:
    - payout_execution_logic
    - ledger_write_contract
    - operations_status_schema

flowchart TD
  A[Incoming webhook] --> B{Known key id?}
  B -->|No| X[Reject + audit]
  B -->|Yes| C{Timestamp valid?}
  C -->|No| X
  C -->|Yes| D{Nonce unused?}
  D -->|No| X
  D -->|Yes| E{Signature valid?}
  E -->|No| X
  E -->|Yes| F[Accept payout event]
  F --> G[Queue processing]

Governance Model

Control Area	Requirement	Evidence
Integrity	Reject forged payout webhooks	Negative signature tests
Replay protection	Reject duplicate webhook deliveries outside allowed retry model	Nonce-store tests
Availability	Preserve provider retry compatibility during rotation window	Dual-key rollout plan
Auditability	Record rejected webhook reason without sensitive payload leakage	Audit event schema
Rollback	Restore previous verification key without code redeploy	Versioned key config
Human approval	Security-sensitive payout path requires explicit approval	Release decision record

Risk Register

Risk	Severity	Control	Residual Status
Forged webhook marks payout complete	High	Signature + key id verification	Controlled
Replay changes payout status twice	High	Nonce store + idempotent worker	Controlled
Clock skew rejects valid provider events	Medium	Timestamp tolerance + alert	Accepted
Key rotation breaks live provider callbacks	Medium	Dual-key window + staged rollout	Controlled
Logs expose sensitive payload	Medium	Redacted audit schema	Controlled

Evidence Matrix

Evidence Class	Artifact	Result
Unit tests	signature verifier accepts valid current key	Pass
Unit tests	signature verifier rejects unknown key id	Pass
Unit tests	verifier rejects stale timestamp	Pass
Unit tests	replay nonce cannot be reused	Pass
Integration tests	provider retry remains idempotent	Pass
Contract tests	accepted event schema unchanged	Pass
Static review	no secrets written to logs	Pass
Manual review	rollout and rollback plan inspected	Pass

Review Timeline

sequenceDiagram
  participant Dev as Author
  participant Agent as Review Agent
  participant CI as CI
  participant Sec as Security Reviewer
  participant Rel as Release Owner

  Dev->>CI: Open PR
  CI-->>Dev: Unit + integration pass
  Agent-->>Dev: Flag missing replay evidence
  Dev->>CI: Add nonce reuse tests
  CI-->>Dev: Replay tests pass
  Sec-->>Dev: Ask for log redaction proof
  Dev->>Sec: Add audit schema evidence
  Rel-->>Dev: Approve staged rollout

Review Signal	Severity	Response
Replay test missing	Blocker	Added nonce reuse tests
Redaction proof unclear	Blocker	Added audit event schema check
Rollback path too implicit	Required clarification	Added key config rollback step
Provider retry behavior	Question	Linked to idempotency evidence

Release Decision

release_decision:
  decision: approve_staged_rollout
  approvers:
    engineering_owner: approved
    security_reviewer: approved
    release_owner: approved
  required_conditions:
    - dual_key_window_enabled
    - rejected_webhook_alert_enabled
    - rollback_key_available
    - audit_log_redaction_verified
  rollout:
    stage_1: internal_provider_test_endpoint
    stage_2: five_percent_live_callbacks
    stage_3: full_provider_traffic
  rollback:
    trigger:
      - rejected_webhook_rate_above_threshold
      - payout_status_lag_above_threshold
      - provider_retry_spike
    action: restore_previous_key_config

stateDiagram-v2
  [*] --> ReadyForStaging
  ReadyForStaging --> Stage1: controls pass
  Stage1 --> Stage2: no alert breach
  Stage2 --> FullRollout: no alert breach
  Stage1 --> Rollback: alert breach
  Stage2 --> Rollback: alert breach
  FullRollout --> Monitor
  Rollback --> Monitor

Decision log: shipping without replay protection, audit evidence, or staged rollout was rejected. The accepted path used temporary dual-key support, staged rollout, and rollback triggers; an indefinite dual-key window was rejected because it would expand the attack window.

Pack Object

{
  "id": "sterlingpay-payout-webhook-signature-rotation-v1",
  "domain": "payments",
  "risk_class": "high_integrity_medium_availability",
  "change": {
    "surface": "payout_webhook_ingestion",
    "control_upgrade": "versioned_signature_verification",
    "unchanged_contracts": [
      "payout execution",
      "ledger status schema",
      "operations status view"
    ]
  },
  "governance": {
    "approval_required": true,
    "rollback_required": true,
    "audit_record_required": true,
    "residual_risks": [
      "provider clock skew within tolerance window"
    ]
  },
  "validation": {
    "unit": "pass",
    "integration": "pass",
    "contract": "pass",
    "manual_release_review": "pass"
  },
  "release": {
    "decision": "approve_staged_rollout",
    "rollback_available": true
  }
}

Presentation Limits

Limit	Handling
Organization label	SterlingPay example
Raw PR URL	Withheld
Repository and service names	Withheld
Provider identifiers	Fictionalized
Secrets and key material	Never shown
Production metrics	Represented as thresholds only