Skip to main content
Governed pull request evidence pack diagram showing risk, controls, validation, review, decision, and limits.

Governed PR Evidence Pack

Completed
AI-Native SDLC Engineering Governance Evidence Design Developer Experience Engineering Leadership

The Risk

A trustworthy workflow needs to show its controls, not just its result.

AI-assisted delivery changes the review problem. The output can look complete, confident, and well-structured before anyone has proved that the right risks were checked. In a payments system, that gap is not cosmetic. It can mean a forged webhook, a replayed event, a misleading payout status, or a release owner approving a change without seeing the residual risk.

The evidence pack is the control surface: a structured record that ties each release risk to the proof required to ship.

flowchart LR
  A[Confident AI-assisted PR] --> B{Controls visible?}
  B -->|No| C[Reviewer infers safety]
  B -->|Yes| D[Reviewer inspects evidence]
  C --> E[Hidden release risk]
  D --> F[Auditable release decision]

Value

The value is not a better PR description. The value is changing the release conversation from “does this look right?” to “which risks are controlled, which risks remain, and what evidence supports the decision?”

PainWithout Evidence PackWith Evidence Pack
Reviewers infer risk from scattered commentsSlow, inconsistent reviewRisk, controls, and evidence are explicit
Security asks repeat questions lateRelease delayRequired proof is attached before approval
Release owner cannot see residual riskBinary ship/no-ship judgmentStaged release with rollback triggers
Audit trail is reconstructed after the factExpensive incident reviewDecision record exists at merge time
Agent-assisted work looks confident but opaqueLow trustAgent findings become structured review signals

Six Months In

Operating QuestionWithout This PatternWith This Pattern
Which AI-assisted changes are safe to delegate?Debated case by caseTracked by risk class and evidence completeness
Where do reviews get stuck?AnecdotalVisible missing-evidence categories
Which controls are repeatedly absent?Found late by senior reviewersAggregated across packs
Can release owners trust agent output?Only after manual rereviewOnly when required controls are evidenced
Can incidents be reconstructed?Pull comments, logs, and memoryStart from the release evidence record

Differentiation

Most PR tooling shows checks. This pack shows judgment.

Usual PR SurfaceEvidence Pack Surface
CI passed or failedWhich risk each check controls
Reviewer commentsReview signals classified by severity
Deployment statusRollout stage, rollback trigger, residual risk
Security approvalProof required for that approval
PR descriptionMachine-readable release evidence

The differentiating claim is scale. A senior engineer can build one good PR checklist. A governance system turns every high-risk PR into the same queryable evidence object, so review, release readiness, audit, and agent evaluation all read from the same record.

flowchart LR
  A[PR diff] --> B[Risk classifier]
  B --> C[Required evidence]
  C --> D[Review signals]
  D --> E[Release decision]
  E --> F[Audit-ready record]

System Model

The artifact below is the output. The system value comes from producing that output consistently without asking every team to invent a release packet by hand.

flowchart LR
  P[Policy sources] --> C[Control catalog]
  O[Ownership + service metadata] --> R[Risk classifier]
  D[PR diff] --> R
  R --> E[Evidence requirements]
  CI[CI results] --> EP[Evidence pack]
  RT[Review threads] --> EP
  AG[Agent review findings] --> EP
  E --> EP
  EP --> PR[PR review surface]
  EP --> RD[Release decision]
  EP --> AU[Audit record]
  EP --> EV[Agent evaluation]
system_capabilities:
  classify:
    input: pr_diff + service_metadata + policy_catalog
    output: risk_class + required_controls
  collect:
    input: ci_results + review_threads + agent_findings + rollout_plan
    output: normalized_evidence_record
  enforce: high_risk_pr_requires_complete_required_evidence
  aggregate:
    - recurring_missing_controls
    - review_bottlenecks
    - agent_false_confidence_patterns
    - exception_frequency

Proof Status

ClaimStatusWhat Would Close It
The evidence categories are usefulDemonstrated by representative packRun against multiple real high-risk PRs
The schema can express release judgmentDemonstrated by representative payments exampleValidate against actual review threads and CI output
The system can generate packs repeatedlyProduct hypothesisAutomated generator wired to PR metadata, CI, and review comments
Aggregation creates governance valueProduct hypothesisDashboard of missing controls, exceptions, and agent review gaps
The public artifact is safe to inspectDemonstratedKeep raw identifiers and private implementation detail out of public view

Why These Fields

FieldWhy It ExistsFailure Without It
Risk classSelects the evidence required before reviewSame checklist for low-risk and high-risk work
Changed controlsShows what protection actually changedReview focuses on code shape, not safety impact
Unchanged contractsProtects downstream assumptionsReview misses accidental behavior drift
Validation mapLinks checks to risksCI passes without proving the relevant control
Review signalsCaptures what humans and agents challengedImportant objections disappear into comment history
Release decisionRecords why ship, block, defer, or stage was chosenApproval becomes an undocumented judgment call
Rollback triggersMakes staged release operational”Rollback available” remains vague
Disclosure limitsSeparates proof from sensitive detailPublic evidence overexposes or becomes unusably vague

Adjacent Approaches Ruled Out

ApproachWhy It Fails
Longer PR templateAdds prose but not structured evidence
More CI checksShows pass/fail, not release judgment
Security review labelShows approval, not what proof supported it
Post-merge audit noteToo late to guide the release decision
Agent summary onlyCan repeat confidence without proving controls

Representative Pack

FieldValue
PatternRisk-to-evidence release pack
DomainPayments platform
ChangeWebhook signature rotation for payout events
Risk classHigh integrity, medium availability
Primary failure modeIncorrect payout status from unauthenticated or stale webhook
Release decisionShip behind staged rollout
Public postureSanitized, representative

System Context

flowchart LR
  PSP[Payment Provider] -->|signed payout webhook| API[Webhook API]
  API --> V[Signature Verifier]
  V --> Q[Payout Event Queue]
  Q --> W[Worker]
  W --> DB[(Ledger Status)]
  DB --> OPS[Operations View]
  V --> AUD[(Audit Log)]

Change Record

change:
  summary: Rotate payout webhook signature verification from static shared secret to versioned key set.
  added:
    - key_id_header_validation
    - dual_key_verification_window
    - timestamp_tolerance_check
    - replay_nonce_store
    - audit_event_for_rejected_webhooks
  changed:
    - payout_webhook_handler
    - provider_webhook_config
    - payout_event_validation_tests
  not_changed:
    - payout_execution_logic
    - ledger_write_contract
    - operations_status_schema
flowchart TD
  A[Incoming webhook] --> B{Known key id?}
  B -->|No| X[Reject + audit]
  B -->|Yes| C{Timestamp valid?}
  C -->|No| X
  C -->|Yes| D{Nonce unused?}
  D -->|No| X
  D -->|Yes| E{Signature valid?}
  E -->|No| X
  E -->|Yes| F[Accept payout event]
  F --> G[Queue processing]

Governance Model

Control AreaRequirementEvidence
IntegrityReject forged payout webhooksNegative signature tests
Replay protectionReject duplicate webhook deliveries outside allowed retry modelNonce-store tests
AvailabilityPreserve provider retry compatibility during rotation windowDual-key rollout plan
AuditabilityRecord rejected webhook reason without sensitive payload leakageAudit event schema
RollbackRestore previous verification key without code redeployVersioned key config
Human approvalSecurity-sensitive payout path requires explicit approvalRelease decision record

Risk Register

RiskSeverityControlResidual Status
Forged webhook marks payout completeHighSignature + key id verificationControlled
Replay changes payout status twiceHighNonce store + idempotent workerControlled
Clock skew rejects valid provider eventsMediumTimestamp tolerance + alertAccepted
Key rotation breaks live provider callbacksMediumDual-key window + staged rolloutControlled
Logs expose sensitive payloadMediumRedacted audit schemaControlled

Evidence Matrix

Evidence ClassArtifactResult
Unit testssignature verifier accepts valid current keyPass
Unit testssignature verifier rejects unknown key idPass
Unit testsverifier rejects stale timestampPass
Unit testsreplay nonce cannot be reusedPass
Integration testsprovider retry remains idempotentPass
Contract testsaccepted event schema unchangedPass
Static reviewno secrets written to logsPass
Manual reviewrollout and rollback plan inspectedPass

Review Timeline

sequenceDiagram
  participant Dev as Author
  participant Agent as Review Agent
  participant CI as CI
  participant Sec as Security Reviewer
  participant Rel as Release Owner

  Dev->>CI: Open PR
  CI-->>Dev: Unit + integration pass
  Agent-->>Dev: Flag missing replay evidence
  Dev->>CI: Add nonce reuse tests
  CI-->>Dev: Replay tests pass
  Sec-->>Dev: Ask for log redaction proof
  Dev->>Sec: Add audit schema evidence
  Rel-->>Dev: Approve staged rollout
Review SignalSeverityResponse
Replay test missingBlockerAdded nonce reuse tests
Redaction proof unclearBlockerAdded audit event schema check
Rollback path too implicitRequired clarificationAdded key config rollback step
Provider retry behaviorQuestionLinked to idempotency evidence

Release Decision

release_decision:
  decision: approve_staged_rollout
  approvers:
    engineering_owner: approved
    security_reviewer: approved
    release_owner: approved
  required_conditions:
    - dual_key_window_enabled
    - rejected_webhook_alert_enabled
    - rollback_key_available
    - audit_log_redaction_verified
  rollout:
    stage_1: internal_provider_test_endpoint
    stage_2: five_percent_live_callbacks
    stage_3: full_provider_traffic
  rollback:
    trigger:
      - rejected_webhook_rate_above_threshold
      - payout_status_lag_above_threshold
      - provider_retry_spike
    action: restore_previous_key_config
stateDiagram-v2
  [*] --> ReadyForStaging
  ReadyForStaging --> Stage1: controls pass
  Stage1 --> Stage2: no alert breach
  Stage2 --> FullRollout: no alert breach
  Stage1 --> Rollback: alert breach
  Stage2 --> Rollback: alert breach
  FullRollout --> Monitor
  Rollback --> Monitor

Decision log: shipping without replay protection, audit evidence, or staged rollout was rejected. The accepted path used temporary dual-key support, staged rollout, and rollback triggers; an indefinite dual-key window was rejected because it would expand the attack window.

Pack Object

{
  "id": "sterlingpay-payout-webhook-signature-rotation-v1",
  "domain": "payments",
  "risk_class": "high_integrity_medium_availability",
  "change": {
    "surface": "payout_webhook_ingestion",
    "control_upgrade": "versioned_signature_verification",
    "unchanged_contracts": [
      "payout execution",
      "ledger status schema",
      "operations status view"
    ]
  },
  "governance": {
    "approval_required": true,
    "rollback_required": true,
    "audit_record_required": true,
    "residual_risks": [
      "provider clock skew within tolerance window"
    ]
  },
  "validation": {
    "unit": "pass",
    "integration": "pass",
    "contract": "pass",
    "manual_release_review": "pass"
  },
  "release": {
    "decision": "approve_staged_rollout",
    "rollback_available": true
  }
}

Presentation Limits

LimitHandling
Organization labelSterlingPay example
Raw PR URLWithheld
Repository and service namesWithheld
Provider identifiersFictionalized
Secrets and key materialNever shown
Production metricsRepresented as thresholds only