Operational Hazard Analysis · OHA

The first SRE platform
with proactive hazard
analysis built in.

Monitoring tells you what broke. Catalogs tell you what exists. Neither tells you what is unsafe right now — or what regulators expect you to have already identified.

OHA operationalizes STPA (System-Theoretic Process Analysis) into a workflow engineers actually use — AI-assisted, audit-ready, and continuously refreshed from your incident record.

Request OHA demo See the hazard catalogue

flagship · differentiator STPA-derived

"Catalog tools tell you what services exist. Observability tells you what broke. OHA tells you what's unsafe right now — and gives the CISO the documentation regulators ask for."

— Mithris platform thesis

What it is

Proactive safety engineering for production software.

Before 2024, applying STPA to a 200-app portfolio required a team of safety engineers and 6–12 months. Mithris compresses that into a workflow an SRE can run on an app in 15 minutes — and the AI improves with every incident PIR it ingests.

Identify: 4-step wizard captures hazards, unsafe control actions, and business losses per application.
Score: HazardRisk = L × I × (1 − avg_eff). Updates live as constraints are added.
Mitigate: Missing safety constraints auto-create gaps. Effective constraints auto-close them. Bi-directional link to PRR evidence.
Learn: Incident records feed CAST-style AI extraction. Every PIR strengthens the hazard register.

portfolio · residual risk

5×5 hazard heatmap

214 hazards · 38 apps

← likelihood · impact ↑

low

moderate

elevated

high

critical

has hazards

Math you can defend

Quantified hazard exposure, not vibes.

Every score in OHA is computed from explicit, auditable inputs. No black boxes. Every number traces back to a hazard, a control, and an effectiveness rating you can review.

F·01 · Hazard risk

HazardRisk = L × I × (1 − ē)

Per-hazard residual risk on a 0–25 scale. L = likelihood, I = impact, ē = average control effectiveness.

F·02 · Control effectiveness

CES = ē_w × 100

Per-application Control Effectiveness Score (0–100). Weighted average across MISSING / PARTIAL / EFFECTIVE / STRONG ratings on every active constraint.

F·03 · Residual risk score

RRS = ū × (1 − CES/100) × 4

Portfolio-ranking Residual Risk Score. ū = mean unmitigated hazard, normalized to 0–100. Drives executive ranking and Board Pack output.

The OHA lifecycle

Identify · Score · Mitigate · Learn.

A continuous loop, not a one-time exercise. Hazards are surfaced before incidents, controls are rated, and every incident report feeds back into the register.

STAGE 01

Identify

A 4-step wizard captures hazards, unsafe control actions, and business losses per application.

→ Hazard record · sources: MANUAL · CATALOG · AI · INCIDENT

STAGE 02

Score

Live HazardRisk updates as constraints are added. A 5×5 heatmap surfaces residual risk for every application.

→ 0–25 hazard score · portfolio rollup

STAGE 03

Mitigate

Missing safety constraints auto-create gaps. Effective constraints auto-close them. Bi-directional link to PRR evidence.

→ Gap register · PRR citation badges

STAGE 04

Learn

Incident records (ServiceNow + free-form PIR) feed CAST-style AI extraction. Every incident strengthens the register.

→ Hazards tagged INCIDENT_DERIVED · audit-trailed

AI where it matters

Five AI features. One LLM-cost discipline.

Every AI suggestion ships with token-Jaccard duplicate detection — flagging similar existing hazards before the operator accepts a suggestion, so the LLM call is never wasted on a duplicate. Runs against any OpenAI-compatible endpoint, including self-hosted Ollama for air-gapped deployments.

AI · 01

Hazard suggester

Proposes hazards per application from telemetry shape + service metadata.

AI · 02

Constraint recommender

Generates safety controls for any identified hazard.

AI · 03

Feedback-gap detector

Finds missing observability between a hazard and its control.

AI · 04

Incident extractor

CAST-style hazard extraction from incident records or free-form PIR.

AI · 05

Executive summary

Generates board-ready narratives from the portfolio dashboard. Copy as markdown for slide decks.

ai · hazard suggestion streaming

Stale-cache delivery of expired financial offers

offers-svc · category: Operations · L=4 · I=4

unsafe control action

Edge-cached offer payloads served past their commercial expiry window, causing customer-visible pricing inconsistency and downstream reconciliation drift.

recommended constraint

CDN purge SLA < 90s post-expiry. Synthetic monitor verifying offer-validity headers at 60s cadence. Auto-disable serving when purge-confirm absent.

⚠ similar to HAZ-184 · 0.71 jaccard

Incident learning loop

Every incident strengthens the register.

OHA implements a simplified CAST workflow. The AI reads the incident record and applies CAST principles — identify the hazard (not what went wrong), name the contributing factors, propose a safety constraint and an observability control.

Sources: ServiceNow incidents · free-form post-mortem text · existing PIR documents
Review states: PENDING · REVIEWED · DISMISSED — tracked separately from the incident record
Source tagging: INCIDENT_DERIVED hazards link back to the originating sys_id for full traceability

incident · review queue 3 pending

INC0294711 · 14:08P1

Payment authorization timeout cascade

→ 2 hazards extracted · 1 constraint proposed

INC0294602 · 09:34P2

Stale CDN cache for promotional pricing

→ 1 hazard extracted · feedback-gap detected

INC0294557 · 03:11P2

Free-form PIR · session-id collision in auth

→ awaiting AI extraction

Curated hazard catalogue

34 hazards. 5 categories. A reliable starting point.

Each catalogue entry is a best-practice template — clone it into your application, tune it to your system, and you have a hazard record in seconds. Or skip the catalogue and let the AI suggest hazards from scratch.

Observability

Silent metric loss after deploy
Missing distributed-trace propagation
Alert fatigue (top-10 noisy rules)
Dashboard truth drift from rename
Telemetry cardinality overrun

Resiliency

Single-region failover not exercised
Retry storm without circuit breaker
Stale read replica during failover
Cross-AZ dependency hidden in libraries
Recovery-time-objective drift

Security

Stale credential rotation policy
Privileged-account access drift
Encryption-at-rest exception decay
Service-to-service mTLS lapse
Audit-log gap during deploy window

Operations

Stale cache serving expired data
Runbook drift after refactor
On-call coverage gap during rotation
Manual deployment race condition
Feature-flag fanout misconfiguration

Capacity

Untested peak-event scaling
Connection pool saturation
Async queue backpressure cascade
Memory-budget creep across releases
Rate-limit policy not exercised

Custom hazards

Author your own catalogue entries
AI suggests from telemetry shape
Incident extractor mines PIRs
Industry overlay packs (roadmap)
Versioned · diffable · reviewable

Why regulators ask for this

OHA maps directly to four regulatory frameworks.

When the CISO asks "show me your hazard register," Mithris answers in one click — and the artifact is the document regulators are actually asking for.

EU · BANKING ENFORCED

DORA · Articles 6 & 8

EU Digital Operational Resilience Act. Article 6 requires a documented ICT risk-management framework. Article 8 requires hazard identification and classification — the exact output of OHA.

In application since 17 Jan 2025 · ~22,000 EU financial entities in scope

EU · CRITICAL INFRA TRANSPOSING

NIS2 · Article 21

Risk-management measures for essential and important entities. Member states are transposing through 2025–26. Hazard analysis and operational-risk treatment are central requirements.

Telcos · energy · transport · healthcare · digital infrastructure

US · FRAMEWORK DE-FACTO STD

NIST CSF 2.0 · Identify

The Identify function (ID.RA — Risk Assessment) is foundational. OHA produces the per-asset risk identification, classification, and treatment evidence the framework expects.

Adopted by US banking regulators (FFIEC), CISA, and most critical-infrastructure operators

ISO · GLOBAL STANDARD

ISO 27005

Information security risk management. OHA's risk-treatment workflow (identify → score → mitigate) maps directly to ISO 27005's recommended process — with audit-ready evidence at every step.

Foundation for ISO 27001 risk programs · referenced by HITRUST, SOC 2, and most enterprise audit regimes

The first SRE platformwith proactive hazardanalysis built in.