Operational Hazard Analysis · OHA

The first SRE platform
with proactive hazard
analysis built in.

Monitoring tells you what broke. Catalogs tell you what exists. Neither tells you what is unsafe right now — or what regulators expect you to have already identified.

OHA operationalizes STPA (System-Theoretic Process Analysis) into a workflow engineers actually use — AI-assisted, audit-ready, and continuously refreshed from your incident record.

flagship · differentiator STPA-derived
"Catalog tools tell you what services exist. Observability tells you what broke. OHA tells you what's unsafe right now — and gives the CISO the documentation regulators ask for."
— Mithris platform thesis
What it is

Proactive safety engineering for production software.

Before 2024, applying STPA to a 200-app portfolio required a team of safety engineers and 6–12 months. Mithris compresses that into a workflow an SRE can run on an app in 15 minutes — and the AI improves with every incident PIR it ingests.

Identify
4-step wizard captures hazards, unsafe control actions, and business losses per application.
Score
HazardRisk = L × I × (1 − avg_eff). Updates live as constraints are added.
Mitigate
Missing safety constraints auto-create gaps. Effective constraints auto-close them. Bi-directional link to PRR evidence.
Learn
Incident records feed CAST-style AI extraction. Every PIR strengthens the hazard register.
portfolio · residual risk
5×5 hazard heatmap
214 hazards · 38 apps
1
2
3
4
5
5
2
5
7
9
4
4
4
8
14
11
3
3
7
16
22
12
5
2
9
18
14
8
4
1
12
9
6
3
2
← likelihood · impact ↑
low
moderate
elevated
high
critical
has hazards
Math you can defend

Quantified hazard exposure, not vibes.

Every score in OHA is computed from explicit, auditable inputs. No black boxes. Every number traces back to a hazard, a control, and an effectiveness rating you can review.

F·01 · Hazard risk
HazardRisk = L × I × (1 − ē)
Per-hazard residual risk on a 0–25 scale. L = likelihood, I = impact, ē = average control effectiveness.
F·02 · Control effectiveness
CES = ēw × 100
Per-application Control Effectiveness Score (0–100). Weighted average across MISSING / PARTIAL / EFFECTIVE / STRONG ratings on every active constraint.
F·03 · Residual risk score
RRS = ū × (1 − CES/100) × 4
Portfolio-ranking Residual Risk Score. ū = mean unmitigated hazard, normalized to 0–100. Drives executive ranking and Board Pack output.
The OHA lifecycle

Identify · Score · Mitigate · Learn.

A continuous loop, not a one-time exercise. Hazards are surfaced before incidents, controls are rated, and every incident report feeds back into the register.

STAGE 01

Identify

A 4-step wizard captures hazards, unsafe control actions, and business losses per application.

→ Hazard record · sources: MANUAL · CATALOG · AI · INCIDENT
STAGE 02

Score

Live HazardRisk updates as constraints are added. A 5×5 heatmap surfaces residual risk for every application.

→ 0–25 hazard score · portfolio rollup
STAGE 03

Mitigate

Missing safety constraints auto-create gaps. Effective constraints auto-close them. Bi-directional link to PRR evidence.

→ Gap register · PRR citation badges
STAGE 04

Learn

Incident records (ServiceNow + free-form PIR) feed CAST-style AI extraction. Every incident strengthens the register.

→ Hazards tagged INCIDENT_DERIVED · audit-trailed
AI where it matters

Five AI features. One LLM-cost discipline.

Every AI suggestion ships with token-Jaccard duplicate detection — flagging similar existing hazards before the operator accepts a suggestion, so the LLM call is never wasted on a duplicate. Runs against any OpenAI-compatible endpoint, including self-hosted Ollama for air-gapped deployments.

AI · 01
Hazard suggester
Proposes hazards per application from telemetry shape + service metadata.
AI · 02
Constraint recommender
Generates safety controls for any identified hazard.
AI · 03
Feedback-gap detector
Finds missing observability between a hazard and its control.
AI · 04
Incident extractor
CAST-style hazard extraction from incident records or free-form PIR.
AI · 05
Executive summary
Generates board-ready narratives from the portfolio dashboard. Copy as markdown for slide decks.
ai · hazard suggestion streaming
Stale-cache delivery of expired financial offers
offers-svc · category: Operations · L=4 · I=4
unsafe control action
Edge-cached offer payloads served past their commercial expiry window, causing customer-visible pricing inconsistency and downstream reconciliation drift.
recommended constraint
CDN purge SLA < 90s post-expiry. Synthetic monitor verifying offer-validity headers at 60s cadence. Auto-disable serving when purge-confirm absent.
⚠ similar to HAZ-184 · 0.71 jaccard
Incident learning loop

Every incident strengthens the register.

OHA implements a simplified CAST workflow. The AI reads the incident record and applies CAST principles — identify the hazard (not what went wrong), name the contributing factors, propose a safety constraint and an observability control.

Sources
ServiceNow incidents · free-form post-mortem text · existing PIR documents
Review states
PENDING · REVIEWED · DISMISSED — tracked separately from the incident record
Source tagging
INCIDENT_DERIVED hazards link back to the originating sys_id for full traceability
incident · review queue 3 pending
INC0294711 · 14:08P1
Payment authorization timeout cascade
→ 2 hazards extracted · 1 constraint proposed
INC0294602 · 09:34P2
Stale CDN cache for promotional pricing
→ 1 hazard extracted · feedback-gap detected
INC0294557 · 03:11P2
Free-form PIR · session-id collision in auth
→ awaiting AI extraction
Curated hazard catalogue

34 hazards. 5 categories. A reliable starting point.

Each catalogue entry is a best-practice template — clone it into your application, tune it to your system, and you have a hazard record in seconds. Or skip the catalogue and let the AI suggest hazards from scratch.

08
Observability
  • Silent metric loss after deploy
  • Missing distributed-trace propagation
  • Alert fatigue (top-10 noisy rules)
  • Dashboard truth drift from rename
  • Telemetry cardinality overrun
07
Resiliency
  • Single-region failover not exercised
  • Retry storm without circuit breaker
  • Stale read replica during failover
  • Cross-AZ dependency hidden in libraries
  • Recovery-time-objective drift
06
Security
  • Stale credential rotation policy
  • Privileged-account access drift
  • Encryption-at-rest exception decay
  • Service-to-service mTLS lapse
  • Audit-log gap during deploy window
07
Operations
  • Stale cache serving expired data
  • Runbook drift after refactor
  • On-call coverage gap during rotation
  • Manual deployment race condition
  • Feature-flag fanout misconfiguration
06
Capacity
  • Untested peak-event scaling
  • Connection pool saturation
  • Async queue backpressure cascade
  • Memory-budget creep across releases
  • Rate-limit policy not exercised
+
Custom hazards
  • Author your own catalogue entries
  • AI suggests from telemetry shape
  • Incident extractor mines PIRs
  • Industry overlay packs (roadmap)
  • Versioned · diffable · reviewable
Why regulators ask for this

OHA maps directly to four regulatory frameworks.

When the CISO asks "show me your hazard register," Mithris answers in one click — and the artifact is the document regulators are actually asking for.

EU · BANKING ENFORCED

DORA · Articles 6 & 8

EU Digital Operational Resilience Act. Article 6 requires a documented ICT risk-management framework. Article 8 requires hazard identification and classification — the exact output of OHA.

In application since 17 Jan 2025 · ~22,000 EU financial entities in scope

EU · CRITICAL INFRA TRANSPOSING

NIS2 · Article 21

Risk-management measures for essential and important entities. Member states are transposing through 2025–26. Hazard analysis and operational-risk treatment are central requirements.

Telcos · energy · transport · healthcare · digital infrastructure

US · FRAMEWORK DE-FACTO STD

NIST CSF 2.0 · Identify

The Identify function (ID.RA — Risk Assessment) is foundational. OHA produces the per-asset risk identification, classification, and treatment evidence the framework expects.

Adopted by US banking regulators (FFIEC), CISA, and most critical-infrastructure operators

ISO · GLOBAL STANDARD

ISO 27005

Information security risk management. OHA's risk-treatment workflow (identify → score → mitigate) maps directly to ISO 27005's recommended process — with audit-ready evidence at every step.

Foundation for ISO 27001 risk programs · referenced by HITRUST, SOC 2, and most enterprise audit regimes

See OHA on your portfolio

A live OHA walkthrough in 30 minutes.

We'll run the hazard wizard on a real service from your portfolio, surface its current residual risk, and show you the Board Pack PDF an auditor would actually accept.

Request OHA demo See the full platform