Azure Application Gateway with WAF v2 is the front door for most SaaS workloads I review. It is also the single most consistently mis-deployed component in those reviews. The pattern repeats across organisations of every size:

  1. CloudOps stands up Application Gateway with WAF in Prevention mode on day one to "tick the security box."
  2. Within days, the WAF blocks legitimate traffic — JSON bodies, file uploads, customer SSO callbacks, the marketing team's redirect chains.
  3. Engineers respond by flipping WAF into Detection mode "temporarily."
  4. Six months later, Detection is still on. The platform has compliance signage that says WAF, and protection that is purely cosmetic.

I have walked into too many environments where this is the steady state. The fix is not a product — it is a disciplined rollout. This article catalogs the mistakes I see most often and lays out the playbook I use to take SaaS apps from no WAF to full Prevention without breaking real users.

Who this is for: solution architects responsible for the front door of a SaaS platform — and especially those of us working on healthcare, where WAF mis-configuration is not just a security gap, it is a HIPAA and HITRUST control failure. The general playbook applies to every workload; the healthcare addendum (§9) is mandatory reading if any byte of ePHI flows through the gateway.

2–4 wk
Detection baseline
95%+
False positives tuned
0
PHI in WAF logs
6 yr
HIPAA log retention

1. The 10 Mistakes I See Most Often

Before the design pattern, the anti-patterns. If you recognise three or more of these in your environment, your WAF is not doing what your compliance team thinks it is doing.

2. Reference Architecture — WAF in Front of SaaS

The right architecture is single-ingress, private backends, end-to-end TLS, and one WAF Policy per application surface. Diagram first, then the playbook for getting there safely.

Single ingress · WAF v2 · private backends User /SSO / API App Gateway WAF v2 TLS 1.2+ · cert from KV WAF Policy (CRS 3.2) Autoscale 2 → 10 App Service /AKS (private) API Backend(Private EP) Static portal(Storage PE) Log Analytics +Sentinel Key Vault(TLS cert)
One public IP, one WAF, multiple private backends. Diagnostics to Sentinel; certificates from Key Vault.

3. The Rollout Playbook — Detection → Tune → Prevention

This is the four-phase rollout I use every time. It is boring, it works, and it keeps both the security team and the application owners on side.

Phase 1 · Week 0
Deploy in Detection
WAF v2 enabled, OWASP CRS 3.2, mode = Detection. Diagnostic logs to Log Analytics from minute one.
Phase 2 · Weeks 1–3
Baseline the traffic
Daily KQL review of AzureDiagnostics. Cluster matches by rule ID, host, URI. Catch every real-traffic anomaly.
Phase 3 · Weeks 3–4
Tune surgically
Add scoped exclusions per rule ID + selector (never wildcard, never disable groups). Re-test in a staging gateway.
Phase 4 · Week 4+
Flip to Prevention
Per-listener cutover, smallest blast radius first. Alert on every block for 7 days. Then steady-state.
💡 Architect's Tip

"Whitelist" is the wrong mental model. You are excluding specific WAF rules for specific request attributes on specific URIs — never permitting traffic. Frame every exclusion request to the security team that way and the conversation gets a lot easier.

4. Tuning Without Weakening the WAF

Most WAF horror stories trace back to a single bad habit: when something gets blocked, the team disables the whole rule group. That converts a precision instrument into a placebo. The right pattern is targeted exclusions.

KQL to baseline blocks before you tune

// Top WAF rule IDs triggering, by host + URI
AzureDiagnostics
| where ResourceType == "APPLICATIONGATEWAYS" and Category == "ApplicationGatewayFirewallLog"
| where action_s in ("Matched","Blocked")
| summarize hits = count() by ruleId_s, hostname_s, requestUri_s, action_s
| order by hits desc
| take 50

Scoped exclusion (right way)

Exclude rule 942100 only for the JSON field filter on the search API — not everywhere:

# Terraform: per-rule, per-selector exclusion
managed_rules {
  exclusion {
    match_variable          = "RequestArgNames"
    selector                = "filter"
    selector_match_operator = "Equals"
    excluded_rule_set {
      type    = "OWASP"
      version = "3.2"
      rule_group {
        rule_group_name = "REQUEST-942-APPLICATION-ATTACK-SQLI"
        excluded_rules  = ["942100"]
      }
    }
  }
}

Anti-pattern (do not ship this)

# Disables the entire SQLi rule group for every request - do NOT do this
disabled_rule_group {
  rule_group_name = "REQUEST-942-APPLICATION-ATTACK-SQLI"
}

5. Risk Register — Threat, Blast Radius, Mitigation

Every WAF mistake maps to a specific risk. This is the table I bring to architecture reviews:

Threat / mistakeBlast radiusMitigation
Detection left on indefinitelyWAF is a logger, not a control. SQLi, XSS, RFI reach the app.Time-boxed rollout with a Prevention date in the change record
Whole rule group disabledEntire OWASP category off — silent over-permissionScoped exclusions per rule ID + selector; weekly review
Day-one Prevention, no baselineLegitimate traffic blocked; emergency rollback erodes security mandateMandatory 2–4 week Detection baseline gate
Backend reachable directlyAttacker bypasses the WAFPrivate Endpoint backends; NSG deny from internet; policy enforce
TLS terminates at gateway onlyPlaintext between gateway and backendEnd-to-end TLS, backend HTTPS settings with pinned root CA
No diagnosticsNo tuning possible; incident forensics blindDiagnostic setting to LA on day one; 90-day retention minimum
Fixed instance countCapacity saturation under marketing spike; 502s to customersAutoscale 2 → 10 minimum on WAF_v2
PHI in WAF diagnostic logs (URI / body capture)HIPAA breach: ePHI written to Log Analytics and downstream Sentinel/SIEM with broader access scope than the application databaseDisable request_body_check capture on PHI endpoints OR scrub at ingestion; segregate WAF workspace; RBAC + customer-managed keys
No geo-filter on US-only healthcare SaaSExpands HIPAA threat surface; non-US recon traffic inflates WAF noise and complicates BAA scopeWAF Policy geo_match custom rule — allow only contracted geographies; alert on denied geos

6. Monitoring & Alerting That Actually Works

Detection mode is useless without someone watching. Set these alerts on day one — Detection or Prevention:

Wire all of these into Microsoft Sentinel or your existing SIEM. WAF logs without correlation are noise; correlated with sign-in logs, NSG flow logs, and backend app logs, they are gold.

Continuous tuning lifecycle — review, exclude, delete

A WAF is not a “deploy once” control. Traffic shifts, applications ship new endpoints, ruleset versions change, and attackers probe what worked yesterday. Without a continuous loop the exclusions you added in week three quietly become permanent over-permissions — and that is precisely where attackers find the gap. The discipline I run on every healthcare gateway:

Daily · SOC
Triage blocks & matches
Sentinel workbook: top rule IDs, new host/URI combinations, geo anomalies. Anything new gets a ticket the same day.
Weekly · AppOps
Tune false positives
Add scoped exclusions in IaC (rule ID + selector + URI). Every exclusion ships with a review date and an owner.
Monthly · SecOps
Hunt & rule-version diff
Threat-hunt across 30 days of WAF logs. Diff managed-ruleset version notes and update custom rules to cover newly published CVEs.
Quarterly · Architect
Delete stale exclusions
Any exclusion past its review date is removed by default. If the false positive comes back, it gets re-added with fresh evidence. No silent permanent allowances.
⚠️ Stale exclusions are an attack surface

Every exclusion is, by definition, a hole in the managed ruleset. An exclusion added two years ago for an endpoint that no longer exists is a free pass for an attacker who finds a way to reach a matching URI or argument name. Treat the exclusion list like firewall rules — reviewed quarterly, deleted aggressively, owned by a named engineer, and tracked in Git so every change is auditable.

KQL to flag exclusions that have not been hit in 90 days — strong candidates for deletion:

// Find recently "matched but not blocked" rules - candidates whose exclusions may be unnecessary
AzureDiagnostics
| where ResourceType == "APPLICATIONGATEWAYS" and Category == "ApplicationGatewayFirewallLog"
| where TimeGenerated > ago(90d)
| summarize last_hit = max(TimeGenerated), hits = count() by ruleId_s, hostname_s, requestUri_s
| where datetime_diff('day', now(), last_hit) > 60
| order by last_hit asc

7. Terraform Baseline — App Gateway + WAF Policy

Deploy the same baseline to every environment. Tuning that happens only in production never gets validated, and exclusions that live only in the portal disappear at the next redeploy.

# WAF Policy: start in Detection, flip to Prevention via variable
resource "azurerm_web_application_firewall_policy" "this" {
  name                = "wafp-${var.app}"
  resource_group_name = var.rg
  location            = var.location

  policy_settings {
    enabled                     = true
    mode                        = var.waf_mode    # "Detection" until baseline complete
    request_body_check          = true
    file_upload_limit_in_mb     = 100
    max_request_body_size_in_kb = 128
  }

  managed_rules {
    managed_rule_set {
      type    = "OWASP"
      version = "3.2"
    }
  }
}

# Application Gateway WAF v2 with autoscale + Key Vault cert
resource "azurerm_application_gateway" "this" {
  name                = "agw-${var.app}"
  resource_group_name = var.rg
  location            = var.location
  firewall_policy_id  = azurerm_web_application_firewall_policy.this.id

  sku {
    name = "WAF_v2"
    tier = "WAF_v2"
  }
  autoscale_configuration {
    min_capacity = 2
    max_capacity = 10
  }

  ssl_certificate {
    name                = "tls-cert"
    key_vault_secret_id = azurerm_key_vault_certificate.tls.secret_id
  }
  # ... listeners, backend pools, end-to-end TLS settings ...
}

Pair this with an Azure Policy assignment at the management group that denies creation of Application Gateways without an attached WAF Policy. That single guardrail prevents 90% of the day-one mistakes from ever shipping.

8. Healthcare Addendum — HIPAA · HITRUST · FHIR (Mandatory)

If your gateway terminates traffic for a healthcare SaaS — a payer portal, a provider app, a claims API, a FHIR endpoint — every architectural decision above gets a compliance dimension on top of the security one. Skipping this section in a healthcare design review is, in my experience, the single fastest way to fail a HITRUST audit on a perfectly good Azure platform.

⚠️ The PHI-in-logs gotcha

Application Gateway WAF logs the request URI in full and, when request_body_check is enabled, can capture the matched portion of the request body. On a healthcare API where the URI carries identifiers (/Patient/12345, /claims/MRN/...) or the body carries ePHI, those values land in Log Analytics, Sentinel, and any downstream SIEM. That is an unintended PHI store with its own access model. Plan for it before you turn WAF on, not after the auditor finds it.

HIPAA §164.312 — how the gateway satisfies each technical safeguard

HIPAA controlGateway / WAF implementation
§164.312(a)(1) Access ControlPrivate Endpoint backends; NSG deny-from-internet; gateway is the only ingress; Front Door or App Gateway WAF — not both bypassed
§164.312(a)(2)(iv) Encryption & DecryptionTLS 1.2+ at the listener; end-to-end TLS to backend; cipher suite policy restricted (no TLS 1.0/1.1, no legacy ciphers)
§164.312(b) Audit ControlsDiagnostic settings → Log Analytics → Sentinel; 6-year retention minimum; immutable archive tier after 90 days
§164.312(c)(1) IntegrityWAF Prevention mode enforces request integrity (no tampered headers/bodies passed to backend); OWASP CRS 3.2 + bot manager
§164.312(d) Person / Entity AuthenticationGateway enforces mTLS / OAuth at the listener where required; AAD-backed SSO for admin plane; never anonymous to backend
§164.312(e)(1) Transmission SecurityEnd-to-end TLS — terminating at the gateway and sending plaintext to a private subnet still fails this control; pin backend root CA

HITRUST CSF — the controls auditors actually look at

FHIR & claims API tuning patterns

These are the exclusions I add early in every healthcare WAF rollout — they show up in the baseline within hours and they are safe when scoped tightly:

Geo-restriction is a compliance lever, not just security

For US-only healthcare SaaS the WAF policy should explicitly allow only the contracted geographies. It shrinks the noise the SOC tunes against, and it reduces the BAA threat surface you must defend. A 5-line custom rule:

# Allow US + service-account geos; everything else blocked at WAF
custom_rules {
  name      = "geo-allow-us-only"
  priority  = 10
  rule_type = "MatchRule"
  action    = "Block"
  match_conditions {
    match_variables { variable_name = "RemoteAddr" }
    operator           = "GeoMatch"
    negation_condition = true
    match_values       = ["US"]
  }
}
🏥 Healthcare design rule

For any listener carrying ePHI: Prevention mode is the only acceptable steady state, end-to-end TLS is non-negotiable, the WAF Log Analytics workspace is treated as a regulated data store (CMK, private link, RBAC, 6-year retention), and every exclusion has an expiry date that re-validates against current FHIR / X12 traffic.

9. Architect's Checklist

Healthcare workloads — add these to the checklist:

10. Further Reading — Official Documentation

Everything in this article is grounded in current Microsoft guidance. These are the primary references I keep open during a WAF review — follow them for the authoritative configuration details and the latest version notes:

Application Gateway & WAF — core

Operations — logging, monitoring, troubleshooting

Security & TLS

Compliance — HIPAA & HITRUST

Infrastructure-as-Code

Key Takeaway

Application Gateway WAF is not a checkbox — it is an operational discipline. Stand it up in Detection, baseline real traffic for weeks, tune surgically, then flip to Prevention with a date you committed to in advance. For healthcare that discipline becomes a compliance obligation: Prevention is the only acceptable steady state for ePHI listeners, every exclusion is scoped and expires, every diagnostic byte is treated as PHI, and HIPAA §164.312 / HITRUST evidence is generated by the architecture — not added by the audit team six months later. Do that and you ship a WAF that protects the application and the BAA.