Three-Layer Deployment Architecture
Context and Problem Statement
The CIMPL deployment includes Azure infrastructure (AKS cluster, networking, RBAC), platform middleware (Elasticsearch, PostgreSQL, RabbitMQ, Istio config), and OSDU application services (Partition, Entitlements, etc.). These have fundamentally different change frequencies and blast radii. A Helm chart upgrade should not risk the AKS cluster, and an application service rollout should not trigger re-evaluation of platform middleware. We need a deployment architecture that isolates these concerns while maintaining a single azd up experience.
The reference implementation (cimpl-azd ADR-0006) uses a two-layer split (infra and software/stack). This project extends that to three explicit layers to provide finer-grained lifecycle control as the platform matures.
Decision Drivers
- Cluster infrastructure changes are infrequent and high-risk
- Platform middleware changes are moderate frequency and medium-risk
- Application service changes are frequent and lower-risk
azd upmust orchestrate all layers in the correct order via lifecycle hooks- Cross-layer values (cluster name, OIDC issuer URL, resource group) must flow between layers via environment variables
- Blast radius of a
terraform destroymust be containable per layer - Teardown must proceed in reverse order (software → platform → infra)
Considered Options
- Three-layer architecture (infra / platform / software)
- Two-layer architecture (infra / stack)
- Single monolithic Terraform state
- Terraform workspaces
Decision Outcome
Chosen option: "Three-layer architecture", because it provides the finest-grained lifecycle isolation, aligns with the natural boundaries between Azure resources, Kubernetes middleware, and application services, and enables independent deploy/rollback per layer.
| Layer | Path | Manages | Triggered by |
|---|---|---|---|
| 1. Infrastructure | infra/ | Resource group, AKS cluster, networking, managed identities | azd provision |
| 1a. Access | infra-access/ | Privileged RBAC, policy exemptions (see ADR-0020) | scripts/bootstrap-access.ps1 |
| 2. Foundation | software/foundation/ | cert-manager, CloudNativePG, Elasticsearch, ExternalDNS, Gateway | post-provision hook |
| 3. Stack | software/stack/ | OSDU services, Airflow, Keycloak, RabbitMQ, Redis, MinIO, Istio routing | azd deploy (pre-deploy hook) |
Consequences
- Good, because each layer can be independently planned, applied, and destroyed
- Good, because
terraform applyinsoftware/cannot accidentally modify platform middleware or the AKS cluster - Good, because platform middleware changes (e.g., Elasticsearch upgrade) don't trigger re-evaluation of all OSDU services
- Good, because teardown hooks can cleanly reverse the order: software → platform → infra
- Good, because aligns with azd lifecycle:
azd provision(infra), post-provision hook (foundation),azd deployvia pre-deploy hook (stack) - Bad, because cross-layer values must be explicitly passed via environment variables — no direct Terraform state references between layers
- Bad, because multiple state files to manage and reason about
- Bad, because debugging requires understanding which layer owns each resource
- Bad, because adds orchestration complexity in lifecycle hook scripts compared to a two-layer model
Pros and Cons of the Options
Three-Layer Architecture (infra / platform / software)
Separate Terraform roots for Azure infrastructure, Kubernetes middleware, and application services.
- Good, because finest-grained blast radius isolation
- Good, because application deploys are fast — only evaluating OSDU service Helm releases
- Good, because platform middleware can be upgraded independently of both infra and applications
- Good, because maps cleanly to team responsibilities (infra team, platform team, app team)
- Neutral, because adds one more layer than the reference implementation
- Bad, because three state files and three sets of cross-layer variable passing
- Bad, because lifecycle hook scripts must coordinate three layers
Two-Layer Architecture (infra / stack)
Separate Terraform roots for Azure infrastructure and all Kubernetes resources (middleware + services combined), as used by the reference implementation (cimpl-azd ADR-0006).
- Good, because simpler orchestration — only two layers to coordinate
- Good, because proven in reference implementation
- Good, because middleware and services share
depends_onrelationships naturally - Bad, because a Helm chart upgrade to Elasticsearch triggers re-evaluation of all OSDU services
- Bad, because blast radius includes both middleware and application services
- Bad, because conflates different change frequencies into one state
Single Monolithic State
One Terraform root managing everything from resource group to application services.
- Good, because simplest to understand — everything in one place
- Good, because direct resource references (no cross-layer variable passing)
- Bad, because
terraform planevaluates every resource on every change (slow) - Bad, because a single
terraform destroyremoves everything with no granularity - Bad, because highest blast radius — any change can affect any resource
Terraform Workspaces
Single Terraform configuration with workspace-based isolation.
- Good, because workspace switching is simpler than directory-based separation
- Bad, because workspaces are designed for environment isolation (dev/staging/prod), not layer isolation
- Bad, because all resources still share one configuration — no blast radius reduction
- Bad, because not supported by azd's Terraform integration
More Information
- Reference implementation: cimpl-azd ADR-0006 (two-layer variant)
- Azure Developer CLI lifecycle hooks
- Related: ADR-0001 (Terraform + azd choice enables lifecycle hook orchestration)