Use AKS Automatic as Deployment Target
Context and Problem Statement
The CIMPL platform needs a Kubernetes runtime on Azure that balances operational simplicity with enterprise security compliance. The platform hosts stateful services (Elasticsearch, PostgreSQL, RabbitMQ) alongside an Istio service mesh and must be deployable via azd up with minimal manual intervention. The reference implementation (cimpl-azd) validated AKS Automatic for this workload profile and documented the trade-offs extensively. Choosing the wrong platform increases operational burden and delays delivery.
Decision Drivers
- Minimize day-2 operational burden (patching, scaling, upgrades)
- Built-in security baseline without manual Gatekeeper policy authoring
- Managed Istio service mesh integration (no manual Istio installation)
- Cost optimization through automatic node provisioning (Karpenter/NAP)
- Single
azd updeployment experience - Alignment with proven reference implementation
Considered Options
- AKS Automatic
- Standard AKS with manual Gatekeeper policies
- Azure Container Apps
- Red Hat OpenShift on Azure (ARO)
Decision Outcome
Chosen option: "AKS Automatic", because it provides the best balance of operational simplicity, built-in security, and managed Istio — all critical for a small team deploying a complex platform stack. This decision is directly informed by the reference implementation (cimpl-azd ADR-0001) which validated this choice in production.
Consequences
- Good, because auto-scaling, auto-upgrade, and auto-repair reduce operational toil
- Good, because Deployment Safeguards enforce security baseline (probes, resource limits, seccomp profiles) without manual policy authoring
- Good, because managed Istio provides service mesh without installation or lifecycle management
- Good, because Karpenter-based Node Auto-Provisioning (NAP) dynamically selects VM SKUs per zone, eliminating
OverconstrainedZonalAllocationRequestfailures - Good, because reference implementation has already solved the major pain points (postrender patterns, deployment gates, namespace layout)
- Bad, because strict Deployment Safeguards require workarounds for every Helm chart that doesn't expose probe/resource/seccomp configuration — this is the single largest source of platform complexity
- Bad, because
NET_ADMINandNET_RAWcapabilities are blocked, preventing Istio sidecar injection in some namespaces - Bad, because Azure Policy eventual consistency means fresh clusters need a deployment gate before platform workloads can be applied
Validation
kubectl get constrainttemplates: verify Gatekeeper policies are activeaz aks show --query "sku.name": confirm cluster is "Automatic"- All pods running with
readinessProbe,livenessProbe, resourcerequests/limits, andseccompProfile: RuntimeDefault
Pros and Cons of the Options
AKS Automatic
Fully managed Kubernetes with opinionated defaults, Deployment Safeguards, managed Istio, and Karpenter-based node provisioning.
- Good, because zero-touch node management (auto-scale, auto-upgrade, auto-repair)
- Good, because Deployment Safeguards enforce pod security baseline at admission
- Good, because managed Istio removes mesh lifecycle burden
- Good, because NAP (Karpenter) enables dynamic VM SKU selection per zone
- Neutral, because relatively new GA offering (less community documentation)
- Bad, because safeguards require postrender/kustomize workarounds for most Helm charts
- Bad, because NET_ADMIN/NET_RAW blocked breaks Istio sidecar
istio-initcontainer in some workloads
Standard AKS with Manual Gatekeeper
Traditional AKS with user-managed Gatekeeper policies and optional Istio add-on.
- Good, because full control over which policies are enforced
- Good, because can selectively exempt workloads without Azure Policy
- Bad, because requires authoring and maintaining Gatekeeper ConstraintTemplates
- Bad, because no automatic enforcement — policies can drift or be disabled
- Bad, because node management is manual (VMSS pools)
Azure Container Apps
Serverless container platform abstracting away Kubernetes.
- Good, because zero infrastructure management
- Good, because built-in Dapr integration
- Bad, because no support for StatefulSets (Elasticsearch, PostgreSQL, RabbitMQ)
- Bad, because no Istio service mesh or Gateway API
- Bad, because limited control over storage, networking, and scheduling
Red Hat OpenShift on Azure (ARO)
Managed OpenShift with built-in security context constraints (SCCs).
- Good, because mature security model with SCCs
- Good, because built-in monitoring (Prometheus/Grafana)
- Neutral, because reference architecture exists (ROSA OSDU deployment)
- Bad, because significantly higher cost per cluster
- Bad, because heavier operational overhead (OpenShift-specific tooling)
- Bad, because not aligned with Azure-native tooling (
azd, Azure RBAC)
More Information
- AKS Automatic documentation
- Deployment Safeguards
- Reference implementation: cimpl-azd ADR-0001