I wanted to share something I've worked a bit to solve regarding Kubernetes: its scheduler has no awareness of the network topology for external services that workloads communicate with. If a pod talks to a database (e.g AWS RDS), K8s does not know it should schedule it in the same AZ as the database. If placed in the wrong AZ, it leads to unnecessary cross-AZ network traffic, adding latency (and costs $).
I've made a tool I've called "Automatic Zone Placement", which automatically aligns Pod placements with their external dependencies.
Testing shows that placing the pod in the same AZ resulted in a ~175-375% performance increase. Measured with small, frequent SQL requests. It's not really that strange, same AZ latency is much lower than cross-AZ. Lower latency = increased performance.
The tool has two components:
1) A lightweight lookup service: A dependency-free Python service that takes a domain name (e.g., your RDS endpoint) and resolves its IP to a specific AZ.
2 ) A Kyverno mutating webhook: This policy intercepts pod creation requests. If a pod has a specific annotation, the webhook calls the lookup service and injects the required nodeAffinity to schedule the pod onto a node in the correct AZ.
The goal is to make this an automatic process, the alternative is to manually add a nodeAffinity spec to your workloads. But resources moves between AZ, e.g. during maintenance events for RDS instances. I built this with AWS services in mind, the concept is generic enough to be used for on-premise clusters to make scheduling decisions based on rack, row, or data center properties.
I'd love some feedback on this, happy to answer questions :)
toredash•2h ago
I wanted to share something I've worked a bit to solve regarding Kubernetes: its scheduler has no awareness of the network topology for external services that workloads communicate with. If a pod talks to a database (e.g AWS RDS), K8s does not know it should schedule it in the same AZ as the database. If placed in the wrong AZ, it leads to unnecessary cross-AZ network traffic, adding latency (and costs $).
I've made a tool I've called "Automatic Zone Placement", which automatically aligns Pod placements with their external dependencies.
Testing shows that placing the pod in the same AZ resulted in a ~175-375% performance increase. Measured with small, frequent SQL requests. It's not really that strange, same AZ latency is much lower than cross-AZ. Lower latency = increased performance.
The tool has two components:
1) A lightweight lookup service: A dependency-free Python service that takes a domain name (e.g., your RDS endpoint) and resolves its IP to a specific AZ.
2 ) A Kyverno mutating webhook: This policy intercepts pod creation requests. If a pod has a specific annotation, the webhook calls the lookup service and injects the required nodeAffinity to schedule the pod onto a node in the correct AZ.
The goal is to make this an automatic process, the alternative is to manually add a nodeAffinity spec to your workloads. But resources moves between AZ, e.g. during maintenance events for RDS instances. I built this with AWS services in mind, the concept is generic enough to be used for on-premise clusters to make scheduling decisions based on rack, row, or data center properties.
I'd love some feedback on this, happy to answer questions :)