How about, don't use Kubernetes? The lack of control over where the workload runs is a problem caused by Kubernetes. If you deploy an application as e.g. systemd services, you can pick the optimal host for the workload, and it will not suddenly jump around.
This project literally sets the affinity. That's precisely the control you seem to negate.
You can specify just about anything, including exact nodes, for Kubernetes workloads.
This is just injecting some of that automatically.
I'm not knocking systemd, it's just not relevant.
Being able to move workloads around is kinda the point. The need exists irrespective of what you use to deploy your app.
Any hostname for a service in AWS that can relocate to another AZ (for whatever reason), can use this.
Fine grained control over workload scheduling is one of the K8s core features?
Affinity, anti-affinity, priority classes, node selectors, scheduling gates - all of which affect scheduling for different use cases, and all under the operator's control.
> To gather zone information, use this command ...
Why couldn't most of this information be gathered by lookup service itself? A point could be made about excessive IAM, but a simple case of RDS reader residing in a given AZ could be easily handled by simply listing the subnets and finding where a given IP belongs.
This service is published more as a concept to be built on top of, than a complete solution.
You wouldn't even need IAM rights to read RDS information, you need subnet information. As subnets are zonal, it does not if the service is RDS or Redis/ElastiCache. The IP returned from the hostname lookup, at the time your pod is scheduled, determines which AZ that Pod should (optimally) be deployed to.
Where this solution was created, was in a multi AWS account environment. Doing describe subnets API calls across multiple accounts is a hassle. It was "good enough" to have a static mapping of subnets, as they didn't change frequently.
Kyverno requirement makes it limited. There is no "automatic-zone-placement-disabled" function in case someone wants to temporarily disable zone placement but not remove the label. How do we handle RDS Zone changing after workload scheduling? No automatic look up of IPs and Zones. What if we only have one node in specific zone? Are we willing to handle EC2 failure or should we trigger scale out?
You don't have to use Kyverno. You could use a standard mutating webhook, but you would have to generate your own certificate and mutate on every Pod.CREATE operations. Not really a problem but, it depends.
> There is no "automatic-zone-placement-disabled"
True. Thats why I choose to use preferredDuringSchedulingIgnoredDuringExecution instead of requiredDuringSchedulingIgnoredDuringExecution. In my case, where this solutions originated from, Kubernetes was already a multi AZ solution where there was always at least one node in each AZ. It was nice if the Pod could be scheduled into the same AZ, but it was not a hard requirement,
> No automatic look up of IPs and Zones. Yup, it would generate a lot of extra "stuff" to mess with: IAM Roles, how to lookup IP/subnet information from multi account AWS setup with VPC Peerings. In our case it was "good enough" with a static approach. Subnet/network topology didnt change frequently enough to add another layer of complexity.
> What if we only have one node in specific zone?
Thats why we defaulted to preferredDuringSchedulingIgnoredDuringExecution and not required.
toredash•6d ago
I wanted to share something I've worked a bit to solve regarding Kubernetes: its scheduler has no awareness of the network topology for external services that workloads communicate with. If a pod talks to a database (e.g AWS RDS), K8s does not know it should schedule it in the same AZ as the database. If placed in the wrong AZ, it leads to unnecessary cross-AZ network traffic, adding latency (and costs $).
I've made a tool I've called "Automatic Zone Placement", which automatically aligns Pod placements with their external dependencies.
Testing shows that placing the pod in the same AZ resulted in a ~175-375% performance increase. Measured with small, frequent SQL requests. It's not really that strange, same AZ latency is much lower than cross-AZ. Lower latency = increased performance.
The tool has two components:
1) A lightweight lookup service: A dependency-free Python service that takes a domain name (e.g., your RDS endpoint) and resolves its IP to a specific AZ.
2 ) A Kyverno mutating webhook: This policy intercepts pod creation requests. If a pod has a specific annotation, the webhook calls the lookup service and injects the required nodeAffinity to schedule the pod onto a node in the correct AZ.
The goal is to make this an automatic process, the alternative is to manually add a nodeAffinity spec to your workloads. But resources moves between AZ, e.g. during maintenance events for RDS instances. I built this with AWS services in mind, the concept is generic enough to be used for on-premise clusters to make scheduling decisions based on rack, row, or data center properties.
I'd love some feedback on this, happy to answer questions :)
darkwater•4h ago
toredash•4h ago
mathverse•4h ago
toredash•4h ago
dserodio•3h ago
toredash•3h ago
If yes, that's a simple update of the manifest to have 3 replicas with ab affinity setting to spread that out over different AZ. Kyverno would use the internal Service object this service provide to have a HA endpoint to send queries to.
If we are not talking about this AZP service, I don't understand what we are talkin about.
stackskipton•2h ago
toredash•2h ago
I would create a similar policy where Kyverno at intervals would check the Deployment spec to see if the endpoint is changed, and alter the affinity rules. It would then be a traditional update of the Deployment spec to reflect the desire to run in another AZ, if that made sense?