A few things in the article I think might help the author:
1. Podman 4 and newer (which FCOS should definitely have) uses netavark for networking. A lot of older tutorials and articles were written back when Podman used CNI for it's networking and didn't have DNS enabled unless you specifically installed it. I think the default `podman` network is still setup with DNS disabled by default. Either way, you don't have to use a pod if you don't want to anymore, you can just attach both containers to the same network and it should Just Work.
2. You can run the generator manually with "/usr/lib/systemd/system-generators/podman-system-generator --dry-run" to check Quadlet validity and output. Should be faster than daemon-reload'ing all the time or scanning the logs.
And as a bit of self-promotion: for anyone who wants to use Quadlets like this but doesn't want to rebuild their server whenever they make a change, I'm created a tool called Materia[0] that can install, remove, template, and update Quadlets and other files from a Git repository.
Anyone know why this is? Or, for that matter, why Kubernetes seems to work like this too?
I have an application for which the natural solution would be to create a pod and then, as needed, create and destroy containers within the pod. (Why? Because I have some network resources that don’t really virtualize, so they can live in one network namespace. No bridges.)
But despite containerd and Podman and Kubernetes kind-of-sort-of supporting this, they don’t seem to actually want to work this way. Why not?
Podman was changing pretty fast for a while so it could be an older version thing, though I'd assume FCOS is on Podman 5 by now.
In Podman, a pod is essentially just a single container; each "container" within a pod is just a separate rootfs. So from that perspective, it makes sense, since you can't really restart half of a container. (But I think that it might be possible to restart individual containers within a pod; but if any container within a pod fails, then I think that the whole pod will automatically restart)
> Why? Because I have some network resources that don’t really virtualize, so they can live in one network namespace.
You can run separate containers in the same network namespace with the "--network" option [0]. You can either start one container with its own automatic netns and then join the other containers to it with "--network=container:<name>", or you can manually create a new netns with "podman network create <name>" and then join all the containers to it with "--network=<name>".
[0]: https://docs.podman.io/en/latest/markdown/podman-run.1.html#...
Oh, right, thanks. I think I did notice that last time I dug into this. But:
> or you can manually create a new netns with "podman network create <name>" and then join all the containers to it with "--network=<name>".
I don’t think this has the desired effect at all. And the docs for podman network connect don’t mention pods at all, which is odd. In general, I have not been very impressed by podman.
Incidentally, apptainer seems to have a more or less first class ability to join an existing netns, and it supports CNI. Maybe I should give it a try.
Pods are specifically not wanted to be treated as vms, but as a single application/deployment units.
Among other things, if a container goes down you don’t know if it corrupted shared state (leaving sockets open or whatever). So you don’t know if the pod is healthy after restart. Also reviving it might not necessarily work, if the original startup process relied on some boot order. So to guarantee a return to healthy you need to restart the whole thing.
You are normally running several instances of your frontend so that it can crash without impacting the user experience, or so it can get deployed to in a rolling manner, etc.
As a result, I think developers are forgetting filesystem cleanliness because if you end up destroying an entire instance, well it’s clean isn’t it?
It also results in people not knowing how to do basic sysadmin work, because everything becomes devops.
The bigger problem I have with this, is the logical conclusion is to use “distroless” operating system images with vmlinuz, an init, and the minimal set of binaries and filesystem structure you need for your specific deployment, and rarely do I see anyone actually doing this.
Instead, people are using a hodgepodge of containers with significant management overhead, that actually just sit on like Ubuntu or something. Maybe alpine. Or whatever Amazon distribution is used on ec2 now. Or of course, like in this article, Fedora CoreOS.
One day, I will work with people who have a network issue and don’t know how to look up ports in use. Maybe that’s already the case, and I don’t know it.
In the few jobs I’ve had over 20 years, this is common in the embedded space, usually using yocto. Really powerful, really obnoxious tool chain.
The tool that manages all my tools is the shell. It is where I attach a debugger, it is where I install iotop and use it for the first time. It is where I cat out mysterious /proc and /sys values to discover exotic things about cgroups I only learned about 5 minutes prior in obscure system documentation. Take it away and you are left with a server that is resilient against things you have seen before but lacks the tools to deal with the future.
But instead we go with multiple moving parts all configured independently? CoreOS, Terraform and a dependence on Vultr thing. Lol.
Never in a million years I would think it's a good idea to disable SSH access. Like why? Keys and non-standard port already bring China login attempts to like 0 a year.
There are tools which show what happens per process/thread and inside the kernel. Profiling and tracing.
Check Yandex's Perforator, Google Perfetto. Netflix also has one, forgot the name.
You’ll never attach a debugger in production. Not going to happen. Shell into what? Your container died when it errored out and was restarted as a fresh state. Any “Sherlock Holmes” work would be met with a clean room. We have 10,000 nodes in the cluster - which one are you going to ssh into to find your container to attach a shell to it to somehow attach a debugger?
You would connect to any of the nodes having the problem.
I've worked both ways; IMHO, it's a lot faster to get to understanding in systems where you can inspect and change the system as it runs than in systems where you have to iterate through adding logs and trying to reproduce somewhere else where you can use interactive tools.
My work environment changed from an Erlang system where you can inspect and change almost everything at runtime to a Rust system in containers where I can't change anything and can hardly inspect the system. It's so much harder.
It is, SSH is indeed the tool for that, but that's because until recently we did not have better tools and interfaces.
Once you try newer tools, you don't want to go back.
Here's the example of my fairly recent debug session:
- Network is really slow on the home server, no idea why
- Try to just reboot it, no changes
- Run kernel perf, check the flame graph
- Kernel spends A LOT of time in nf_* (netfilter functions, iptables)
- Check iptables rules
- sshguard has banned 13000 IP addresses in its table
- Each network packet travels through all the rules
- Fix: clean the rules/skip the table for established connections/add timeouts
You don't need debugging facilities for many issues. You need observability and tracing.Instead of debugging the issue for tens of minutes at least, I just used observability tool which showed me the path in 2 minutes.
lawrencegripper•2h ago
The predictability and drop in toil is so nice.
https://blog.gripdev.xyz/2024/03/16/in-search-of-a-zero-toil...