I have AdGuard Home but one of my spouse’s streaming services wouldn’t work. “There was a problem.” Gee thanks. Eventually figured out that I had to unblock a few hosts so it would work. Only found which ones by googling and finding some other poor soul who fixed it and documented it.
It's been years since I've significantly used Apple software, but when I had to use a Mac at work, or helped friends or family troubleshoot some problem on Mac OS, I had a similar experience. When things don't "just work", it was very difficult to figure out why it didn't work.
Apple is all about walled-off, locked-down, black box, just-works (when it does) etc. It's supposed to seem like magic. You're not supposed to tinker with magic, it makes it pedestrian. Apple as a brand is a lifestyle, a feeling. The slick, polished brand. Remember "I'm a Mac, and I'm a PC"? PC is where you tinker, and there is screws and nuts and bolts and jargon and troubleshooting etc. In Apple land, you just take it to a slick genius bar and they do their magic. Or you just buy a new one.
As a European I'm always baffled how Apple got so much market share among the actual techies and power users in the US. You do it to yourself by buying this stuff. It's for people who don't want to spend one second thinking about actual technical issues.
But paint the nerds who like MacOS and the wonderful third-party app ecosystem of developers who care about fit and finish as a bunch of mindless rubes if it makes you feel better.
When I view the logs on my Apple systems they make sense to me. One does have to understand the logs which implies understanding the system under diagnosis.
iPads are a completely different world and really feel not just restrictive but the whole ecosystem constantly tries to push you towards subscriptions for everything, including the OS which conveniently offers the only sane backup solution that can cover all apps. It incentivises content consumption and giving up control over one's data. Not my cup of tea.
Linux, historically, was terrible and then some; lots of us simply want to get on with life and not dork with the OS every day. If you didn't want to use Windows at your day job, that left OS X.
And, for a while, Apple hardware was quite nice. For a remarkably long time, you could get way cheaper high resolution laptop displays than the competition. The trackpads have always been far superior on Apple than Linux. And then the M-series came along and was also quite nice.
However, over time Linux has gotten better so it's now functional as a daily driver and reasonably reliable. And macOS has deteriorated until it's now probably below Linux in terms of reliability.
So, here we are. macOS and Windows do seem to be losing share to Linux, but only Linux cares. At this point, desktop/laptop revenue is dwarfed by everything else at both Microsoft and Apple.
Meanwhile, I'm baffled why any techie would voluntarily use an OS that force-enables telemetry and advertising. The fight for privacy and ad-free experiences is hard enough without your OS fundamentally working against you.
https://sneak.berlin/20210202/macos-11.2-network-privacy/
None of this able to be turned off, the boot volume is read-only. Can only be deactivated by jumping through hoops.
It's almost as if they demand the data, and won't be denied it.
Somehow angry Europeans (at least in this thread) are running into the embrace of Windows as the defender of the tinkerers. Certainly not in n my bingo card.
I know exactly how this happened, I was there. It filled a gap for a practical desktop UNIX when none existed.
In the old days, there many flavors of proprietary UNIX, like Solaris, IRIX, HPUX, AIX, et al plus a few open source versions like FreeBSD and early Linux. The early Internet was a purely UNIX world (still mostly is) but UNIX was a fragmented market of dozens of marginally interoperable OS.
During the dotcom boom, Solaris on Sparc became the gold standard for large servers. These are very expensive machines and not particularly user friendly. If you were a dev in those days, you were either using some type of Sparc workstation or FreeBSD or Linux (which wasn’t very good in those days). You wanted your desktop environment to be UNIX-ish but the good + cheap options were limited. Linux became better on the server and started to displace FreeBSD there but was still very limited as a desktop OS. Linux was much worse than Windows NT on the desktop at the time but Windows NT wasn’t UNIX.
MacOS X came along and offered UNIX on the desktop with a far better experience than Linux (or any other UNIX) on the desktop, and much cheaper than a Solaris workstation. It filled a clear gap in the market, and so Silicon Valley moved from a mix of Solaris and Linux desktops for development to MacOS X desktops, which were better in almost every way for the average dev. It was UNIX and it ran normal business applications like Microsoft Office.
MacOS X was a weaker UNIX than many of the other UNIX OS but it offered a desktop that didn’t suck and it was cheap. For someone that had been using Linux or Solaris at the time, which many devs were, it was a massive upgrade.
MacOS still kind of sucks as a UNIX but that’s okay because we don’t use it as a server. Silicon Valley needed a competent UNIX desktop that didn’t cost a fortune and Apple delivered.
Apple is just a remote UNIX system for manipulating the other UNIX systems your code actually runs one.
Why only the US? I'm in Europe and I've switched from Linux to Mac OS as my daily driver when I got tired of waiting for the mythical "linux on the desktop year".
Note that a good part of my career involves arm linuxes for industrial applications so I never actually stopped using linux if i was paid for it.
Mac OS is indeed becoming more and more annoying, but then so is desktop Ubuntu. And Windows is out of the question. I know firsthand, I have a contract for a windows application right now.
If Apple management continues to not take their dried frog pills as prescribed, I will eventually switch back to Linux, but for the desktop I'll probably have to check out some more niche distributions, or at least Debian.
And even then I'll probably keep the macbook pro and switch to Linux only on the desktop machines.
I don't know. There's a lot of friction like gnome hiding or removing configuration options, kde becoming a third class citizen, different packaging systems every 2 years... app stores being pushed instead of apt-get install...
The command line and server side stuff is fine of course, I wouldn't dream of running anything but linux for that.
So... i don't know about "AI". Might have to still write the config files by hand.
And linux on the server works well enough that I decided to replace the home box only after like 10 years, so I'm not even sure what services I need to migrate, and the safe option is to start from a clean slate and redo all the configuration from scratch.
Probably don't remember what questions to ask. Or if i should dump apache and install nginx instead.
For server side software where there is a sysadmin in charge of keeping it running I generally agree.
But for end user software (desktop, mobile, embedded) no one wil read the logs and there the logs can, and probably should, be aimed at the developers. Of course you can and should still provide usable and informative end user oriented error messages but they're not the same thing as logs
I.e. SEO-optimized
Lots of end user software is used in an enterprise context where the helpdesk staff will have to read those logs. And for B2C (or retail, or amateur, whatever you want to call them) users, often they will go through online tutorials to try to self-diagnose because the developers are most of the time unreachable.
* https://dave.autonoma.ca/blog/2022/01/08/logging-code-smell/
* https://dave.autonoma.ca/blog/2026/02/03/lloopy-loops/
Both of these posts discuss using event-based frameworks to eliminate duplicative (cross-cutting) logging statements throughout a code base.
My desktop Markdown editor[1], uses this approach to output log messages to a dialog box, a status bar, and standard error, effectively "for free".
[1]: https://repo.autonoma.ca/repo/keenwrite/tree/HEAD/src/main/j...
For example, we're experimenting with having Claude Desktop read log files for remote users. It's often able to troubleshoot and solve issues for our users faster than we can, especially after you give it access to your codebase through GH MCP or something like that. It's wild.
The biggest problem is what when you wrote a code for a 'totally obvious message' you yourself was in the context. Years, year, heck even weeks later you would stare at it and wonder 'why tf I didn't wrote something more verbose?'.
Anecdote: I wrote some supporting scripts to 'integrate' two systems three times - totally oblivious the second and the third times what I already did it. Both times I was somewhere 60% when I though 'wait I totally recognize this code but I just wrote it! What in Deja-vu-nation?!'.
This is such a core advantage to javascript: that it is an alive language. The runtime makes it very easy to change and modify systems ongoingly, and as an operator, that is so so so much better than having a statically compiled binary, in terms of what is possible.
One of my favorite techniques is using SIGUSR1 to start the node debugger. Performance impact is not that bad. Pick a random container in prod, and... just debug it. Use logpoints instead of breakpoints, since you don't want to halt the world. Takes some scripting to SSH port forward to docker port forward to the container, but an LLM can crack that script out in no time. https://nodejs.org/en/learn/getting-started/debugging#enable...
My cherry on top is to make sure the services my apps consume are attached to globalThis, so I can just hit my services directly from the running instance, in the repl. Without having to trap them being used here or there.
A human, or an 'agent' can use those to figure out why said next step might have gone wrong.
Also it's helpful to log before operations rather than after because if a step gets stuck it's possible to know what it's stuck on.
It either gets resolved quicker by the L2 guy or dispatched to the third party hardware fix it guy or sent to some speciality L3 team. Resolution time is down like 60%.
My next goal is to assess disk and battery health in laptops and proactively replace if they hit whatever threshold we can push the vendors to accept. That could eliminate something like 30% of device related issues, which has a super high value.
Not the least of which because log processing SaaS companies seem to be overcharging for their services even versus hosted Grafana services, and really many of us could do away with the rent seeking entirely.
The computational complexity of finding meaning in log files versus telemetry data leans toward this always being the case. It will never change except in brief cases of VC money subsidizing your subscription.
If an error shouldn’t trigger operator actions, but 1000 should, that’s a telemetry alert not a data dog or Splunk problem.
Reference to where my brain is at: https://www.robustperception.io/cardinality-is-key/
I feel like splunk’s business model favors a healthy system and gives major disadvantages to an unhealthy one. What I mean in an example: when the system is unhealthy, I know it because all my splunk queries get queued up because everyone is slamming it with queries. I hate it.
But I’m stuck in knowing how to move some things to Prometheus. Like say we have a CustomerID and we want to track number of times something is done by user. If we have thousands of customers, cardinality breaks that solution.
Is there a good solution for this?
We got a lot of pushback when migrating our telemetry to AWS after initially being told to just move it when they saw how OTEL amplified data points and cardinality versus our old StatsD data.
You probably need less cardinality than you think, and there are a mix of stats that work fine with less frequent polling, while others like heap usage are terrible if you use 20 or 30 second intervals. Our Pareto frontier was to reduce the sampling rate of most stats and push per-process things like heap usage into histograms.
An aggregator per box can drop a couple of tags before sending them upstream which can help considerably with the number of unique values. (eg, instanceID=[0..31] isn't that useful outside of the box)
But I can give you a partial picture.
You're going to end up with multiple dashboards with duplicate charts on them because you're showing correlation between two charts via proximity. Especially charts that are in the same column in row n±1 or vice versa. You're trying to show whether correlation is likely to be causation or not. Grafana has a setting that can show the crosshairs on all graphs at the same time, but they need to be in the same viewport for the user to see them. Generally, for instance, error rates and request rates are proportional to each other, unless a spike in error rates is being for instance triggered by web crawlers who are now hitting you with 300 req/s each whereas they normally are sending you 50. The difference in the slope of the lines can tell you why an alert fired or that it's about to. So I let previous RCAs inform whether two graphs need to be swapped because we missed a pattern that spanned a viewport. And sometimes after you fix tech debt, the correlation between two charts goes up or way down. So what was best in May not be best come November.
There's a reason my third monitor is in portrait mode, and why that monitor is the first one I see when I come back to my desk after being AFK. I could fit 2 dashboards and group chat all on one monitor. One dashboard showed overall request rate and latency data, the other showed per-node stats like load and memory. That one got a little trickier when we started doing autoscaling. The next most common dashboard which we would check at intervals showed per-service tail latencies versus request rates. You'd check that one every couple of hours, any time there was a weird pattern on the other two, or any time you were fiddling with feature toggles.
From there things balkanized a bit. We had a few dashboards that two or three of us liked and the rest avoided.
You’ve got correlationids, and if your system isn’t reliably propagating those everywhere you absolutely have to fix that. But you’re going to use those once you already notice an uptick in a weird error you haven’t seen before, and it’s hard to see those when you’re generating 8k log entries per second that are 140-200 characters long and so you’re only seeing twenty of them at a time in Splunk.
You have some chatty frontend that’s firing off three requests at the same time and you’re going to struggle period. You’re going to be down to some janky log searches for that and you don’t need to be paying someone $$ every month to still have it rough.
We used to have QA people for this.
> there’s really no such thing as tracing a->b->c anyway
> and it’s hard to see those when you’re generating 8k log entries per second that are 140-200 characters long and so you’re only seeing twenty of them at a time in Splunk.
Except as you note you can have a tag to correlate logs across distributed services. This is already done for jaeger tracing. It would be insanity to try to look at all logs at once. When you're looking at logs it's because something like "customer A complains they had a problem with request XYZ". And honestly, 8k/s is child's play for logging. A system I was running had to start tuning down the log verbosity at ~30k requests/s and that's because it was generating like 8 logs per request (so ~100k logs/s).
> You’re going to be down to some janky log searches for that and you don’t need to be paying someone $$ every month to still have it rough
That's between you and your log ingestion system. You get to pick where you send your logs and the capabilities it has. All the companies I worked at self-hosted their log infrastructure and it worked fine for not a lot of money. You're conflating best practices with "what can I pay a SaaS company to solve for me". Honeycomb.io may be helpful here btw. Their pricing wasn't exorbitantly egregious here and at low to medium scale tracing the way they do it can supplant the need for logging.
Some users do read log messages, just as some users file useful bug reports. Even when they are a tiny minority, I find their discoveries valuable. They give me a view into problems that my software faces out there in the wilds of real-world use. My log messages enable those discoveries, and have led to improvements not only in my own code, but also in other people's projects that affect mine.
This is part of why I include a logging system and (hopefully) understandable messages in my software. Even standalone tools and GUI applications.
(And since I am among the minority who read log messages, I appreciate good ones in software made by other people. Especially when they allow me to solve a problem immediately, on my own, rather than waiting days or weeks get the developer's attention.)
Etheryte•2d ago
hk__2•2d ago
Yes, see all the questions on StackOverflow with people posting their error message without reading it, like “I got an error that says ‘fail! please install package xyz!’, what should I do?!?”.
dexwiz•2d ago
dylan604•2d ago
If they don't know how to do X, then they should be able to look up how to do X. If it's something like install 3rd party library, then that's not the first party's responsibility. Especially OSS for different arch/distros. They are all different. Look up the 3rd party's repo and figure it out.
But no, it's contact support straight away.
dexwiz•2d ago
john_strinlai•2d ago
on wednesday i got a call saying "the CRM wont let me input this note, please fix" when the error message was "you have included an invalid character, '□' found in description. remove invalid characters and resubmit".
butterbomb•2d ago
tempest_•2d ago
Generally a sysadmin needs to know "is there an action I can do to correct this" where as a dev has the power to alter source code and thus needs to know what and where the system is doing things.
EvanAnderson•2d ago
lucianbr•2d ago
I disagree with this view, but it definitely exists.
EvanAnderson•2d ago
superjan•2d ago
justinclift•2d ago
ie `$HOME`/.config/foo/stuff.cfg` rather than `/home/joebloggs/foo/stuff.cfg`?
Terr_•2d ago
Obviously that depends on the messages being infrequent in production logging levels.
PunchyHamster•2d ago
They don't need to. The log message is so helpdesk have something actionable, or so it can be copy pasted into google to find people with similar problem maybe having solution.
ragall•2d ago
No, it's very different: developers generally want to know about things they control, so they want detailed debugging info about a program's internal state that could have caused a malfunction. Sysadmins don't care about that (they're fine with coalescing all the errors that developers care about under a general "internal error"), and they care about what in the program environment could have triggered the bug, so that they may entirely avoid it or at least deploy workarounds to sidestep the bug.