frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Casio F-91W

https://en.wikipedia.org/wiki/Casio_F-91W
1•azhenley•4m ago•0 comments

MCP Security is still Broken

https://forgecode.dev/blog/prevent-attacks-on-mcp/
1•kt_sainicoder•6m ago•0 comments

We stopped believing in progress. Now, we're stagnating. [video]

https://www.youtube.com/watch?v=i7IZ-1bZT_U
1•dartharva•9m ago•0 comments

Show HN: Youchoose.one – A site that helps you narrow down a list, one by one

https://www.youchoose.one/
1•djdmorrison•14m ago•0 comments

Callers are hearing robotic voices when they try to reach relatives in Iran

https://apnews.com/article/iran-israel-war-ai-calls-bots-d83c659b61de1f904b68dc475ddad766
1•clarity8•15m ago•0 comments

0xchat Secure chat built on Nostr

https://www.0xchat.com/#/
1•Bluestein•16m ago•0 comments

A White Nationalist Wrote a Law Paper Promoting Racist Views. Won Him an Award

https://www.nytimes.com/2025/06/21/us/white-supremacist-university-of-florida-paper.html
10•_tk_•16m ago•1 comments

Show HN: Lightweight DNS server for Docker containers

https://github.com/ei-sugimoto/wakemae
1•ei-sugimoto•17m ago•0 comments

A Decade On, Has Japan's Corporate Revolution Worked Too Well?

https://www.bloomberg.com/opinion/articles/2025-06-19/m-a-has-japan-s-corporate-revolution-worked-too-well
1•xbmcuser•18m ago•0 comments

Zuckerberg spent billions on an AI 'dream team,' has to deliver for shareholders

https://www.cnbc.com/2025/06/21/metas-zuckerberg-has-to-win-ai-after-billions-spent-on-dream-team.html
1•rntn•19m ago•0 comments

Weird and wonderful PCB business card designs

https://www.techadjacent.io/p/extreme-pcb-business-cards-gone-wild
2•safety_sandals•20m ago•0 comments

Show HN: OSAI-Browser – A P2P Browser for Web3 and HTML Games

2•EvoSync•22m ago•0 comments

Mars orbiter's first pic of volcano above clouds: twice as tall as Mauna Loa

https://www.aol.com/news/mars-orbiter-captures-1st-ever-123751091.html
2•Bluestein•25m ago•0 comments

Hiding Metrics from the Web

https://manualdousuario.net/en/hiding-metrics-from-the-web/
1•rpgbr•26m ago•0 comments

10-Year-Old Linux/Android Performance Tool/Service: Guider 3.9.9

https://github.com/iipeace/guider
1•iipeace•29m ago•1 comments

I built a local TTS add-on using an 82M parameter neural model – Runs on potatos

https://github.com/pinguy/kokoro-tts-addon
1•pinguy•30m ago•1 comments

We started porting Lego Island to everything? [video]

https://www.youtube.com/watch?v=JUNdWnI5BTk
1•makepanic•31m ago•1 comments

Show HN: MyLineArts – Turn Photos into Bobbie Goods-Style Line Art for Painting

https://mylinearts.com
1•leonardo2204•31m ago•0 comments

Liberty Phone Is Made in America, Creator Explains How

https://www.wsj.com/us-news/climate-environment/heat-dome-wave-record-temperatures-565d6d13
2•next_xibalba•33m ago•0 comments

AI SDK

https://ai-sdk.dev
1•tosh•33m ago•0 comments

Everything I know about good system design

https://www.seangoedecke.com/good-system-design/
2•swah•34m ago•0 comments

American data center tax exemptions "out of control," according to report

https://www.datacenterdynamics.com/en/news/american-data-center-tax-exemptions-out-of-control-according-to-report/
2•belter•36m ago•1 comments

MIT student prints AI polymer masks to restore paintings in hours

https://arstechnica.com/ai/2025/06/mit-student-prints-ai-polymer-masks-to-restore-paintings-in-hours/
2•Brajeshwar•38m ago•0 comments

Record DDoS pummels site with once-unimaginable 7.3Tbps of junk traffic

https://arstechnica.com/security/2025/06/record-ddos-pummels-site-with-once-unimaginable-7-3tbps-of-junk-traffic/
8•Brajeshwar•38m ago•0 comments

Don't Build Multi-Agents

https://cognition.ai/blog/dont-build-multi-agents#principles-of-context-engineering
2•Brajeshwar•38m ago•0 comments

Amakobe Sande: Progress for Children Is Progress for Everyone

http://en.ccg.org.cn/archives/87637
2•mooreds•38m ago•0 comments

Maple Mono: open-source monospace font with round corner

https://github.com/subframe7536/maple-font
1•Bluestein•44m ago•0 comments

Show HN: Simple gitignore generator on the web – GUItignore

https://gitignore.0x00.cl/
1•0x00cl•48m ago•0 comments

AI May Be Listening in on Your Next Doctor's Appointment

https://www.wsj.com/health/healthcare/ai-ambient-listening-doctor-appointment-e7afd587
2•bookofjoe•50m ago•0 comments

Mark Zuckerberg unleashed his inner brawler

https://www.ft.com/content/a86f5ca3-f841-4cdc-9376-304e085c4cfd
4•doener•54m ago•1 comments
Open in hackernews

Scaling our observability platform by embracing wide events and replacing OTel

https://clickhouse.com/blog/scaling-observability-beyond-100pb-wide-events-replacing-otel
75•valyala•4h ago

Comments

ofrzeta•2h ago
Whenever I read things like this I think: You are doing it wrong. I guess it is an amazing engineering feat for Clickhouse but I think we (as in IT or all people) should really reduce the amount of data we create. It is wasteful.
XorNot•2h ago
The problem with this is generally that you have logs from years ago, but no way to get a live stream of logs which are happening now.

(one of my immense frustrations with kubernetes - none of the commands for viewing logs seem to accept logical aggregates like "show me everything from this deployment").

knutzui•2h ago
Maybe not via kubectl directly, but it is rather trivial to build this, by simply combining all log streams from pods of a deployment (or whatever else).

k9s (k9scli.io) supports this directly.

madduci•2h ago
And what is the sense of keeping years of logs? I could probably understand very sensitive industries, but In general, I see a pure waste of resources. At most you need 60-90 days of logs.
brazzy•2h ago
One nice side effects of the GDPR is that you're not allowed to keep logs indefinitely if there is any chance at all that they contain personal information. The easiest way to comply is to throw away logs after a month (accepted as the maximum justifiable for general error analysis) and be more deliberate about what you keep longer.
Sayrus•2h ago
Access logs and payment information for compliance, troubleshooting and evaluating trends of something you didn't know existed until months or years later, finding out if an endpoint got exploited in the past for a vulnerability that you only now discovered, tracking events that may span across months. Logs are a very useful tool in many non-dev or longer term uses.
fc417fc802•1h ago
My home computer has well over 20 TB of storage. I have several LLMs, easily half a TB worth. The combined logs generated by every single program on my system might total 100 GB per year but I doubt it. And that's before compression.

Would you delete a text file that's a few KB from a modern device in order to save space? It just doesn't make any sense.

sureglymop•1h ago
It makes sense to keep a high fidelity history of what happened and why. However, I think the issue is more that this data is not refined correctly.

Even when it comes to logging in the first place, I have rarely seen developers do it well, instead logging things that make no sense just because it was convenient during development.

But that touches on something else. If your logs are important data, maybe logging is the wrong way to go about it. Instead think about how to clean, refine and persist the data you need like your other application data.

I see log and trace collecting in this way almost as a legacy compatibility thing, analog to how kubernetes and containerization allows you to wrap up any old legacy application process into a uniform format, just collecting all logs and traces is backwards compatible with every application. But in order to not be wasteful and only keep what is valuable, a significant effort would be required afterwards. Well, storage and memory happen to be cheap enough to never have to care about that.

AlecBG•2h ago
This sounds pretty easy to hack together with 10s of lines of python
Sayrus•2h ago
Stern[1] does that. You can tail deployments, filter by labels and more.

[1] https://github.com/stern/stern

ofrzeta•2h ago
What about "kubectl logs deploy/mydep --all-containers=true" but I guess you want more than that? Maybe https://www.kubetail.com?
CSDude•2h ago
Blanket statements like this miss the point. Not all data is waste. Especially high-cardinality, non-sampled traces. On a 4-core ClickHouse node, we handled millions of spans per minute. Even short retention windows provided critical visibility for debugging and analysis.

Sure, we should cut waste, but compression exists for a reason. Dropping valuable observability data to save space is usually shortsighted.

And storage isn't the bottleneck it used to be. Tiered storage with S3 or similar backends is cheap and lets you keep full-fidelity data without breaking the budget.

ofrzeta•2h ago
> Dropping valuable observability data to save space is usually shortsighted

That's a bit of a blanket statement, too :) I've seen many systems where a lot of stuff is logged without much thought. "Connection to database successful" - does this need to be logged on every connection request? Log level info, warning, debug? Codebases are full of this.

throwaway0665•1h ago
There's always another log that could have been key to getting to the bottom of an incident. It's impossible to know completely what will be useful in advance.
citrin_ru•1h ago
Probably not very useful for prod (non debug) logging, but it’s useful when such events are tracked in metrics (success/failure, connect/response times). And modern databases (including ClickHouse) can compress metrics efficiently so not much space will be spent on a few metrics.
jiggawatts•41m ago
I agree with both you and the person you're replying to, but...

My centrist take is that data can be represented wastefully, which is often ignored.

Most "wide" log formats are implemented... naively. Literally just JSON REST APIs or the equivalent.

Years ago I did some experiments where I captured every single metric Windows Server emits every second.

That's about 15K metrics, down to dozens of metrics per process, per disk, per everything!

There is a poorly documented API for grabbing everything ('*') as a binary blob of a bunch of 64-bit counters. My trick was that I then kept the previous such blob and simply took the binary difference. This set most values to zero, so then a trivial run length encoding (RLE) reduced a few hundred KB to a few hundred bytes. Collect an hour of that, compress, and you can store per-second metrics collected over a month for thousands of servers in a few terabytes. Then you can apply a simple "transpose" transformation to turn this into a bunch of columns and get 1000:1 compression ratios. The data just... crunches down into gigabytes that can be queried and graphed in real time.

I've experimented with Open Telemetry, and its flagrantly wasteful data representations make me depressed.

Why must everything be JSON!?

tjungblut•2h ago
tldr, they now do a zero (?) copy of raw bytes instead of marshaling and unmarshaling json.
the_real_cher•2h ago
What is the trick that this and dynamo use?

Are they just basically large hash tables?

atemerev•2h ago
When I get back from Clickhouse to Postgres, I am always shocked. Like, what it is doing for some minutes importing this 20G dump? Shouldn't it take seconds?
joshstrange•1h ago
Every time I use Clickhouse I want blow my brains out, especially knowing that Postgres exists. I’m not saying Clickhouse doesn’t have its place or that Postgres can do everything that Clickhouse can.

What I am saying is that I really dislike working in Clickhouse with all of the weird foot guns. Unless you are using it in a very specific, and in my opinion, limited way, it feels worse than Postgres in every way.

mrbluecoat•2h ago
Noteworthy point:

> If a service is crash-looping or down, SysEx is unable to scrape data because the necessary system tables are unavailable. OpenTelemetry, by contrast, operates in a passive fashion. It captures logs emitted to stdout and stderr, even when the service is in a failed state. This allows us to collect logs during incidents and perform root cause analysis even if the service never became fully healthy.

fuzzy2•1h ago
Everything OTel I ever did was fully active. So I wouldn't say this is very noteworthy. Instead it is wrong/incomplete information.
jurgenkesker•2h ago
So yeah, this is only really relevant for collecting logs from clickhouse. Not for logs from anything else. Good for them, and I really love Clickhouse, but not really relevant.
iw7tdb2kqo9•1h ago
I haven't worked in ClickHouse level scale.

Can you search log data in this volume? ElasticSearch has query capabilities for small scale log data I think.

Why would I use ClickHouse instead of storing log data as json file for historical log data?

sethammons•1h ago
Scale and costs. We are faced with logging scale at my work. A naive "push json into splunk" will cost us over $6M/year, but I can only get maybe 5-10% of that approved.

In the article, they talk about needing 8k cpu to process their json logs, but only 90 cpu afterward.

munchbunny•1h ago
> Can you search log data in this volume?

(Context: I work at this scale)

Yes. However, as you can imagine, the processing costs can be potentially enormous. If your indexing/ordering/clustering strategy isn't set up well, a single query can easily end up costing you on the order of $1-$10 to do something as simple as "look for records containing this string".

My experiences line up with theirs: at the scale where you are moving petabytes of data, the best optimizations are, unsurprisingly, "touch as little data as few times as possible" and "move as little data as possible". Every time you have to serialize/de-serialize, and every time you have to perform disk/network I/O, you introduce a lot of performance cost and therefore overall cost to your wallet.

Naturally, this can put OTel directly at odds with efficiency because the OTel collector is an extra I/O and serialization hop. But then again, if you operate at the petabyte scale, the amount of money you save by throwing away a single hop can more than pay for an engineer whose only job is to write serializer/deserializer logic.

revskill•1h ago
THis industry is mostly filled with half-baked or in-progress standards which leads to segmentation of the ecosystems. From graphql, to openapi, to mcp,... to everything, nothing is perfect and it's fine.

The problem is, people who created spec is just following trial and error approach, which is insane.