frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Claude 4 System Card

https://simonwillison.net/2025/May/25/claude-4-system-card/
98•pvg•3h ago•29 comments

Reinvent the Wheel

https://endler.dev/2025/reinvent-the-wheel/
400•zdw•13h ago•162 comments

On File Formats

https://solhsa.com/oldernews2025.html#ON-FILE-FORMATS
61•ibobev•4d ago•38 comments

How to Install Windows NT 4 Server on Proxmox

https://blog.pipetogrep.org/2025/05/23/how-to-install-windows-nt-4-server-on-proxmox/
89•thepipetogrep•8h ago•29 comments

Google Shows Off Android XR Smart Glasses with In-Lens Display

https://www.macrumors.com/2025/05/20/google-android-xr-smart-glasses/
21•tosh•3d ago•23 comments

I used o3 to find a remote zeroday in the Linux SMB implementation

https://sean.heelan.io/2025/05/22/how-i-used-o3-to-find-cve-2025-37899-a-remote-zeroday-vulnerability-in-the-linux-kernels-smb-implementation/
477•zielmicha•19h ago•137 comments

Why old games never die, but new ones do

https://pleromanonx86.wordpress.com/2025/05/06/why-old-games-never-die-but-new-ones-do/
150•airhangerf15•12h ago•141 comments

Infinite Tool Use

https://snimu.github.io/2025/05/23/infinite-tool-use.html
8•tosh•2h ago•0 comments

Space is not a wall: toward a less architectural level design

https://www.blog.radiator.debacle.us/2025/05/space-is-not-wall-toward-less.html
21•PaulHoule•3d ago•2 comments

Tachy0n: The Last 0day Jailbreak

https://blog.siguza.net/tachy0n/
207•todsacerdoti•13h ago•30 comments

The WinRAR Approach

https://basicappleguy.com/basicappleblog/the-winrar-approach
62•frizlab•4d ago•41 comments

Good Writing

https://paulgraham.com/goodwriting.html
227•oli5679•18h ago•235 comments

Hydra: Vehicles on the island – 'After the works they abandon them here'

https://en.protothema.gr/2025/05/19/hydra-see-photos-of-vehicles-on-the-island-after-the-works-they-abandon-them-here-say-residents/
7•gnabgib•2d ago•1 comments

Show HN: Rotary Phone Dial Linux Kernel Driver

https://gitlab.com/sephalon/rotary_dial_kmod
306•sephalon•20h ago•43 comments

Nvidia Pushes Further into Cloud with GPU Marketplace

https://www.wsj.com/articles/nvidia-pushes-further-into-cloud-with-gpu-marketplace-4fba6bdd
63•Bostonian•3d ago•41 comments

Hong Kong's Famous Bamboo Scaffolding Hangs on (For Now)

https://www.nytimes.com/2025/05/24/world/asia/hongkong-bamboo-scaffolding.html
177•perihelions•21h ago•51 comments

The Xenon Death Flash: How a Camera Nearly Killed the Raspberry Pi 2

https://magnus919.com/2025/05/the-xenon-death-flash-how-a-camera-nearly-killed-the-raspberry-pi-2/
201•DamonHD•21h ago•75 comments

Using the Apple ][+ with the RetroTink-5X

https://nicole.express/2025/apple-ii-more-like-apple-5x.html
37•zdw•12h ago•9 comments

Peer Programming with LLMs, for Senior+ Engineers

https://pmbanugo.me/blog/peer-programming-with-llms
143•pmbanugo•19h ago•61 comments

Lone coder cracks 50-year puzzle to find Boggle's top-scoring board

https://www.ft.com/content/0ab64ced-1ed1-466d-acd3-78510d10c3a1
142•DavidSJ•15h ago•29 comments

An Almost Pointless Exercise in GPU Optimization

https://blog.speechmatics.com/pointless-gpu-optimization-exercise
47•atomlib•4d ago•2 comments

Scientific conferences are leaving the US amid border fears

https://www.nature.com/articles/d41586-025-01636-5
315•mdhb•11h ago•199 comments

Contacts let you see in the dark with your eyes closed

https://scitechdaily.com/from-sci-fi-to-superpower-these-contacts-let-you-see-in-the-dark-with-your-eyes-closed/
43•geox•2d ago•8 comments

It is time to stop teaching frequentism to non-statisticians (2012)

https://arxiv.org/abs/1201.2590
68•Tomte•16h ago•57 comments

The Logistics of Road War in the Wasteland

https://acoup.blog/2025/05/23/collections-the-logistics-of-road-war-in-the-wasteland/
69•ecliptik•13h ago•28 comments

Domain Theory Lecture Notes

https://liamoc.net/forest/dt-001Y/index.xml
29•todsacerdoti•9h ago•3 comments

AI, Heidegger, and Evangelion

https://fakepixels.substack.com/p/ai-heidegger-and-evangelion
136•jger15•19h ago•70 comments

Exposed Industrial Control Systems and Honeypots in the Wild [pdf]

https://gsmaragd.github.io/publications/EuroSP2025-ICS/EuroSP2025-ICS.pdf
47•gnabgib•15h ago•0 comments

Microsoft-backed UK tech unicorn Builder.ai collapses into insolvency

https://www.ft.com/content/9fdb4e2b-93ea-436d-92e5-fa76ee786caa
116•louthy•21h ago•90 comments

Personal Computer Origins: The Datapoint 2200

https://thechipletter.substack.com/p/personal-computer-origins-the-datapoint
19•rbanffy•3d ago•1 comments
Open in hackernews

On File Formats

https://solhsa.com/oldernews2025.html#ON-FILE-FORMATS
59•ibobev•4d ago

Comments

adelpozo•4h ago
I would add to make it streamable or at least allow to be read remotely efficiently.
flowerthoughts•3h ago
Agreed on that one. With a nice file format, streamable is hopefully just a matter of ordering things appropriately once you know the sizes of the individual chunks. You want to write the index last, but you want to read it first. Perhaps you want the most influential values first if you're building something progressive (level-of-detail split.)

Similar is the discussion of delimited fields vs. length prefix. Delimited fields are nicer to write, but length prefixed fields are nicer to read. I think most new formats use length prefixes, so I'd start there. I wrote a blog post about combining the value and length into a VLI that also handles floating point and bit/byte strings: https://tommie.github.io/a/2024/06/small-encoding

lifthrasiir•3h ago
I don't think a single encoding is generally useful. A good encoding for given application would depend on the value distribution and neighboring data. For example any variable-length scalar encoding would make vectorization much harder.
mjevans•4h ago
Most of that's pretty good.

Compression: For anything that ends up large it's probably desired. Though consider both algorithm and 'strength' based on the use case carefully. Even a simple algorithm might make things faster when it comes time to transfer or write to permanent storage. A high cost search to squeeze out yet more redundancy is probably worth it if something will be copied and/or decompressed many times, but might not be worth it for that locally compiled kernel you'll boot at most 10 times before replacing it with another.

thasso•3h ago
For archive formats, or anything that has a table of contents or an index, consider putting the index at the end of the file so that you can append to it without moving a lot of data around. This also allows for easy concatenation.
charcircuit•3h ago
Why not put it at the beginning so that it is available at the start of the filestream that way it is easier to get first so you know what other ranges of the file you may need?

>This also allows for easy concatenation.

How would it be easier than putting it at the front?

lifthrasiir•3h ago
If the archive is being updated in place, turning ABC# into ABCD#' (where # and #' are indices) is easier than turning #ABC into #'ABCD. The actual position of indices doesn't matter much if the stream is seekable. I don't think the concatenation is a good argument though.
shakna•3h ago
Files are... Flat streams. Sort of.

So if you rewrite an index at the head of the file, you may end up having to rewrite everything that comes afterwards, to push it further down in the file, if it overflows any padding offset. Which makes appending an extremely slow operation.

Whereas seeking to end, and then rewinding, is not nearly as costly.

charcircuit•2h ago
Most workflows do not modify files in place but rather create new files as its safer and allows you to go back to the original if you made a mistake.
shakna•57m ago
If you're writing twice, you don't care about the performance to begin with. Or the size of the files being produced.

But if you're writing indices, there's a good chance that you do care about performance.

charcircuit•44m ago
Files are often authored once and read / used many times. When authoring a file performance is less important and there is plenty of file space available. Indices are for the performance for using the file which is more important than the performance for authoring it.
shakna•24m ago
If storage and concern aren't a concern when writing, then you probably shouldn't be doing workarounds to include the index in the file itself. Follow the dbm approach and separate both into two different files.

Which is what dbm, bdb, Windows search indexes, IBM datasets, and so many, many other standards will do.

PhilipRoman•2h ago
You can do it via fallocate(2) FALLOC_FL_INSERT_RANGE and FALLOC_FL_COLLAPSE_RANGE but sadly these still have a lot of limitations and are not portable. Based on discussions I've read, it seems there is no real motivation for implementing support for it, since anyone who cares about the performance of doing this will use some DB format anyway.

In theory, files should be just unrolled linked lists (or trees) of bytes, but I guess a lot of internal code still assumes full, aligned blocks.

McGlockenshire•3h ago
> How would it be easier than putting it at the front?

Have you ever wondered why `tar` is the Tape Archive? Tape. Magnetic recording tape. You stream data to it, and rewinding is Hard, so you put the list of files you just dealt with at the very end. This now-obsolete hardware expectation touches us decades later.

jclulow•3h ago
tar streams don't have an index at all, actually, they're just a series of header blocks and data blocks. Some backup software built on top may include a catalog of some kind inside the tar stream itself, of course, and may choose to do so as the last entry.
charcircuit•2h ago
But new file formats being developed are most likely not going to be designed to be used with tapes. If you want to avoid rewinds you can write a new concatenated version of the files. This also allows you to keep the original in case you need it.
MattPalmer1086•11m ago
Imagine you have a 12Gb zip file, and you want to add one more file to it. Very easy and quick if the index is at the end, very slow if it's at the start (assuming your index now needs more space than is available currently).

Reading the index from the end of the file is also quick; where you read next depends on what you are trying to find in it, which may not be the start.

leiserfg•3h ago
If binary, consider just using SQLite.
lifthrasiir•3h ago
Using SQLite as a container format is only beneficial when the file format itself is a composite, like word processor files which will include both the textual data and any attachments. SQLite is just a hinderance otherwise, like image file formats or archival/compressed file formats [1].

[1] SQLite's own sqlar format is a bad idea for this reason.

frainfreeze•3h ago
sqlar proved a great solution in the past for me. Where does it fall short in your experience?
lifthrasiir•2h ago
Unless you are using the container file as a database too, sqlar is strictly inferior to ZIP in terms of pretty much everything [1]. I'm actually more interested in the context sqlar did prove useful for you.

[1] https://news.ycombinator.com/item?id=28670418

SyrupThinker•2h ago
From my own experience SQLite works just fine as the container for an archive format.

It ends up having some overhead compared to established ones, but the ability to query over the attributes of 10000s of files is pretty nice, and definitely faster than the worst case of tar.

My archiver could even keep up with 7z in some cases (for size and access speed).

Implementing it is also not particularly tricky, and SQLite even allows streaming the blobs.

Making readers for such a format seems more accessible to me.

lifthrasiir•2h ago
SQLite format itself is not very simple, because it is a database file format in its heart. By using SQLite you are unknowingly constraining your use case; for example you can indeed stream BLOBs, but you can't randomly access BLOBs because the SQLite format puts a large BLOB into pages in a linked list, at least when I checked last. And BLOBs are limited in size anyway (4GB AFAIK) so streaming itself might not be that useful. The use of SQLite also means that you have to bring SQLite into your code base, and SQLite is not very small if you are just using it as a container.

> My archiver could even keep up with 7z in some cases (for size and access speed).

7z might feel slow because it enables solid compression by default, which trades decompression speed with compression ratio. I can't imagine 7z having a similar compression ratio with correct options though, was your input incompressible?

sureglymop•2h ago
I think it's fine as an image format. I've used the mbtiles format which is basically just a table filled with map tiles. Sqlite makes it super easy to deal with it, e.g. to dump individual blobs and save them as image files.

It just may not always be the most performant option. For example, for map tiles there is alternatively the pmtiles binary format which is optimized for http range requests.

InsideOutSanta•2h ago
The Mac image editor Acorn uses SQLite as its file format. It's described here:

https://shapeof.com/archives/2025/4/acorn_file_format.html

The author notes that an advantage is that other programs can easily read the file format and extract information from it.

lifthrasiir•1h ago
It is clearly a composite file format [1]:

> Acorn’s native file format is used to losslessly store layer data, editable text, layer filters, an optional composite of the image, and various metadata. Its advantage over other common formats such as PNG or JPEG is that it preserves all this native information without flattening the layer data or vector graphics.

As I've mentioned, this is a good use case for SQLite as a container. But ZIP would work equally well here.

[1] https://flyingmeat.com/acorn/docs/technotes/ACTN002.html

shakna•3h ago
Spent the weekend with an untagged chunked format, and... I rather hate it.

A friend wanted a newer save viewer/editor for Dragonball Xenoverse 2, because there's about a total of two, and they're slow to update.

I thought it'd be fairly easy to spin up something to read it, because I've spun up a bunch of save editors before, and they're usually trivial.

XV2 save files change over versions. They're also just arrays of structs [0], that don't properly identify themselves, so some parts of them you're just guessing. Each chunk can also contain chunks - some of which are actually a network request to get more chunks from elsewhere in the codebase!

[0] Also encrypted before dumping to disk, but the keys have been known since about the second release, and they've never switched them.

lifthrasiir•3h ago
Generally good points. Unfortunately existing file formats are rarely following these rules. In fact these rules should form naturally when you are dealing with many different file formats anyway. Specific points follow:

- Agreed that human-readable formats have to be dead simple, otherwise binary formats should be used. Note that textual numbers are surprisingly complex to handle, so any formats with significant number uses should just use binary.

- Chunking is generally good for structuring and incremental parsing, but do not expect it to provide reorderability or back/forward compatibility somehow. Unless explicitly designed, they do not exist. Consider PNG for example; PNG chunks were designed to be quite robust, but nowadays some exceptions [1] do exist. Versioning is much more crucial for that.

[1] https://www.w3.org/TR/png/#animation-information

- Making a new file format from scratch is always difficult. Already mentioned, but you should really consider using existing file formats as a container first. Some formats are even explicitly designed for this purpose, like sBOX [2] or RFC 9277 CBOR-labeled data tags [3].

[2] https://nothings.org/computer/sbox/sbox.html

[3] https://www.rfc-editor.org/rfc/rfc9277.html

constantcrying•2h ago
Also you should consider the context in which you are developing. Often there are "standard" tools and methods to deal with the kind of data you want to store.

E.g. if you are interested in storing significant amounts of structured floating point data, choosing something like HDF5 will not only make your life easier it will also make it easy to communicate what you have done to others.

InsideOutSanta•2h ago
>Most extensions have three characters, which means the search space is pretty crowded. You may want to consider using four letters.

Is there a reason not to use a lot more characters? If your application's name is MustacheMingle, call the file foo.mustachemingle instead of foo.mumi?

This will decrease the probability of collision to almost zero. I am unaware of any operating systems that don't allow it, and it will be 100% clear to the user which application the file belongs to.

It will be less aesthetically pleasing than a shorter extension, but that's probably mainly a matter of habit. We're just not used to longer file name extensions.

Any reason why this is a bad idea?

delusional•2h ago
> it will be 100% clear to the user which application the file belongs to.

The most popular operating system hides it from the user, so clarity would not improve in that case. At leat one other (Linux) doesn't really use "extensions" and instead relies on magic headers inside the files to determine the format.

Otherwise I think the decision is largely aestethic. If you value absolute clarity, then I don't see any reason it won't work, it'll just be a little "ugly"

hiAndrewQuinn•1h ago
I don't even think it's ugly. I'm incredibly thankful every time I see someone make e.g. `db.sqlite`, it immediately sets me at ease to know I'm not accidentally dealing with a DuckDB file or something.
wvbdmp•29m ago
Yes, oh my god. Stop using .db for Sqlite files!!! It’s too generic and it’s already used by Windows for those thumbnail system files.
dist-epoch•1h ago
> At leat one other (Linux) doesn't really use "extensions" and instead relies on magic headers inside the files to determine the format.

mostly for executable files.

I doubt many Linux apps look inside a .py file to see if it's actually a JPEG they should build a thumbnail for.

scrollaway•1h ago
Your doubts are incorrect. There's a fairly standard way of extracting the file type out of files on linux, which relies on a mix of extensions and magic bytes. Here's where you can start to read about this:

https://wiki.archlinux.org/title/XDG_MIME_Applications

A lot of apps implement this (including most file managers)

Hackbraten•1h ago
A 14-character extension might cause UX issues in desktop environments and file managers, where screen real estate per directory entry is usually very limited.

When under pixel pressure, a graphical file manager might choose to prioritize displaying the file extension and truncate only the base filename. This would help the user identify file formats. However, the longer the extension, the less space remains for the base name. So a low-entropy file extension with too many characters can contribute to poor UX.

dist-epoch•1h ago
> call the file foo.mustachemingle

You could go the whole java way then foo.com.apache.mustachemingle

> Any reason why this is a bad idea

the focus should be on the name, not on the extension.

strogonoff•1h ago
Thinking about a file format is a good way to clarify your vision. Even if you don’t want to facilitate interop, you’d get some benefits for free—if you can encapsulate the state of a particular thing that the user is working on, you could, for example, easily restore their work when they return, etc.

Some cop-out (not necessarily in a bad way) file formats:

1. Don’t have a file format, just specify a directory layout instead. Example: CinemaDNG. Throw a bunch of particularly named DNGs (a file for each frame of the footage) in a directory, maybe add some metadata file or a marker, and you’re good. Compared to the likes of CRAW or BRAW, you lose in compression, but gain in interop.

2. Just dump runtime data. Example: Mnemosyne’s old format. Do you use Python? Just dump your state as a Python pickle. (Con: dependency on a particular runtime, good luck rewriting it in Rust.)

3. Almost dump runtime data. Example: Anki, newer Mnemosyne with their SQLite dumps. (Something suggests to me that they might be using SQLite at runtime.) A step up from a pickle in terms of interop, somewhat opens yourself (but also others) to alternative implementations, at least in any runtime that has the means to read SQLite. I hope if you use this you don’t think that the presence of SQL schema makes the format self-documenting.

4. One or more of the above, except also zip or tar it up. Example: VCV, Anki.