frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Scientists may have found a way to eliminate chromosome linked to Down syndrome

https://academic.oup.com/pnasnexus/article/4/2/pgaf022/8016019
200•MattSayar•4h ago•138 comments

Graphene OS: a security-enhanced Android build

https://lwn.net/SubscriberLink/1030004/898017c7953c0946/
135•madars•4h ago•59 comments

Anthropic teams use Claude Code

https://www.anthropic.com/news/how-anthropic-teams-use-claude-code
57•yurivish•1h ago•43 comments

Inter-Planetary Network Special Interest Group

https://www.ipnsig.org
99•OhMeadhbh•6h ago•27 comments

Alto turns your Apple Notes into a website

https://alto.so/
19•colinprince•2h ago•7 comments

Positron – A next-generation data science IDE

https://positron.posit.co/
113•amai•3d ago•46 comments

I wasted weeks hand optimizing assembly because I benchmarked on random data

https://www.vidarholen.net/contents/blog/?p=1160
231•thunderbong•3d ago•68 comments

There is no memory safety without thread safety

https://www.ralfj.de/blog/2025/07/24/memory-safety.html
271•tavianator•11h ago•250 comments

AMD CEO sees chips from TSMC's US plant costing 5%-20% more

https://www.bloomberg.com/news/articles/2025-07-23/amd-ceo-su-sees-chips-from-us-tsmc-plant-costing-5-to-20-more
262•mfiguiere•1d ago•445 comments

New Aarch64 Back End

https://ziglang.org/devlog/2025/#2025-07-23
65•Bogdanp•5h ago•14 comments

A GPU Calculator That Helps Calculate What GPU to Use

https://calculator.inference.ai/
40•chlobunnee•4h ago•14 comments

PSA: SQLite WAL checksums fail silently and may lose data

https://avi.im/blag/2025/sqlite-wal-checksum/
242•avinassh•11h ago•111 comments

Revisiting Moneyball

https://djpardis.medium.com/revisiting-moneyball-074fc2435b07
59•sebg•5h ago•21 comments

Visa and Mastercard: The global payment duopoly (2024)

https://quartr.com/insights/edge/visa-and-mastercard-the-global-payment-duopoly
235•bilekas•5h ago•129 comments

RE#: High performance derivative-based regular expression matching (2024)

https://arxiv.org/abs/2407.20479
20•fanf2•3d ago•5 comments

Air Force unit suspends use of Sig Sauer pistol after shooting death of airman

https://www.nhpr.org/nh-news/2025-07-23/sig-sauer-pistol-air-force-shooting-death
106•duxup•8h ago•195 comments

Use Your Type System

https://www.dzombak.com/blog/2025/07/use-your-type-system/
228•ingve•11h ago•230 comments

Vet is a safety net for the curl | bash pattern

https://github.com/vet-run/vet
177•mooreds•13h ago•164 comments

Open Source Maintenance Fee

https://github.com/wixtoolset/issues/issues/8974
223•AndrewDucker•14h ago•158 comments

Covers as a way of learning music and code

https://ntietz.com/blog/covers-as-a-way-of-learning/
133•zdw•3d ago•69 comments

Intel CEO Letter to Employees

https://morethanmoore.substack.com/p/intel-ceo-letter-to-employees
189•fancy_pantser•5h ago•329 comments

Low-Temp 2D Semiconductors: A Chipmaking Shift

https://spectrum.ieee.org/cdimensions-2d-semiconductors
4•rbanffy•3d ago•0 comments

Why concatenative programming matters (2012)

http://evincarofautumn.blogspot.com/2012/02/why-concatenative-programming-matters.html
47•azhenley•3d ago•11 comments

Bus Bunching

https://www.futilitycloset.com/2025/07/12/bus-bunching/
52•surprisetalk•4d ago•57 comments

Superfunctions: A universal solution against sync/async fragmentation in Python

https://github.com/pomponchik/transfunctions
25•pomponchik•3d ago•26 comments

Mwm – The smallest usable X11 window manager

https://github.com/lslvr/mwm
132•daureg•3d ago•55 comments

UK: Phone networks down: EE, BT, Three, Vodafone, O2 not working in mass outage

https://www.the-independent.com/tech/ee-bt-three-vodafone-o2-down-phone-networks-outage-latest-b2795260.html
196•oger•13h ago•87 comments

Writing is thinking

https://www.nature.com/articles/s44222-025-00323-4
270•__rito__•3d ago•117 comments

American sentenced for helping North Koreans get jobs at U.S. firms

https://fortune.com/2025/07/24/north-korean-it-workers-chapman-nike/
106•fortran77•6h ago•74 comments

The POSIX specification of vi

https://pubs.opengroup.org/onlinepubs/9799919799/utilities/vi.html
63•exvi•3d ago•20 comments
Open in hackernews

FastVLM: Efficient Vision Encoding for Vision Language Models

https://machinelearning.apple.com/research/fast-vision-language-models
90•2bit•1d ago

Comments

meatmanek•1d ago
I guess this is the paper / announcement about https://github.com/apple/ml-fastvlm, which was previously discussed in https://news.ycombinator.com/item?id=44661527
yorwba•1d ago
I think you meant to link to https://news.ycombinator.com/item?id=43968897
meatmanek•22h ago
Oops, you are correct.
godelski•1d ago
Personally, I didn't find too much value in this paper. I think it is good as a product demonstration, I just don't know if it added a ton of value into the research space (but maybe it did because people have been making the same mistake for awhile?).

I actually think the linked page makes it very easy to understand my main critique. The main problem here is that downscaling is a destructive process. It destroys information. Zoom in on that sign, can you read it?[0] No! But can you in the high res?[1] Of course!

We can of course train the model on those signs alone and then get it to recognize what the sign should say, the same way you might do this (not by reading words, but by reading the symbol), but we may run into problems when downsampling images, especially with subtle biases that those algorithms can create, which even includes tiling[3].

If the main thesis is "training on larger resolution results in better performance on high resolution images" then this seems to be a conclusion we already knew from a pure mathematical understanding of entropy, and is something many researchers have been discussing for decades.

There are a lot of evaluations here but it is not explicitly clear to me that the architecture is playing the main role. There is very little in the ablation study and a larger focus on dataset coverage. Table 1 is difficult interpret. While I commend the fine tuning of ViT it would not distinguish the entropy problem as (IIRC) VIT was pretrained on 224x224 resolution images and then fine-tuned to a higher resolution. More fine tuning isn't going to make that problem go away. Table 2 helps us understand pooling but does more in terms of dataset coverage than the coverage of solution space.

I don't think it is bad in the way of "this is not a useful thing that was built" but "the way this is communicated makes it difficult for me as a researcher to interpret the reason for these results." In a way, my criticism here is much more general than just this paper. I am frustrated with the recent trends in AI research that there is more focus being put into coverage of datasets over interpretation. Interpretation such as more in depth ablations (e.g. holding variables constant, changing specific parameters for a test[4]). There isn't infinite compute, so I'm not expecting the world. But in the trade-off between dataset coverage and more thorough ablations, I'd significantly prefer the latter. It is entirely possible that the architectural changes here are critical to the model's ability to properly encode the information. There are hints at it in the paper but it is difficult to distinguish form training procedures and simply the entropy. There's many moving parts and the information provided is not enough to distinguish (or distinguish to an acceptable threshold). I don't entirely blame researchers for making their choice in trade-offs, we can't encourage more in depth ablations until reviewers stop using "what about x dataset" as a excuse[5]. This paradigm of dataset coverage really feels like a lot of wasted compute. And honestly, I suspect we'd make far more improvements were we to change paradigms, as well as many of those improvements would come from much smaller labs without these large compute resources.

[0] Small Res: http://0x0.st/8nU3.png

[1] High Res: https://0x0.st/8nUE.png

[2] https://www.cs.cmu.edu/~clean-fid/

[3] https://arxiv.org/abs/2104.05704

[4] It would be nice to change one parameter at a time but sometimes things are coupled.

[5] "I'm curious about performance on x dataset because x dataset has y quality that I think is important" is a perfectly fine critique. But I rarely see that type of criticism in reviews. They include the demand but not the motivation for the demand. Just leads to noisy reviewing as an author can't infer if reviewer is asking because they're lazy or because they think lack of inclusion undermines the author's claims.

imtringued•18h ago
>If the main thesis is "training on larger resolution results in better performance on high resolution images" then this seems to be a conclusion we already knew from a pure mathematical understanding of entropy, and is something many researchers have been discussing for decades.

I think you missed the part where the word performance is doing double duty here. Performance as in accuracy of the result and performance as in the time it takes to achieve said result.

The expectation is that training on a larger resolution will worsen performance in the second sense. You also mentioned that downsampling images will destroy information, hence FastVLM should also perform worse in the first sense, since it is clearly running its transformer layers on downsampled images through the patch embedding halving the image resolution with each layer.

To be fair, the presented network architecture does not really look like anything special. Three CNN layers with two transformer layers is just good product engineering. The real insight to be had here is that writing your own custom downsampling algorithm is a waste of time. You should make the downsampling learnable and part of the model.

godelski•8h ago
Sorry, I should clarify

  > The expectation is that training on a larger resolution will worsen performance in the second sense.

  > downsampling images will destroy information, hence FastVLM should also perform worse in the first sense
I do not think these are in contention. By training on larger images or embedding subnetwork can better learn to embed the requisite information. It need not hurt performance, in the sense of inference speed. This would require wise inference speeds were everything held equal or we just naively scaled. But it can actually be better if the learned algorithm is more efficient at extracting information, where there's the advantage of having access to more information. The larger resolution photo simply contains more information. On the other hand, if you train a model for a different downsampling task that information may not transfer well to the new downsampling task, which makes finetuning tricky and insufficient for a hard conclusion.

Note that their model is smaller. That actually can give us good analysis opportunities, as this suggests what I'm implying: more efficient embedding.

  > Three CNN layers with two transformer layers is just good product engineering. The real insight to be had here is that writing your own custom downsampling algorithm is a waste of time. You should make the downsampling learnable and part of the model.
Actually that's the reason I linked [3] is because it reminded me of that paper. They used an overlapping (convolution) patch-and-embed method in the ViT model as opposed to the hard standard partitioning. Which in effect, is the same conclusion: learn your downsampler (embedder)

I think we're pretty much in agreement. I just really want to see more ablations