frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

I scanned 2,500 Hugging Face models for malware/issues. Here is the data

https://github.com/ArseniiBrazhnyk/Veritensor
22•arseniibr•3d ago
Hi HN,

I built a CLI tool called Veritensor for scunning AI models, because I found out that downloading model weights from 3rd party websites and loading them with torch.load() can lead to RCE. At the same time, simple regex scanners are easy to bypass.

To test my tool, I ran it against 2500 new and trending models on Hugging Face.

Here is what I found — 86 failed models: Broken files — 16 models were actually Git LFS text pointers (several hundred bytes), not binaries. If you try to load them, your code crashes. Hidden Licenses — 5 models. I found models with Non-Commercial licenses hidden inside the .safetensors headers, even if the repo looked open source. Shadow Dependencies — 49 models. Many models tried to import libraries I didn't have (like ultralytics or deepspeed). My tool blocked them because I use a strict allowlist of libraries. Suspicious Code — 11 files used STACK_GLOBAL to build function names dynamically. This is a common way how RCE malware hides, though in my case, it was mostly old numpy files. Scan Errors — 5 models failed because of missing local dependencies (like h5py for old Keras files).

I was able to detect some threats because under the hood, Veritensor works differently from common regex scanners. Instead of searching for suspicious text, it simulates how Pickle loads data, which helps it find hidden payloads without running any code. It also checks that the model file is real by hashing it and comparing it with the version from Hugging Face, so fake or changed models can be detected. Veritensor also looks at model metadata in formats like Safetensors and GGUF to spot license restrictions. If everything looks safe, it can sign the container using Sigstore Cosign.

It supports PyTorch, Keras, and GGUF. Free to use — Apache 2.0.

Repo: https://github.com/ArseniiBrazhnyk/Veritensor Data of the scan [CSV/JSON]: https://drive.google.com/drive/folders/1G-Bq063zk8szx9fAQ3NN... PyPI: pip install veritensor

Let me know if you have any feedback, have you ever faced similar threats and whether this tool could be useful for you.

Comments

patrakov•3d ago
The single --force flag is not a good design decision. Please break it up (EDIT: I see you already did it partially in veritensor.yaml). Right now, according to the description, it suppresses detection of both genuinely non-commercial/AGPL models and models with inconsistent licensing data. Also, I might accept AGPL but not CC-BY-NC.

Probably, it would be better to split it into --accept-model-license=AGPL --accept-inconsistent-licensing --ignore-layer-license-metadata --ignore-rce-vector=os.system and so on.

arseniibr•3d ago
Thank you for the valuable feedback. I agree that having granular CLI flags is better for ad-hoc scans or CI pipelines where you don't want to commit a config file. Splitting it into --ignore-license vs --ignore-malware (which should probably never be ignored easily) is a great design decision. Added to the roadmap!
embedding-shape•1h ago
> Broken files — 16 models were actually Git LFS text pointers (several hundred bytes), not binaries. If you try to load them, your code crashes.

Yeah, if you don't know how use the repositories, they might look broken :) Pointers are fine, the blobs are downloaded after you fetch the git repository itself, then it's perfectly loadable. Seems like a really basic thing to misunderstand, given the context.

Please, understand how things typically work in the ecosystem before claiming something is broken.

That whatever LLM you used couldn't import some specific libraries also doesn't mean the repository itself has issues.

I think you need to go back to the drawing board here, fully understand how things work, before you set out to analyze what's "broken".

lucrbvi•1h ago
You should know that there is already a solution for this, SafeTensors [0].

But it may be a nice tool for those who download "unsafe" models

[0]: https://huggingface.co/docs/safetensors/index

embedding-shape•59m ago
It seems like this project has decided that .safetensors might not be so safe after all, since it's scanning them too, according to https://drive.google.com/drive/folders/1G-Bq063zk8szx9fAQ3NN... at least.
amelius•41m ago
> loading them with torch.load() can lead to RCE (remote command execution)

Why didn't the Torch team fix this?

embedding-shape•30m ago
OP misunderstands, the issue is specifically with the pickle format, and similar ones, as they're essentially code that needs to be executed, not just data to be loaded. Most of the ecosystem have already moved to using .safetensor format which is just data and doesn't suffer from that issue.

Deutsche Telekom is violating Net Neutrality

https://netzbremse.de/en/
336•tietjens•5h ago•168 comments

This paper has been cited more than 6k times. It's fatally flawed.

https://statmodeling.stat.columbia.edu/2026/01/22/aking/
195•timr•4h ago•76 comments

Show HN: Bonsplit – tabs and splits for native macOS apps

https://bonsplit.alasdairmonk.com
25•sgottit•1h ago•5 comments

Introduction to PostgreSQL Indexes

https://dlt.github.io/blog/posts/introduction-to-postgresql-indexes/
93•dlt•5h ago•1 comments

Jurassic Park - Tablet device on Nedry's desk? (2012)

https://www.therpf.com/forums/threads/jurassic-park-tablet-device-on-nedrys-desk.169883/
59•exvi•4h ago•17 comments

Show HN: TUI for managing XDG default applications

https://github.com/mitjafelicijan/xdgctl
21•mitjafelicijan•2h ago•7 comments

BirdyChat becomes first European chat app that is interoperable with WhatsApp

https://www.birdy.chat/blog/first-to-interoperate-with-whatsapp
647•joooscha•18h ago•397 comments

Adoption of EVs tied to real-world reductions in air pollution: study

https://keck.usc.edu/news/adoption-of-electric-vehicles-tied-to-real-world-reductions-in-air-poll...
446•hhs•13h ago•381 comments

Nango (YC W23, Dev Infrastructure) Is Hiring Remotely

https://jobs.ashbyhq.com/Nango
1•bastienbeurier•1h ago

A Lament for Aperture

https://ikennd.ac/blog/2026/01/old-man-yells-at-modern-software-design/
122•firloop•4d ago•25 comments

BU-808: How to Prolong Lithium-based Batteries (2023)

https://www.batteryuniversity.com/article/bu-808-how-to-prolong-lithium-based-batteries/
23•eswat•2d ago•4 comments

Show HN: AutoShorts – Local, GPU-accelerated AI video pipeline for creators

https://github.com/divyaprakash0426/autoshorts
43•divyaprakash•5h ago•15 comments

Alarm overload is undermining safety at sea as crews face thousands of alerts

https://www.lr.org/en/knowledge/press-room/press-listing/press-release/2026/alarm-overload-is-und...
16•geox•52m ago•5 comments

David Patterson: Challenges and Research Directions for LLM Inference Hardware

https://arxiv.org/abs/2601.05047
84•transpute•10h ago•7 comments

Hands-On with Two Apple Network Server Prototype ROMs

http://oldvcr.blogspot.com/2026/01/hands-on-with-two-apple-network-server.html
13•todsacerdoti•5h ago•0 comments

Google confirms 'high-friction' sideloading flow is coming to Android

https://www.androidauthority.com/google-sideloading-android-high-friction-process-3633468/
294•_____k•5d ago•253 comments

Intrinsically stretchable 2D MoS2 transistors

https://www.nature.com/articles/s41467-026-68504-2
14•bookofjoe•4d ago•0 comments

I built a 2x faster lexer, then discovered I/O was the real bottleneck

https://modulovalue.com/blog/syscall-overhead-tar-gz-io-performance/
49•modulovalue•4d ago•27 comments

Two Weeks Until Tapeout

https://essenceia.github.io/projects/two_weeks_until_tapeout/
143•client4•12h ago•8 comments

Accept_language 2.2 – RFC 7231/4647 compliant Accept-Language parsing for Ruby

https://github.com/cyril/accept_language.rb
9•cyrilllllll•3h ago•0 comments

Postmortem: Our first VLEO satellite mission (with imagery and flight data)

https://albedo.com/post/clarity-1-what-worked-and-where-we-go-next
189•topherhaddad•17h ago•59 comments

Claude Code's new hidden feature: Swarms

https://twitter.com/NicerInPerson/status/2014989679796347375
441•AffableSpatula•22h ago•298 comments

Article on the History of Spot Instances: Analyzing Spot Instance Pricing Change

https://spot.rackspace.com/blogs/history-of-spot-instances
4•aleroawani•4d ago•0 comments

The Rebirth of Pennsylvania's Infamous Burning Town

https://www.atlasobscura.com/articles/centralia-pennsylvania-rebirth
9•pbshgthm•5d ago•1 comments

Raspberry Pi Drag Race: Pi 1 to Pi 5 – Performance Comparison

https://the-diy-life.com/raspberry-pi-drag-race-pi-1-to-pi-5-performance-comparison/
191•verginer•19h ago•83 comments

Putting Rocks on the Moon

https://ahwoo.com/posts/019bd882-d104-7347-be7b-8e0a5ce13cb5
13•epaga•4d ago•0 comments

Typography on Pencils (2023)

https://www.presentandcorrect.com/blogs/blog/typography-on-pencils-1-5
84•NaOH•4d ago•6 comments

We X-Rayed a Suspicious FTDI USB Cable

https://eclypsium.com/blog/xray-counterfeit-usb-cable/
160•aa_is_op•13h ago•65 comments

Sony Data Discman

https://huguesjohnson.com/random/sony-ebook/
5•naves•5h ago•0 comments

Wall Street braced for a private credit meltdown. The risk of one is rising

https://www.cnbc.com/2026/01/23/wall-street-private-credit-risk-rising.html
10•zerosizedweasle•2h ago•0 comments