frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Introducing the Massive Legal Embedding Benchmark (MLEB)

https://isaacus.com/blog/introducing-mleb
5•ubutler•3h ago

Comments

afistfullof•3h ago
"We note, however, that there is, unfortunately, a serious risk that Voyage’s models were trained on some of the evaluation sets in MLEB, particularly SCALR and Consumer Contracts QA, which are both also part of MTEB, due to the fact that Voyage trains on their customers’ private data by default (which would invariably include benchmarks). This is also a risk for Cohere and Jina models."

Wow.

ubutler•3h ago
We were unfortunately disappointed to discover that, yes, Voyage, Cohere, and Jina all train on the data of their API customers by default.

Voyage's terms say:

> you grant Voyage AI (and its successors and assigns) a worldwide, irrevocable, perpetual, royalty-free, fully paid-up, right and license to use, copy, reproduce, distribute, prepare derivative works of, display and perform the Customer Content: ... (iii) to train, improve, and otherwise further develop the Service (such as by training the artificial intelligence models we use).

Cohere's terms say:

> YOU GRANT US A ... RIGHT TO ... USE ... ANY DATA ... TO ... IMPROVE AND ENHANCE THE COHERE SOLUTION AND OUR OTHER OFFERINGS AND BENCHMARK THE FOREGOING, INCLUDING BY SHARING API DATA AND FINETUNING DATA WITH THIRD PARTIES ...

Jina's terms say:

> Jina AI shall, subject to applicable mandatory data protection requirements, be entitled to retain data uploaded to the Jina AI Systems or otherwise provided by the Customer or collected by Jina AI in the course of providing the Services and to use such data in anonymized/pseudonymized format for its business purposes including to improve its artificial intelligence applications.

abksaai•3h ago
This is the most interesting part of this article.
abksaai•3h ago
Very interesting and alarming.

GitHub/accessibility-scanner: finds accessibility gaps and attempts to fix them

https://github.com/github/accessibility-scanner
1•robin_reala•20s ago•0 comments

A Stateful Browser Agent Using Self-Healing DOM Maps

https://100x.bot/a/a-stateful-browser-agent-using-self-healing-dom-maps
1•shardullavekar•2m ago•0 comments

Outlook marked Microsoft's Copilot email as spam

https://media.flashblaze.dev/Screenshot_20251016_084256_Outlook.jpg
2•flashblaze•3m ago•1 comments

Cosmic Dust Could Have Helped Get Life Going on Earth

https://astrobiology.arizona.edu/news/cosmic-dust-could-have-helped-get-life-going-earth
1•geox•3m ago•0 comments

Sunshine Kills Bugs

https://squirrelsquadron.substack.com/p/sunshine-kills-bugs
1•squirrel•5m ago•0 comments

OpenAI's ChatGPT will soon allow 'erotica' for adults in major policy shift

https://www.cnbc.com/2025/10/15/erotica-coming-to-chatgpt-this-year-says-openai-ceo-sam-altman.html
1•mgh2•7m ago•1 comments

Four Strategies for Organizing Code (2016)

https://medium.com/@msandin/strategies-for-organizing-code-2c9d690b6f33
1•oftenwrong•7m ago•0 comments

Stop Dismissing 'AI Cognition' as Metaphor – Evidence seems to show it's real

https://github.com/shaunbuswell/cognitive-type-system
1•buzzovich•8m ago•1 comments

Gist of Go: Atomics

https://antonz.org/go-concurrency/atomics/
1•rbanffy•11m ago•0 comments

How The Pentagon Is Blocking Out News Organizations

https://www.nytimes.com/interactive/2025/10/15/business/media/pentatgon-press-rules.html
2•jshprentz•11m ago•0 comments

Nightmare Fuel: What is Skibidi Toilet, How it demos a non-narrative future

https://journal.media-culture.org.au/index.php/mcjournal/article/view/3108
2•mallowdram•12m ago•0 comments

Alex Jones Warns of a Globalist Death Cult Fueling Civil War and the Antichrist [video]

https://www.youtube.com/watch?v=DDD_N6ZcCV4
1•keepamovin•14m ago•1 comments

Measuring Stress

https://www.aidlab.com/blog/measuring-stress
1•guzik•15m ago•0 comments

Std: Introduce `Io` Interface by Andrewrk

https://github.com/ziglang/zig/pull/25592
2•database64128•15m ago•0 comments

Scaling Instruction-Selection Verification Against Authoritative ISA Semantics

https://doi.org/10.1145/3764383
1•mmcloughlin•15m ago•0 comments

Scheme Steering Committee Election

https://r7rs.org/sc/
1•todsacerdoti•16m ago•0 comments

Is Librem Mail Down?

https://status.librem.one/
1•leavenotracks•16m ago•2 comments

Job-Doc Fit

https://quietmoats.substack.com/p/job-doc-fit
1•okossi•17m ago•0 comments

Chat-GPT becomes Sex-GPT for verified adults

https://twitter.com/sama/status/1978129344598827128
3•smartmic•22m ago•1 comments

A experimental NES emulator written in Haskell

https://github.com/Arthi-chaud/FuNes
2•yehoshuapw•22m ago•0 comments

my dotfiles. nothing fancy. 317 total LOCs.

https://github.com/danielfalbo/dotfiles
1•danielfalbo•25m ago•0 comments

Autopoietic Networks (a few more examples)

https://gbragafibra.github.io/2025/05/27/autopoietic_nets2.html
1•Fibra•25m ago•0 comments

Chatbots Are a Waste of A.I.'S Real Potential

https://www.nytimes.com/2025/10/16/opinion/ai-specialized-potential.html
1•tysone•27m ago•0 comments

The DORA 4 key metrics become 5

https://cd.foundation/blog/2025/10/16/dora-5-metrics/
1•gpi•27m ago•0 comments

Drivers Beg for Relief Bill to Allow More Parking for Bathroom Breaks and Rest

https://www.thecity.nyc/2025/09/15/bathroom-relief-bill-council-uber-lyft-drivers/
2•PaulHoule•32m ago•1 comments

What are your thoughts on vibe coding as professionals?

1•eibrahim•32m ago•0 comments

Show HN: Collaborative Music Discovery with AI

https://back2back.ai/
2•pj4533•33m ago•1 comments

How slow is channel-based iteration?

https://www.dolthub.com/blog/2025-10-10-how-slow-is-channel-iteration/
1•Zababa•33m ago•0 comments

Show HN: I made an AI agent write a spoiler-free wiki for the novel Anathem

https://avoutarchive.com/wiki/
1•teoryn•34m ago•0 comments

Keep Your Vue Apps Fresh

https://wedgworth.dev/keep-your-vue-apps-fresh/
1•paltman•35m ago•0 comments