frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Bitcoin wallet on NXP SE050 secure element, Tor-only open source

https://github.com/0xdeadbeefnetwork/sigil-web
1•sickthecat•41s ago•0 comments

White House Explores Opening Antitrust Probe on Homebuilders

https://www.bloomberg.com/news/articles/2026-02-06/white-house-explores-opening-antitrust-probe-i...
1•petethomas•57s ago•0 comments

Show HN: MindDraft – AI task app with smart actions and auto expense tracking

https://minddraft.ai
1•imthepk•5m ago•0 comments

How do you estimate AI app development costs accurately?

1•insights123•6m ago•0 comments

Going Through Snowden Documents, Part 5

https://libroot.org/posts/going-through-snowden-documents-part-5/
1•goto1•7m ago•0 comments

Show HN: MCP Server for TradeStation

https://github.com/theelderwand/tradestation-mcp
1•theelderwand•10m ago•0 comments

Canada unveils auto industry plan in latest pivot away from US

https://www.bbc.com/news/articles/cvgd2j80klmo
1•breve•11m ago•0 comments

The essential Reinhold Niebuhr: selected essays and addresses

https://archive.org/details/essentialreinhol0000nieb
1•baxtr•13m ago•0 comments

Rentahuman.ai Turns Humans into On-Demand Labor for AI Agents

https://www.forbes.com/sites/ronschmelzer/2026/02/05/when-ai-agents-start-hiring-humans-rentahuma...
1•tempodox•15m ago•0 comments

StovexGlobal – Compliance Gaps to Note

1•ReviewShield•18m ago•1 comments

Show HN: Afelyon – Turns Jira tickets into production-ready PRs (multi-repo)

https://afelyon.com/
1•AbduNebu•19m ago•0 comments

Trump says America should move on from Epstein – it may not be that easy

https://www.bbc.com/news/articles/cy4gj71z0m0o
5•tempodox•20m ago•1 comments

Tiny Clippy – A native Office Assistant built in Rust and egui

https://github.com/salva-imm/tiny-clippy
1•salvadorda656•24m ago•0 comments

LegalArgumentException: From Courtrooms to Clojure – Sen [video]

https://www.youtube.com/watch?v=cmMQbsOTX-o
1•adityaathalye•27m ago•0 comments

US moves to deport 5-year-old detained in Minnesota

https://www.reuters.com/legal/government/us-moves-deport-5-year-old-detained-minnesota-2026-02-06/
3•petethomas•30m ago•1 comments

If you lose your passport in Austria, head for McDonald's Golden Arches

https://www.cbsnews.com/news/us-embassy-mcdonalds-restaurants-austria-hotline-americans-consular-...
1•thunderbong•35m ago•0 comments

Show HN: Mermaid Formatter – CLI and library to auto-format Mermaid diagrams

https://github.com/chenyanchen/mermaid-formatter
1•astm•50m ago•0 comments

RFCs vs. READMEs: The Evolution of Protocols

https://h3manth.com/scribe/rfcs-vs-readmes/
2•init0•57m ago•1 comments

Kanchipuram Saris and Thinking Machines

https://altermag.com/articles/kanchipuram-saris-and-thinking-machines
1•trojanalert•57m ago•0 comments

Chinese chemical supplier causes global baby formula recall

https://www.reuters.com/business/healthcare-pharmaceuticals/nestle-widens-french-infant-formula-r...
2•fkdk•1h ago•0 comments

I've used AI to write 100% of my code for a year as an engineer

https://old.reddit.com/r/ClaudeCode/comments/1qxvobt/ive_used_ai_to_write_100_of_my_code_for_1_ye...
2•ukuina•1h ago•1 comments

Looking for 4 Autistic Co-Founders for AI Startup (Equity-Based)

1•au-ai-aisl•1h ago•1 comments

AI-native capabilities, a new API Catalog, and updated plans and pricing

https://blog.postman.com/new-capabilities-march-2026/
1•thunderbong•1h ago•0 comments

What changed in tech from 2010 to 2020?

https://www.tedsanders.com/what-changed-in-tech-from-2010-to-2020/
3•endorphine•1h ago•0 comments

From Human Ergonomics to Agent Ergonomics

https://wesmckinney.com/blog/agent-ergonomics/
1•Anon84•1h ago•0 comments

Advanced Inertial Reference Sphere

https://en.wikipedia.org/wiki/Advanced_Inertial_Reference_Sphere
1•cyanf•1h ago•0 comments

Toyota Developing a Console-Grade, Open-Source Game Engine with Flutter and Dart

https://www.phoronix.com/news/Fluorite-Toyota-Game-Engine
2•computer23•1h ago•0 comments

Typing for Love or Money: The Hidden Labor Behind Modern Literary Masterpieces

https://publicdomainreview.org/essay/typing-for-love-or-money/
1•prismatic•1h ago•0 comments

Show HN: A longitudinal health record built from fragmented medical data

https://myaether.live
1•takmak007•1h ago•0 comments

CoreWeave's $30B Bet on GPU Market Infrastructure

https://davefriedman.substack.com/p/coreweaves-30-billion-bet-on-gpu
1•gmays•1h ago•0 comments
Open in hackernews

ETH Zurich and EPFL to release a LLM developed on public infrastructure

https://ethz.ch/en/news-and-events/eth-news/news/2025/07/a-language-model-built-for-the-public-good.html
716•andy99•7mo ago

Comments

k__•7mo ago
"respecting web crawling opt-outs during data acquisition produces virtually no performance degradation"

Great to read that!

Onavo•7mo ago
No performance degradation on training metrics except for the end user. At the end of the day users and website owners have completely orthogonal interests. Users want answers and content, website owners want attention so they can upsell/push ads. You can only serve one master.
esafak•7mo ago
> Users want answers and content, website owners want attention so they can upsell/push ads. You can only serve one master

How are you going to serve users if web site owners decide to wall their content? You can't ignore one side of the market.

Onavo•7mo ago
You don't. You bypass them with crawlers and don't reveal your training data. And this is exactly why open source models can't surpass open weight models.
diggan•7mo ago
> And this is exactly why open source models can't surpass open weight models.

It is a fair point, but how strong of a point it is remains to be seen, some architectures are better than others, even with the same training data, so not impossible we could at one point see some innovative architectures beating current proprietary ones. It would probably be short-lived though, as the proprietary ones would obviously improve in their next release after that.

jowea•7mo ago
How can open source models respectful of robots.txt possibly perform equally if they are missing information that the other models have access to?
datameta•7mo ago
How can we possibly find out without trying?
jowea•7mo ago
It is logically impossible for a LLM to, for example, to know that fooExecute() takes two int arguments if the documentation is blocked by robots.txt and there are no examples of fooExecute() usage in the wild, don't you agree?
tharant•7mo ago
Sure, the model would not “know” about your example, but that’s not the point; the penultimate[0] goal is for the model to figure out the method signature on its own just like a human dev might leverage her own knowledge and experience to infer that method signature. Intelligence isn’t just rote memorization.

[0] the ultimate, of course, being profit.

jowea•7mo ago
I don't think a human dev can divine a method signature and effects in the general case either. Sure the add() function probably takes 2 numbers, but maybe it takes a list? Or a two-tuple? How would we or the LLM know without having the documentation? And yeah sure the LLM can look at the documentation while being used instead of it being part of the training dataset, but that's strictly inferior for practical uses, no?

I'm not sure if we're thinking of the same field of AI development. I think I'm talking about the super-autocomplete with integrated copy of all of digitalized human knowledge, while you're talking about trying to do (proto-)AGI. Is that it?

heavenlyblue•6mo ago
> Sure the add() function probably takes 2 numbers, but maybe it takes a list? Or a two-tuple? How would we or the LLM know without having the documentation?

You just listed possible options in the order of their relative probability. Human would attempt to use them in exactly that order

diggan•6mo ago
I agree, but also think it's less important. I don't want a big fat LLM that memorized every API out there, and as soon as the API changed, the weights have to updated. I like the current approach of Codex (and similar) where they can look up the APIs they need to use as they're doing the work instead, so same weights will continue to work no matter how much the APIs change.
Dylan16807•7mo ago
Maybe the missing data makes it 3% worse but the architecture is 5% better. Or your respect for robots.txt gets you more funding and you gain a 4% advantage by training longer.

Don't focus too much on a single variable, especially when all the variables have diminishing returns.

lllllm•6mo ago
this is what this paper tries to answer: https://arxiv.org/abs/2504.06219 the quality gap is surprisingly small between compliant and not
JKCalhoun•7mo ago
Is there not yet a Source where the web has already been scraped and souped down to just the text? It would seem someone would have created such a thing in order to save LLM training from having to reinvent the wheel.

I understand the web is a dynamic thing but still it would seem to be useful on some level.

CaptainFever•6mo ago
Common Crawl, maybe?
stephen_cagle•7mo ago
I wonder if the reason for these results is that any data on the internet is already copied to other locations by actors who ignore crawling opt-outs. So, even if they respect all web crawling opt-outs, they are still effectively copying the data because someone else did not respect it who does not include an opt-out.
conradkay•7mo ago
My guess is that it doesn't remove that much of the data, and the post-training data (not just randomly scraped from the web) probably matters more
lllllm•7mo ago
Yes this is an interesting question. In our arxiv paper [1] we did study this for news articles, and also removed duplicates of articles (decontamination). We did not observe an impact on the downstream accuracy of the LLM, in the case of news data.

[1] https://arxiv.org/abs/2504.06219

Bengalilol•7mo ago
Looking forward to proof test it.
greenavocado•7mo ago
Why would you announce this without a release? Be honest.
wood_spirit•7mo ago
The announcement was at the International Open-Source LLM Builders Summit held this week in Switzerland. Is it so strange that they announced what they are doing and the timeline?
JumpCrisscross•7mo ago
Funding? Deeply biasing European uses to publicly-developed European LLMs (or at least not American or Chinese ones) would make a lot of sense. (Potentially too much sense for Brussels.)
phtrivier•7mo ago
The cliché (at least on my side of the Alps) is that people in Switzerland like to take theiiiir tiiiime.
Bengalilol•7mo ago
"Move as quickly as possible, but as slowly as necessary."
WeirderScience•7mo ago
The open training data is a huge differentiator. Is this the first truly open dataset of this scale? Prior efforts like The Pile were valuable, but had limitations. Curious to see how reproducible the training is.
layer8•7mo ago
> The model will be fully open: source code and weights will be publicly available, and the training data will be transparent and reproducible

This leads me to believe that the training data won’t be made publicly available in full, but merely be “reproducible”. This might mean that they’ll provide references like a list of URLs of the pages they trained on, but not their contents.

WeirderScience•7mo ago
Yeah, I suspect you're right. Still, even a list of URLs for a frontier model (assuming it does turn out to be of that level) would be welcome over the current situation.
glhaynes•7mo ago
That wouldn't seem reproducible if the content at those URLs changes. (Er, unless it was all web.archive.org URLs or something.)
dietr1ch•7mo ago
This is a problem with the Web. It should be easier to download content like it was updating a git Repo.
TobTobXX•7mo ago
Well, when the actual content is 100s of terabytes big, providing URLs may be more practical for them and for others.
layer8•7mo ago
The difference between content they are allowed to train on vs. being allowed to distribute copies of is likely at least as relevant.
sschueller•6mo ago
No problem, we have 25 Gbit/s home internet here. [1]

[1] https://www.init7.net/en/internet/fiber7/

evolvedlight•7mo ago
Yup, it’s not a dataset packaged like you hope for here, as it still contains traditionally copyrighted material
oytis•7mo ago
The press release talks a lot about how it was done, but very little about how capabilities compare to other open models.
pantalaimon•7mo ago
It's a university, teaching the 'how it's done' is kind of the point
EA-3167•7mo ago
Sure, but usually you teach something that is inherently useful, or can be applied to some sort of useful endeavor. In this case I think it's fair to ask what the collision of two bubbles really achieves, or if it's just a useful teaching model, what it can be applied to.
joot82•7mo ago
The model will be released in two sizes — 8 billion and 70 billion parameters [...]. The 70B version will rank among the most powerful fully open models worldwide. [...] In late summer, the LLM will be released under the Apache 2.0 License.

We'll find out in September if it's true?

k__•7mo ago
I hope DeepSeek R2, but I fear Llama 4.
oytis•7mo ago
Yeah, I was thinking more of a table with benchmark results
wood_spirit•7mo ago
The article says

“ Open LLMs are increasingly viewed as credible alternatives to commercial systems, most of which are developed behind closed doors in the United States or China”

It is obvious that the companies producing big LLMs today have the incentive to try to enshitify them. Trying to get subscriptions at the same time as trying to do product placement ads etc. Worse, some already have political biases they promote.

It would be wonderful if a partnership between academia and government in Europe can do a public good search and AI that endeavours to serve the user over the company.

klabb3•7mo ago
Yes but it’s a very complicated service to deliver. Even if they train great models, they likely will not operationalize them for inference. Those will still be private actors, and the incentives to enshittify will be the same. Also, for AI generally the incentives is much higher than last tech generation, due to cost of running these things. Basically, the free services where you’re the product must aggressively extract value out of you in order to make a profit.
bee_rider•7mo ago
Is this setting the bar for dataset transparency? It seems like a significant step forward. Assuming it works out, that is.

They missed an opportunity though. They should have called their machine the AIps (AI Petaflops Supercomputer).

philipkglass•7mo ago
I think that the Allen Institute for Artificial Intelligence OLMo models are also completely open:

OLMo is fully open

Ai2 believes in the power of openness to build a future where AI is accessible to all. Open weights alone aren’t enough – true openness requires models to be trained in the open with fully open access to data, models, and code.

https://allenai.org/olmo

lamuswawir•6mo ago
I am a simple man, I see AI2, I upvote.
ekianjo•7mo ago
Smollm is also completely open as far as I know
isusmelj•7mo ago
I hope they do well. AFAIK they’re training or finetuning an older LLaMA model, so performance might lag behind SOTA. But what really matters is that ETH and EPFL get hands-on experience training at scale. From what I’ve heard, the new AI cluster still has teething problems. A lot of people underestimate how tough it is to train models at this scale, especially on your own infra.

Disclaimer: I’m Swiss and studied at ETH. We’ve got the brainpower, but not much large-scale training experience yet. And IMHO, a lot of the “magic” in LLMs is infrastructure-driven.

luke-stanley•7mo ago
When I read "from scratch", I assume they are doing pre-training, not just finetuning, do you have a different take? Do you mean it's normal Llama architecture they're using? I'm curious about the benchmarks!
andy99•7mo ago
Imo, a lot of the magic is also dataset driven, specifically the SFT and other fine tuning / RLHF data they have. That's what has separated the models people actually use from the also-rans.

I agree with everything you say about getting the experience, the infrastructure is very important and is probably the most critical part of a sovereign LLM supply chain. I would hope there will also be enough focus on the data, early on, that the model will be useful.

alfalfasprout•7mo ago
The infra does become pretty complex to get a SOTA LLM trained. People assume it's as simple as loading up the architecture and a dataset + using something like Ray. There's a lot that goes into designing the dataset, the eval pipelines, the training approach, maximizing the use of your hardware, dealing with cross-node latency, recovering from errors, etc.

But it's good to have more and more players in this space.

lllllm•7mo ago
No, the model has nothing do to with Llama. We are using our own architecture, and training from scratch. Llama also does not have open training data, and is non-compliant, in contrast to this model.

Source: I'm part of the training team

blurbleblurble•7mo ago
Are you using dbpedia?
lllllm•7mo ago
no. the main source is fineweb2, but with additional filtering for compliance, toxicity removal, and quality filters such as fineweb2-hq
PeterStuer•6mo ago
Thx for engaging here.

Can you comment on how the filtering impacted language coverage? E.g. finweb2 has 1800+ languages, but some with very little actual representation, while finweb2-hq has just 20 but each with a subdsantial data set.

(I'm personaly most interested in covering the 24 official EU languages)

lllllm•6mo ago
we kept all 1800+ (script/language) pairs, not only the quality filtered ones. the question if a mix of quality filtered and not languages impacts the mixing is still an open question. preliminary research (Section 4.2.7 of https://arxiv.org/abs/2502.10361 ) indicates that quality filtering can mitigate the curse of multilinguality to some degree, so facilitate cross-lingual generalization, but it has to be seen how strong this effect is on larger scale
danielhanchen•6mo ago
If you guys need help on GGUFs + Unsloth dynamic quants + finetuning support via Unsloth https://github.com/unslothai/unsloth on day 0 / 1, more than happy to help :)
lllllm•6mo ago
absolutely! i've sent you a linkedin message last week. but here seems to work much better, thanks a lot!
danielhanchen•6mo ago
Oh sorry I might have missed it! I think you or your colleague emailed me (I think?) My email is daniel @ unsloth.ai if that helps :)
isusmelj•6mo ago
Thanks for clarifying! I wish you all the best luck!
Al-Khwarizmi•6mo ago
So you're not going to use copyrighted data for training? That's going to be a disadvantage with respect to LLaMa and other well-known models, it's an open secret that everyone is using everything they can get their hands on.

Good luck though, very needed project!

badsectoracula•6mo ago
Not sure about the Swiss laws, but the EU AI Act and the 2019/790 digital millennium directive it piggies back on the topic, does allow for training on copyrighted data as long as any opt-out mechanisms (e.g. robots.txt) are respected. AFAICT this LLM was trained by respecting those mechanisms (and as linked elsewhere they didn't find any practical difference in performance - note that there is an exception to allow ignoring the opt-out mechanisms for research purposes, so they could make that comparison).
miraculixx•6mo ago
That is not correct. The EU AI Act has no such provision, ans the data mining excemption does not apply as the EU has made clear. As for Switzerland copyrighted material cannot be used unless licensed.
moffkalast•6mo ago
L3 has open pretraining data, it's just not official for obvious legal reasons: https://huggingface.co/datasets/HuggingFaceFW/fineweb
menaerus•6mo ago
Wait, whole (english speaking) web content dataset size is ~50TB?
zX41ZdbW•6mo ago
Yes, if we take the filtered and deduplicated HTMLs of CommonCrawl. I've made a video on this topic recently: https://www.youtube.com/watch?v=8yH3rY1fZEA
menaerus•6mo ago
Fun presentation, thanks! 72min ingestion time for ~81TB of data is ~1TB/min or ~19GB/s. Distributed or single-node? Shards? I see 50 jobs are used for parallel ingestion, and I wonder how ~19GB/s was achieved since ingestion rates were far below that figure last time I played around with CH performance. Granted, that was some years ago.
zX41ZdbW•6mo ago
Distributed across 20 replicas.
d3m0t3p•6mo ago
Hey, really cool project, I’m excited to see the outcome. Is there a blog / paper summarizing how you are doing it ? Also which research group is currently working on it at eth ?
asjir•6mo ago
I'd be more concerned about the size used being 70b (deepseek r1 has 671b) which makes catching up with SOTA kinda more difficult to begin with.
zettabomb•6mo ago
SOTA performance is relative to model size. If it performs better than other models in the 70B range (e.g. Llama 3.3) then it could be quite useful. Not everyone has the VRAM to run the full fat Deepseek R1.
tough•6mo ago
also isn't DeepSeek's Mixture of Experts? meaning not all params get ever activated on one forward pass?

70B feels like the best balance between usable locally and decent for regular use.

maybe not SOTA, but a great first step.

hubraumhugo•7mo ago
Pretty proud to see this at the top of HN as a Swiss (and I know many are lurking here!). These two universities produce world-class founders, researchers, and engineers. Yet, we always stay in the shadow of the US. With our top-tier public infrastructure, education, and political stability (+ neutrality), we have a unqiue opportunity to build something exceptional in the open LLM space.
RHSman2•6mo ago
I work with EPFL alumni. Brilliant minds.
MITSardine•6mo ago
I think EPFL and ETH are generally well known internationally, but Switzerland being rather small (9M pop), it's only natural you don't hear much about it compared to other larger countries!
amelius•7mo ago
Yeah, that's what "democratizing AI" means.
nektro•7mo ago
gross use of public infrastructure
protocolture•7mo ago
I literally cant fault this, even steelmanning anti AI positions. What makes you say that?
PetitPrince•7mo ago
Sometimes ago there was a Tom Scott video about the fasted accelerating car in the world, developed by a team with a vast majority of student. One remark stayed with me: "the goal is not to build a car, but to build engineer".

In that regard it's absolutely not a waste of public infra just like this car was not a waste.

herbst•6mo ago
It even used green power. Literally zero complains or outcry from the public yet. Guess we like progress, especially if it helps independence.
MITSardine•6mo ago
University and research clusters are built to run research code. I can guarantee this project is 10x as impactful and interesting as what usually runs on these machines. This coming from someone in the area that usually hogs these machines (numerical simulation). I'm very excited to see academic actors tackle LLMs.
westurner•7mo ago
Use case for science and code LLMs: Superhydrodynamic gravity (SQR / SQG, )

LLMs do seem to favor general relativity but probably would've favored classical mechanics at the time given the training corpora.

Not-yet unified: Quantum gravity, QFT, "A unified model must: " https://news.ycombinator.com/item?id=44289148

Will be interested to see how this model responds to currently unresolvable issues in physics. Is it an open or a closed world mentality and/or a conditioned disclaimer which encourages progress?

What are the current benchmarks?

From https://news.ycombinator.com/item?id=42899805 re: "Large Language Models for Mathematicians" (2023) :

> Benchmarks for math and physics LLMs: FrontierMath, TheoremQA, Multi SWE-bench: https://news.ycombinator.com/item?id=42097683

Multi-SWE-bench: A Multi-Lingual and Multi-Modal GitHub Issue Resolving Benchmark: https://multi-swe-bench.github.io/

Add'l LLM benchmarks and awesome lists: https://news.ycombinator.com/item?id=44485226

Microsoft has a new datacenter that you don't have to keep adding water to; which spares the aquifers.

How to use this LLM to solve energy and sustainability problems all LLMs exacerbate? Solutions for the Global Goals, hopefully

westurner•6mo ago
(Unbelievable that I need to justify this at -4!)

Is the performance or accuracy on this better on FrontierMath or Multi-SWE-bench, given the training in 1,000 languages?

I just read in the Colab release notes that models uploaded to HuggingFace can be opened on Colab with "Open in colab" on HuggingFace

kordlessagain•6mo ago
It's the word "gravity" that triggers them.
seydor•7mo ago
I wonder if multilingual llms are better or worse compared a single language model
tugdual•7mo ago
This is an interesting problem that has various challenges - currently most tokenization solutions where trainees using hype pair encoding where the most commonly seen combinations of letters were being selected to be a mapping. This meant that the majority of tokenization was English mappings meaning your LLM had a better tokenization of English compared to other languages it was being trained on.

C.f. https://medium.com/@biswanai92/understanding-token-fertility...

mukeshyadavnitt•7mo ago
nice
contrarian1234•6mo ago
This seems like the equivalent of a university designing an ICE car...

What does anyone get out of this when we have open weight models already ?

Are they going to do very innovative AI research that companies wouldn't dare try/fund? Seems unlikely ..

Is it a moonshot huge project that no single company could fund..? Not that either

If it's just a little fun to train the next generation of LLM researchers.. Then you might as well just make a small scale toy instead of using up a super computer center

urvader•6mo ago
This model will be one of the few open models where the training data is also open which makes it ideal for fine tuning.
chvid•6mo ago
That it will actually be open and reproducible?

Including how it was trained, what data was used, how training data was synthesized, how other models were used etc. All the stuff that is kept secret in case of llama, deepseek etc.

herbst•6mo ago
Why do you think it's about money? IMO it's about much more than that, like independence and actual data freedom trough reproductive LLMs
MITSardine•6mo ago
Super computers are being used daily for much toy-ier codes in research, be glad this at least interests the public and constitutes a foray of academia into new areas.
defraudbah•6mo ago
ETH Zurich is doing so many amazing things that I want to go study there. Unbelievable how many great people are coming from that university
blue_light_man•6mo ago
It's also possible you just think of ETH Zurich as great and automatically associate the people and products as amazing. Could be a circular dependency here.
rtaylorgarlock•6mo ago
That is indeed how things work. I can think of a few 'good' media-relevant examples, including e.g. that recent super-quick cart project [1], that reach beyond the more vanilla startup-spinoffs or basic media efforts.

1 https://ethz.ch/en/news-and-events/eth-news/news/2023/09/fro...

datameta•6mo ago
I took courses online from ETH Zurich before the formula was "perfected" and I'd say they were ahead of the curve in quality, concise but info-dense educational content.
defraudbah•6mo ago
I had no idea what ETH means 2 years ago, I thought it's ethereum club in switzerland or something. Then I kept hearing about it, noticing people wearing ETH stuff.

obviously I don't know if it's university or people there because I haven't been there, but I keep hearing about ETH Zurich in different areas and it means something

Tepix•6mo ago
How does it compare to Teuken and EuroLLM?
sschueller•6mo ago
Yet, Switzerland was put in the 2. Tier list[1] of countries that can get unlimited access to the top AI chips.

[1] https://www.bluewin.ch/en/news/usa-restricts-swiss-access-to...

[2] https://chplusplus.org/u-s-export-controls-on-ai-chips/

kisamoto•6mo ago
Any info on context length or comparable performance? Press release is unfortunately lacking on technical details.

Also I'm curious if there was any reason to make such a PR without actually releasing the model (due Summer)? What's the delay? Or rather what was the motivation for a PR?

rkrisztian•6mo ago
I'm disappointed. 8B is too low for GPUs with 16 GB VRAM (which is still common in affordable PCs), where most 13B to 16B models could still be easily run, depending on the quantization.
adultSwim•6mo ago
This is such a smart move for the country. Best wishes on their important endeavor.