frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Why We Need Arabic Language Models

https://www.natureasia.com/en/nmiddleeast/article/10.1038/nmiddleeast.2025.142
22•thinkingemote•2h ago

Comments

sarabande•2h ago
Does anyone know if they published the dataset?
nakamoto_damacy•2h ago
I wonder of you pre train on Hebrew and Arabic if it will find the similarities between the RTL writing direction. So many similar words. I guess both came from Aramaic? If so, how about the trifecta of ancient languages with Aramaic then Hebrew the Arabic.
ch4s3•2h ago
They don’t come from Aramaic, Arabic is a Southwestern Semitic language and Aramaic and Hebrew are Northwestern Semitic languages. Aramaic and Hebrew tree are sort of cousins with Hebrew splitting off from southern Canaanite which was sort of a siblings language with an older form of Aramaic.
nakamoto_damacy•1h ago
They = Hebrew, Aramaic and Arabic

---

## 1. “They all kept the triconsonantal root system — where word meaning is based on three core consonants (like K-T-B = “write” → Hebrew katav, Arabic kataba, Aramaic ktav).”

*Source evidence:*

* The article “Triliteral Roots / Consonantal Roots” states that many Semitic languages (including Arabic, Hebrew) have roots typically made of three consonants (triliteral) and that words are formed by inserting vowels, etc. ([Transparent Blogs][1]) * A source says: “Both Hebrew and Arabic rely on a triliteral root system, meaning words are formed from three core consonants. Example of the root K-T-B…” ([Biblical Hebrew][2]) * Another general description: “The roots of verbs and most nouns in the Semitic languages are characterized as a sequence of consonants ... such abstract consonantal roots are used…” ([Wikipedia][3]) So this claim is well supported.

*Arabic translation of the claim:*

> احتفظت جميعها بنظام الجذر الثلاثي الحروف — حيث يعتمد معنى الكلمة على ثلاثة حروف صامتة أساسية (مثل ك-ت-ب = “كتب/يكتب” → العبرية כתב (katav)، العربية كتب (kataba)، الآرامية כתَب (ktav)).

*Hebrew translation of the claim:*

> כולן שמרו על שיטת השורש התלת-עברי — שבה משמעות המילה מבוססת על שלושה עיצורים ל־(למשל כ־ת־ב = “כתב” → עברית כתב ( katav ), ערבית كتب ( kataba ), ארמית כתב ( ktav )).

*Citations (for this claim):*

* Semitic linguistics: “The roots of verbs and most nouns in the Semitic languages are characterized as a sequence of consonants …” ([Wikipedia][3])

* “Both Hebrew and Arabic rely on a triliteral root system, meaning words are formed from three core consonants.” ([Biblical Hebrew][2])

* Description of the K-T-B root being used in both Arabic and Hebrew. ([Wikipedia][4])

---

## 2. “They share similar grammar and sound systems, just evolved differently.”

*Source evidence:*

* A blog post on Duolingo says: “Because Arabic and Hebrew are part of the same large language family, their grammars often ‘work’ in similar ways.” ([Duolingo Blog][5]) * A site “Arabic and Hebrew Compared” states: “Arabic and Hebrew morphology … is based on the consonant root system. …” ([Google Sites][6]) * The Wikipedia article on Semitic languages states that the Semitic languages share many grammatical features (word order, non-concatenative morphology, etc.) ([Wikipedia][7]) So yes, there is support for similar grammar and sound (phonological) systems.

*Arabic translation of the claim:*

> إنهما تشتركان في نحو وصوتيات متشابهة، רק تطورتا بشكل مختلف.

*Hebrew translation of the claim:*

> הן חולקות דקדוק ומערכות צלילים דומות, רק שהתפתחו באופן שונה.

*Citations (for this claim):*

* “Because Arabic and Hebrew … their grammars often ‘work’ in similar ways.” ([Duolingo Blog][5]) * “Arabic and Hebrew morphology … is based on the consonant root system.” ([Google Sites][6]) * “Semitic languages share a number of grammatical features …” ([Wikipedia][7])

---

## 3. “Many religious and cultural interactions over millennia reinforced overlap (borrowed or re-borrowed vocabulary).”

*Source evidence:*

* The article “Similarities Between Hebrew and Arabic” mentions: “Many Hebrew and Arabic words are cognates, retaining similar meanings and sounds.” ([Biblical Hebrew][2]) * A blog “Halal, Hillul, and the Shared Meanings of Hebrew and Arabic” discusses relationships between similar sounding words (cognates) due to shared roots. ([Hebrew College][8]) * Comparative grammar sources mention that because Hebrew, Arabic and Aramaic are closely related, there has been lexical borrowing and shared vocabulary. ([semiticroots.net][9]) So your statement about religious/cultural interaction reinforcing overlap (vocabulary) is broadly supported.

*Arabic translation of the claim:*

> العديد من التفاعلات الدينية والثقافية عبر الألفيات عزَّزت التداخل (استعارت أو أعادت استعارة مفردات).

*Hebrew translation of the claim:*

> אינספור אינטראקציות דתיות ותרבותיות לאורך אלפי השנים חיזקו את ההשתלבות (השאלה או השאלה מחדש של אוצר מילים).

*Citations (for this claim):*

* “Many Hebrew and Arabic words are cognates …” ([Biblical Hebrew][2]) * “The relationships between similar-sounding words … in the case of the Semitic languages, similar roots.” ([Hebrew College][8]) * “Hebrew, Arabic, and Aramaic … than between Hebrew and any other language …” ([semiticroots.net][9])

---

[1]: https://blogs.transparent.com/hebrew/hebrew-grammar-consonan... "Hebrew Grammar: Consonantal Roots - Transparent Language Blog" [2]: https://biblicalhebrew.org/similarities-between-hebrew-and-a... "Similarities Between Hebrew and Arabic" [3]: https://en.wikipedia.org/wiki/Semitic_root?utm_source=chatgp... "Semitic root - Wikipedia" [4]: https://en.wikipedia.org/wiki/K-T-B?utm_source=chatgpt.com "K-T-B" [5]: https://blog.duolingo.com/are-arabic-hebrew-persian-related/... "Dear Duolingo: Are Arabic, Hebrew, and Persian related?" [6]: https://sites.google.com/site/mopclanguages/arabic-and-hebre... "MOPC Languages - Arabic and Hebrew Compared" [7]: https://en.wikipedia.org/wiki/Semitic_languages?utm_source=c... "Semitic languages" [8]: https://hebrewcollege.edu/blog/halal-hillul-and-the-shared-m... "Halal, Hillul, and the Shared Meanings of Hebrew and Arabic" [9]: https://www.semiticroots.net/downloads/Comparative%20Grammar... "Comparative Grammar of the Semitic Languages"

binarymax•2h ago
Do we need language specific LLMs? I can’t vouch for the data coverage or accuracy of Arabic in the leading models today, but I do know them to be highly cross-lingual capable.
readthenotes1•2h ago
Makes me wonder who did the translation

"This is a translation of the Arabic article published on 3rd August 2025"

Full irony would be from an LLM

tokai•2h ago
Such declarations have become pretty useless without any indicator of the translation method.
nzeid•2h ago
> Clear examples emerge when global language models address culturally sensitive issues, such as social relationships or political debates. They often adopt ambiguous positions that overlook the Arab cultural context, creating a gap between these digital tools and the values and lived experiences of Arab users.

Well I have bad news, my friend. English language models are also terrible at this.

This whole article seems to stem from the premise that it's important for LLMs to engage cultural issues competently. But... should they even?

Fade_Dance•45m ago
>But... should they even?

I don't see why not?

Also, while I don't have access to this perspective myself, I'd imagine this is an unending annoyance in many areas of the world, since they are consuming often quite America-centric offerings where localization is an after-thought and contracted out.

Novo Nordisk's Canadian Mistake

https://www.science.org/content/blog-post/novo-nordisk-s-canadian-mistake
118•jbm•1h ago•42 comments

Show HN: 18yo first iOS app: blocks distracting apps and unlocks with QR/barcode

https://apps.apple.com/us/app/recode-screen-time-control/id6752352978
27•alhart•52m ago•5 comments

Doing well in your courses: Andrej's advice for success (2013)

https://cs.stanford.edu/people/karpathy/advice.html
293•peterkshultz•5h ago•110 comments

Dosbian: Boot to DOSBox on Raspberry Pi

https://cmaiolino.wordpress.com/dosbian/
75•indigodaddy•2h ago•19 comments

Airliner hit by possible space debris

https://avbrief.com/united-max-hit-by-falling-object-at-36000-feet/
107•d_silin•4h ago•43 comments

Duke Nukem: Zero Hour N64 ROM Reverse-Engineering Project Hits 100%

https://github.com/Gillou68310/DukeNukemZeroHour
12•birdculture•1h ago•2 comments

Compare Single Board Computers

https://sbc.compare/
89•todsacerdoti•4h ago•35 comments

GNU Octave Meets JupyterLite: Compute Anywhere, Anytime

https://blog.jupyter.org/gnu-octave-meets-jupyterlite-compute-anywhere-anytime-8b033afbbcdc
91•bauta-steen•6h ago•14 comments

Could the XZ backdoor been detected with better Git/Deb packaging practices?

https://optimizedbyotto.com/post/xz-backdoor-debian-git-detection/
47•ottoke•4h ago•36 comments

The working-class hero of Bletchley Park you didn't see in the movies

https://www.theguardian.com/world/2025/oct/12/move-over-alan-turing-meet-the-working-class-hero-o...
60•hansmayer•1w ago•13 comments

Deterministic multithreading is hard (2024)

https://www.factorio.com/blog/post/fff-415
10•adtac•13h ago•1 comments

The Spilhaus Projection: A world map according to fish

https://southernwoodenboatsailing.com/news/the-spilhaus-projection-a-world-map-according-to-fish
71•zynovex•1w ago•10 comments

Comparing the power consumption of a 30 year old refrigerator to a new one

https://ounapuu.ee/posts/2025/10/14/fridge-power-consumption/
78•furkansahin•5d ago•114 comments

Bible and Quran apps flagged NSFW by F-Droid

https://forum.f-droid.org/t/nsfw-flag-incorrectly-added-to-bible-and-quran-apps/33401
44•jtlebigot•1h ago•38 comments

The Trinary Dream Endures

https://www.robinsloan.com/lab/trinary-dream/
34•FromTheArchives•5h ago•47 comments

Infisical (YC W23) Is Hiring Full Stack Engineers

https://www.ycombinator.com/companies/infisical/jobs/0gY2Da1-full-stack-engineer-global
1•vmatsiiako•5h ago

Show HN: Duck-UI – Browser-Based SQL IDE for DuckDB

https://demo.duckui.com
168•caioricciuti•11h ago•54 comments

Abandoned land drives dangerous heat in Houston, study finds

https://stories.tamu.edu/news/2025/10/07/abandoned-land-drives-dangerous-heat-in-houston-texas-am...
108•PaulHoule•8h ago•112 comments

The macOS LC_COLLATE hunt: Or why does sort order differently on macOS and Linux (2020)

https://blog.zhimingwang.org/macos-lc_collate-hunt
67•g0xA52A2A•9h ago•13 comments

How to Assemble an Electric Heating Element from Scratch

https://solar.lowtechmagazine.com/2025/10/how-to-build-an-electric-heating-element-from-scratch/
73•surprisetalk•8h ago•48 comments

Show HN: Pyversity – Fast Result Diversification for Retrieval and RAG

https://github.com/Pringled/pyversity
58•Tananon•8h ago•5 comments

Redis Backplane for Hubots

https://github.com/hubot-friends/hubot-redis-backplane
6•gijoeyguerra•5d ago•2 comments

Ask HN: What are people doing to get off of VMware?

88•jwithington•5h ago•61 comments

The case for the return of fine-tuning

https://welovesota.com/article/the-case-for-the-return-of-fine-tuning
122•nanark•12h ago•68 comments

The Cancer Imaging Archive (TCIA)

https://www.cancerimagingarchive.net/
5•1970-01-01•6d ago•0 comments

Scheme Reports at Fifty

https://crumbles.blog/posts/2025-10-18-scheme-reports-at-fifty.html
38•djwatson24•7h ago•13 comments

Improving PixelMelt's Kindle Web Deobfuscator

https://shkspr.mobi/blog/2025/10/improving-pixelmelts-kindle-web-deobfuscator/
82•ColinWright•10h ago•14 comments

Designing EventQL, an Event Query Language

https://docs.eventsourcingdb.io/blog/2025/10/20/designing-eventql-an-event-query-language/
5•goloroden•2h ago•0 comments

Xubuntu.org Might Be Compromised

https://old.reddit.com/r/Ubuntu/comments/1oa4549/xubuntuorg_might_be_compromised/
282•kekqqq•7h ago•120 comments

RFCs: Blueprints of the Internet

https://ackreq.github.io/posts/what-are-rfcs/
100•ackreq•7h ago•77 comments