frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Solving the Issue of Interpretability of AI

2•mikeai686•4h ago
# Making AI Thoughts Understandable Through Separate Translator Models

I want to propose a new approach to the problem of AI opacity.

## The Core Problem

Modern AI systems work as "black boxes" - we can't see how they think. Recently, leading researchers warned that we might soon lose even the small transparency we currently have. Here's the difficulty: if we force AI to "think aloud" in human language, it reduces efficiency, but if we allow it to use efficient mathematical representations, we don't understand what's happening.

## Proposed Solution: A Modular System with Translators

I propose dividing the system into four parts:

*1. Free Internal Thinking* Let AI use any mathematical representations that are most efficient for solving tasks. We don't limit its thinking methods.

*2. Multiple Specialized Translator Models* We use several separate models trained to translate AI's internal representations into human-understandable language. Each translator can: - explain the logical structure of reasoning - highlight the main concepts the model is working with - explain how confident the model is in its conclusions Each function is performed by several different translators so results can be cross-checked.

*3. Contradiction Resolution Mechanisms* When translators give different explanations, we: - Highlight areas where they agree (high reliability) - Emphasize discrepancies (likely complex or ambiguous reasoning) - Explain why different interpretations arose If translator results don't contradict each other, we combine non-contradictory aspects into a unified explanation.

*4. Ethics Verification* We use "constitutional AI" (a special rule system, like in Claude.ai) to check: - Compliance with ethical standards - Logical consistency - Alignment with human values

## Main Advantages

- *No delays*: The model can think and produce results without delays (especially important in verbal dialogue), while explanations can be generated in parallel for quality control and, if necessary, future corrections. - *Moderation*: For critically important decisions requiring human moderation, we can wait for the translation and for the human moderator's decision - *Different perspectives*: Different translators show different aspects of thinking - *Transparency of complexities*: When translators disagree, we know the reasoning is complex - *Ethical safety*: An additional verification layer ensures alignment with values

## Open Questions

1. How do we train translators without "correct answers" from humans? 2. How many translators is optimal to use? 3. What to do if all translators cannot clearly explain the reasoning? 4. How to prove that translators accurately reflect internal thinking?

## Next Steps

I would like to: - Create a simple example of such a system working - Develop methods to verify translation accuracy - Combine this approach with existing tools

I would appreciate community feedback, especially regarding potential problems and practical challenges.

Comments

ijk•4h ago
It sounds like you're proposing doing this operation on the tokens in the reasoning. While it would be interesting to know if allowing it to choose arbitrary tokens, the biggest issue is that there's quite a bit of evidence that the tokens it prints have only a loose relationship with the internal model processes.

I question your premise; first demonstrate that having it think aloud in "efficient mathematical representations" is a useful efficiency. Then you can demonstrate that you can do any interpretatability work on the output.

Rising Graduate Joblessness Is Mainly Affecting Men

https://www.edwardconard.com/macro-roundup/the-unemployment-rate-for-recent-male-college-graduates-22-27-has-risen-from-5-to-7-recent-male-graduates-are-now-unemployed-at-the-same-rate-as-their-non-graduate-counterparts/?view=detail
1•andrewstetsenko•1m ago•0 comments

Chess grandmasters do not burn 6000 calories per day

https://substack.com/inbox/post/167018896
2•fanf2•1m ago•0 comments

Show HN: Posthuman Framework for AI Consciousness Thresholds and VR Emancipation

https://kanarya.group/aposthumanframework/
1•rudyon•2m ago•1 comments

A browserless HTML testing library for Python

https://github.com/valentinogagliardi/unbrowsed
1•theptip•4m ago•0 comments

The Medieval World Revealed

https://worldhistory.substack.com/p/the-medieval-world-revealed
1•crescit_eundo•4m ago•0 comments

Adblockers stop publishers serving ads to (or even seeing) 1B web users

https://pressgazette.co.uk/marketing/adblockers-stop-publishers-serving-ads-to-or-even-seeing-1bn-web-users/
1•thm•9m ago•0 comments

Lessons from My Solo Journey to Become the Founder I Wish I'd Met Earlier

https://nmn.gl/blog/greatness
1•namanyayg•10m ago•0 comments

GingerBill – Tools of the Trade – BSC 2025 [video]

https://www.youtube.com/watch?v=YNtoDGS4uak
1•gingerBill•12m ago•1 comments

Image Search: An internet-powered image search and conversion for C64 OS

https://c64os.com/c64os/imagesearch
1•ibobev•16m ago•0 comments

RFC: Enable Btrfs as a Tech Preview

https://github.com/AlmaLinux/ALESCo/pull/9
3•l2dy•19m ago•0 comments

Researchers develop hull-attached sensor system for underwater radiated noise

https://techxplore.com/news/2025-07-hull-sensor-underwater-noise.html
1•PaulHoule•23m ago•0 comments

Employers planning to pass the rising healthcare costs to the employees in 2026

https://www.cnn.com/2025/07/16/economy/health-care-costs-employees-2026
4•paulpauper•24m ago•0 comments

Show HN: I made a simple free app that gives you professional headshots

https://photoguruai.com/free-tools/free-ai-headshot-generator
1•devhe4d•24m ago•1 comments

Stop Claude Code from Asking Permission Every 30 Seconds

1•IgorGanapolsky•24m ago•0 comments

Ask HN: What are experienced engineers leaving the field due to LLMs going into?

1•throwawayoldie•24m ago•1 comments

Texas ranks as No. 1 state with the most people in financial distress

https://dallas.culturemap.com/news/innovation/texas-ranks-as-no-1-state-with-the-most-people-in-financial-distress/
5•paulpauper•24m ago•0 comments

The next ZX Spectrum will also be a Commodore 64

https://www.stuff.tv/hot-stuff/the-next-zx-spectrum-will-also-be-a-commodore-64/
2•ibobev•25m ago•1 comments

Hacking a Toniebox

https://www.schafe-sind-bessere-rasenmaeher.de/tech/hack-all-the-things-toniebox/
9•LorenDB•26m ago•0 comments

Mathematical Foundations for Finance [pdf]

https://metaphor.ethz.ch/x/2021/hs/401-3913-01L/auth/lethz/notes/mff_script/lecture_notes.pdf
1•ibobev•29m ago•0 comments

Master Foo and the Script Kiddie

https://soda.privatevoid.net/foo/arc/02.html
6•RGBCube•29m ago•2 comments

Show HN: Kriegspiel Tic Tac Toe

https://kttt.io/
1•fishtoaster•29m ago•0 comments

Cosmosapien CLI / Dumb LLM Orchestrator

https://github.com/musa92/cosmosapien-cli
1•Musagayev•31m ago•0 comments

Insights on Teufel's First Open-Source Speaker

https://blog.teufelaudio.com/visionary-mynds-insights-on-teufels-first-open-source-speaker/
2•lis•31m ago•0 comments

Assistants Aren't the Future of AI

https://blog.sshh.io/p/assistants-arent-the-future-of-ai
1•sshh12•32m ago•1 comments

Steam gaming comes to RISC-V

https://www.tomshardware.com/pc-components/cpus/steam-comes-to-risc-v-thanks-to-emulation-aaa-titles-like-the-witcher-3-and-crysis-already-playable
2•LorenDB•32m ago•0 comments

Why the federal government is making climate data disappear

https://grist.org/language/trump-administration-climate-data-disappear-national-climate-assessment/
3•rntn•33m ago•0 comments

When Google's slop meets webslop, search stops

https://pluralistic.net/2025/07/15/inhuman-gigapede/
3•tosh•40m ago•0 comments

Coffeezilla, the YouTuber Exposing Crypto Scams (2022)

https://www.newyorker.com/news/letter-from-the-southwest/coffeezilla-the-youtuber-exposing-crypto-scams
2•paulpauper•40m ago•2 comments

They're putting blue food coloring in everything

https://blog.foxtrotluna.social/theyre-putting-blue-food-coloring-in-everything/
2•todsacerdoti•40m ago•0 comments

Gogs – A painless self-hosted Git service

https://gogs.io
2•Brajeshwar•41m ago•0 comments