frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: We Evaluates Medical Research Agent Skills

https://github.com/aipoch/medical-research-skills
2•The_resa•1h ago
What is AIPOCH Medical Skill Auditor?

Medical Skill Auditor is an evaluation framework that AIPOCH uses to assess the quality of its medical research agent skills before they are made available to users. It acts as a gatekeeper, ensuring that skills meet defined standards in reliability, usability, security, and scientific integrity.

How does Medical Skill Auditor work?

Veto Gates

To enforce strict quality control, Skill Auditor is designed with two layers of veto mechanisms. Any failure in these checks may lead to immediate rejection of a skill.

Skill Veto

Operational Stability Structural Consistency Result Determinism System Security

Research Veto

Scientific Integrity Practice Boundaries Methodological Ground Code Usability

Core Capability

Evaluates a skill’s design and contract against key dimensions such as Functional Suitability, Reliability, Performance & Context, Agent Usability, Human Usability, Security, Agent-Specific and Maintainability.

Medical Task

Assesses actual outputs of a skill with layered criteria.

For skill testing, the AI automatically generates inputs. The number of inputs in specific categories will increase or decrease depending on the complexity of the skill. The following 7 inputs represent the most comprehensive version.

/Canonical /Variant A /Edge /Variant B /Stress /Scope Boundary /Adversarial

Skill Complexity Classification

Label Code/Rank Definition

Simple S Narrow task scope

Moderate M Moderate branching or multiple task types

Complex C Broad or multi-step specialized skill

Simple (S):3 inputs

Moderate (M):5 inputs

Complex (C):7 inputs

Final Score

The Skill Evaluator uses a two-stage scoring system: static evaluation (design quality, accounting for 40%) and dynamic evaluation (runtime performance, accounting for 60%). The final overall score is derived by combining both.

Static (40%) Dynamic (60%)

Final Score = Static Score × 40% + Dynamic Score × 60%

You can view evaluation results for selected AIPOCH skills here:https://www.aipoch.com/agent-skills/medical-research-literat....

This framework is still under active development, we’d love to hear your feedback! Right now this assessment framework is only applied to a subset of AIPOCH’s skills, but we’re considering expanding it more broadly. If this evaluation framework could be used to assess third‑party skills in the future, would you consider trying it in your own projects? Are there evaluation frameworks you’re already using?

Dola – AI Consulting for your tiny firm

https://dolalabs.com/
1•radurevutchi•2m ago•0 comments

Pro-Russian 'doppelganger' campaign exploits DW brand

https://corporate.dw.com/en/hungary-election-pro-russian-doppelganger-campaign-exploits-dw-brand/...
1•doener•5m ago•0 comments

My browser-based static site generator

https://stratts.au/posts/browser-based-ssg/
1•stratts•6m ago•0 comments

Out: Resumes. In: Weeklong In-Office Trials

https://www.businessinsider.com/out-resumes-in-weeklong-in-office-trials-hiring-2026-4
1•KnuthIsGod•7m ago•0 comments

AXI: Agent EXperience Interface

https://axi.md/
1•borisjabes•9m ago•0 comments

I let a agent control my window manager

https://blog.zimengxiong.com/#post/agents-will-need-a-good-window-manager
1•zimengx•13m ago•0 comments

Writers Guild Deal: $321M Health Plan Infusion, Residuals, AI Licensing Language

https://www.hollywoodreporter.com/business/business-news/writers-contract-deal-321m-health-plan-i...
1•mikhael•14m ago•0 comments

Show HN: Proroot – Zero-overhead proot replacement for Android

https://github.com/coderredlab/proroot
1•coderredlab•14m ago•0 comments

America's AI Build-Out Hinges on Chinese Electrical Parts

https://www.bloomberg.com/news/features/2026-04-01/us-ai-data-center-expansion-relies-on-chinese-...
3•doener•17m ago•0 comments

Letting go of climate guilt in 5 easy steps [pdf]

https://hsph.harvard.edu/wp-content/uploads/2024/11/21.08-Letting-go-of-climate-guilt-in-5-easy-s...
1•num42•19m ago•0 comments

Anthropics Mythos Model Sparks Fears of AI Doomsday

https://nypost.com/2026/04/08/business/anthropics-claude-mythos-model-sparks-fears-of-ai-doomsday...
2•silexia•21m ago•0 comments

Under oath, Frank Lloyd Wright introduced himself as "world greatest architect"

https://www.pidgeondigital.com/talks/the-world-s-greatest-architect/chapters/
3•felipevb•24m ago•1 comments

Does Anybody Need Me?

https://ed-wentworth.medium.com/does-anybody-need-me-6fde408000cb
2•gpi•25m ago•0 comments

Navigating the Mythos-haunted world of platform security

https://www.redhat.com/en/blog/navigating-mythos-haunted-world-platform-security
1•LaSombra•27m ago•0 comments

Show HN: Connect with strangers who feel the same as you

https://emotiapp.com/
2•lirongliu•28m ago•1 comments

The Life and Death of the Book Review

https://libertiesjournal.com/articles/the-life-and-death-of-the-book-review/
1•lermontov•29m ago•0 comments

The Usefulness of AI Agents

https://erikjohannes.no/posts/20260408-on-the-usefulness-of-ai-agents/index.html
1•wazHFsRy•32m ago•1 comments

I wish Xcode was more like Visual Studio when coding C++

https://www.lasselaursen.com/post/i-wish-xcode-was-more-like-visual-studio-when-coding-c/
2•Gazoo101•39m ago•0 comments

Formal Verification in Any Language for Everybody (lean 4)

https://www.dev-log.me/formal_verification_in_any_language_for_everybody/
1•wazHFsRy•39m ago•2 comments

Can LLMs accelerate science? An experiment

https://pavpanchekha.com/blog/llk.html
1•pavpanchekha•43m ago•0 comments

Federal Court Denies Anthropic's Motion to Lift 'Supply Chain Risk' Label

https://www.nytimes.com/2026/04/08/technology/anthropic-pentagon-risk-circuit-court.html
2•DeathArrow•44m ago•0 comments

Flatpak: Complete Sandbox Escape

https://github.com/flatpak/flatpak/security/advisories/GHSA-cc2q-qc34-jprg
1•eyberg•44m ago•0 comments

AI #163: Mythos Quest

https://thezvi.substack.com/p/ai-163-mythos-quest
1•paulpauper•50m ago•0 comments

US adults are having fewer kids – and it's forcing schools to close

https://www.theguardian.com/us-news/2026/mar/16/birthrate-schools-closing
1•PaulHoule•53m ago•0 comments

'The Egg' by Andy Weir (2009)

https://www.galactanet.com/oneoff/theegg_mod.html
4•goekjclo•54m ago•0 comments

When AI Day of Reckoning?

https://www.overcomingbias.com/p/when-ai-day-of-reckoning
1•paulpauper•54m ago•0 comments

Keychron has open sourced its hardware

https://github.com/Keychron/Keychron-Keyboards-Hardware-Design/tree/main
5•azhenley•54m ago•0 comments

I rebuilt Claude Code's removed /buddy companion as a permanent MCP app

https://github.com/1270011/claude-buddy
2•1270011•57m ago•0 comments

Violating Copyright, Not the Planet

https://mumumelon.co/
1•Tomte•1h ago•0 comments

Bryson DeChambeau to use a 5-iron he made with 3D printer at Masters

https://www.espn.com/golf/story/_/id/48431238/bryson-dechambeau-using-iron-made-3d-printer-masters
1•1659447091•1h ago•0 comments