LLMs Are Bad Judges. So Use Our Classifier Instead

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5331811

41•lordgrenville•4mo ago

Comments

adlumal•4mo ago

Having done some work in the legal AI field, I wonder how this classifier deals with issues of transparency, explainability and ultimately trust? It’s valuable to have some idea of how a proceedings might unfold but from my experience most competent lawyers have a high bar when it comes to trusting any AI/ML output.

Taikonerd•4mo ago

I was worried about explainability, too. If the classifier just spat out "INNOCENT" or "GUILTY," it would be useless -- the legal reasoning has to be part of the output.

Looking at the paper, the classifier definitely does output its reasoning:

"The legal issue at hand is whether the 50/50 royalty split in the 1961 contract binds only pre-existing affiliates or if it also includes affiliates that come into being after the agreement..."

leobg•4mo ago

This reads like an ad for Arbitrus.ai. It’s copywriting lingo:

> We built one called Arbitrus. We put it through a mini-Choi test and it mopped the floor with the competition

lordgrenville•4mo ago

True, but they own that:

> Declaration of Interest: [Authors] have financial interests in...Arbitrus.ai. As the title would suggest, the authors are making no effort to obfuscate this fact.

causal•4mo ago

I'd argue that dressing an ad up as an academic paper is obfuscation

Taikonerd•4mo ago

I had thought: what's the business model for Arbitrus? Is it going to be a sort of "suggested finding" tool for judges? Or are law firms going to use it to screen cases, so they can pick winners?

It seems like the answer is neither: on their website, Arbitrus.ai says it's for private arbitration. "Arbitrus is a private court system with an AI judge. Why use the public court system or expensive AAA arbitration to settle your disputes, when you can do it faster, cheaper, and better with Arbitrus?"

nextaccountic•4mo ago

What kind of classifier is this? I mean is it k-NN (for example), or something else?

Even LLMs can be viewed as classifiers, as the paper (ad?) itself admits.

esafak•4mo ago

pg36 "This is proprietary and part of Fortuna’s moat, so we explain it to the extent appropriate."

opwieurposiu•4mo ago

I love the idea of Arbitrus.ai, but they want $2500 a go to test it. I wish they had a demo version to play with.

barbazoo•4mo ago

The margin and line spacing makes this hard to read. Is this how you're supposed to typeset a paper? Some pages have three, maybe four sentences on them.

LawKek•4mo ago

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5587611. Related to this post.

Willow – Protocols for an uncertain future [video]

Feedback on a client-side, privacy-first PDF editor I built

Clay Christensen's Milkshake Marketing (2011)

Show HN: WeaveMind – AI Workflows with human-in-the-loop

Show HN: Seedream 5.0: free AI image generator that claims strong text rendering

A contributor trust management system based on explicit vouches

Show HN: Analyzing 9 years of HN side projects that reached $500/month

The Floating Dock for Developers

Arcan Explained – A browser for different webs

We are not scared of AI, we are scared of irrelevance

Quartz Crystals

Show HN: I built a free dictionary API to avoid API keys

Show HN: Kybera – Agentic Smart Wallet with AI Osint and Reputation Tracking

Show HN: brew changelog – find upstream changelogs for Homebrew packages

Any chess position with 8 pieces on board and one pair of pawns has been solved

LLMs as Language Compilers: Lessons from Fortran for the Future of Coding

Projecting high-dimensional tensor/matrix/vect GPT–>ML

Show HN: Free Bank Statement Analyzer to Find Spending Leaks and Save Money

Our Stolen Light

Matchlock: Linux-based sandboxing for AI agents

Show HN: A2A Protocol – Infrastructure for an Agent-to-Agent Economy

Drinking More Water Can Boost Your Energy

Proving Laderman's 3x3 Matrix Multiplication Is Locally Optimal via SMT Solvers

Fire may have altered human DNA

"Compiled" Specs

The Next Big Language (2007) by Steve Yegge

Open-Weight Models Are Getting Serious: GLM 4.7 vs. MiniMax M2.1

Using AI for Code Reviews: What Works, What Doesn't, and Why

Show HN: Solnix – an early-stage experimental programming language

DoNotNotify is now Open Source