"A+" "B" "C-" "F", etc. feel a lot more intuitive than how stars are used.
I used to rate three stars for what "performs as expected" until I realized that it's punishing good products. Switch to A-F would result in the same behavior, except it'd be Uber drivers trying to make a living instead of noxious parents declaring that their kid deserves an A.
In US education you are taught that you need to get an A. Anything below a C, gets you on the equivalent of a “Performance Improvement Plan” in corporate world. And B is… well… B.
So with that rating engrained, people would probably feel bad about rating their ride-share driver a C when they did what was expected. And it wouldn’t stop companies from pushing for A ratings.
Even elsewhere like the food industry where they do have letter ratings, A is the norm with anything lower being an outlier.
Perhaps for this to work, it would need a complete systemic shift where C truly is the average and A and F are the outliers. In school C would need to be “did the student do the assignment.” And A would need to be “the student did the assignment, and then some.”
Consider for example the "S" as a better grade than "A", originating from Japan but widely applied in gaming.
I wonder if companies are afraid of being accused of "cooking the books", especially in contexts where the individual ratings are visible.
If I saw a product with 3x 5-star reviews and 1x 3-star review, I'd be suspicious if the overall rating was still a perfect 5 stars.
You would start by estimating each driver's rating as the average of their ratings - and then estimate the bias of each rider by comparing the average rating they give to the estimated score of their drivers. Then you repeat the process iteratively until you see both scores (driver rating, and user bias) converge.)
[0] https://en.wikipedia.org/wiki/Expectation%E2%80%93maximizati...
Alternatively, there might be some hidden reason why a broken rating system is better than a good one, but if so I don't know it.
Anything really bad can be dealt with via a complaint system.
Anything exceptional could be asked by a free text field when giving a tip.
Who is going to read all those text fields and classify them? AI!
The big rating problem I have is with sites like boardgamegeek where ratings are treated by different people as either an objective rating of how good the game is within its category, or subjectively how much they like (or approve of) the game. They're two very different things and it makes the ratings much less useful than they could be.
They also suffer a similar problem in that most games score 7 out of 10. 8 is exceptional, 6 is bad, and 5 is disastrous.
2 and 4 are irrelevant and/or a wild guess or user defined/specific.
Most of the time our rating systems devolve into roughly this state anyways.
E.g.
5 is excellent 4.x is fine <4 is problematic
And then there's a sub domain of the area between 4 and 5 where a 4.1 is questionable, 4.5 is fine and 4.7+ is excellent
In the end, it's just 3 parts nested within 3 parts nested within 3 parts nested within....
Let's just do 3 stars (no decimal) and call it a day
The trick is collecting enough ratings to average out the underlying issues and keeping context. IE: You want rankings relative to the area, but also on some kind of absolute scale, and also relative to the price point etc.
A reviewer might round up a 7/10 to a 3 as it’s better than average, while someone else might round down a 8/10 because it’s not at that top tier. Both systems are equally useful with 1 or 10,000 reviews but I’m not convinced they are equivalent with say 10 review.
Also, most restaurants that stick around are pretty good but you get some amazingly bad restaurants that soon fail. It’s worth separating overpriced from stay the fuck away.
However, the rounding issue is a big deal both in how people rate stuff and how they interpret the scores to the point where small numbers of responses become very arbitrary.
It doesn't mitigate the effect, the combination of the effect on rating and interpretation is the source of the issue, which exists whenever the review reader isn't in the cultural midpoint of the raters.
Obviously, yet the scale of the mismatch when looking at a composite score isn’t total, thus the effect is being mitigated.
Further, even without that the more consistent the cultural mix the more consistent the ratings. Anyone can understand a consistent system.
Has anyone seen a live system (Uber, Goodreads, etc.) implement per-user z-score normalization?
"Here's your last 5 drivers, please rank them"
nlh•14h ago
Take any given Yelp / Google / Amazon page and you'll see some distribution like this:
User 1: "5 stars. Everything was great!"
User 2: "5 stars. I'd go here again!"
User 3: "1 star. The food was delicious but the waiter was so rude!!!one11!! They forgot it was my cousin's sister's mother's birthday and they didn't kiss my hand when I sat down!! I love the food here but they need to fire that one waiter!!"
Yelp: 3.6 stars average rating.
One thing I always liked about FourSquare was that they did NOT use this lazy method. Their score was actually intelligent - it checked things like how often someone would return, how much time they spent there, etc. and weighted a review accordingly.
theendisney•14h ago
If one would normalize the ratings they could change without doing anything. A former customer may start giving good ratings elsewhere making yours worse or give poor ones inproving yours.
Maybe the relevance of old ratings should decline.
kayson•14h ago
theendisney•9h ago
Alternatively you could apply the same rating to the customer and display it next to their user name along with their own review counter.
What also seems a great option is to simply add up all the stars :) Then the grumpy people wont have to do anything.
ajmurmann•13h ago
This actually somewhat goes into another pet peeve of mine with rating systems. I'd like to see ratings for how much I will like it. An extreme but simple example might be that the ratings of a vegan customer of a steak house might be very relevant to other vegans but very irrelevant to non-vegans. More subtle versions are simply about shared preferences. I'd love to see ratings normalized and correlated to other users to create a personalized rating. I think Netflix used to do stuff like this back in the day and you could request your personal predicted score via API but now that's all hidden and I'm instead shown different covers off the same shows over and over
Hizonner•13h ago
My favorites: A power supply got one star for not simultaneously delivering the selected limit voltage and the selected limit current into the person's random load. In other words, literally for not violating the laws of physics. An eccentric-cone flare tool got one star for the cone being off center. "Eccentric" is in the name, chum....
stevage•13h ago
derefr•12h ago
esperent•13h ago
I would personally frame that as a review for poor documentation. A device shouldn't expect users to know laws of physics to understand it's limitations.
Hizonner•12h ago
We're talking about a general-purpose device meant to drive a circuit you create yourself. I'm not sure what a good analogy would be. Expecting the documentation for a saw to tell you you have to cut all four table legs the same length?
esperent•9h ago
The saw analogy isn't a good one - saws work within the range of physics that humans have instinctual understanding of. We instinctively know what causes a table to wobble. We do not instinctively know the physical behaviors of electricity.
You might counter that people should know this before messing with electricity, and I'll agree. But what people should know and what they actually know are often very different.
A warning in the manual might prevent some overeager teenager who got their hands on this device from learning this particular law of physics the hard way.
nmstoker•12h ago
anon7000•12h ago
Why can’t I downvote or comment on it? As a user, I just want more context.
But obviously, it’s not in Amazon’s interest to make me not want to buy something.
BeFlatXIII•10h ago
kazinator•12h ago
zzo38computer•12h ago
nlh•7h ago