We are working on one! Happy to send you a draft if you're interested.
danielmarkbruce•6mo ago
yes please, do send. my username at the very popular google email service.
ACCount36•6mo ago
No implementation details, no samples from an actual reward model in action, no github repo. Looks like a sales page more than anything. Eww.
trhway•6mo ago
it was just a matter of time before the reward/error/etc. would become a vector instead of a scalar - whether it is a vector of tokens or a vector of some other qualities the output is evaluated for. So, instead of vector gradient - d scalar error by d every var - we'll have a matrix and thus would be matrix-transforming the NN instead of just adding the gradient.
danielmarkbruce•6mo ago
aagr•6mo ago
danielmarkbruce•6mo ago