In browser PPO training demo, made possible by tinygrad: TinyJit -> WebGPU kernels.
Requires WebGPU.
Comments
simedw•44m ago
Cool project!
I noticed that if you go from training to watch and then back, the training temporarily drop significantly in score.
neduma•36m ago
More details and implementation notes please?
beardsciences•5m ago
My average eventually made it to about 3900, and then stagnated between 3600-3900. I'm curious if this is universal behavior or not. I'm up to about 5k steps.
simedw•44m ago
I noticed that if you go from training to watch and then back, the training temporarily drop significantly in score.