Been working on a completely open source LLM utilizing MLA from DeepSeek, PEER from DeepMind's research, and adding in a couple of performance optimizations for GH200's and PEER (including what I think is a nifty caching strategy). I named it LLM720 because the goal is to have this be the next iteration of what was accomplished with LLM360.
I'm looking for collaborators. We're about to start a large training run now that ablations are starting to finish, and would like to have more people along for the ride.
wrmedford•3h ago
I'm looking for collaborators. We're about to start a large training run now that ablations are starting to finish, and would like to have more people along for the ride.