CLaRa (Continuous Latent Reasoning) is an approach Retrieval-Augmented Generation (RAG) that shortens context, reduces double-encoding, and improves quality of responses by compressing documents into a small set of "continuous memory tokens" that preserve the key information in the documents, and optimizes and performs retrieval and generation out of that shared latent space.
anactofgod•1h ago