Here's how we got here:
We started with standard FIM (fill-in-the-middle) autocomplete. Other AI code completion tools in JetBrains can only "add" code at your cursor position. This can be helpful when writing assert statements in unit tests, but FIM is less useful for cases like adding a new parameter to a function.
Autocomplete feels much better when it can rewrite code in addition to simply adding code. This is typically called "next-edit."
To get this capability, we trained a model on granular user actions such as arrow key movements, cursor jumps, and keystroke-level data. This works really well for tasks like adding enumerate to a Python for-loop, refactoring conditionals, and other repetitive changes.
Another problem is that this can be very slow. Out of the box with vLLM, each request had a median latency of 1500ms (too long). To optimize this, we rewrote TensorRT-LLM to support N-gram speculative decoding, which lets us serve completions at a median latency of 94ms. We also wrote more about it here: https://blog.sweep.dev/posts/next-edit-jetbrains
Finally, to get full codebase context awareness, we actually have one unique advantage over VS Code. The JetBrains codebase index (via their Program Structure Interface) is exceptionally well-built and has already indexed the entire codebase. This means we can quickly access the definitions of arbitrary functions or classes. To get extremely precise codebase context, we fetch the definitions of code symbols around your cursor and pass them to our model.
We've spent a lot of time getting the details right, and we'd love to get your thoughts and feedback!