I am building a Terminal User Interface (like Claude Code) for self-hosted AI agents on Jetsons. Works in air-gapped environments. Unlike other solutions, this is optimised for unified memory machines, as to avoid OOM errors.
The agent can do stuff like edit, read, create files - manage and interpret data locally.
Currently, it gets ~17 tok/s on Jetson Orin Nano 8GB using Qwen3-4B-Instruct-4bit In the future, adding TensorRT .engine support which will boost inference further. I am trying to get the memory footprint down, so if anyone has knowledge on kv cache optimisation, that would be great.
I would love to get your feedback and people try running it on more capable devices and models - post your results here.
Run ``` pip install open-jet open-jet --setup ```
Webiste: https://www.openjet.dev/ Directly on Pypi: https://pypi.org/project/open-jet/ Repo: https://github.com/L-Forster/open-jet/