NVIDIA released Cosmos-Reason2 last month, targeting physical AI workloads (video reasoning, robotics planning, event detection), with official support for DGX Spark, H100, GB200 and Jetson AGX Thor.
We quantized the 2B model to W4A16 and optimized it further to run across the full Jetson lineup, including the most constrained Orin Nano 8GB Super (8 GB).
Interested in feedback from others deploying VLMs on Jetson, especially around serving stacks (vLLM vs TensorRT-LLM vs other approaches) and practical bottlenecks!
Embedl-Wilhelm•1h ago
We quantized the 2B model to W4A16 and optimized it further to run across the full Jetson lineup, including the most constrained Orin Nano 8GB Super (8 GB).
Model, setup instructions, and benchmarks: https://huggingface.co/embedl/Cosmos-Reason2-2B-W4A16
Interested in feedback from others deploying VLMs on Jetson, especially around serving stacks (vLLM vs TensorRT-LLM vs other approaches) and practical bottlenecks!