frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

Show HN: Dual YOLOv8n UAV Detection on RK3588S at 42 FPS Using NPU

https://github.com/alebal123bal/khadas_yolov8n_multithread
56•alebal123bal•5h ago

Comments

alebal123bal•5h ago
I built this while trying to understand how much of the RK3588S vision pipeline could be kept off the CPU.

The main trick is not the YOLO model itself, but the pipeline structure: MIPI capture through the ISP, resize/color conversion through RGA, and YOLOv8n inference through all 3 NPU cores with one RKNN context per core. With a 3-thread inference pool the pipeline goes from ~31 FPS to the OS08A10 camera’s 46 FPS ceiling.

The memory footprint is also small: roughly 137–152 MB RSS for one 1080p stream, using a fixed preallocated buffer pool rather than per-frame allocations. Two streams are roughly 276–304 MB RSS.

The repo also has a multi-process side of the pipeline: detections are published over Unix-domain sockets to tracking, temporal features, a presence FSM, and an optional Qwen2.5-0.5B summary step. For the LLM step, the camera pipeline can temporarily blackout/resume so RKLLM gets the whole NPU.

I split the work into three repos:

- runtime dual-stream YOLOv8n RK3588S pipeline: https://github.com/alebal123bal/khadas_yolov8n_multithread

- train/export/INT8 RKNN conversion for YOLOv8/YOLOv5: https://github.com/alebal123bal/RKNN_TRAIN_YOLO

- Qwen on RK3588S, via RKLLM/NPU or llama.cpp/CPU: https://github.com/alebal123bal/RKLLM_LLAMA_QWEN

The demo class is UAV/drone, but this is meant as a general edge-inference pipeline example, not an operational/surveillance/defense system.

throwa356262•37m ago
These NPUs look very interesting.

Sad they are mostly sitting there unused because very few people know how to program them.

robinduckett•4h ago
Is there something special about yolov8 over later models (9-12)? It seems most of the research and working examples default to v8 despite it being 3 years old. Or just because it is what fits on this hardware?
snovv_crash•3h ago
Newer versions aren't open source, or at least have murky licencing.
robinduckett•3h ago
Ahh that’ll do it. A shame really, the later models seem to be fairly good just from my idle testing as an enthusiast.
alebal123bal•3h ago
Mainly because YOLOv8 is well-supported by the Rockchip/RKNN toolchain.

The goal here was an end-to-end RK3588S pipeline rather than comparing detector families: training/export, ONNX graph fixing, INT8 RKNN conversion, C++ postprocessing, and runtime inference across the 3 NPU cores. YOLOv8 has known-good export paths and Rockchip examples, so it was the most practical baseline.

Newer YOLO versions may be possible, but usually require more work around RKNN export compatibility.

stefan_•2h ago
More slop again. The way to get more throughput is to bump batch size, not to try and "multithread" job submits to the NPU as if its a CPU.
alebal123bal•1h ago
Batching is definitely the right answer for some offline / throughput-only cases, but it was not the right tradeoff here.

This pipeline is processing live camera frames and displaying/streaming annotated output, so latency and frame freshness matter. Increasing batch size would add queueing latency and tends to make the output older, especially when the sensor is producing frames continuously.

The “multithreading” here is not treating the NPU like a CPU in the usual sense. The RK3588S NPU is exposed as 3 cores, and RKNN supports using separate contexts with `rknn_dup_context` and assigning them with `rknn_set_core_mask`. The point was to keep the 3 NPU cores fed while capture, RGA preprocessing, inference, and display are pipelined.

In the single-context loop I was seeing ~31 FPS. With one context per NPU core and pipelined frame handling, it reaches the camera ceiling, around 42–46 FPS depending on the mode. So in this particular real-time streaming setup, parallel contexts/core masks were the practical way to saturate the hardware without adding batch latency.

stefan_•25m ago
Again with it. You have two cameras, so you can batch 2 already with no latency hit. In fact less latency because the fake multithreading is gone.

(You are not even measuring latency correctly)

Show HN: Kage – Shadow any website to a single binary for offline viewing

https://github.com/tamnd/kage
175•tamnd•3h ago•43 comments

Show HN: Trace – Offline Mac meeting transcripts you can flag mid-call

https://traceapp.info
12•AG342•23h ago•2 comments

Show HN: 3D print Z reinforcement via injected loops

https://mgunlogson.github.io/magma/
37•mgunlogson•5d ago•11 comments

Show HN: Ray Hosting – Topology-aware game server orchestrator made from scratch

https://ray-hosting.com/en-US
2•bardhyliis•25m ago•0 comments

Show HN: Discover Wikipedia articles popular on Hacker News

https://www.orangecrumbs.com/
4•octopus143•2h ago•0 comments

Show HN: Dual YOLOv8n UAV Detection on RK3588S at 42 FPS Using NPU

https://github.com/alebal123bal/khadas_yolov8n_multithread
56•alebal123bal•5h ago•9 comments

Show HN: Philosophy for Kids

https://philosophy.ocaho.com/
3•rahimnathwani•2h ago•1 comments

Show HN: I am building a map of people who lived in the Roman Empire

https://new.roman-names.com/
196•metiscus•4d ago•44 comments

Show HN: Paca – Lightweight Jira alternative for human-AI collaboration

https://github.com/Paca-AI/paca
160•pikann22•1d ago•57 comments

Show HN: A zero-telemetry clipboard, color picker, and capture suite

5•Peacetoes•4h ago•6 comments

Show HN: Bastion – isolated Linux VMs for background coding agents

https://bastion.computer/
24•almostlit•17h ago•2 comments

Show HN: Homebrew 6.0.0

https://brew.sh/2026/06/11/homebrew-6.0.0/
1454•mikemcquaid•3d ago•355 comments

Show HN: I run a vision model on every screenshot, locally, on a 4GB GPU

https://github.com/ayushh0110/ScreenMind
32•skye0110•21h ago•4 comments

Show HN: Afterburner – Capability-Sandboxed JavaScript/TS Runtime in Rust

https://github.com/afterburner-sh/afterburner
6•vertexclique•7h ago•2 comments

Show HN: Putt.day a daily mini golf game

https://putt.day/
309•ellg•1d ago•110 comments

Show HN: Lightweight Task queue on Erlang/OTP, SQLite-backed, no overengineering

https://github.com/entGriff/ezra
73•ent1c3d•4d ago•11 comments

Show HN: 2 Weeks of Hallucinate – The Photo Gallery

https://hallucinate.site/gallery
71•stagas•1d ago•24 comments

Show HN: Öcha – A minimalist, Kindle-style RSS and newsletter reader

https://readocha.com/
4•pavn•5h ago•0 comments

Show HN: Velyr – an AI agent that finds and fixes conversion leaks on your site

https://velyr.io/
7•flo_r•10h ago•1 comments

Show HN: Quant Picker – which GGUF file fits your model and machine

https://vettedconsumer.com/quant-picker/
18•ermantrout•1d ago•0 comments

Show HN: FablePool – pool money behind a prompt, and Fable builds it in public

https://fablepool.com
521•matthewbarras•2d ago•274 comments

Show HN: StackScope – I crawled over 40k indie launches to see what they ship

https://stackscope.dev/
64•datafreak_•2d ago•17 comments

Show HN: Extend UI – open-source UI kit for modern document apps

https://www.extend.ai/ui
250•kbyatnal•4d ago•81 comments

Show HN: Boo – Screen-style terminal multiplexer built on libghostty

https://github.com/coder/boo
94•kylecarbs•2d ago•28 comments

Show HN: GlyphX, a local-first LaTeX editor that compiles offline

4•kanakkholwal•5h ago•0 comments

Show HN: Skill for your agent to visualize your gbrain and Obsidian

https://github.com/vladignatyev/brain-map-skill
21•v_ignatyev•1d ago•16 comments

Show HN: Claw Patrol, a security firewall for agents

https://github.com/denoland/clawpatrol
110•rough-sea•5d ago•30 comments

Show HN: HelixDB – A graph database built on object storage

https://github.com/HelixDB/helix-db/tree/main
157•GeorgeCurtis•4d ago•42 comments

Show HN: Turn your name into a tree in an infinite procedural shanshui landscape

https://landscape.bairui.dev/
41•subairui•4d ago•21 comments

Show HN: Motplot is a crossword but it plays like Sudoku

https://motplot.app/
5•jamwise•15h ago•3 comments