frontpage.

Show HN: Local task classifier and dispatcher on RTX 3080

https://github.com/resilientworkflowsentinel/resilient-workflow-sentinel

9•Shubham_Amb•4h ago

Hi HN, I am shubham a 3d artist who learned coding in college as an I.T. graduate know logics but not an expert as i just wanna try my hands on to ai

So i built Resilient Workflow Sentinel this is offline ai agent which classify urgency (Low,Medium and HIgh) and dispatches to the candidates based on availability Well i want an offline system like a person can trust with its sensitive data to stay completely locally

Did use ai to code for speeding and cutting labor.

Its works on RTX 3080 system (this is an basic affordable setup not heavy ai machinery) which i want it to make it reliable without heavy upgrade This is full system doesn't require ollama(I am not against it)

I see in companies tickets are raised on jira and slack. Currently people or manager (self) have to sort those things either manually read one by one or send them to the cloud. But the issue is you can't send everything like there is a lot of sensitive data out there which they do not trust and makes it harder and manual sorting through thousands is likely a nightmare.

But then just imagine u get all the task classified like its urgency and distribution u can selectively see which task is urgent and needs immediate attention and last of all information doesn't leave your building totally secure Also Api sending is not the only issue u are paying per token cost for task for each may be monthly 100$ to 1000$ which can like save hassle for startup a lot or companies as well

There was several biases like positional bias also json out put bias also have issues in attention At start i tried just prompting things like Chain of thoughts,RISE(evaluate negative first), given negative examples,Positive examples, somewhere it was struggling with commonsense issue so examples for that (Later changed the approach)

Well prompting did give the output and worked well but took too much time to process for single task like 70 to 90secs for a task

Then i tried batching and the biases got worst like it got stronger it always use to like favour alice also more prompts are like ignored and more

For json output i used constrain so model can only generate json and if fails there is a as well parser i used when i implemented prompting only

This reduce time from 90sec to nearly 15 to 30secs per task I used steering vector to correct the attention i seen issues happening

Stack: Language: Python 3.10 Model: qwen2.5-7b-instruct Libraries: Pytorch, Hugging Face Transformers (No Langchain, No Ollama) API: Fast API UI: NiceGUI Hardware: Ryzen 5, 16Gb ram RTX 3080

Implementation:

Quantization: Load model in nf4 quantization so models like 7b can fit on vram of 10gb which is on rtx 3080 also my hardware

Steering Vectors: Standard prompting wasn't enough. I need to block or direct certain things on a certain layer of llm to make it reliable.

Json Constraints: Used constraint to make model strictly give json and also stop from over explanation this happens at logits level where token are blocked which are not required etc

github : https://github.com/resilientworkflowsentinel/resilient-workf...

Youtube: https://youtu.be/tky3eURLzWo

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: Calfkit – an SDK to build distributed, event-driven AI agents on Kafka

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: Total Recall – write-gated memory for Claude Code

Show HN: A state-based narrative engine for tabletop RPGs

Show HN: Craftplan – I built my wife a production management tool for her bakery

Show HN: Accept-md – One command to make Next.js sites LLM-scraping friendly

Show HN: Playwright Best Practices AI SKill

Show HN: Ghidra MCP Server – 110 tools for AI-assisted reverse engineering

Show HN: Morph – Videos of AI testing your PR, embedded in GitHub

Show HN: Mmdr – 1000x faster Mermaid rendering in pure Rust (no browser)

Show HN: Safe-now.live – Ultra-light emergency info site (<10KB)

Show HN: GitHub Browser Plugin for AI Contribution Blame in Pull Requests

Show HN: CLI tool to convert Markdown to rich HTML clipboard content

Show HN: Octosphere, a tool to decentralise scientific publishing

Show HN: A package manager for agent skills with built-in evals

Show HN: Pipeline and datasets for data-centric AI on real-world floor plans

Show HN: SymDerive – A functional, stateless symbolic math library

Show HN: FizzBuzz Enterprise Edition 2026. AI-powered divisibility detection

Show HN: Sandboxing untrusted code using WebAssembly

Show HN: C discrete event SIM w stackful coroutines runs 45x faster than SimPy

Show HN: An AI-Powered President Simulator

Show HN: Claude.md templates based on Boris Cherny's advice

Show HN: Adboost – A browser extension that adds ads to every webpage

Show HN: FIPSPad – a FIPS 140-3 and NIST SP 800-53 minimal Notepad app in Rust

Show HN: Buquet – Durable queues and workflows using only S3

Show HN: Inklings – Handwritten family notes turned into a printed book monthly

Show HN: The Last Worm – Visualizing guinea worm eradication, from 3.5M to 10

Show HN: A text format for UI wireframes – comparing token costs across 4 format

Show HN: Interactive California Budget (By Claude Code)

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: Calfkit – an SDK to build distributed, event-driven AI agents on Kafka

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: Total Recall – write-gated memory for Claude Code

Show HN: A state-based narrative engine for tabletop RPGs

Show HN: Craftplan – I built my wife a production management tool for her bakery

Show HN: Accept-md – One command to make Next.js sites LLM-scraping friendly

Show HN: Playwright Best Practices AI SKill

Show HN: Ghidra MCP Server – 110 tools for AI-assisted reverse engineering

Show HN: Morph – Videos of AI testing your PR, embedded in GitHub

Show HN: Mmdr – 1000x faster Mermaid rendering in pure Rust (no browser)

Show HN: Safe-now.live – Ultra-light emergency info site (<10KB)

Show HN: GitHub Browser Plugin for AI Contribution Blame in Pull Requests

Show HN: CLI tool to convert Markdown to rich HTML clipboard content

Show HN: Octosphere, a tool to decentralise scientific publishing

Show HN: A package manager for agent skills with built-in evals

Show HN: Pipeline and datasets for data-centric AI on real-world floor plans

Show HN: SymDerive – A functional, stateless symbolic math library

Show HN: FizzBuzz Enterprise Edition 2026. AI-powered divisibility detection

Show HN: Sandboxing untrusted code using WebAssembly

Show HN: C discrete event SIM w stackful coroutines runs 45x faster than SimPy

Show HN: An AI-Powered President Simulator

Show HN: Claude.md templates based on Boris Cherny's advice

Show HN: Adboost – A browser extension that adds ads to every webpage

Show HN: FIPSPad – a FIPS 140-3 and NIST SP 800-53 minimal Notepad app in Rust

Show HN: Buquet – Durable queues and workflows using only S3

Show HN: Inklings – Handwritten family notes turned into a printed book monthly

Show HN: The Last Worm – Visualizing guinea worm eradication, from 3.5M to 10

Show HN: A text format for UI wireframes – comparing token costs across 4 format

Show HN: Interactive California Budget (By Claude Code)