frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Open Access Qwen3.6-35B-A3B-UD-Q5_K_M with TurboQuant

2•freakynit•1h ago
https://w418ufqpha7gzj-80.proxy.runpod.net

Started for myself, but since Im not using it continuously, sharing it:

Open Access Qwen3.6-35B-A3B-UD-Q5_K_M with TurboQuant (TheTom/llama-cpp-turboquant) on RTX 3090 (Runpod spot instance).

5 parallel requests supported.. full context available (please don't misuse..there are no safety guards in place)

Open till spot instance lasts or max 4 hours.

And yes, no request logging (I don't even know how to do it with llama-server)

Prompt processing and generation speeds (at 8K context): 900t/s and 60t/s. And at 100K context: 450t/s and 30t/s.

Command used:

    ./build/bin/llama-server \
      -m ../Qwen3.6-35B-A3B-UD-Q5_K_M.gguf \
      --alias 'Qwen3-6-35B-A3B-turbo' \
      --ctx-size 262144 \
      --no-mmproj \
      --host 0.0.0.0 \
      --port 80 \
      --jinja \
      --flash-attn on \
      --cache-type-k turbo3 \
      --cache-type-v turbo3 \
      --reasoning off \
      --temp 0.6 \
      --top-p 0.95 \
      --top-k 20 \
      --min-p 0.0 \
      --presence-penalty 0.0 \
      --repeat-penalty 1.0 \
      --parallel 5.0 \
      --cont-batching \
      --threads 16 \
      --threads-batch 16
Thanks..

Comments

freakynit•37m ago
Update: spot terminated
EnthrallingEmil•5m ago
Also 3090. using Q4_XL, reduced max context size, with 100k prompt length, I get 2520 tk/s for prompt processing, 68 token/s generation:

  llama-server \
    --model /mnt/ubuntu/models/llama-cpp-qwen/Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf \
    --ctx-size 150000 \
    --n-gpu-layers 99 \
    --cache-type-k q8_0 \
    --cache-type-v q8_0 \
    --parallel 3 \
    --kv-unified \
    --ctx-checkpoints 32 \
    --checkpoint-every-n-tokens 8192 \
    --checkpoint-min-tokens 64 \
    --flash-attn on \
    --batch-size 4096 \
    --ubatch-size 1024 \
    --reasoning on \
    --temp 0.6 \
    --top-p 0.95 \
    --top-k 20
I was wondering if turboquant is worth the effort right now, but I'm not yet seeing it speed wise.

checkpoint-min-tokens is a local patch I have so that small background tasks don't wreck my checkpoint cache.

AWS Security Agent on-demand penetration testing now generally available

https://aws.amazon.com/about-aws/whats-new/2026/03/aws-security-agent-ondemand-penetration/
1•mariuz•1m ago•0 comments

The Strait of Hormuz is now open

https://www.cnn.com/2026/04/17/investing/oil-strait-hormuz-iran
1•ricberw•1m ago•0 comments

AI Tool Blindness

https://www.wespiser.com/posts/2026-04-17-ai-tool-blindness.html
1•wespiser_2018•1m ago•1 comments

Solo founders and indie hackers should have a backup plan

https://alcazarsec.com/deadmanswitch/use-cases/solo-founders
1•alcazar•2m ago•0 comments

Two US citizens sentenced for running North Korean laptop farms

https://www.tomshardware.com/tech-industry/two-us-citizens-get-combined-18-years-in-prison-for-ru...
1•drak0n1c•2m ago•0 comments

Stop Killing Games at the European Parliament Full Hearing [video]

https://www.youtube.com/watch?v=QXdmoeaYZ9Y
1•weli•2m ago•0 comments

Show HN: Using an AI agent to refine a ML model for Zephyr RTOS

https://rufilla.com/the-mlforge-proof-of-concept/
1•OOHehir•3m ago•0 comments

Cloudflare: The Agent Readiness score. Is your site agent-ready?

https://blog.cloudflare.com/agent-readiness/
2•kol3x•4m ago•1 comments

Consider sending a list of everything you did to your coworkers everyday

https://aelerinya.substack.com/p/consider-sending-a-list-of-everything
1•surprisetalk•4m ago•0 comments

Scientists Develop "Molecular Scissors" Alternative to Cas9

https://humanprogress.org/scientists-develop-molecular-scissors-alternative-to-cas9/
1•surprisetalk•4m ago•0 comments

Rejoice: A concatenative multiset language built on Fractran-like primitives

https://wiki.xxiivv.com/site/rejoice
1•surprisetalk•4m ago•0 comments

Why Amazon Is Buying Globalstar–and What It Means for Your iPhone

https://www.wired.com/story/why-amazon-is-buying-globalstar-and-what-it-means-for-your-iphone/
1•smurda•4m ago•0 comments

Chinese fabs import US chipmaking equipment via Singapore and Malaysia

https://www.tomshardware.com/tech-industry/chinese-chip-tool-makers-booked-record-2025-revenues
1•speckx•5m ago•0 comments

How should you change your life if we are being watched by alien drone probes?

https://marginalrevolution.com/marginalrevolution/2026/04/how-should-you-change-your-life-decisio...
1•surprisetalk•6m ago•1 comments

Distill MCP – Turn your reading queue into a podcast, via Claude Code MCP

https://github.com/davidlbatey/distill_mcp
2•davidlbatey•7m ago•1 comments

Is 1 Nit Enough? – Phone Minimum Display Brightness

https://www.lttlabs.com/articles/2026/04/16/phone-minimum-display-brightness
1•LabsLucas•8m ago•0 comments

Linux 7.1 Crypto Code Rework Enables More Optimizations by Default

https://www.phoronix.com/news/Linux-7.1-Crypto
2•Brajeshwar•9m ago•0 comments

What Is Infrastructure from Code?

https://encore.dev/blog/what-is-infrastructure-from-code
2•andout_•10m ago•1 comments

A third of Americans don't drive. So why is our transportation so car-centric?

https://yaleclimateconnections.org/2025/01/american-transportation-revolves-around-cars-many-amer...
3•doener•10m ago•0 comments

Teaching a Model to Code

https://rig.ai/blog/teaching-a-model-to-code
3•adam_patarino•10m ago•1 comments

Replaced Official Release Date Trailer [video]

https://www.youtube.com/watch?v=fuUo7_VaboE
2•doener•12m ago•0 comments

Anthropic Quadruples London Office Amid US Regulatory Tensions

https://www.techbuzz.ai/articles/anthropic-quadruples-london-office-amid-us-tensions
3•gaurangt•14m ago•0 comments

White House Investigating Wave of Missing or Dead Scientists

https://www.newsweek.com/white-house-investigating-wave-mystery-dead-scientists-11836410
3•tejohnso•15m ago•0 comments

High Amplitude Disagreeableness – Stay SaaSy

https://blog.staysaasy.com/p/high-amplitude-disagreeableness
2•kiyanwang•16m ago•0 comments

Reflections on Trusting Trust [pdf]

https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_ReflectionsonTrustingTrust.pdf
2•throwpoaster•16m ago•1 comments

Twilio Account Hacked

2•kinj28•16m ago•0 comments

Show HN: Use real handwriting for messages and forums (Write Me, Maybe)

https://writememaybe.com/
2•blemblemblam•18m ago•1 comments

WorldSeed – define a world in YAML, let AI agents live in it

https://github.com/AIScientists-Dev/WorldSeed
2•jay_morphmind•18m ago•0 comments

Great Docs for Python Project Documentation

https://opensource.posit.co/blog/2026-04-15_great-docs-introduction/
2•richmeister•19m ago•0 comments

PostgreSQL MVCC, Byte by Byte

https://boringsql.com/posts/postgresql-mvcc-byte-by-byte/
3•radimm•19m ago•0 comments