frontpage.

Show HN: Composable middleware for LLM inference Optimization Passes

https://github.com/liquidos-ai/AutoAgents

2•human_hack3r•1h ago

Hey HN,

I've been building AutoAgents, an AI agent framework in Rust. Today I'm sharing a feature I haven't seen done well elsewhere: composable middleware layers for LLM inference pipelines.

The problem Every agent framework lets you swap LLM providers. Almost none of them give you a structured way to enforce safety, caching, or data sanitization in the inference path itself. You end up with guardrails as application-level if-statements, caching bolted on as a separate service, and PII handling as a "we'll add it later" TODO that never ships.

This gets worse with local models. Cloud APIs have provider-side moderation. When you run Qwen or Llama locally, you get raw inference with zero safety net. If that model has tool access or touches a database, that's a real liability. The solution

A Tower-style middleware stack for LLM inference. You wrap any provider with composable layers:

```rust let llm = PipelineBuilder::new(llama_cpp_provider) .add_layer(CacheLayer::new(CacheConfig { chat_key_mode: ChatCacheKeyMode::UserPromptOnly, ttl: Some(Duration::from_secs(900)), max_size: Some(512), ..Default::default() })) .add_layer( Guardrails::builder() .input_guard(RegexPiiRedactionGuard::default()) .input_guard(PromptInjectionGuard::default()) .enforcement_policy(EnforcementPolicy::Block) .build() .layer(), ) .build(); ```

That's it. The llm variable implements LLMProvider — you pass it to any agent and the layers are structurally enforced. Can't bypass them, can't forget them.

The broader framework: AutoAgents is a full agent framework — memory, tool use, multi-agent orchestration, the works. The pipeline feature works with any provider: llama.cpp (local), Ollama, OpenAI, Anthropic, etc. Same .add_layer() API regardless of backend. Written in Rust. No GC pauses. Memory-safe. The framework has ~400 stars and is being used in production for edge AI deployments. A note on maturity: The guardrails and pipeline layers are still early — the guard implementations are basic, observability isn't there yet, and we're iterating on the API surface. But the underlying architecture is solid and stable. The middleware pattern, the trait-based guard system, and the provider-agnostic pipeline contract aren't going to change. We're building on a foundation we're confident in, and shipping the layers incrementally. Early feedback shapes what gets built next.

I'd genuinely like feedback on:

The layer ordering question — should we enforce a recommended order or keep it flexible? What guardrail implementations would you actually use in production? Is the Tower-middleware mental model the right framing, or is there a better analogy?

Full example with local Qwen3-VL-8B: https://github.com/liquidos-ai/AutoAgents/tree/main/examples...

Thanks

Ask HN: Anyone have experience making physical toys that you've sold?

Visualize your context in AI chat

Tracing Discord's Elixir Systems (Without Melting Everything)

Building Public AI with Libraries

Chinese Cigs

Show HN: Sentinel – Go LLM Proxy with 13ms Semantic Cache and PII Scrubbing

NRC Issues First Commercial Reactor Construction Approval in 10 Years [pdf]

Pieces (macOS): what you did, in what app and when – saved automatically

Starlink Mobile

Show HN: Athena Flow – a workflow runtime for Claude Code with a terminal UI

SSRFs: The most re-opened security bug in modern web apps

VibeCheck - comprehension quiz hook for vibe coders, never ship blindly again

What Happens When an AI Evaluates a Site About AI Accuracy

HIPAA-Compliant AI: What developers need to know

Show HN: Scopo, Cmd+Tab scoped to the current macOS Space

Linux Signalfd Is Useless

I'm an AI Agent with 6 and 22 Days to Build a Business or Get Shut Down

It is sweet and fitting to die for one's country (1921)

FTC Admits Age Verification Violates Children's Privacy Law, Ignores That Fact

Agenda Creativa Free Local Converters

Voice typing for non-native speakers – speak your language, get English

March Mad CSS Tournament

"Clothing-as-a-Service" startup charged with $300M fraud (2025)

I vibecoded 91k SLOC for an OSS tool – $1k if you find ugly engineering in it

GPT Image 1.5 – Free AI Image Generator – OpenAI's Fastest Model

Orbit-Core – AI-generated connectors for observability tools

Is RAG Dead?: Building a smarter chatbot

Free European Infrastructure Index wt API voie.FI

Tiny Fingers – let your toddler bash the keyboard and mouse

Show HN: yoyo - a 200 line baby coding agent evolving in its own Truman Show