Show HN: Direct to silicon DLinear AI accelerator on the Sky130 open-source node

https://github.com/Aperion-Technologies/Aperion-DLinear-ASIC-Core

2•NotJustBinary•1h ago

Hi HN, I have always been interested and inspired by the idea of speed in computing, which raises a logical question: why do we use general-purpose processors for tasks that require minimal latency and predictability? The question is interesting, so a couple of months ago I thought: can I make a DLinear-type time series model? DLinear is a simple but effective tool for time series analysis, to implement it directly into silicon using a PDK with an open Sky130.

Well, it wasn't as easy as it seemed... My first attempt at direct synthesis led to a nightmare. OpenLane (the RTL-to-GDSII flow I used) reported a setup slack of -7.88ns. Essentially, the signal was too slow to travel through the 64-tap window within a 100MHz clock cycle. I spent weeks refactoring the architecture in Chisel. I moved to a fully unrolled, 4-stage deep pipeline. The hardest part was balancing the binary adder tree; I had to ensure that the 6 levels of addition didn't bottleneck the entire chip. I also realized that I could cheat a bit: instead of using a resource-heavy divider for the moving average, I used a static bit-shift (>> 6). In hardware, that’s just re-wiring, which costs zero nanoseconds and zero gates. The final result is an 86,443-cell design that is LVS/DRC clean.

Of course, it now runs at 100 MHz on a 130 nanometer process, but physics shows that in theory it is possible to achieve 1.2GHz on a 7 nanometer process, which will reduce the delay to about 3.3 nanoseconds (yes, I did not try ASAP7 in OpenLane, the project is too controversial and I was not sure if it would give realistic results) I think we are now approaching the point where a software-defined interface is becoming too slow for line rate networking or high speed control loops. All GDSII layouts and surfer waveforms could be found in repo

If someone is interested, I'd love to get feedback from the community about the architecture, as there are still some problems with synchronization in the chip, which can cause power consumption to suddenly jump when the chip is running, which leads to the chip going into reboot mode or maybe someone knows a more elegant method for handling the summation of large windows in Chisel.

Comments

NotJustBinary•51m ago

To add some more technical context for those interested, the most challenging part was the 6 level binary adder tree. In Chisel, it’s easy to write reduceTree(_ +& _), but at 130nm, the routing congestion becomes real when you're trying to meet a 10ns period with 86k cells. I ended up manually inserting pipeline registers between the 3rd and 4th levels of the tree to balance the slack. I’m also curious about the Hold Violations I encountered. OpenROAD handled them by inserting massive amounts of buffer padding, which is why the utilization is around 39% despite the logic being quite dense. Has anyone here had experience balancing area vs hold-slack for high-speed dataflow like this?

Show HN: Stoneforge – Open-source orchestration for parallel AI coding agents

ChatGPT vs. MOSQUITO Trolley Problem [YouTube] [video]

Attempted Hack of Water Treatment Plant in 2021 [pdf]

Mac Studio 512GB RAM Option Disappears Amid Global DRAM Shortage

Cluely Retracts June 2025 Revenue Statement

Auto update and visualize your AI chat context

A family need transformed into a simple learning tool

Show HN: Kybernis – Prevent AI agents from executing the same action twice

Triumph of the toons: how animation came to rule the box office

What Happens When We Die

How Legal Punishment Affects Crime: Law's Punitive Behavioral Mechanisms (2025)

Dereks at Work: what would it mean for an AI agent to be "accountable"?

Show HN: SafeAppeals – Cursor for Documents

Jj v0.39.0 Released

As AI Turns Prevalent, UI Becomes Irrelevant

Show HN: FlowLessAI – NPM I -g vibe-auditor – AI audits your codebase

Trajectly – deterministic regression tests for AI agents

Are You Noticing This?

Snapdragon ARM laptop overtakes Intel's flagship Panther Lake in benchmarks

Sub-10-Second Database Boot on Kubernetes with Full Isolation

United Airlines can permanently ban passengers who don't wear headphones

Why Does Child Care Seem Less Affordable Than Ever

10–97% in nine minutes: BYD presents second generation of Blade Battery

Sam Altman Admits OpenAI Can't Control Pentagon's Use of AI

Show HN: I built an AI exam prep platform for AWS certs after failing one myself

Documentation Is a Message in a Bottle

Dcsctp: An SCTP Implementation for WebRTC Data Channels in Rust

Ask HN: How do you keep AI coding agents aligned with your codebase standards?

Ask HN: Do You Enjoy Your Career in Tech Nowadays?

You can just register new holidays