frontpage.

Porting an INT8 VHDL CNN from Intel Agilex 3 to Lattice Certus-NX

1•smart_coconut•1h ago

We implemented a small INT8 CNN for handwritten digit classification (NIST SD19 subset) in pure VHDL and built it on two different FPGA families: Intel Agilex 3 and Lattice Certus-NX.

The design was originally targeting Agilex 3. We later rebuilt it for a Certus-NX board to see how portable the RTL actually was and what changed in terms of resource usage and timing.

## Model

Input: 128×128 grayscale images streamed over UART from a host PC webcam.

Architecture: - 3×3 conv (8 filters) + pooling - 3×3 conv (12 filters) + pooling - 3×3 conv (16 filters) + pooling - 3×3 conv (24 filters) + pooling - 3×3 conv (32 filters) + pooling - fully connected 512 → 10

All arithmetic is INT8. The design is single-clock and streaming; feature maps are buffered in block RAM between layers.

---

## Platform 1: Intel Agilex 3

Device: A3CY135BM16AE6S Board: Agilex 3 C-Series Development Kit Toolchain: Quartus Prime Pro 25.3

Resource usage: - ALMs: 2,526 / 45,800 (6%) - RAM blocks: 36 / 353 (10%) - DSP blocks: 17 / 184 (9%)

Fmax: 146 MHz

---

## Platform 2: Lattice Certus-NX

Device: LDN2NX-40-7BG196I Board: Cruvi CR00103-03 Toolchain: Radiant 2025.2

Resource usage: - LUT4: 13,757 / 32,256 (42.6%) - MULT9: 66 / 112 (59%) - MULT18: 12 / 56 (21%) - Block RAM: 20 / 84 (24%)

Single clock domain: 273 MHz

The Certus design uses an on-chip PLL (12 → 48 MHz) input adaptation for the board. The Agilex board required an external UART adapter; the Certus board had it integrated.

---

## Porting effort

The RTL is written in plain VHDL without vendor IP. No vendor-specific primitives are instantiated in the CNN datapath.

In practice:

- No DSP or RAM wrapper layer was required. - No changes to arithmetic or pipeline structure were necessary. - No timing constraint rework beyond board-specific clock definitions. - Only board-level adaptations (clocking, UART wiring).

The vendor itself was largely irrelevant for this design. The differences were at the board and toolchain level.

---

## Observations

- On Agilex 3, the design is small relative to the device (single-digit % utilization). - On Certus-NX-40, the same design consumes a significant fraction of LUTs and MULT9 blocks. - Achieved Fmax on Certus-NX is higher in this configuration (273 MHz vs 146 MHz), though the system clocking and board setup differ.

The DSP usage profile differs noticeably: Certus-NX’s MULT9 blocks are heavily used (59%), which constrains scaling the number of parallel MAC units more quickly than on Agilex 3.

For this size of INT8 CNN, portability at the RTL level was straightforward. The limiting factor when moving to the smaller device was resource headroom rather than functional incompatibility.

---

## Question

For those who have moved similar streaming CNN datapaths across vendors: Have you found cases where DSP inference or block RAM inference diverged enough to require structural RTL changes, or does that mostly appear once designs become more deeply pipelined or multi-clock?

WebMCP is available for early preview

Show HN: Analyzing ~10k professional product reviews to calculate a single score

Show HN: NaijaML – Open-Source NLP Toolkit for Nigerian Languages (17MB, CPU+)

Modu – modular feedback boards with AI clustering

Automattic Planned to "Steal Every Single WP Site" from Hosts That Refused Deals

How China became fixated on cloud seeding

MargaUI: A No-Build, Native Tailwind 4 Port of DaisyUI

Show HN: Got VACE working in real-time – 30fps on a 5090

Show HN: An AI agent covering all first-line hotel and Airbnb communications

I benchmarked 4 coding agents on an NP-hard problem I solved 8 years ago

Merz calls on Germans to work more – and draws a withering backlash

Buekorps

My job has turned into this Tim and Eric sketch [video]

The Tension Between Technical and Less-Technical People Because of AI

PostgreSQL 18.2, 17.8, 16.12, 15.16, and 14.21 Released

Show HN: Faux radio website instead of texting MP3s

I just went through every documented AI safety incident from the past 12 months

Ask HN: Why does no one talk about the H-1B problem?

Postman: From API Client to "Everything App"

What should I do if a wait call reports WAIT_ABANDONED?

The Agile Manifesto at 25 – The Most Talked-About Unread Document in Software

Sad news? Linux Mint may soon be shutting down its shorter development cycle

Show HN: We achieved 72.2% issue resolution on SWE-bench Verified using AI teams

Kim Jong Un chooses teen daughter as heir

Show HN: Crashcat – Lightweight 3D physics for JavaScript

Standardizing HLSL

The Archive System (DevLog)

Programmers Spend Their Time

Show HN: Open-Source Inbox-as-a-Service for LLM Agents

Ask HN: Tools to Code Using Voice?