Neuromorphic computing usually feels locked behind closed silicon and NDA-heavy toolchains. At AI Tech Inspire, we spotted a project that pushes in the opposite direction: two open neuromorphic processors, Catalyst N1 and N2, designed for feature parity with Intel’s Loihi generations and tested on real FPGAs with a full Python SDK. For developers and researchers who’ve wanted to inspect, tweak, and deploy spiking neural networks with modern tooling, this is worth a closer look.
Fast facts (neutral and concise)
- Two open neuromorphic processors:
Catalyst N1(targets Loihi 1) andCatalyst N2(targets Loihi 2), both validated on FPGA. - Complete Python SDK with three backends: CPU (cycle-accurate), GPU via PyTorch, and FPGA — unified
deploy/step/get_resultAPI. - Licensing: BSL 1.1 (source-available, free for research). Built at the University of Aberdeen.
- N1 highlights: 128 cores; fixed
CUBA LIFneuron model; 1,024 neurons/core; ~131K synapses/core (CSR); 24-bit state precision; microcode learning engine; graded spikes (8-bit); delays 0–63; 3× RV32IMF embedded CPUs; open design. - N2 highlights: programmable neurons (5 shipped models:
CUBA LIF,Izhikevich,ALIF,Sigma-Delta,Resonate-and-Fire); 4 spike payload formats (0/8/16/24-bit); weight precision 1/2/4/8/16-bit; spike traces ×5; synapse formats ×4 (+ convolutional); reward traces (persistent), homeostasis, observability (counters, 25-var probes, energy metering); 1,024 neurons/core; open design. - Comparison to Loihi: N1 matches Loihi 1’s functional features and exceeds on state precision, delays, and graded spikes. N2 matches or exceeds Loihi 2’s programmable features, with a smaller per-core neuron count due to FPGA BRAM limits.
- Benchmark (Spiking Heidelberg Digits): 85.9% float accuracy; 85.4% at 16-bit quantization; 0.4% quantization loss; 1.14M synapses; trained with surrogate gradients (fast sigmoid), AdamW, 300 epochs; surpasses reported 83.2% and 83.4% baselines.
- FPGA validation: N1 — 25 RTL testbenches, 98 scenarios, zero failures (simulation). N2 — 28/28 FPGA integration tests on AWS F2 (VU47P) at 62.5 MHz + 9 RTL-level tests (163K+ spikes) with zero mismatches; 16-core instance; dual-clock CDC (62.5 MHz neuromorphic / 250 MHz PCIe).
- SDK scale-up from N1 to N2: tests 168 → 3,091; modules 14 → 88; neuron models 1 → 5; weight precisions 1 → 5; ~8K → ~52K lines of Python.
Why this matters: open parity with Loihi, without the black box
Most developers who are curious about spiking neural networks hit the same wall: limited hardware access and opaque toolchains. Catalyst N1 and N2 flip that script. They bring feature parity with Loihi’s two generations into a source-available stack, with a documented architecture, FPGA validation, and a Python SDK that looks familiar to anyone comfortable with PyTorch.
Key takeaway: A research-ready, inspectable path to Loihi-class neuromorphic features — with tests and backends you can run today.
For those exploring event-driven inference at the edge, temporal processing, or energy-aware learning, this lowers the barrier to building and benchmarking real SNN workloads. And because the SDK spans CPU, GPU, and FPGA, iteration can start on a laptop and graduate to hardware without rewriting pipelines.
N1 in brief: fixed-function, but full-featured and open
Catalyst N1 mirrors Loihi 1’s fixed pipeline but adds a few thoughtful touches. A 128-core design with 1,024 neurons per core and ~131K CSR synapses/core supports a classic CUBA LIF neuron model. It includes a microcode learning engine (16 registers, 14 ops), compartment trees (4 join ops), and two spike traces (x1, x2). Notably, it supports graded spikes (8-bit) and a slightly wider delay range (0–63). State precision is 24-bit versus Loihi 1’s 23-bit.
That makes N1 a clean platform for:
- Teaching and prototyping SNN algorithms where reproducibility and visibility matter.
- Comparative studies against Loihi 1-class features, with the option to inspect every stage.
- Exploring graded spike pipelines that anticipate programmable-neuron-era workflows.
In short, think of N1 as a practical, open baseline for fixed-function SNNs — with the paperwork and FPGA validation to back it up.
N2: programmable neurons and the “shader moment”
Catalyst N2 represents the pivot many have waited for: moving from fixed LIF pipelines to programmable neurons — much like the jump from fixed-function GPUs to programmable shaders. Out of the box, it ships with five models (CUBA LIF, Izhikevich, ALIF, Sigma-Delta, Resonate-and-Fire), and supports multiple spike payload formats (0/8/16/24-bit) and weight precisions from 1 to 16 bit. There are four synapse formats plus a convolutional option, plasticity at the synapse-group level, persistent reward traces, and epoch-based homeostasis. Observability spans counters, 25-variable probes, and energy metering.
Two aspects stand out for practitioners:
- Experiment velocity. Switching neuron models, precisions, and spike encodings enables quick “what works best?” loops for a target task or device budget.
- Deployment realism. Support for low-bit weights (even 1–2 bit) invites aggressive memory/power trade-offs. Paired with event-driven computation, this points toward efficient edge inference.
The one caveat is scale: 1,024 neurons/core versus Loihi 2’s 8,192. The project attributes this to FPGA BRAM constraints rather than an architectural limit — a fair trade-off given the focus on openness and verifiability. If the design maps to an ASIC in the future, expect headroom.
Results that invite replication: SHD at 85.9% and tiny quantization loss
On the Spiking Heidelberg Digits task, the reported metrics are eye-catching: 85.9% float accuracy and 85.4% with 16-bit quantization — just 0.4% loss. The network uses a 700 → 768 (recurrent) → 20 topology with 1.14M synapses and is trained via surrogate gradients (fast sigmoid) and AdamW over 300 epochs. The results outpace referenced baselines (83.2% and 83.4%).
For developers, the interesting angle is how close the quantized model tracks float. With programmable neurons and flexible precisions, N2 gives a playground to validate assumptions about information flow in low-bit SNNs — and to test where precision actually matters. Because there’s a GPU backend via PyTorch, many will prototype locally, profile with CUDA, and then push to FPGA when ready.
Validation depth and the SDK pipeline
Confidence often comes from test coverage, not just claims. N1 reports 25 RTL testbenches, 98 scenarios, zero failures in simulation. N2 extends this to 28/28 FPGA integration tests on AWS F2 (Xilinx VU47P) at 62.5 MHz, plus nine RTL-level tests that generated 163K+ spikes with zero mismatches. A 16-core instance and dual-clock CDC (62.5 MHz neuromorphic / 250 MHz PCIe) round out the hardware story.
The SDK has also grown significantly: from 168 to 3,091 tests; 14 to 88 modules; and ~8K to ~52K lines of Python. Crucially, the backends share the same API. A typical flow looks like:
# Pseudocode (SDK style)
net = build_spiking_net(models=["ALIF", "Izhikevich"], weight_bits=4)
handle = backend.deploy(net) # CPU, GPU (PyTorch), or FPGA
for t in range(T):
handle.step(inputs[t])
outputs = handle.get_result()
No need to juggle separate toolchains per backend — a small yet meaningful quality-of-life improvement. If you enjoy quick iteration, this is the kind of friction reducer that keeps experiments moving.
Where to point this hardware: practical use cases
- Temporal sensing at the edge. Audio keywords, anomaly detection, and low-power wake-word systems can benefit from event-driven compute and low-bit weights.
- Event camera pipelines. With convolutional synapses and programmable neurons, it’s natural to explore sparse spatiotemporal filters for DVS data.
- On-device learning signals. Reward traces and homeostasis support sketch out reinforcement-style adaptation and stability mechanisms.
- Robotics and control. Deterministic timing, counters, and energy metering invite closed-loop control experiments that care about latency and power.
Because the design is open and the license is BSL 1.1 (free for research), labs and startups can validate ideas without waiting for closed hardware access. That’s a big deal for reproducibility and for teaching.
Questions worth exploring
- How do different neuron models (
ALIFvs.Izhikevich) trade off accuracy and power for a given task? - What’s the sweet spot between 1–2 bit weights and model robustness on noisy real-world streams?
- Can the convolutional synapse mode scale to larger event-driven CNNs without running into routing or memory pressure?
- What performance and energy profiles emerge when porting surrogate-trained models from PyTorch to FPGA via the SDK?
Bottom line
Open neuromorphic design, feature parity with Loihi-class hardware, quantization-friendly accuracy, and a test-heavy SDK make Catalyst N1/N2 a compelling platform for spiking research and prototyping. It’s not a marketing promise — there’s RTL, FPGA timing, and thousands of tests behind it.
If you’ve been waiting to tinker with programmable neurons and low-bit SNNs without a black box, this is a rare chance to press Enter and actually run it.
At AI Tech Inspire, the takeaway is simple: this project turns neuromorphic curiosity into a practical engineering exercise. Whether you’re aiming for an energy-aware keyword spotter, a DVS pipeline, or an educational lab that needs transparent hardware, these processors and their SDK look ready to explore.
Recommended Resources
As an Amazon Associate, I earn from qualifying purchases.