Chimera is a single licensable core—AXI-compliant, scalable from 1 to 864 TOPS. The unified pipeline handles matrix, vector, and scalar operations in one execution stream. No partitioning. No split toolchains. When the graph compiler can't handle an operator, C++ gets you out. Built to outlast your product lifecycle.

The Challenge
The Problem
Separate NPU, DSP, CPU. Multiple vendors. Months wiring them together.

Separate NPU, DSP, and CPU components that must be integrated, debugged, and maintained independently.
Each processor requires its own compiler and debugger. AI workloads partitioned across cores.
Hardware optimized for last year's models. New operators require silicon updates.
The Quadric Approach
Single Core. 100% C++ Programmable. Single Binary.

Matrix, vector, and scalar operations in one execution pipeline. No partitioning.
New operators added via C++ kernels after deployment. Never blocked by silicon.
One codestream, one toolchain, one debug environment. ONNX and C++ merge seamlessly.
Why Chimera
Simplify your SoC design and speed up porting of new AI models
Quadric's solution enables hardware developers to instantiate a single core that can handle an entire AI/ML workload plus the typical digital signal processor functions and signal conditioning workloads often intermixed with inference functions. Dealing with a single core drastically simplifies hardware integration and eases performance optimization. System design tasks such as profiling memory usage and estimating system power consumption are greatly simplified.
Quadric's Chimera GPNPU architecture dramatically simplifies software development since matrix, vector, and control code can all be handled in a single code stream. Graph code from the common training toolsets (TensorFlow, PyTorch, ONNX formats) is compiled by the Quadric SDK and can be merged with signal processing code written in C++, all compiled into a single code stream running on a single processor core. The entire subsystem can be debugged in a single debug console.
A Chimera GPNPU can run any AI/ML graph that can be captured in ONNX, and anything written in C++. This is incredibly powerful since SoC developers can quickly write code to implement new neural network operators and libraries long after the SoC has been taped out. This eliminates fear of the unknown and dramatically increases a chip's useful life. As ML models continue to evolve, the payoff from this unified architecture helps future-proof chip design cycles.
How It Works
Accelerator-level performance with full processor flexibility
Designed from the ground up to address the constantly evolving AI inference deployment challenges facing system on chip (SoC) developers, the Chimera GPNPU family has a simple yet powerful architecture with demonstrated improved matrix-computation performance over the traditional approach.
Matrix, vector and scalar code in one execution pipeline. No partitioning required.
Continuously optimize performance throughout a device's lifecycle via software updates.
Runs classic backbones, Transformers, LLMs, and networks not yet invented.
Chimera GPNPU Block Diagram
A hybrid Von Neumann + 2D SIMD architecture that unifies matrix, vector, and scalar operations in a single execution pipeline

The Chimera GPNPU is entirely driven by code, empowering developers to continuously optimize the performance of their models and algorithms throughout the device's lifecycle. That's why it's ideal to run classic backbone networks, today's newest Transformers and Large Language Models, and whatever new networks are invented tomorrow.
Modern System-on-Chip architectures deploy complex algorithms that mix traditional C++ based code with newly emerging and fast-changing machine learning inference code. This combination is found in numerous chip subsystems, most prominently in vision and imaging subsystems, radar and lidar processing, communications baseband subsystems, and a variety of other data-rich processing pipelines.
Unlike heterogeneous alternatives requiring splitting AI/ML graph execution and tuning performance across two or three heterogeneous cores, the Chimera GPNPU operates as a single software-controlled core, allowing for simple expression of complex parallel workloads.
Technical Specifications
A hybrid Von Neumann + 2D SIMD architecture optimized for AI/ML inference
Product Portfolio
Spanning from single-core QC Nano to 8-way QC-Multi clusters. Fully synthesizable for any process technology.
Performance scaling across process nodes and configurations
The Chimera QC processor family spans a wide range of performance requirements. As a fully synthesizable processor, you can implement a Chimera IP core in any process technology, from older nodes to the most advanced technologies. There is a Chimera processor that meets your performance goals for high-volume end applications including mobile devices, digital home applications, automotive and network edge compute systems.
Power Efficiency
ML inference is a data & memory movement optimization problem, not a compute efficiency problem.
ML/AI inference solutions are most often performance- and power-dissipation-limited by memory system bandwidth utilization. With most state-of-the-art AI models having millions or billions of parameters, fitting an entire model into on-chip memory within an advanced System-on-a-Chip is generally not possible. Therefore, smart management of available on-chip data storage of both weights and activations is a prerequisite to achieving high efficiency.
Key Insight: Compiler optimizations that keep data resident in the Register File or LRM yield significant power savings. The Chimera processor family solves memory management limitations by being fully programmable and powered by compiler-driven DMA management.
Chimera Graph Compiler (CGC) manages data movement across the memory hierarchy

Many second-generation NPU accelerators are hardwired finite state machines (FSMs) that offload several performance intensive building-block AI operators. These FSM solutions deliver high efficiency only if the ultimate network does not waver from the limited scope of operators hard-coded into the silicon. A FSM solution does not allow for future fine-tuning of memory management strategies as network workloads evolve.
Technology Comparison
Understanding the key differences between traditional NPUs and General Purpose NPUs
Get the datasheet. Talk to our architects. See the benchmarks.