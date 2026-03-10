In collaboration with NXP, Gateworks is releasing a new M.2 AI Acceleration Card, the GW16168, with NXP's passively cooled Discrete NPU (DNPU), the Ara240. Designed, tested and assembled in the USA, the GW16168 is built to industrial-grade standards with an emphasis on their "Decoupled AI Architecture" philosophy.

Using and deploying your own AI is a costly process, whether you are talking about time, manpower or even financial costs. Incorporating newer AI acceleration hardware often entails re-thinking your entire hardware stack, from Single Board Computers (SBCs) to custom cooling systems. This is typically, with current market options, a costly and complex process. A process which is exacerbated by the frequent need to replace or update hardware. Gateworks have spotted this gap in the market and introduced the GW16168, to remove some of the hurdles that engineers and businesses tackle when deciding to run in-house AI.

"Decoupled AI Architecture" based design philosophy.

"We are ending the era where you must choose your entire compute platform based on the AI chip". A powerful sentiment from the team at Gateworks. The design decisions from the engineers at Gateworks have been made to decouple their M.2 card from specific hardware or environmental constraints; from the power profile, the M.2 2280 M-Key form factor and passively cooled Ara240 DNPU.

So what are these changes, and why do they matter?

Normally, there are not many options for hardware to support high-performance AI. You were forced to choose between repurposed GPUs that required a full system redesign or running inference directly on embedded CPUs and NPUs at the cost of severe thermal limits and high latency. Earlier USB and M.2 accelerators offered a more modular path at the [large] cost of limited compute and memory capacity. This left developers with an expensive balancing act where they must consider performance, power consumption and flexibility, often sacrificing one or more in the process.

Gateworks' new M.2 card revives the modularity of earlier M.2 accelerators while significantly advancing the underlying technology. Future upgrades and revisions no longer require replacing otherwise capable industrial SBCs. For example, dedicated AI acceleration can be added directly to platforms such as the i.MX 8M Plus or i.MX 95 applications processors via the M.2 interface. Typically, these same SBC systems would reach 100% utilization when running inference workloads, but not with the GW16168. With its 16GB of LPDDR4 memory, the GW16168 allows these tasks to be offloaded to the card, freeing the host CPU to focus on system logic and I/O. As an added benefit, the common out-of-memory errors when trying to run Vision transformers or LLMs on standard edge modules are no longer an issue.

Through collaboration with NXP, the GW16168 is backed by the mature Ara240 SDK ecosystem, offering a full compiler toolchain, support for TensorFlow, PyTorch and ONNX, and integrated model-conversion utilities that simplify the transition from existing AI models to edge deployment. "The ARA SDK & Compiler could be looked at as an abstraction layer", said by the CTO at Gateworks. He continues, "It acts as the middleware between high-level AI frameworks like PyTorch or TensorFlow and the proprietary NXP Ara hardware. The SDK handles model conversion, quantization, graph optimization, etc., to simplify software development". This is the key behind the modularity of the GW16168, however, even with a working full stack, there might be other reasons to consider the GW16168.

"Gateworks' GW16168 illustrates exactly why decoupled AI architectures are the future of edge computing. By combining NXP's Ara240 DNPU with Gateworks' industrial-grade design, customers can scale AI performance without redesigning their entire hardware platform. This brings flexibility, longevity and cost efficiency to real-world AI deployments." Said by Ravi Annavajjhala, Vice President and General Manager, Neural Processing Units, NXP Semiconductors

Thanks to the GW16168, you might just lose your best fans.

One of the biggest challenges in AI deployment is thermal management. High-performance AI systems can draw significant power, with demand often spiking during complex tensor operations. As a result, thermals frequently become the limiting factor, especially in

space-constrained industrial designs where advanced cooling solutions can quickly become costly and impractical. Gateworks has designed its M.2 card with this in mind using a passively cooled Ara240 DNPU together with carefully engineered power circuitry, to enable a typical power consumption of 6.6 W. This lower power envelope reduces heat build-up, enabling reliable operation in sealed, fanless environments while maintaining thermal characteristics aligned with industrial-grade AI hardware. Gateworkalso reports a decade-long lifespan for the GW16168 modules, with advanced thermal management reducing wear on the modules.

The Performance. The GW16168 is designed to enhance overall capability rather than simply rebalance it. Delivering up to 40 eTOPS, the module reaches what can reasonably be described as "GPU-class" AI performance within a far smaller power envelope. Rather than following the traditional limitations of current or legacy edge accelerators, the design focuses on sustained throughput, supported by ruggedized power delivery that maintains stability even during peak inference loads approaching 40 TOPS.

The GW16168 and the associated development kit will be available for purchase through DigiKey, Braemac, RoundSolutions and Avnet. Shipping late May.

Photo - https://mma.prnewswire.com/media/2929407/EW_2026_Gateworks_NXP.jpg