---
title: "NVIDIA's RTX Spark and the case for local inference"
date: 2026-06-26
url: https://remiam.co.uk/notes/rtx-spark-and-local-inference
tags: [Hardware, AI, Silicon]
read_time_minutes: 5
description: "Remiam's read on NVIDIA's RTX Spark superchip from Computex 2026, and what unified memory on the PC means for running AI models locally rather than in the cloud."
---

# NVIDIA's RTX Spark and the case for local inference

*Published 2026-06-26 · 5 min read · by Liam (Remiam)*

NVIDIA put 128GB of unified memory in a laptop and called it an AI PC. The interesting part is not the petaflop, it is where inference starts to run.

At Computex in Taipei in early June, NVIDIA showed the RTX Spark, its first proper move into the consumer PC silicon that runs the operating system, not just the graphics. Built with MediaTek under the codename N1X, it pairs a twenty-core Arm CPU with a Blackwell GPU on one package, rated at a petaflop of FP4 AI performance. The number that matters is quieter: up to 128GB of unified memory at 300GB/s, shared between the CPU and GPU on a single chip. RTX Spark laptops are due in the autumn, with HP and ASUS already showing machines.

A petaflop in a 14mm chassis is a good headline. Unified memory is the part that changes how you would build software for it.

## What unified memory actually removes

On a normal PC the GPU has its own memory, separate from system RAM. To run a model you copy weights across the bus into VRAM, and the size of that VRAM is a hard ceiling on how big a model you can hold. It is why a capable consumer card with 16GB cannot load a model that a data-centre part with 80GB runs without thinking. Unified memory puts the CPU and GPU on the same pool, so there is no copy and no second ceiling. With 128GB on tap, a laptop can keep a model in the tens of billions of parameters resident and answer from it directly.

That is a different proposition from the AI PC marketing of the last two years, which was mostly a small neural unit doing background tasks. This is enough headroom to run a serious model on the device in front of you.

## The studio read

For most of the systems we build, the question of where a model runs has had one answer: a cloud endpoint, billed per token, with the round trip and the data-handling questions that come with it. Hardware like this makes local inference a real option for the first time, and it is worth being precise about when that option is the better one.

- Latency: an on-device model answers without a network hop, which matters for anything interactive, a kiosk, an installation, a tool that responds as you type.
- Data handling: if the prompt never leaves the machine, a whole class of privacy and compliance work gets simpler. For healthcare and property clients that is not a nice-to-have.
- Cost shape: local inference trades a per-token bill for a fixed hardware cost. For steady, high-volume workloads the maths can favour the device; for spiky or occasional use the cloud still wins.
- Operational load: a cloud endpoint is somebody else updating the model and keeping it up. On-device means you own the update path, the version drift and the support call when a machine falls behind.

So this is not a case for moving everything on-device. It is a case for stopping treating the cloud as the default. The right architecture for a lot of what we ship is a split: a small, fast model resident on the machine for the common path, and a larger cloud model held in reserve for the hard questions. Unified-memory hardware finally makes the local half of that split worth designing for, rather than a compromise you tolerate.

We would not buy a fleet of these on the strength of a keynote. The first machines ship in autumn, the real-world memory bandwidth and the thermals under a sustained load are the things to test, and Arm-on-Windows still has driver edges. But the direction is clear enough to build towards. If you are specifying a system in 2026 that leans on a model, design it so the inference can move. The place it runs should be a decision you get to make, not one the architecture made for you.

## References

1. [Tom's Hardware, NVIDIA unveils RTX Spark Superchip at Computex 2026 (2026)](https://www.tomshardware.com/laptops/nvidia-unveils-rtx-spark-superchip-at-computex-2026-new-platform-promises-to-turn-windows-into-an-agentic-ai-os-with-arm-cpu-blackwell-gpu-and-128gb-unified-memory)
2. [NVIDIA Newsroom, NVIDIA and Microsoft Reinvent Windows PCs for the Age of Personal AI (2026)](https://nvidianews.nvidia.com/news/nvidia-microsoft-windows-pcs-agents-rtx-spark)
3. [CNBC, Nvidia's new PC chips are CEO Huang's bid to own every part of the AI stack (2026)](https://www.cnbc.com/2026/06/02/nvidias-new-pc-chips-are-ceos-bid-to-own-every-part-of-ai-stack.html)
