From a PC manufacturer perspective, the important part about including an NPU is that you can slap an “AI” sticker on your marketing. This is regardless of whether it has any actual use cases or not.
There’s definitely a large slice of “AI” features shipping now just to excite shareholders and serve no actual function to the user.
I have to wonder if NPUs are just going to eventually become a normal part of the instruction set.
When SIMD was first becoming a thing, it was advertised as accelerating “multimedia,” as that was the hot buzzword of the 1990s. Now, SIMD instructions are used everywhere, any place there is a benefit from processing an array of values in parallel.
I could see NPUs becoming the same. Developers start using NPU instructions, and the compiler can “NPU-ify” scalar code when it thinks it’s appropriate.
NPUs are advertised for “AI,” but they’re really just a specialized math coprocessor. I don’t really see this as a bad thing to have. Surely there are plenty of other uses.
The problem that (local) ai has at the current moment is that its not just a single type of compute, and because of that, breaks usefulness in the pool of what you can do with it.
on the Surface level, “AI” is a mixture of what is essentially FP16, FP8, and INT8 accelerators, and different implementations have been using different ones. NPUs are basically INT8 only, while GPU intensive ones are FP based, making them not inherently cross compatible.
It forces devs to either think of the NPUs themselves with small things (e.g background blur with camera) as there isn’t any consumer level chip with a massive INT8 co processor except for the PS5 Pro (300 TOPS INT8, which compared to laptop cpus, have a 50 TOPs, so on a completely different league, PS5 Pro uses it to upscale)
Microsoft, Google, and Apple are all quietly integrating NPUs into their devices, and implementing the software infrastructure in their operating systems to do on-device classification of content: Windows Recall, Google SafetyCore, and Apple Intelligence. These services are obsequiously marketed as being for your benefit, while all are privacy and surveillance nightmares. When the security breaking features of these systems are mentioned, each company touts convoluted workarounds to justify the tech.
Why would these companies risk rabidly forcing these unwanted, unpopular, insecure, expensive, and unnecessary features on their collective user bases? The real reason is to capture everything you do and store on your device, use the tensor hardware you may or may not even know that you purchased to analyze the data locally, then export and sell that “anonymized” information to advertisers and the government. All while cryptographically tying the data to your device, and the device to you, for “security”. This enables mass surveillance, digital rights management, and targeted advertising on a scale and depth previously unseen. Who needs a backdoor or a quantum computer to break consumer-grade encryption when you can just locally record everything everyone does and analyze it automatically at the hardware level?
Each of these providers is already desperate to scan, analyze, and classify your content:
Microsoft has been caught using your stored passwords to decrypt archives uploaded to OneDrive.
Apple developed forced client side scanning for CSAM before backlash shut it down. They already locally scan your photos with a machine learning classification algorithm whether you like it or not. You can’t turn it off.
Google recently implemented local content scanning with SafetyCore to “protect you from unwanted content like spam”. Then why is it scanning your photo library?
I would rather saw off my own nuts with a rusty spork before willfully purchasing a device with an integrated NPU. I fear that in the next 5-10 years, you won’t be able to avoid them. We are paying for the edge hardware being used for our own unwilling surveillance. Then, our tax dollars are paid to these tech companies to purchase the data!
Do you trust the rising fascist regimes and their tech lackeys in America and the UK to use this power morally and responsibly?
Do you really believe that these features that you didn’t ask for, that you cannot disable, and are baked directly into the hardware, are for your benefit?
Do you really believe that these features that you didn’t ask for, that you cannot disable, and are baked directly into the hardware, are for your benefit?
They’re not selling chips. They’re selling stock.
Microsoft, Google, and Apple are all quietly integrating NPUs into their devices, and implementing the software infrastructure in their operating systems to do on-device classification of content:
Too late. Apple has had an NPU in their phones since 2017. It’s been standard on flagship android devices since 2020 with various amd and intel processors starting around the same time.
I’m not here to defend LLMs or AI feature but for a comment to start with such a misinformed assessment of the state of reality reminds me of someone spouting off about chemtrails and weather machines.
At least NPUs actually exist.
If someone wants to avoid this stuff they are going to need to pick an open source platform that does not use these processor features.
Bring on the games that require AI
I’m pretty sure most video games require AI already. I stuggle to name ones that don’t use AI. Some that I can think of are snake, two-player pong and two-player chess.
Neural nets, on the other hand - I find it hard to imagine running a NN locally without impacting the game’s performance.
I was excited when I learned that a new business laptop had a removable battery, a decent graphics card, and 1tb storage standard. I planned to buy it used for a fraction of its current price, 2k usd new, once some doofus got bored of underusing their machine and decided to trade up. Then I saw the AI chip and my desire wavered. You think there will ever be workarounds to make use of this garbage? I really want that removable battery.
That NPU is a math coprocessor. It can be very useful. It’s like a cuda core.
A CUDA core is just a vector processor like every GPUs since the late 90s has been made of, but with a different name so it sounds special. It doesn’t just run CUDA, it runs everything else a GPU has traditionally been for, too, and that was stuff people were doing before CUDA was introduced. There are lots of tasks that require the same sequence of operations to be applied to groups of 32 numbers.
An NPU is a much more specialised piece of hardware, and it’s only really neural network training and inference that it can help with. There aren’t many tasks that require one operation to be applied over and over to groups of hundreds of numbers. Most people aren’t finding that they’re spending lots of time waiting for neural network inference or draining their batteries doing neural network inference, so making it go faster and use less power isn’t a good use of their money compared to making their computer better at the things they do actually do.
Yes, i agree and if it must run neural network it could do it on GPU, NPU is not necesary.
Someone with the expertise should correct me if I am wrong; it’s been 4-5 years since I learnt about NPUs during my internship so I am very rusty:
You don’t even need a GPU if all you want to do is to run - i.e. perform inference with - a neural network (abbreviating it to NN). Just a CPU would do if the NN is sufficiently lightweight. The GPU is only needed to speed up the training of NNs.
The thing is, the CPU is a general-purpose processor, so it won’t be able run the NN optimally / as efficiently as possible. Imagine you want to do something that requires the NN and as a result, you can’t do anything else on your phone / laptop (it won’t be problem for desktops with GPUs though).
Where NPU really shines is when there are performance constraints on the model: when it has to be fast (to be specific: have real-time speed), lightweight and memory efficient. Use cases include mobile computing and IoT.
In fact, there’s news about live translation on Apple AirPod. I think this may be the perfect scenario for using NPUs - ideally housed within the earphones directly but if not, within a phone.
Disclaimer: I am only familiar with NPUs in the context of “old-school” convolutional neural networks (boy, tech moves so quickly). I am not familiar with NPUs for transformers - and LLMs by extension - but I won’t be surprised if NPUs have been adapted to work with them.
Yes, that’s right, NN can be run on either the CPU or GPU, but a GPU isn’t required. My point is that an NPU isn’t very useful for the average laptop user right now, while a GPU (which also accelerates neural network execution) is more versatile. In my opinion, it’s better to increase the GPU’s power and run NN on it when necessary than to add an NPU that’s useless 95% of the time.
I expect we’re eventually going to start seeing AI in more sensible places and these NPUs will prove useful. Hopefully soon the bubble will burst and we’ll stop seeing it crammed in everywhere, then it’ll start being used where it actually improves a product rather than wherever an LLM will fit.
i’m upvoting this comment from my internet enabled toaster.
I mean, even if the NPU space can’t be replaced by more useful components easily or cheaply, just removing it is sure to save a small amount of power which equates to a possibly not so small amount of heat that needs to be dissipated, which takes not insignificant amounts of and/or requires slowing the system down. Additionally, the pathways likely could be placed to create less interference with each other and direct heat transfer which is likely to mean more stability overall.
Of course without a comparable processor without the NPU to compare to, these are really difficult things to quantify, but are true of nearly all compact chips on power sensitive platforms.
No, it will not save any power at all. The power is only consumed during switching, so when some module is properly clock gated, it will not consume any power. There are many parts of the chip that are dark, for example the full CPU core could be disabled for various reasons and it does not affect power consumption when it’s dark. Maybe you know the Steam Deck, it is a battery operated device with the best power efficiency in its class. But what people don’t know is that it has more than 20% of its chip area disabled, as it relates to stereoscopic displays, because the same exact chip is also used by some AR or VR goggles, I forgot the name.
Also in general the modern chips are more limited by thermal rather than space. So realistically, even if you remove the NPU, you won’t be able to place anything high power there anyways, maybe you can put a couple hundred K of sram for cache in its place, but it won’t matter much in the end.
Am curious if NPUS can be used in Ollama or Local LLMS,if it can’t then its completely useless also useless if you don’t use AI at all.