Google has unveiled a new algorithm that significantly reduces memory usage, triggering a broad selloff across memory-related stocks.
According to Google, its new TurboQuantlp1 p algorithm can reduce LLM memory requirements by up to 6×, while boosting computation speed by 8×.
However, LLMs rely on two types of memory during operation:
Weights (which store model parameters)
KV Cache (used during inference)
TurboQuant only optimizes the KV Cache, not the model weights.
More importantly, lower memory usage could actually accelerate demand growth, not reduce it.
For example, if 128GB of memory previously handled 10,000 tokens, it can now process 60,000 tokens faster. This enables companies to build more complex and longer-context AI applications.
As a result, instead of cutting purchases, enterprises are likely to buy more chips to support these advanced workloads.
Additionally, models that once required expensive servers may now run on smartphones and PCs, potentially triggering a new wave of memory upgrades rather than reducing demand.
@TigerPM @TigerStars @TigerObserver @Daily_Discussion @Tiger_comments
Comments