TurboQuant compress "context" KV cache by 4x , nowhere near to 6x. It doesn't reduces the memory needed to load the entire LLM. A smaller useable (not just playable ) LLM needs 128GB or more of ram. Cloud Models are typically above 1Tb. Of which, the memory requirements for context is anywhere from 16kb to 1024kb depending on the configuration The compression is supposedly linear, so it reduces memory requirements for context to 1/4 of current. Anybody who works in technology will tell you this, The amount of data will increase based on the capacity available. Better compression just means more context, not less. The other thing to note is that Micron are not selling DDR5 rams, so spot price of DDR5!which is under pressure from new China manufacturers are under pressure. Micron sells HBM4 , to Nvidia, AMD, various Hyperscalers. Doesn't matter whose tensor core, cuda, rocm, arc you use, they still need memory to compete with other brands..
Modified in.04-01 00:09
Mag 7 Forced Down Again?! Start of Tech Winter?
The market's critical new reality: as oil surges, tech valuations inevitably sink. Investors are now fiercely debating whether this volatility is just a temporary adjustment or the beginning of a longer-term downturn.
How do you view Mag 7's trend?
Microsoft loses over 20% YTD.
When and where will be entry zone?
Which stock is oversold or still overvalued now?
Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.
Comments