# LLM-Quantization **Repository Path**: cy1227/llm-quantization ## Basic Information - **Project Name**: LLM-Quantization - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-01-27 - **Last Updated**: 2026-01-27 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # LLM-Quantization 记录量化LLM中的总结。 配套文章: - [逐层推理技术](https://mp.weixin.qq.com/s/dHLg_aX7hvuna-QI6Rhh-w) - [量化和推理入门例子](https://mp.weixin.qq.com/s/fkO1CkFkS3o0JuzVJ3G_pg) - [Deepseek-R1推理溢出问题](https://mp.weixin.qq.com/s/wdBLvvwFs0NCHP_0DCNBXA) - [深度解析Qwen-2.5-VL-7B-Instruct量化](https://mp.weixin.qq.com/s/72lsiteZnVUvbeZJbkfr_Q) - [QQQ论文解读](https://mp.weixin.qq.com/s/-LX4amLWzsrlloDsLcy0Qg) - [旋转矩阵在量化中的使用](https://mp.weixin.qq.com/s/H-Ytyy5nxiJvsEgUaESq_Q) - [使用quarot量化qwen3并实现在线推理](https://mp.weixin.qq.com/s?__biz=MzkyMDE3MDEwMw==&mid=2247484899&idx=1&sn=681793b6540a85f2d237914b93248bdf&chksm=c0ddb5dae3128d77505b074d5e4d96811550e2eaaa66c45ac981aa5ab973094fc238d9a63874&scene=126&sessionid=1752561818#rd) - [ResQ(ResQuant)适配量化Qwen3模型](http://mp.weixin.qq.com/s?__biz=MzkyMDE3MDEwMw==&mid=2247484899&idx=2&sn=96321932038a4086b318680e89e372b4&chksm=c06b8ccd7cfe1c912237f169c1f4c19c942046a381d00ca6c85f1dcfd2da888b910f996aabf6&scene=126&sessionid=1752561818#rd) - [使用transformers推理w8a8量化后的模型](http://mp.weixin.qq.com/s?__biz=MzkyMDE3MDEwMw==&mid=2247484899&idx=3&sn=c0838def7d4c6d42844ff01715790ab6&chksm=c0f0450d91055ccde6725bc597084c35c9e25e01a63140dda3e650c2284a78dfa59f2c4f7e32&scene=126&sessionid=1752561818#rd) - [使用qwen2的模型推理qwen3](http://mp.weixin.qq.com/s?__biz=MzkyMDE3MDEwMw==&mid=2247484866&idx=1&sn=8e49eb8d17c06ac7f417ae86c40a9cb0&chksm=c0627bd7fb14220c2b48dc096825f38775fb0ff3372b459edb0fa1009543dc05b292000c0cea&scene=126&sessionid=1752561818#rd) - [quarot旋转的最佳实践](http://mp.weixin.qq.com/s?__biz=MzkyMDE3MDEwMw==&mid=2247484866&idx=2&sn=49f815806891a2f1c6ff6d7f52aa3d64&chksm=c042936e5cd54d8ebf9809ca5422703520b9fe4ace6b542ac76ddfa06ade7dda495818bdd5ad&scene=126&sessionid=1752561818#rd) - [sageattention3和gpt-oss使用的mxfp4是什么?](https://mp.weixin.qq.com/s/WVwxCDy8uQ6RarA3olveVQ) - [LLM中使用sageattention](https://mp.weixin.qq.com/s/V7dtPpOubFN1ZTvoRQRk2g) - [使用旋转变换处理Qwen3-VL的LLM](https://zhuanlan.zhihu.com/p/1974880598964339199) - [对gpt_oss模型进行quarot旋转变换](https://mp.weixin.qq.com/s/jN1QNP0By9zz4df_96ZmRg) - [为什么nvfp4量化不能使用hadamard变换?](https://mp.weixin.qq.com/s/JUbpK8dtq0PQM9vul1DSHg) - [为什么mxfp4使用hadamard变换有效而nvfp4不行?](https://mp.weixin.qq.com/s/Th5xHwGXsqgXz3C2xQXDNA) - [nvfp4和mxfp4应该怎么量化?](https://mp.weixin.qq.com/s/DztFrth3Mrleg2-fvBrelA)