# ai-infra-learning
**Repository Path**: cnbubblefish/ai-infra-learning
## Basic Information
- **Project Name**: ai-infra-learning
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-11-16
- **Last Updated**: 2025-11-16
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
## AI Infra 学习会议
| 主题 | 时间 | 预习资料 | 录频 | 文档 | 问题反馈 & 课后思考题
| --- | --- | --- | --- | --- | --- |
| vLLM Quickstart | 2025-05-11 | [Doc: vLLM](https://docs.vllm.ai/en/latest/index.html) | [AI INFRA 学习 01 - LLM 全景图介绍/vLLM 快速入门](https://www.bilibili.com/video/BV1T2EGzLEHi)| [01-vllm-quickstart](https://github.com/cr7258/ai-infra-learning/blob/main/lesson/01-vllm-quickstart.md) | |
| PagedAttention| 2025-05-25 | [Blog: vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention](https://blog.vllm.ai/2023/06/20/vllm.html)
[Paper: Efficient Memory Management for Large Language Model Serving with PagedAttention](https://arxiv.org/pdf/2309.06180)
[Video: Fast LLM Serving with vLLM and PagedAttention](https://www.bilibili.com/video/BV1WUYieQEyL)| [AI INFRA 学习 02 - vLLM PagedAttention 论文精读](https://www.bilibili.com/video/BV1GWjjzfE1b) | [02-pagedattention](https://github.com/cr7258/ai-infra-learning/tree/main/lesson/02-pagedattention)| [02-PagedAttention 问题反馈](https://github.com/cr7258/ai-infra-learning/issues/2) |
| Prefix Caching | 2025-06-08 | [Doc: Automatic Prefix Caching](https://docs.vllm.ai/en/stable/features/automatic_prefix_caching.html)
[Design Doc: Automatic Prefix Caching](https://docs.vllm.ai/en/stable/design/v1/prefix_caching.html)
[Paper: SGLang: Efficient Execution of Structured Language Model Programs](https://arxiv.org/abs/2312.07104) | [AI INFRA 学习 03 - Prefix Caching 原理详解](https://www.bilibili.com/video/BV1jgTRzSEjS) | [03-prefix-caching](https://github.com/cr7258/ai-infra-learning/tree/main/lesson/03-prefix-caching)| |
| Speculative Decoding | 2025-06-22 | [Doc: Speculative Decoding](https://docs.vllm.ai/en/stable/features/spec_decode.html)
[Blog: How Speculative Decoding Boosts vLLM Performance by up to 2.8x](https://blog.vllm.ai/2024/10/17/spec-decode.html)
[Video: Hacker's Guide to Speculative Decoding in VLLM](https://www.youtube.com/watch?v=9wNAgpX6z_4)
[Video: Speculative Decoding in vLLM](https://www.youtube.com/watch?v=eVJBFajJRIU)
[Paper: Accelerating Large Language Model Decoding with Speculative Sampling](https://arxiv.org/abs/2302.01318)
[Paper: Fast Inference from Transformers via Speculative Decoding](https://arxiv.org/abs/2211.17192) | [AI INFRA 学习 04 - Speculative Decoding 实现方案](https://www.bilibili.com/video/BV1Q5KWzQEhn) | [04-speculative-decoding](https://github.com/cr7258/ai-infra-learning/tree/main/lesson/04-speculative-decoding)| |
| Chunked-Prefills | 2025-07-13 | [Doc: vLLM Chunked Prefill](https://docs.vllm.ai/en/stable/configuration/optimization.html#chunked-prefill_1)
[Paper: SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills](https://arxiv.org/abs/2308.16369)
[Paper: DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference](https://arxiv.org/abs/2401.08671)
[Paper: Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve](https://arxiv.org/abs/2403.02310) | [AI INFRA 学习 05 - Chunked-Prefills 分块预填充](https://www.bilibili.com/video/BV1f2uczGEqt) | [05-chunked-prefills](https://github.com/cr7258/ai-infra-learning/tree/main/lesson/05-chunked-prefills) | [05-Chunked-Prefills 问题反馈 & 课后思考题](https://github.com/cr7258/ai-infra-learning/issues/1) |
| Disaggregating Prefill and Decoding | 2025-09-21 | [Doc: Disaggregated Prefilling](https://docs.vllm.ai/en/stable/features/disagg_prefill.html#disaggregated-prefilling-experimental)
[Doc: vLLM Production Stack Disaggregated Prefill](https://docs.vllm.ai/projects/production-stack/en/latest/tutorials/disagg.html)
[Paper: DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving](https://arxiv.org/abs/2401.09670)
[Paper: Splitwise: Efficient generative LLM inference using phase splitting](https://arxiv.org/abs/2311.18677)
[Video: vLLM Office Hours - Disaggregated Prefill and KV Cache Storage in vLLM](https://www.youtube.com/watch?v=FPr37jCOvrA) | [AI INFRA 学习 06 - PD 分离推理架构详解](https://www.bilibili.com/video/BV1ZTWAzmEEc) | [06-disaggregating-prefill-and-decoding](https://github.com/cr7258/ai-infra-learning/tree/main/lesson/06-disaggregating-prefill-and-decoding) | [06-PD 分离问题反馈](https://github.com/cr7258/ai-infra-learning/issues/3) |
| LoRA Adapters | | [Doc: LoRA Adapters](https://docs.vllm.ai/en/stable/features/lora.html)
[Paper: LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685) | ||
| Quantization | | | | | |
| Distributed Inference and Serving | | [Doc: Distributed Inference and Serving](https://docs.vllm.ai/en/stable/serving/distributed_serving.html)| || |
## 交流群(加群请备注来意)
## 微信公众号
