# ai-infra-learning

**Repository Path**: cnbubblefish/ai-infra-learning

## Basic Information

- **Project Name**: ai-infra-learning
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-11-16
- **Last Updated**: 2025-11-16

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

## AI Infra 学习会议

| 主题 | 时间 | 预习资料 | 录频 | 文档 | 问题反馈 & 课后思考题
| --- | --- | --- |  ---  | --- | --- | 
| vLLM Quickstart | 2025-05-11 | [Doc: vLLM](https://docs.vllm.ai/en/latest/index.html)  | [AI INFRA 学习 01 - LLM 全景图介绍/vLLM 快速入门](https://www.bilibili.com/video/BV1T2EGzLEHi)|  [01-vllm-quickstart](https://github.com/cr7258/ai-infra-learning/blob/main/lesson/01-vllm-quickstart.md) | |
| PagedAttention| 2025-05-25 | [Blog: vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention](https://blog.vllm.ai/2023/06/20/vllm.html)<br><br>[Paper: Efficient Memory Management for Large Language Model Serving with PagedAttention](https://arxiv.org/pdf/2309.06180)<br><br>[Video: Fast LLM Serving with vLLM and PagedAttention](https://www.bilibili.com/video/BV1WUYieQEyL)| [AI INFRA 学习 02 - vLLM PagedAttention 论文精读](https://www.bilibili.com/video/BV1GWjjzfE1b) | [02-pagedattention](https://github.com/cr7258/ai-infra-learning/tree/main/lesson/02-pagedattention)| [02-PagedAttention 问题反馈](https://github.com/cr7258/ai-infra-learning/issues/2) |
| Prefix Caching    |  2025-06-08    | [Doc: Automatic Prefix Caching](https://docs.vllm.ai/en/stable/features/automatic_prefix_caching.html)<br><br>[Design Doc: Automatic Prefix Caching](https://docs.vllm.ai/en/stable/design/v1/prefix_caching.html)<br><br>[Paper: SGLang: Efficient Execution of Structured Language Model Programs](https://arxiv.org/abs/2312.07104) | [AI INFRA 学习 03 - Prefix Caching 原理详解](https://www.bilibili.com/video/BV1jgTRzSEjS) | [03-prefix-caching](https://github.com/cr7258/ai-infra-learning/tree/main/lesson/03-prefix-caching)| |
| Speculative Decoding | 2025-06-22  | [Doc: Speculative Decoding](https://docs.vllm.ai/en/stable/features/spec_decode.html)<br><br>[Blog: How Speculative Decoding Boosts vLLM Performance by up to 2.8x](https://blog.vllm.ai/2024/10/17/spec-decode.html)<br><br>[Video: Hacker's Guide to Speculative Decoding in VLLM](https://www.youtube.com/watch?v=9wNAgpX6z_4)<br><br>[Video: Speculative Decoding in vLLM](https://www.youtube.com/watch?v=eVJBFajJRIU)<br><br>[Paper: Accelerating Large Language Model Decoding with Speculative Sampling](https://arxiv.org/abs/2302.01318)<br><br>[Paper: Fast Inference from Transformers via Speculative Decoding](https://arxiv.org/abs/2211.17192) | [AI INFRA 学习 04 - Speculative Decoding 实现方案](https://www.bilibili.com/video/BV1Q5KWzQEhn)  | [04-speculative-decoding](https://github.com/cr7258/ai-infra-learning/tree/main/lesson/04-speculative-decoding)| |
| Chunked-Prefills | 2025-07-13 | [Doc: vLLM Chunked Prefill](https://docs.vllm.ai/en/stable/configuration/optimization.html#chunked-prefill_1)<br><br> [Paper: SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills](https://arxiv.org/abs/2308.16369)<br><br>[Paper: DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference](https://arxiv.org/abs/2401.08671)<br><br>[Paper: Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve](https://arxiv.org/abs/2403.02310) | [AI INFRA 学习 05 - Chunked-Prefills 分块预填充](https://www.bilibili.com/video/BV1f2uczGEqt) | [05-chunked-prefills](https://github.com/cr7258/ai-infra-learning/tree/main/lesson/05-chunked-prefills) | [05-Chunked-Prefills 问题反馈 & 课后思考题](https://github.com/cr7258/ai-infra-learning/issues/1) |
| Disaggregating Prefill and Decoding | 2025-09-21 | [Doc: Disaggregated Prefilling](https://docs.vllm.ai/en/stable/features/disagg_prefill.html#disaggregated-prefilling-experimental)<br><br>[Doc: vLLM Production Stack Disaggregated Prefill](https://docs.vllm.ai/projects/production-stack/en/latest/tutorials/disagg.html)<br><br>[Paper: DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving](https://arxiv.org/abs/2401.09670)<br><br>[Paper: Splitwise: Efficient generative LLM inference using phase splitting](https://arxiv.org/abs/2311.18677)<br><br>[Video: vLLM Office Hours - Disaggregated Prefill and KV Cache Storage in vLLM](https://www.youtube.com/watch?v=FPr37jCOvrA) | [AI INFRA 学习 06 - PD 分离推理架构详解](https://www.bilibili.com/video/BV1ZTWAzmEEc) | [06-disaggregating-prefill-and-decoding](https://github.com/cr7258/ai-infra-learning/tree/main/lesson/06-disaggregating-prefill-and-decoding) | [06-PD 分离问题反馈](https://github.com/cr7258/ai-infra-learning/issues/3) |
| LoRA Adapters    |       | [Doc: LoRA Adapters](https://docs.vllm.ai/en/stable/features/lora.html)<br>[Paper: LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685) | ||
| Quantization      |      |       |                      | | |
| Distributed Inference and Serving | | [Doc: Distributed Inference and Serving](https://docs.vllm.ai/en/stable/serving/distributed_serving.html)|  || |

## 交流群（加群请备注来意）

<img src=https://github.com/user-attachments/assets/b0451ab2-b16e-4079-8b0a-b5893097572a width=60% />

## 微信公众号

<img src=https://github.com/user-attachments/assets/d2362785-c05a-4b5b-aaa7-49e939ccfc02 width=50% />

![搜索框传播样式-白色版](https://github.com/user-attachments/assets/bf4c1c47-4e85-407b-8143-68a59b474186)