# AIDB

**Repository Path**: lbjuechen/AIDB

## Basic Information

- **Project Name**: AIDB
- **Description**: AI4DB
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 1
- **Forks**: 0
- **Created**: 2023-07-31
- **Last Updated**: 2024-03-02

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README


## Researches and Practices in Autonomous Databases

<p align="center">
    【English | <a href="README-Chinese.md">中文</a>】
</p>


Continuously update the *autonomous database works* based on our past tutorials.

Kindly let us know if we have missed any great papers. Thank you!

Conference deadlines: https://ccfddl.github.io/

(Note *conference postponement* may not be promptly synchronized, so just consider it as a reference.)

Table of Contents
=================

* [0. Survey and Tutorial (16)](#0-survey-and-tutorial)
* [1. Database Configuration](#1-database-configuration)
    * [1.1 Knob Tuner (20)](#knob-tuner)
    * [1.2 View Advisor (5)](#view-advisor)
    * [1.3 Index Advisor (23)](#index-advisor)
    * [1.4 Partition Advisor (11)](#partition-advisor)
    * [1.5 Hybrid Advisor (2)](#hybrid-advisor)
* [2. Query Optimization](#2-query-optimization)
    * [2.1 Query Rewriter (12)](#query-rewriter)
    * [2.2 Cardinality Estimation (33)](#cardinality-estimation)
    * [2.3 Cost Estimation (3)](#cost-estimation)
    * [2.4 Plan Optimization (22)](#plan-optimization)
* [3. Workload Scheduling (2)](#3-workload-scheduling)
* [4. Database Design](#4-database-design)
    * [4.1 Learned Index (30)](#index)
    * [4.2 Learned Layout (7)](#layout)
    * [4.3 Query Execution (2)](#query-execution)
* [5. Database Monitoring (9)](#5-database-monitoring)
* [6. Database Diagnosis](#6-database-diagnosis)
    * [6.1 System Diagnosis (7)](#system-and-kernel-causes)
    * [6.2 Query Diagnosis (1)](#bottleneck-queries)
* [7. General Techniques](#7-general-techniques)
    * [7.1 Feature Engineering for DB (6)](#feature-engineering-for-db)
    * [7.2 Feature Engineering for AI (6)](#feature-engineering-for-ai)
    * [7.3 Model Transfer (1)](#model-transfer) 
    * [7.4 Query And Data Generation (5)](#query-and-data-generation)
* [8. Database Frameworks (16)](#8-database-frameworks)
* [9. Demonstrations (13)](#9-demonstrations)
* [S1. Large Language Models x Database (10)](#s1-large-language-models-meet-database)
* [S2. AI Resources (4)](#s2-ai-paper-and-code-list)
* [S3. Data And SQLs (3)](#s3-open-datasets-and-sqls)


---
*Great talks you should not miss >>*

***Make Your Database System Dream of Electric Sheep : Towards Self-Driving Operation.*** Andy Pavlo, Matthew Butrovich, Lin Ma, et al. [[link](https://vldb.org/pvldb/vol14/p3211-pavlo.pdf)]


***Towards instance-optimized data systems.*** Tim Kraska. [[link](http://vldb.org/pvldb/vol14/p3222-kraska.pdf)]


***AI-Native Database.*** Guoliang Li. [[link](http://dbgroup.cs.tsinghua.edu.cn/ligl/papers/aidb-slides.pdf)]

***From BERT to GPT-3 Codex: Harnessing the Potential of Very Large Language Models for Data Management.*** Immanuel Trummer. [[link](https://drive.google.com/file/d/1U-2j8oi5au3nuYwPIlhnno7c6UNDfifl/view)]


***Retrieval-based Language Models and Applications.*** Akari Asai,	Sewon Min,	Zexuan Zhong,	Danqi Chen. [[link](https://acl2023-retrieval-lm.github.io/)]

---

## 0. Survey and Tutorial

### Survey

**Database meets deep learning: Challenges and opportunities.** ![](https://img.shields.io/badge/-ai4db-Informational)

*Wei Wang, Meihui Zhang, Gang Chen, et al.  SIGMOD Record, 2016.* [[paper](https://doi.org/10.1145/3003665.3003669)]


**Database Meets Artificial Intelligence: A Survey.** ![](https://img.shields.io/badge/-ai4db-Informational) ![](https://img.shields.io/badge/-db4ai-informational)

*Xuanhe Zhou, Chengliang Chai, Guoliang Li, et al. TKDE, 2020.* [[paper](http://dbgroup.cs.tsinghua.edu.cn/ligl/papers/aidb.pdf)]

**A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration.** ![](https://img.shields.io/badge/-learned_optimizer-orange)

*Hai Lan, Zhifeng Bao, Yuwei Peng. Data Science and Engineering, 2021.* [[paper](https://link.springer.com/article/10.1007/s41019-020-00149-7)]

**A Survey on Deep Reinforcement Learning for Data Processing and Analytics.** ![](https://img.shields.io/badge/-rl4db-Informational)

*Qingpeng Cai, Can Cui, Yiyuan Xiong, et al. TKDE, 2022.* [[paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9723570)]


**Self-Driving Database Papers (CMU Spring Course)**. 2022. ![](https://img.shields.io/badge/-ai4db-Informational)

https://15799.courses.cs.cmu.edu/spring2022/schedule.html


**Automatic Database Knob Tuning: A Survey.** ![](https://img.shields.io/badge/-learned_knob_tuning-brown)

*Xinyang Zhao, Xuanhe Zhou, Guoliang Li. TKDE, 2023.* [[paper](http://dbgroup.cs.tsinghua.edu.cn/ligl/papers/tuning-survey.pdf)] [[code](https://github.com/evolveDB/tuning-survey)]


### Tutorial

**From auto-tuning one size fits all to self-designed and learned data-intensive systems.** ![](https://img.shields.io/badge/-ai4db-Informational)

*Stratos Idreos, Tim Kraska. SIGMOD, 2019.*  [[paper](https://doi.org/10.1145/3299869.3314034)]


**Speedup Your Analytics: Automatic Parameter Tuning for Databases and Big Data Systems.** ![](https://img.shields.io/badge/-learned_tuning-brown)

*Jiaheng Lu, Yuxing Chen, Herodotos Herodotou, Shivnath Babu. VLDB, 2019.*  [[paper](http://www.vldb.org/pvldb/vol12/p1970-lu.pdf)] [[slides](https://pdfs.semanticscholar.org/a784/25f87ec066c51043380f93502950e044cca3.pdf)]


**Tutorial: Adaptive Replication and Partitioning in Data Systems.** ![](https://img.shields.io/badge/-auto_db_cluster-Informational)

*Brad Glasbergen, Michael Abebe, Khuzaima Daudjee. Middleware, 2018.*  [[paper](https://cs.uwaterloo.ca/~kdaudjee/AdaptiveTutorial.pdf)]


**A Tutorial on Learned Multi-dimensional Indexes.** ![](https://img.shields.io/badge/-learned_index-black)

*Abdullah Al-Mamun, Hao Wu, Walid G. Aref. SIGSPATIAL, 2020.*  [[paper](https://dl.acm.org/doi/10.1145/3397536.3426358)]


**AI Meets Database: AI4DB and DB4AI.** ![](https://img.shields.io/badge/-ai4db-Informational) ![](https://img.shields.io/badge/-db4ai-informational)

*Guoliang Li, Xuanhe Zhou, Lei Cao. SIGMOD, 2021.*  [[paper](http://dbgroup.cs.tsinghua.edu.cn/ligl/papers/sigmod21-tutorial-paper.pdf)] [[slides](http://dbgroup.cs.tsinghua.edu.cn/ligl/papers/sigmod21-tutorial-slides.pdf)]


**Machine Learning for Databases.** ![](https://img.shields.io/badge/-ai4db-Informational)

*Guoliang Li, Xuanhe Zhou, Lei Cao. VLDB, 2021.*  [[paper](http://dbgroup.cs.tsinghua.edu.cn/ligl/papers/vldb21-tutorial-paper.pdf)][[slides](http://dbgroup.cs.tsinghua.edu.cn/ligl/papers/vldb21-tutorial-slides.pdf)]


**Machine Learning for Cloud Data Systems: the Promise, the Progress, and the Path Forward.** ![](https://img.shields.io/badge/-learned_cloud_db-Informational)

*Alekh Jindal, Matteo Interlandi. VLDB, 2021.*  [[paper](http://vldb.org/pvldb/vol14/p3202-jindal.pdf)]


**Workload-Aware Performance Tuning for Autonomous DBMSs.** ![](https://img.shields.io/badge/-learned_tuning-brown)

*Zhengtong Yan, Jiaheng Lu, Naresh Chainani, et al. ICDE, 2021.*  [[paper](https://www2.helsinki.fi/sites/default/files/atoms/files/icde_2021_tutorial_latest.pdf)]


**Learned Query Optimizer: At the Forefront of AI-Driven Databases.** ![](https://img.shields.io/badge/-learned_optimizer-orange)

*Zhu, Rong, Ziniu Wu, Chengliang Chai, et al. EDBT, 2022.*  [[paper](https://openproceedings.org/2022/conf/edbt/tutorial-1.pdf)]


**From BERT to GPT-3 Codex: Harnessing the Potential of Very Large Language Models for Data Management.** ![](https://img.shields.io/badge/-llm4db-Informational)

*Immanuel Trummer. VLDB, 2022.*  [[paper](https://dl.acm.org/doi/pdf/10.14778/3554821.3554896)]


## 1. Database Configuration

### Knob Tuner

#### Heuristic 

PGTune: https://pgtune.leopard.in.ua. ![](https://img.shields.io/badge/-rule-green) 

**OpenTuner: An Extensible Framework for Program Autotuning** ![](https://img.shields.io/badge/-search-yellowgreen)   

*Ansel J, Kamil S, Veeramachaneni K, et al. PACT, 2014.* [[paper](https://dl.acm.org/doi/pdf/10.1145/2628071.2628092)] 

**BestConfig: Tapping the Performance Potential of Systems via Automatic Configuration Tuning** ![](https://img.shields.io/badge/-search-yellowgreen)  

*Zhu Y, Liu J, Guo M, et al. SoCC, 2017.* [[paper](https://dl.acm.org/doi/abs/10.1145/3127479.3128605)]

---

#### BO-based

**Tuning Database Conﬁguration Parameters with iTuned** ![](https://img.shields.io/badge/-gaussian_process-orange)    

*Duan, S., Thummala, V., & Babu, S. VLDB, 2009.* [[paper](https://users.cs.duke.edu/~shivnath/ituned/paper.pdf)]

**Automatic database management system tuning through large-scale machine learning** ![](https://img.shields.io/badge/-gaussian_process-orange)  

*Van Aken D, Pavlo A, Gordon G J, et al. SIGMOD, 2017.* [[paper](https://dl.acm.org/doi/pdf/10.1145/3035918.3064029)]

**Black or White? How to Develop an AutoTuner for Memory-based Analytics** ![](https://img.shields.io/badge/-gaussian_process-orange) ![](https://img.shields.io/badge/-Featurization-9cf)   

*Kunjir M, Babu S. SIGMOD, 2020.* [[paper](https://dl.acm.org/doi/pdf/10.1145/3318464.3380591)]

**ResTune: Resource Oriented Tuning Boosted by Meta-Learning for Cloud Databases** ![](https://img.shields.io/badge/-gaussian_process-orange) ![](https://img.shields.io/badge/-Model_Transferring-8cfff3)

*Zhang X, Wu H, Chang Z, et al. SIGMOD, 2021.* [[paper](https://15799.courses.cs.cmu.edu/spring2022/papers/08-knobs3/zhang-sigmod2021.pdf)]

**CGPTuner: a Contextual Gaussian Process Bandit Approach for the Automatic Tuning of IT Configurations Under Varying Workload Conditions** ![](https://img.shields.io/badge/-contextual_gaussian_process-orange)   

*Cereda S, Valladares S, Cremonesi P, et al. VLDB, 2021.* [[paper](https://www.cl.cam.ac.uk/~ey204/teaching/ACS/R244_2021_2022/papers/CGPTUNER_VLDB_2021.pdf)]

**Towards Dynamic and Safe Configuration Tuning for Cloud Databases** ![](https://img.shields.io/badge/-bounded_gaussian_process-orange)  

*Zhang X, Wu H, Li Y, et al. SIGMOD, 2022.* [[paper](https://arxiv.org/pdf/2203.14473)]

**LlamaTune: Sample-Efficient DBMS Configuration Tuning** ![](https://img.shields.io/badge/-gaussian_process-orange)  

*Kanellis K, Ding C, Kroth B, et al. VLDB, 2022.* [[paper](https://arxiv.org/pdf/2203.05128)]

---

#### DL-based

**iBTune: Individualized Buffer Tuning for Large-scale Cloud Databases**

*Jian Tan, Tieying Zhang, Feifei Li, et al. VLDB, 2019.* [[paper](http://www.vldb.org/pvldb/vol12/p1221-tan.pdf)]

---

#### RL-based

**An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning** 

*Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, et al. SIGMOD, 2019.* [[paper](https://dl.acm.org/doi/abs/10.1145/3299869.3300085)]

**QTune: A Query-Aware Database Tuning System with Deep Reinforcement Learning** ![](https://img.shields.io/badge/-query_encoding-blue)  

*Li G, Zhou X, Li S, et al. VLDB, 2019.* [[paper](https://15799.courses.cs.cmu.edu/spring2022/papers/08-knobs3/p2118-li.pdf)]

**Watuning: A workload-aware tuning system with attention-based deep reinforcement learning** ![](https://img.shields.io/badge/-pre_trained-grey)

*Ge J K, Chai Y F, Chai Y P.  JCST, 2021.* [[paper](https://link.springer.com/article/10.1007/s11390-021-1350-8)]

**The Case for NLP-Enhanced Database Tuning: Towards Tuning Tools that "Read the Manual"** ![](https://img.shields.io/badge/-llm-f5f5dc)  

*Trummer I. VLDB, 2021.* [[paper](http://vldb.org/pvldb/vol14/p1159-trummer.pdf)]

**DB-BERT: a Database Tuning Tool that “Reads the Manual”** ![](https://img.shields.io/badge/-llm-f5f5dc)  

*Trummer I. SIGMOD, 2022.* [[paper](https://dl.acm.org/doi/pdf/10.1145/3514221.3517843)]

**HUNTER- An Online Cloud Database Hybrid Tuning System for Personalized Requirements** 

*Cai B, Liu Y, Zhang C, et al. SIGMOD, 2022.* [[paper](https://scholar.archive.org/work/mhkvbi2uwfdvfb2zhj73brzb6a/access/wayback/https://dl.acm.org/doi/pdf/10.1145/3514221.3517882)]

---

#### Knob Selection

**SARD: A statistical approach for ranking database tuning parameters**

*Debnath B K, Lilja D J, Mokbel M F. ICDE Workshops 2008.*  [[paper](https://www-users.cse.umn.edu/~mokbel/papers/SARD.pdf)]

**Too Many Knobs to Tune? Towards Faster Database Tuning by Pre-selecting Important Knobs**

*Kanellis K, Alagappan R, Venkataraman S. HotStorage 2020.* [[paper](https://www.usenix.org/system/files/hotstorage20_paper_kanellis.pdf)]

---

#### Benefit Estimation

**IWEK: An Interpretable What-If Estimator for Database Knobs** ![](https://img.shields.io/badge/-random_forest-orange)   ![](https://img.shields.io/badge/-transfer-8cfff3)  

*Yu Yan, Hongzhi Wang, Jian Geng, et al. arXiv 2023.* [[paper](https://arxiv.org/pdf/2307.16115.pdf)] 

---

#### Experiments

**An inquiry into machine learning-based automatic configuration tuning services on real-world database management systems**

*Van Aken D, Yang D, Brillard S, et al. VLDB, 2021.* [[paper](https://www.cs.cmu.edu/~./pavlo/papers/p1241-aken.pdf)]

**Facilitating Database Tuning with Hyper-Parameter Optimization- A Comprehensive Experimental Evaluation**

*Zhang X, Chang Z, Li Y, et al. VLDB, 2022.* [[paper](https://15799.courses.cs.cmu.edu/spring2022/papers/09-knobs4/zhang-techreport2021.pdf)]

<br>


### View Advisor

**Selecting subexpressions to materialize at datacenter scale**

*A. Jindal, K. Karanasos, S. Rao, and H. Patel. PVLDB, 11(7):800–812, 2018.* [[paper](http://www.vldb.org/pvldb/vol11/p800-jindal.pdf)]

**Automated generation of materialized views in Oracle**

*Ahmed, R., Bello, R., Witkowski, A., & Kumar, P. (2020). VLDB, 2020.* [[paper](https://doi.org/10.14778/3415478.3415533)]


**Computation reuse in analytics job service at microsoft**

*Jindal, A., Qiao, S., Patel, H., Yin, Z., Di, J., Bag, M., Friedman, M., Lin, Y., Karanasos, K. and Rao, S., SIGMOD, 2018 (pp. 191-203).* [[paper](https://dl.acm.org/doi/abs/10.1145/3183713.3190656)]

**Automatic View Generation for Equivalent Subqueries with Deep Learning and Reinforcement Learning**

*Yuan, H., Sun, J., & Li, G. (2020). ICDE, 2020.* [[paper](https://doi.org/10.1109/ICDE48307.2020.00133)]

**An Autonomous Materialized View Management System with Deep Reinforcement Learning**

*Han, Y., Li, G., Yuan, H., & Sun, J. ICDE, 2021.* [[paper](https://doi.org/10.1109/ICDE51399.2021.00217)]

**AutoView: An Autonomous Materialized View Management System with Encoder-Reducer**

Han, Y., Li, G., Yuan, H. and Sun, J., TKDE, 2022. [[paper](https://ieeexplore.ieee.org/abstract/document/9744426/)]

**Dynamic Materialized View Management using Graph Neural Network**

*Yue Han, Chengliang Chai, Jiabin Liu, Guoliang Li, Chuangxian Wei, Chaoqun Zhan. ICDE 2023.* [[paper](http://dbgroup.cs.tsinghua.edu.cn/ligl/papers/dynamic-view-icde23.pdf)]

**A novel coral reefs optimization algorithm for materialized view selection in data warehouse environments**

*Azgomi, H. and Sohrabi, M.K., Applied Intelligence, 2019, 49, pp.3965-3989.* [[paper](https://link.springer.com/article/10.1007/s10489-019-01481-w)]

<br>

### Index Advisor

**[EA & B]** Jan Kossmann, Stefan Halfpap, Marcel Jankrift, Rainer Schlosser: *Magic mirror in my hand, which is the best in the land? An Experimental Evaluation of Index Selection Algorithms.* Proc. VLDB Endow. 13(11): 2382-2395 (2020) [[paper](http://www.vldb.org/pvldb/vol13/p2382-kossmann.pdf)]

**[Industry, Microsoft Azure]** Sudipto Das, Miroslav Grbic, Igor Ilic, Isidora Jovandic, Andrija Jovanovic, Vivek R. Narasayya, Miodrag Radulovic, Maja Stikic, Gaoxiang Xu, Surajit Chaudhuri: *Automatically Indexing Millions of Databases in Microsoft Azure SQL Database.* SIGMOD Conference 2019: 666-679 [[paper](https://www.microsoft.com/en-us/research/uploads/prod/2019/02/autoindexing_azuredb.pdf)]

**[Industry, Meta]** Ritwik Yadav, Satyanarayana R. Valluri, Mohamed Zait: *AIM: A practical approach to automated index management for SQL databases*. ICDE 2023 [[paper](https://research.facebook.com/micro_site/url/?click_from_context_menu=true&country=CN&destination=https%3A%2F%2Fresearch.facebook.com%2Ffile%2F215595724407039%2FAIM_SRT_Update.pdf&event_type=click&last_nav_impression_id=0CS9zArYOjQEk5cnm&max_percent_page_viewed=36&max_viewport_height_px=961&max_viewport_width_px=1912&orig_http_referrer=https%3A%2F%2Fwww.google.com%2F&orig_request_uri=https%3A%2F%2Fresearch.facebook.com%2Fpublications%2Faim-a-practical-approach-to-automated-index-management-for-sql-databases%2F&region=apac&scrolled=false&session_id=0e6eXwoLiqjYPPnEb&site=mc_research)]

**[Heuristic-based, AutoAdmin]** Surajit Chaudhuri, Vivek R. Narasayya: *An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server.* VLDB 1997: 146-155 [[paper](https://www.vldb.org/conf/1997/P146.PDF)]

**[Heuristic-based, DB2Advis]** Gary Valentin, Michael Zuliani, Daniel C. Zilio, Guy M. Lohman, Alan Skelley: *DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes.* ICDE 2000: 101-110 [[paper](http://www.cs.toronto.edu/~alan/papers/icde00.pdf)]

**[Heuristic-based, Relaxation]** Nicolas Bruno, Surajit Chaudhuri: Automatic Physical Database Tuning: *A Relaxation-based Approach.* SIGMOD Conference 2005: 227-238 [[paper](https://dl.acm.org/doi/10.1145/1066157.1066184)]

**[Heuristic-based, COLT]** Karl Schnaitter, Serge Abiteboul, Tova Milo, Neoklis Polyzotis: *On-Line Index Selection for Shifting Workloads.* ICDE Workshops 2007: 459-468 [[paper](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.130.3168&rep=rep1&type=pdf)]

**[Heuristic-based, Extend]** Rainer Schlosser, Jan Kossmann, Martin Boissier: *Efficient Scalable Multi-attribute Index Selection Using Recursive Strategies.* ICDE 2019: 1238-1249 [[paper](http://ieeexplore.ieee.org/document/8731387)]

**[Learning-based, DQN]** Hai Lan, Zhifeng Bao, Yuwei Peng: *An Index Advisor Using Deep Reinforcement Learning.* CIKM 2020: 2105-2108 [[paper](https://doi.org/10.1145/3340531.3412106)]

**[Learning-based, DQN]** Zahra Sadri, Le Gruenwald, Eleazar Leal: *Online Index Selection Using Deep Reinforcement Learning for a Cluster Database.* ICDE Workshops 2020: 158-161 [[paper](https://doi.org/10.1109/ICDEW49219.2020.00035)]

**[Learning-based, DQN]** Gabriel Paludo Licks, Júlia Mara Colleoni Couto, Priscilla de Fátima Miehe, Renata De Paris, Duncan Dubugras A. Ruiz, Felipe Meneguzzi: *SmartIX: A Database Indexing Agent based on Reinforcement Learning.* Appl. Intell. 50(8): 2575-2588 (2020) [[paper](https://link.springer.com/article/10.1007/s10489-020-01674-8)]

**[Learning-based, DQN]** Vishal Sharma, Curtis E. Dyreson, Nicholas Flann: *MANTIS: Multiple Type and Attribute Index Selection using Deep Reinforcement Learning.* IDEAS 2021: 56-64 [[paper](https://dl.acm.org/doi/abs/10.1145/3472163.3472176)]

**[Learning-based, DQN]** Yu Yan, Shun Yao, Hongzhi Wang, Meng Gao: *Index selection for NoSQL database with deep reinforcement learning.* Inf. Sci. 561: 20-30 (2021) [[paper](https://www.sciencedirect.com/science/article/pii/S0020025521000049)]

**[Learning-based, DQN]** Vishal Sharma, Curtis E. Dyreson: *Indexer++: Workload-aware Online Index Tuning with Transformers and Reinforcement Learning.* SAC 2022: 372-380 [[paper](https://dl.acm.org/doi/10.1145/3477314.3507691)]

**[Learning-based, MAB]** R. Malinga Perera, Bastian Oetomo, Benjamin I. P. Rubinstein, Renata Borovica-Gajic: *DBA bandits: Self-driving index tuning under ad-hoc, analytical workloads with safety guarantees.* ICDE 2021: 600-611 [[paper](https://arxiv.org/pdf/2010.09208.pdf)]

**[Learning-based, MAB]** R. Malinga Perera, Bastian Oetomo, Benjamin I. P. Rubinstein, Renata Borovica-Gajic:
*HMAB: Self-Driving Hierarchy of Bandits for Integrated Physical Database Design Tuning.* Proc. VLDB Endow. 16(2): 216-229 (2022) [[paper](https://dl.acm.org/doi/abs/10.14778/3565816.3565824)]

**[Learning-based, MCTS]** Xuanhe Zhou, Luyang Liu, Wenbo Li, Lianyuan Jin, Shifu Li, Tianqing Wang, Jianhua Feng: *AutoIndex: An Incremental Index Management System for Dynamic Workloads.* ICDE 2022: 2196-2208 [[paper](http://dbgroup.cs.tsinghua.edu.cn/ligl/papers/icde2022-autoindex.pdf)]

**[Learning-based, MCTS]** Wentao Wu, Chi Wang, Tarique Siddiqui, Junxiong Wang, Vivek R. Narasayya, Surajit Chaudhuri, Philip A. Bernstein: *Budget-aware Index Tuning with Reinforcement Learning.* SIGMOD Conference 2022: 1528-1541 [[paper](https://www.microsoft.com/en-us/research/uploads/prod/2022/06/mcts-full.pdf)]

**[Optimization, Learned Cost]** Bailu Ding, Sudipto Das, Ryan Marcus, Wentao Wu, Surajit Chaudhuri, Vivek R. Narasayya: *AI Meets AI: Leveraging Query Executions to Improve Index Recommendations.* SIGMOD Conference 2019: 1241-1258 [[paper](https://doi.org/10.1145/3299869.3324957)]

**[Optimization, Learned Cost]** Jiachen Shi, Gao Cong, Xiaoli Li: Learned Index Benefits: Machine Learning Based Index Performance Estimation. Proc. VLDB Endow. 15(13): 3950-3962 (2022) [[paper](https://www.vldb.org/pvldb/vol15/p3950-shi.pdf)]

**[Optimization, Learned Cost]** Jianling Gao, Nan Zhao, Ning Wang, Shuang Hao:
*SmartIndex: An Index Advisor with Learned Cost Estimator*. CIKM 2022: 4853-4856 [[paper](https://dl.acm.org/doi/abs/10.1145/3511808.3557163)]

**[Optimization, Workload Summarization]** Tarique Siddiqui, Saehan Jo, Wentao Wu, Chi Wang, Vivek R. Narasayya, Surajit Chaudhuri:
*ISUM: Efficiently Compressing Large and Complex Workloads for Scalable Index Tuning.* SIGMOD Conference 2022: 660-673 [[paper](https://dl.acm.org/doi/10.1145/3514221.3526152)]

**[Optimization, What-if Call]** Tarique Siddiqui, Wentao Wu, Vivek R. Narasayya, Surajit Chaudhuri:
*DISTILL: Low-Overhead Data-Driven Techniques for Filtering and Costing Indexes for Scalable Index Tuning.* Proc. VLDB Endow. 15(10): 2019-2031 (2022) [[paper](https://dl.acm.org/doi/abs/10.14778/3547305.3547309)]

<br>

### Partition Advisor

**Automating physical database design in a parallel database.** ![](https://img.shields.io/badge/-horizontal-brown) 

*Jun Rao, Chun Zhang, Nimrod Megiddo, Guy M. Lohman. SIGMOD, 2002.* [[paper](https://www.csd.uoc.gr/~hy460/pdf/p558-rao.pdf)]


**Schism: a Workload-Driven Approach to Database Replication and Partitioning.** ![](https://img.shields.io/badge/-horizontal-brown)

*Carlo Curino, Yang Zhang, Evan P. C. Jones, Samuel Madden. PVLDB, 2010.* [[paper](https://doi.org/10.14778/1920841.1920853)]


**Locality-aware partitioning in parallel database systems.**

*Erfan Zamanian, Carsten Binnig, Abdallah Salama. SIGMOD, 2015.* [[paper](https://doi.org/10.1145/2723372.2723718)]


**Query centric partitioning and allocation for partially replicated database systems.**

*Tilmann Rabl, Hans-Arno Jacobsen. SIGMOD, 2017.* [[paper](https://doi.org/10.1145/3035918.3064052)]

**Workload-driven horizontal partitioning and pruning for large HTAP systems.** ![](https://img.shields.io/badge/-horizontal-brown)  ![](https://img.shields.io/badge/-data_skip-yellow) 

*Martin Boissier, Kurzynski Daniel. ICDE Workshop, 2018.* [[paper](https://doi.org/10.1109/ICDEW.2018.00026)]


**Towards learning a partitioning advisor with deep reinforcement learning.**  ![](https://img.shields.io/badge/-horizontal-brown)  ![](https://img.shields.io/badge/-RL-yellow) 

*Benjamin Hilprecht, Carsten Binnig, Uwe Röhm. aiDM@SIGMOD, 2019.* [[paper](https://doi.org/10.1145/3329859.3329876)]


**Automated vertical partitioning with deep reinforcement learning**. ![](https://img.shields.io/badge/-vertical-green)

*Campero Durand G, Piriyev R, Pinnecke M, et al. ADBIS, 2019.* [[paper](https://doi.org/10.1007/978-3-030-30278-8_16)]


**Fast and effective distribution-key recommendation for amazon redshift.**  ![](https://img.shields.io/badge/-horizontal-brown)  ![](https://img.shields.io/badge/-algorithm_set-yellow) 

*Panos Parchas, Yonatan Naamad, Peter Van Bouwel, et al. PVLDB, 2020.* [[paper](https://doi.org/10.14778/3407790.3407834)]


**Adaptive partitioning and indexing for in situ query processing.** ![](https://img.shields.io/badge/-situ-blue)

*Olma, M., Karpathiotakis, M., Alagiannis, I., Athanassoulis, et al. VLDB Journal.* [[paper](https://doi.org/10.1007/s00778-019-00580-x)]


**Learning a Partitioning Advisor for Cloud Databases.**  ![](https://img.shields.io/badge/-horizontal-brown)  ![](https://img.shields.io/badge/-RL-yellow) 

*Benjamin Hilprecht, Carsten Binnig, Uwe Röhm. SIGMOD, 2020.* [[paper](https://15799.courses.cs.cmu.edu/spring2022/papers/10-partitioning/hilprecht-sigmod2020.pdf)]


**Grep: A Graph Learning Based Database Partitioning System.**  ![](https://img.shields.io/badge/-horizontal-brown)  ![](https://img.shields.io/badge/-GNN-orange) 

*Xuanhe Zhou, Guoliang Li, Jianhua Feng, et al. SIGMOD, 2023.* [[paper](http://dbgroup.cs.tsinghua.edu.cn/ligl/papers/grep.pdf)] [[demo](https://github.com/TsinghuaDatabaseGroup/AI4DBCode/tree/master/DatabasePartition)]

<br>

### Hybrid Advisor

**Universal Database Optimization using Reinforcement Learning**

*Wang J, Trummer I, Basu D. VLDB, 2021.* [[paper](http://www.vldb.org/pvldb/vol14/p3402-wang.pdf)]


**A Unified and Efficient Coordinating Framework for Autonomous DBMS Tuning**

*Xinyi Zhang, Zhuo Chang, HONG WU, et al. SIGMOD, 2023.* [[paper](https://arxiv.org/abs/2303.05710)]


## 2. Query Optimization

### Query Rewriter 

(note other interesting problems like [text2SQL](https://github.com/yechens/NL2SQL) are not within the scope)

#### Traditional

**[Rewrite Rules]** 	Béatrice Finance, Georges Gardarin. A Rule-Based Query Rewriter in an Extensible DBMS. ICDE 1991. [[paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=131472)]

**[Rewrite Rules]** 	Hamid Pirahesh, Joseph M. Hellerstein, Waqar Hasan. *Extensible/Rule Based Query Rewrite Optimization in Starburst*. SIGMOD Conference 1992. [[paper](https://sigmodrecord.org/publications/sigmodRecord/9206/pdfs/141484.130294.pdf)]

**[Cost/Heuristic Rewrite]** Rafi Ahmed, Allison W. Lee, Andrew Witkowski, et al. *Cost-Based Query Transformation in Oracle*. VLDB 2006: 1026-1036. [[paper](https://www.researchgate.net/publication/221311318_Cost-Based_Query_Transformation_in_Oracle/link/572bbc5e08aef7c7e2c6b829/download)]

**[Heuristic Rewrite]** De Araújo, A. H. M., Monteiro, J. M., Antônio, J., De Macêdo, F., Tavares, J. A., Brayner, A., & Lifschitz, S. (2014). *ARe-SQL: An Online, Automatic and Non-Intrusive Approach for Rewriting SQL Queries*. JIDM, 2014. [[paper](https://www.researchgate.net/publication/264081912_ARE-SQL_AN_ONLINE_AUTOMATIC_AND_NON-INTRUSIVE_APPROACH_FOR_REWRITING_SQL_QUERIES)]

**[Semantic Equivalence]** 	Shumo Chu, Konstantin Weitz, Alvin Cheung, Dan Suciu. *HoTTSQL: proving query rewrites with univalent SQL semantics*. PLDI 2017: 510-524. [[paper](https://doi.org/10.1145/3062341.3062348)]

**[Optimization Engine]** Begoli, E., Camacho-Rodríguez, J., Hyde, J., Mior, M. J., & Lemire, D. (2018). *Apache calcite: A foundational framework for optimized query processing over heterogeneous data sources*. SIGMOD, 2018. [[paper](https://doi.org/10.1145/3183713.3190662)]

**[Map-Reduce Rewrite]** 	Partho Sarthi, Kaushik Rajan, Akash Lal, Abhishek Modi, et al. *Generalized Sub-Query Fusion for Eliminating Redundant I/O from Big-Data Queries. OSDI 2020: 209-224*.  [[paper](https://www.usenix.org/system/files/osdi20-sarthi_0.pdf)]

**[Streaming]** Wentao Wu, Philip A. Bernstein, Alex Raizman, Christina Pavlopoulou. *Cost-based Query Rewriting Techniques for Optimizing Aggregates Over Correlated Windows*. CoRR abs/2008.12379 (2020)  [[paper](https://www.researchgate.net/profile/Wentao-Wu-2/publication/343986286_Cost-based_Query_Rewriting_Techniques_for_Optimizing_Aggregates_Over_Correlated_Windows/links/5f52ad2e299bf13a31a07101/Cost-based-Query-Rewriting-Techniques-for-Optimizing-Aggregates-Over-Correlated-Windows.pdf)]

**[Rewrite Rules]** 	Zhaoguo Wang, Zhou Zhou, Yicun Yang, Haoran Ding, Gansen Hu, Ding Ding, Chuzhe Tang, Haibo Chen, Jinyang Li. *WeTune: Automatic Discovery and Verification of Query Rewrite Rules*. SIGMOD Conference 2022: 94-107. [[paper](https://ipads.se.sjtu.edu.cn/_media/publications/wetune_final.pdf)]

**[Rewrite Rules]** 	Qiushi Bai, Sadeem Alsudais, Chen Li. *QueryBooster: Improving SQL Performance Using Middleware Services for Human-Centered Query Rewriting*. VLDB, 2023. [[paper](https://arxiv.org/pdf/2305.08272.pdf)]


#### Learning-based

**[Predicate Rewrite]** Qi Zhou, Joy Arulraj, Shamkant B. Navathe, William Harris, Jinpeng Wu. *Sia : Optimizing Queries using Learned Predicates*. SIGMOD, 2021. [[paper](https://doi.org/10.1145/3448016.3457262)]

**[Rewrite Strategy]** Xuanhe Zhou, Guoliang Li, Chengliang Chai, Jianhua Feng. *A Learned Query Rewrite System using Monte Carlo Tree Search*. VLDB, 2022. [[paper](http://dbgroup.cs.tsinghua.edu.cn/ligl/papers/vldb22-query-rewrite.pdf)]


### Cardinality Estimation

**[Card, Query-based]** Dutt, A., Wang, C., Nazi, A., Kandula, S., Narasayya, V., & Chaudhuri, S. (2018). Selectivity estimation for range predicates using lightweight models. Proceedings of the VLDB Endowment, 12(9), 1044–1057, 2018. [[paper](https://doi.org/10.14778/3329772.3329780)]

**[Card, Query-based]** Kipf A, Kipf T, Radke B, et al. Learned cardinalities: Estimating correlated joins with deep learning. CIDR, 2019. [[paper](https://arxiv.org/pdf/1809.00677)]

**[Card, Query-based]** Woltmann L, Hartmann C, Thiele M, et al. Cardinality estimation with local deep learning models. aiDM, 2019. [[paper](https://doi.org/10.1145/3329859.3329875)]

**[Card, Query-based]** Hayek, R., & Shmueli, O. (2020). *NN-based Transformation of Any SQL Cardinality Estimator for Handling DISTINCT, AND, OR and NOT*. arXiv， 2020. [[paper](http://arxiv.org/abs/2004.07009)]

**[Card, Query-based]** Tzoumas K, Deshpande A, Jensen C S. Lightweight graphical models for selectivity estimation without independence assumptions[J]. Proceedings of the VLDB Endowment, 4(11): 852-863, 2011. [[paper](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.228.675&rep=rep1&type=pdf)]

**[Card, Query-based]** Xiao Hu, Yuxi Liu, Haibo Xiu, Pankaj K. Agarwal, Debmalya Panigrahi, Sudeepa Roy, Jun Yang. *Selectivity Functions of Range Queries are Learnable*. SIGMOD, 2022. [[paper](https://dl.acm.org/doi/pdf/10.1145/3514221.3517896)]

**[Card, Query-based, Adaptability]** Beibin Li, Yao Lu, Srikanth Kandula: Warper: Efficiently Adapting Learned Cardinality Estimators to Data and Workload Drifts. SIGMOD Conference 2022: 1920-1933 [[paper](https://dl.acm.org/doi/10.1145/3514221.3526179)]

**[Card, Query-based, Robust Encoding & Training]** Negi, Parimarjan, Ziniu Wu, Andreas Kipf, Nesime Tatbul, Ryan Marcus, Sam Madden, Tim Kraska, and Mohammad Alizadeh: Robust Query Driven Cardinality Estimation under Changing Workloads. VLDB, 2023. [[paper](https://www.vldb.org/pvldb/vol16/p1520-negi.pdf)]

**[Card, Data-based]** Leis, V., Radke, B., Gubichev, A., Kemper, A., & Neumann, T. (2017). Cardinality estimation done right: Index-based join sampling. CIDR, 2017. [[paper](http://cidrdb.org/cidr2017/papers/p9-leis-cidr17.pdf)]

**[Card, Data-based]** Yang, Z., Liang, E., Kamsetty, A., Wu, C., Duan, Y., Chen, X., … Stoica, I. (2019). Deep Unsupervised Cardinality Estimation. VLDB, 2019. [[paper](https://doi.org/10.14778/3368289.3368294)]

**[Card, Data-based]** Yang, Z., Kamsetty, A., Luan, S., Liang, E., Duan, Y., Chen, X., & Stoica, I. (2020). Neurocard: One cardinality estimator for all tables. *Proceedings of the VLDB Endowment*, *14*(1), 61–73, 2020. [[paper](https://doi.org/10.14778/3421424.3421432)]

**[Card, Data-based]** Zhu R, Wu Z, Han Y, et al. FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation[J]. arXiv preprint arXiv:2011.09022, 2020. [[paper](https://arxiv.org/pdf/2011.09022)]

**[Card, Data-based]** Wu Z, Shaikhha A, Zhu R, et al. BayesCard: Revitilizing Bayesian Frameworks for Cardinality Estimation. arXiv preprint arXiv: 2012.14743, 2020. [[paper](https://arxiv.org/pdf/2012.14743)]

**[Card, Data-based]** Hilprecht, B., Schmidt, A., Kulessa, M., Molina, A., Kersting, K., & Binnig, C. (2020). DeepDB: Learn from data, not from queries! VLDB, *13*(7), 992–1005, 2020. [[paper](https://doi.org/10.14778/3384345.3384349)]

**[Card, Data-based]** Yongjoo Park, Shucheng Zhong, and Barzan Mozafari. Quicksel: Quick selectivity learning with mixture models. SIGMOD 2020. [[paper](https://arxiv.org/pdf/1812.10568.pdf)]

**[Card, Data-based]** Lu Y, Kandula S, König A C, et al. Pre-training summarization models of structured datasets for cardinality estimation[J]. Proceedings of the VLDB Endowment, 2021. [[paper](https://dl.acm.org/doi/pdf/10.14778/3494124.3494127?casa_token=v6OMWXKyNM4AAAAA:gN2zqOt0DBvEt7AhW3e26aZSREvTaMWb6f64f9m_Vs4dLcs-18paOgLbX4Mzq1IlJ-ILFl2-nNZXdiI)] 

**[Card, Data-based]** Zhu, R., Wu, Z., Han, Y., Zeng, K., Pfadler, A., Qian, Z., … Cui, B. (2020). FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation. VLDB, 2021. [[paper](http://arxiv.org/abs/2011.09022)]

**[Card, Data-based]** Jiayi Wang, Chengliang Chai, Jiabin Liu, Guoliang Li. FACE: A Normalizing Flow based Cardinality Estimator. VLDB 2022. [[paper](https://dbgroup.cs.tsinghua.edu.cn/ligl/papers/vldb22-flow-card.pdf)]

**[Card, Data-based]** Yao Lu, Srikanth Kandula, Arnd Christian König, Surajit Chaudhuri. Pre-training summarization models of structured datasets for cardinality estimation. VLDB 2022. [[paper](https://www.vldb.org/pvldb/vol15/p414-lu.pdf)]

**[Card, Query&Data-based]** Wu P, Cong G. A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation. SIGMOD. 2021: 2009-2022. [[paper](https://arxiv.org/pdf/2107.12295)]

**[Card]** Parimarjan Negi, Ryan C. Marcus, Andreas Kipf, Hongzi Mao, Nesime Tatbul, Tim Kraska, Mohammad Alizadeh. Flow-Loss: Learning Cardinality Estimates That Matter. VLDB Endow, 14(11): 2019-2032, 2021. [[paper](http://www.vldb.org/pvldb/vol14/p2019-negi.pdf)] 

**[Card, Model Selection]** Jintao Zhang, Chao Zhang, Guoliang Li, Chengliang Chai. *AutoCE: An Accurate and Efficient Model Advisor for Learned Cardinality Estimation*. ICDE, 2023.  [[paper](https://github.com/jt-zhang/jt-zhang.github.io/raw/master/files/AutoCE_camera_ready_icde23.pdf)]

**[Card]** Xiaoye Miao, Yangyang Wu, Jiazhen Peng, et al. Efficient and Effective Cardinality Estimation for Skyline Family. SIGMOD, 2023. [[paper](https://dl.acm.org/doi/10.1145/3588958)]

**[Card]** Ziniu Wu, Parimarjan Negi, Mohammad Alizadeh, Tim Kraska, Samuel Madden. FactorJoin: A New Cardinality Estimation Framework for Join Queries. SIGMOD, 2023. [[paper](https://dl.acm.org/doi/10.1145/3588721)]

**[Card, Query-based]** Fang Wang, Xiao Yan, Man Lung Yiu, Shuai Li, Zunyao Mao, and Bo Tang. Speeding Up End-to-end Query Execution via Learning-based Progressive Cardinality Estimation. SIGMOD, 2023. [[paper](https://dl.acm.org/doi/pdf/10.1145/3588708)]

**[ EA&B ]** Wang, X., Qu, C., Wu, W., Wang, J., & Zhou, Q. (2021). Are We Ready For Learned Cardinality Estimation?  Proc. VLDB Endow. 14(9): 1640-1654 (2021). [[paper](http://www.vldb.org/pvldb/vol14/p1640-wang.pdf)]

**[ EA&B ]** Sun, J., Zhang, J., Sun, Z., Li, G., & Tang, N. (n.d.). *Learned Cardinality Estimation : A Design Space Exploration and a Comparative Evaluation [ EA & B ]*. *14*(1). VLDB, 2022. [[paper](http://dbgroup.cs.tsinghua.edu.cn/ligl/papers/vldb22-card-exp.pdf)]

**[ EA&B ]** Yuxing Han, Ziniu Wu, Peizhi Wu, et al. Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation Yuxing. VLDB, 2022. [[paper](https://arxiv.org/abs/2109.05877)] 

**[ EA&B ]** Kyoungmin Kim, Jisung Jung, In Seo, Wook-Shin Han, Kangwoo Choi, Jaehyok Chong: Learned Cardinality Estimation: An In-depth Study. SIGMOD Conference 2022: 1214-1227 [[paper](https://dl.acm.org/doi/abs/10.1145/3514221.3526154)]

**[ EA&B ]** Harmouch, H., & Naumann, F. (2018). Cardinality Estimation: An Experimental Survey. *Pvldb*, *11*(4), 4999–512, 2017. [[paper](https://doi.org/10.1145/3164135.3164145)]

### Cost Estimation

**[Cost]** Marcus, R., & Papaemmanouil, O. (2019). *Plan-Structured Deep Neural Network Models for Query Performance Prediction*. 1733–1746. [[paper](http://arxiv.org/abs/1902.00132)]

**[Cost]** Sun, J., & Li, G. (n.d.). *An End-to-End Learning-based Cost Estimator*. VLDB, 2020. [[paper](http://dbgroup.cs.tsinghua.edu.cn/ligl/papers/vldb2020-learnedcost.pdf)]

**[Cost]** Benjamin Hilprecht, Carsten Binnig. *Zero-Shot Cost Models for
Out-of-the-box Learned Cost Prediction*. VLDB, 2022. [[paper](https://www.vldb.org/pvldb/vol15/p2361-hilprecht.pdf)]

### Plan Optimization

**Continuously Adaptive Query Processing**

*Ron Avnur, Joseph M. Hellerstein. Eddies. SIGMOD, 2000.* [[paper](https://dl.acm.org/doi/pdf/10.1145/342009.335420)]

**How Good Are Query Optimizers, Really?** ![](https://img.shields.io/badge/benchmark-blue) 

*Leis, V., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., & Neumann, T. Proceedings of the VLDB Endowment (2016), 9(3), 204–215.* [[paper](https://doi.org/10.14778/2850583.2850594)]

**Neo: A Learned query optimizer** ![](https://img.shields.io/badge/RL-blue) 

*Marcus, R., Negi, P., Mao, H., Zhang, C., Alizadeh, M., Kraska, T., … Tatbul, N. (2018). Proceedings of the VLDB Endowment*, *12*(11), 1705–1718, 2018. [[paper](https://doi.org/10.14778/3342263.3342644)]

**Deep reinforcement learning for join order enumeration**

*Marcus, R., & Papaemmanouil, O. (2018). Proceedings of the 1st International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, AiDM 2018*, 0–3. [[paper](https://doi.org/10.1145/3211954.3211957)]

**SkinnerDB : Regret-Bounded Query Evaluation via Reinforcement Learning** ![](https://img.shields.io/badge/MCTS-blue) 

*Trummer, I., Wang, J., Maram, D., Moseley, S., Jo, S., & Antonakakis, J. (n.d.). SIGMOD, 2019.* [[paper](https://arxiv.org/abs/1901.05152)]

**Progressive Join Algorithms Considering User Preference**

*Ding, M., Chen, S., & Manegold, S. (2021). CIDR, 2021.* [[paper](http://cidrdb.org/cidr2021/papers/cidr2021_paper02.pdf)]

**Reinforcement Learning with Tree-LSTM for Join Order Selection**

*Yu, X., Li, G., Tang, N. (n.d.). ICDE, 2020.* [[paper](http://dbgroup.cs.tsinghua.edu.cn/ligl/papers/icde2020-learnedjoinorder.pdf)]

**Towards a Learning Optimizer for Shared Clouds**

*Chenggang Wu, Alekh Jindal, Saeed Amizadeh, Hiren Patel, Wangchao Le, Shi Qiao, Sriram Rao. Proc. VLDB Endow. 12(3): 210-222, 2018.* [[paper](http://www.vldb.org/pvldb/vol12/p210-wu.pdf)]

**SQL Plan Observability through Hints in Oracle Autonomous Database**

*Pasupuleti, K., Park, M., & Valluri, S. (n.d.).*

**Bao: Making Learned Query Optimization Practical** ![](https://img.shields.io/badge/optimizer_knobs-blue) 

*Marcus, R., Negi, P., Mao, H., Tatbul, N., Alizadeh, M., & Kraska, T. (2020). SIGMOD, 2021.* [[paper](https://doi.org/10.1145/3448016.3452838)]

**Steering Query Optimizers: A Practical Take on Big Data Workloads**

*Parimarjan Negi, Matteo Interlandi, Ryan Marcus, Mohammad Alizadeh, Tim Kraska, Marc Friedman, Alekh Jindal. SIGMOD, 2021.* [[paper](https://doi.org/10.1145/3448016.3457568)]

**SkinnerMT: Parallelizing for Efficiency and Robustness in Adaptive Query Processing on Multicore Platforms**

*Ziyun Wei, Immanuel Trummer. PVLDB, 2022.* [[paper](https://www.vldb.org/pvldb/vol16/p905-wei.pdf)]

**Learning a Query Optimizer Without Expert Demonstrations**

*Zongheng Yang, Wei-Lin Chiang, Sifei Luan, Gautam Mittal, Michael Luo, Ion Stoica. Balsa. SIGMOD, 2022*  [[paper](https://dl.acm.org/doi/pdf/10.1145/3514221.3517885)]

**Workload-driven, Lazy Discovery of Data Dependencies for Query Optimization**

*Jan Kossmann. CIDR, 2022* [[paper](https://www.cidrdb.org/cidr2022/papers/p70-kossmann.pdf)]

**LOGER: A Learned Optimizer towards Generating Efficient and Robust Query Execution Plans** ![](https://img.shields.io/badge/gnn-blue) 

*Tianyi Chen, Jun Gao, Hedui Chen, and Yaofeng Tu. PVLDB, 2023.* [[paper](https://www.vldb.org/pvldb/vol16/p1777-gao.pdf)]

**BASE: Bridging the Gap between Cost and Latency for Query Optimization** ![](https://img.shields.io/badge/hybrid_cost_latency-blue) 

*Chen, Xu, Zhen Wang, Shuncheng Liu, et al.* [[paper](https://zheng-kai.com/paper/vldb_2023_chen.pdf)]

**COOOL: A Learning-To-Rank Approach for SQL Hint Recommendations** ![](https://img.shields.io/badge/relative_cost-blue) 

*Xu, Xianghong, Zhibing Zhao, Tieying Zhang, et al.* [[paper](https://arxiv.org/pdf/2304.04407.pdf)]

**Lero: A Learning-to-Rank Query Optimizer** ![](https://img.shields.io/badge/rank-blue) 

*Rong Zhu, Wei Chen, Bolin Ding, Xingguang Chen, Andreas Pfadler, Ziniu Wu, Jingren Zhou. VLDB 2023.* [[paper](https://www.vldb.org/pvldb/vol16/p1466-zhu.pdf)]

**Cost-based or Learning-based? A Hybrid Query Optimizer for Query Plan Selection** ![](https://img.shields.io/badge/uncertainty-blue) 

*Xiang Yu, Chengliang Chai, Guoliang Li, Jiabin Liu. VLDB 2023.* [[paper](http://dbgroup.cs.tsinghua.edu.cn/ligl/papers/optimizer-vldb23.pdf)]

**Leveraging Query Logs and Machine Learning for Parametric Query Optimization** 

*Kapil Vaidya, Anshuman Dutt, Vivek Narasayya, Surajit Chaudhuri. VLDB 2022.* [[paper](https://dl.acm.org/doi/pdf/10.14778/3494124.3494126)]

**Kepler: Robust Learning for Parametric Query Optimization** ![](https://img.shields.io/badge/perturbation-blue) 

*Lyric Doshi, Vincent Zhuang, Gaurav Jain, Ryan C Marcus, Haoyu Huang, Deniz Altınbüken, Eugene Brevdo, Campbell Fraser. SIGMOD 2023.*[[paper](https://dl.acm.org/doi/10.1145/3588963)]

**LEON: A New Framework for ML-Aided Query Optimization** ![](https://img.shields.io/badge/ML--aided-blue)

*Xu Chen, Haitian Chen, Zibo Liang, Shuncheng Liu, Jinghong Wang, Kai Zeng, Han Su, and Kai Zheng.*[[paper](https://www.vldb.org/pvldb/vol16/p2261-chen.pdf)]

## 3. Workload Scheduling

Ibrahim Sabek, Tenzin Samten Ukyab, Tim Kraska. *LSched: A Workload-Aware Learned Query Scheduler for Analytical Database Systems*. SIGMOD, 2022. [[paper](https://dl.acm.org/doi/pdf/10.1145/3514221.3526158)] 

Chi Zhang, Ryan Marcus, and et al. Buffer Pool Aware Query Scheduling via Deep Reinforcement Learning. In VLDB, 2020. [[paper](https://arxiv.org/pdf/2007.10568.pdf)] 


## 4. Database Design

### Index
#### One-dimensional Index

**[1-D, Immutable]** Kraska, T., Beutel, A., Chi, E. H., Dean, J., & Polyzotis, N. (2018). *The case for learned index structures*. SIGMOD, 2018. [[paper](https://dl.acm.org/doi/10.1145/3183713.3196909)] [[code](https://github.com/learnedsystems/RMI/tree/5fdff45d0929beaccf6bc56f8f4c0d82baf10304)]

**[1-D, Mutable]** Galakatos, A., Markovitch, M., Binnig, C., Fonseca, R., & Kraska, T. (2019). *Fiting-tree: A data-aware index structure*. SIGMOD, 2019. [[paper](https://dl.acm.org/doi/abs/10.1145/3299869.3319860)]

**[1-D, Mutable, Secondary]** Wu, Y., Yu, J., Tian, Y., Sidle, R., Barber, R. (2019). *Designing succinct secondary indexing mechanism by exploiting column correlations*. SIGMOD 2019. [[paper](https://dl.acm.org/doi/pdf/10.1145/3299869.3319861)]

**[1-D, Mutable]** Ferragina, P., & Vinciguerra, G. (2020). *The PGM-index : a fully-dynamic compressed learned index with provable worst-case bounds*. VLDB, 2020. [[paper](https://dl.acm.org/doi/abs/10.14778/3389133.3389135)]

**[1-D, Mutable]** Ding, J., Minhas, U. F., Yu, J., Wang, C., Do, J., Li, Y., Zhang, H., Chandramouli, B., Gehrke, J., Kossmann, D., Lomet, D., & Kraska, T. (2020). *ALEX: An Updatable Adaptive Learned Index*. SIGMOD, 2020. [[paper](https://dl.acm.org/doi/10.1145/3318464.3389711)] [[code](https://github.com/microsoft/ALEX)]

**[1-D, Mutable, Persistent]** Lu, B., Ding, J., Lo, E., Minhas, U. F., & Wang, T. (2021). *APEX: A High-Performance Learned Index on Persistent Memory*. VLDB, 2021. [[paper](https://doi.org/10.14778/3494124.3494141)]

**[1-D, Immutable, Auto-generated]** Dittrich, J., Nix, J., & Schön, C. (2021). *The next 50 Years in Database Indexing or: The Case for Automatically Generated Index Structures*. VLDB, 2021. [[paper](https://doi.org/10.14778/3494124.3494136)] [[code](https://github.com/BigDataAnalyticsGroup/GENE)]

**[1-D, Mutable, Concurrency]** Li, P., Hua, Y., Jia, J., Zuo, P. (2021). *FINEdex: A Fine-grained Learned Index Scheme for Scalable and Concurrent Memory Systems*. VLDB, 2021. [[paper](https://www.vldb.org/pvldb/vol15/p321-hua.pdf)]

**[1-D, Mutable]** Wu, J., Zhang, Y., Chen, S., Wang, J., Chen, Y., Xing, C. (2021). *Updatable learned index with precise positions*. VLDB, 2021. [[paper](http://vldb.org/pvldb/vol14/p1276-wu.pdf)]

**[1-D, Mutable]** Ma, C., Yu, X., Li, Y., Meng, X., & Maoliniyazi, A. (2022). *FILM: A Fully Learned Index for Larger-Than-Memory Databases*. VLDB, 2022. [[paper](https://dl.acm.org/doi/pdf/10.14778/3570690.3570704)]

**[1-D, Mutable, Concurrency]** Wang, Z., Chen, H., Wang, Y., & Tang, C. (2022). *The Concurrent Learned Indexes for Multicore Data Storage*. ACM Transactions on Storage, 18(1), 1-35. [[paper](https://dl.acm.org/doi/pdf/10.1145/3478289)] [[code](https://ipads.se.sjtu.edu.cn:1312/opensource/xindex.git)]

**[1-D, Mutable]** Jiaoyi Zhang, Yihan Gao. (2022). *CARMI: A Cache-Aware Learned Index with a Cost-based Construction Algorithm*. VLDB, 2022. [[paper](https://www.vldb.org/pvldb/vol15/p2679-gao.pdf)]

**[1-D, Mutable]** Shangyu Wu. (2022). *NFL: Robust Learned Index via Distribution Transformation*. VLDB, 2022. [[paper](https://www.vldb.org/pvldb/vol15/p2188-wu.pdf)]

**[1-D, Mutable, Persistent]** Zhang, Z., Chu, Z., Jin, P., Luo, Y., Xie, X., Wan, S., Luo, Y., Wu, X., Zou, P., Zheng, C., Wu, G., Rudoff. A. (2022). *PLIN: A Persistent Learned Index for Non-Volatile Memory with High Performance and Instant Recovery*. VLDB, 2022. [[paper](https://doi.org/10.14778/3565816.3565826)]

**[1-D, Mutable]** Li, Pengfei, Hua Lu, Rong Zhu, Bolin Ding, et al. (2023). *DILI: A Distribution-Driven Learned Index*. VLDB, 2023. [[paper](https://arxiv.org/pdf/2304.08817.pdf)]

**[1-D, Mutable, Persistent]** Yulai Tong, Jiazhen Liu, Hua Wang, Ke Zhou, Rongfeng He, Qin Zhang, and Cheng Wang. (2023). *Sieve: A Learned Data-Skipping Index for Data Analytics*. VLDB, 2023. [[paper](https://dl.acm.org/doi/pdf/10.14778/3611479.3611520)]

#### Multi-dimensional Index

**[Multi-D, Immutable]** Nathan, V., Ding, J., Alizadeh, M., & Kraska, T. (2020). *Learning multi-dimensional indexes*. SIGMOD, 2020. [[paper](https://dl.acm.org/doi/10.1145/3318464.3380579)]

**[Multi-D, Mutable, Persistent]** Li, P., Lu, H., Zheng, Q., Yang, L., & Pan, G. (2020). *LISA: A Learned Index Structure for Spatial Data*. SIGMOD, 2020. [[paper](https://doi.org/10.1145/3318464.3389703)]

**[Multi-D, Mutable, Persistent]** Qi, J., Liu, G., Jensen, C.S., Kulik, L. (2020). *Effectively learning spatial indices*. VLDB, 2020. [[paper](http://www.vldb.org/pvldb/vol13/p2341-qi.pdf)]

**[Multi-D, Immutable]** Ding, J., Nathan, V., Alizadeh, M., & Kraska, T. (2020). *Tsunami: A learned multi-dimensional index for correlated data and skewed workloads*. VLDB, 2020. [[paper](https://dl.acm.org/doi/abs/10.14778/3425879.3425880)]

**[Multi-D, Mutable]** Dong, H., Chai, C., Luo, Y., Liu, J., Feng, J., Zhan, C. (2022). *RW-Tree: A Learned Workload-aware Framework for R-tree Construction*. ICDE, 2022. [[paper](https://doi.org/10.1109/ICDE53745.2022.00201)]

**[Multi-D, Immutable]** Gao, J., Cao, X., Yao, X., Zhang, G., & Wang, W. (2023). *LMSFC: A Novel Multidimensional Index based on Learned Monotonic Space Filling Curves*. VLDB, 2023. [[paper](https://arxiv.org/pdf/2304.12635.pdf)]

#### Experiment and Analysis

**[1-D, Immutable, Analysis]** Ferragina, P., Lillo, F., & Vinciguerra, G. (2020). *Why are learned indexes so effective?*. ICML, 2020. [[paper](http://proceedings.mlr.press/v119/ferragina20a/ferragina20a.pdf)]

**[1-D, Immutable, Experiment]** Marcus, R., Stoian, M., Kipf, A., Misra, S., van Renen, A., Kemper, A., Neumann, T., & Kraska, T. (2020). *Benchmarking learned indexes*. VLDB, 2020. [[paper](https://dl.acm.org/doi/10.14778/3421424.3421425)] [[code](https://github.com/learnedsystems/SOSD)]

**[1-D, Poisoning Attack]** Evgenios M. Kornaropoulos, Silei Ren, Roberto Tamassia. (2022). *The Price of Tailoring the Index to Your Data: Poisoning Attacks on Learned Index Structures*. SIGMOD, 2022. [[paper](https://dl.acm.org/doi/abs/10.1145/3514221.3517867)]

**[1-D, Mutable, Experiment]** Wongkham, C., Lu, B., Liu, C., Zhong, Z., Lo, E., Wang, T. (2022). *Are Updatable Learned Indexes Ready?*. VLDB, 2022. [[paper](https://doi.org/10.14778/3551793.3551848)]

**[1-D, Immutable, Experiment]** Maltry, M., Dittrich, J. (2022). *A critical analysis of recursive model indexes*. VLDB, 2022. [[paper](https://doi.org/10.14778/3510397.3510405)]

**[1-D, Hash Index, Experiment]** Sabek, I., Vaidya, K., Horn TUM, D., Kipf, A., Mitzenmacher, M., Kraska, T., Horn, D., Kraska Can, T. (2022) *Can Learned Models Replace Hash Functions?*. VLDB, 2022. [[paper](https://doi.org/10.14778/3570690.3570702)]

**[1-D, Mutable, Experiment]** Sun, Z., Zhou, X., Li, G. (2023). *Learned Index: A Comprehensive Experimental Evaluation*. VLDB, 2023. [[paper](https://dl.acm.org/doi/pdf/10.14778/3594512.3594528)] [[code](https://github.com/curtis-sun/TLI)]

**[1-D, Immutable, Experiment]** Sabek, I., & Kraska, T. (2023). *The Case for Learned In-Memory Joins*. VLDB, 2023. [[paper](https://arxiv.org/pdf/2111.08824.pdf)]

### Layout

**[Learned Layout]** Liwen Sun, Michael J. Franklin, Sanjay Krishnan, et al. *Fine-grained partitioning for aggressive data skipping*. SIGMOD, 2014. [[paper](https://doi.org/10.1145/2588555.2610515)]

**[Learned Layout]** Yang, Z., Chandramouli, B., Wang, C., Gehrke, J., Li, Y., Minhas, U. F., … Acharya, R. (n.d.). *Qd-tree: Learning Data Layouts for Big Data Analytics*. SIGMOD, 2020. [[paper](https://doi.org/10.1145/3318464.3389770)]

**[Learned Layout]** Jialin Ding, Umar Farooq Minhas, Badrish Chandramouli, et al. *Instance-Optimized Data Layouts for Cloud Analytics Workloads*. SIGMOD, 2021. [[paper](https://doi.org/10.1145/3448016.3457270)]

**[Learned Layout]** Bandle, M., Giceva, J., & Neumann, T. (2021). *To Partition, or Not to Partition, That is the Join Question in a Real System*. SIGMOD, 2021. [[paper](https://doi.org/10.1145/3448016.3452831)]

**[Data Container]** Madden S, Ding J, Kraska T, Sudhir S, Cohen D, Mattson T, Tatbul N. *Self-Organizing Data Containers*. CIDR, 2022. [[paper](https://www.cidrdb.org/cidr2022/papers/p44-madden.pdf)]

**[Learned Layout]** Teng Zhang, Jian Tan, Xin Cai, Jianying Wang, Feifei Li, Jianling Sun. *SA-LSM : Optimize Data Layout for LSM-tree Based Storage using Survival Analysis*. VLDB, 2022. [[paper](https://www.vldb.org/pvldb/vol15/p2161-zhang.pdf)]

**[Learned Layout]** Michael Abebe. *Tiresias: Enabling Predictive Autonomous Storage and Indexing*. VLDB, 2022. [[paper](https://www.vldb.org/pvldb/vol15/p3126-abebe.pdf)]

### Query Execution

**[CodeGen]** Immanuel Trummer. *CodexDB: Synthesizing Code for Qery Processing from Natural Language Instructions using GPT-3 Codex*. VLDB, 2022. [[paper](https://www.vldb.org/pvldb/vol15/p2921-trummer.pdf)]

Zhang, C., Marcus, R., Kleiman, A., & Papaemmanouil, O. (2020). *Buffer Pool Aware Query Scheduling via Deep Reinforcement Learning*. AIDB@VLDB, 2020. [[paper](https://arxiv.org/abs/2007.10568)]


## 5. Database Monitoring

**[Trend Prediction]** L. Ma, D. V. Aken, A. Hefny, G. Mezerhane, A. Pavlo, and G. J. Gordon, “Query-based Workload Forecasting for Self-driving Database Management Systems,” in SIGMOD, 2018. [[paper](https://www.pdl.cmu.edu/PDL-FTP/Database/sigmod18-ma.pdf)]

**[Performance Prediction]** Dorn, J., Apel, S., & Siegmund, N. (n.d.). *Mastering Uncertainty in Performance Estimations of Configurable Software Systems*. (3).

**[Performance Prediction]** Marcus, R., & Papaemmanouil, O. (2019). Plan-structured deep neural network models for query performance prediction. *Proceedings of the VLDB Endowment*, *12*(11), 1733–1746. [[paper](https://doi.org/10.14778/3342263.3342646)]

**[Performance Prediction]** Wu, W., Chi, Y., Hacig̈um̈uş, H., & Naughton, J. F. (2013). Towards predicting query execution time for concurrent and dynamic database workloads. *Proceedings of the VLDB Endowment*, *6*(10), 925–936. [[paper](https://doi.org/10.14778/2536206.2536219)]

**[Performance Prediction]** Duggan, J., Papaemmanouil, O., Cetintemel, U., & Upfal, E. (2014). Contender: A resource modeling approach for concurrent query performance prediction. *Advances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings*, 109–120. [[paper](https://doi.org/10.5441/002/edbt.2014.11)]

**[Performance Prediction]** Wu, W., Chi, Y., Zhu, S., Tatemura, J., Hacigümüş, H., & Naughton, J. F. (2013). Predicting query execution time: Are optimizer cost models really unusable? *Proceedings - International Conference on Data Engineering*, (1), 1081–1092. [[paper](https://doi.org/10.1109/ICDE.2013.6544899)]

**[Performance Prediction]** Higginson, A. S., Dediu, M., Arsene, O., Paton, N. W., & Embury, S. M. (2020). Database Workload Capacity Planning using Time Series Analysis and Machine Learning. *Proceedings of the ACM SIGMOD International Conference on Management of Data*, 769–783. [[paper](https://doi.org/10.1145/3318464.3386140)]

**[Performance Prediction]** Unterbrunner, P., Giannikis, G., Alonso, G., Fauser, D., & Kossmann, D. (2009). Predictable performance for unpredictable workloads. *Proceedings of the VLDB Endowment*, *2*(1), 706–717. [[paper](https://doi.org/10.14778/1687627.1687707)]

**[Performance Prediction]** Xuanhe Zhou, Ji Sun, Guoliang Li, Jianhua Feng. Query Performance Prediction for Concurrent Queries using Graph Embedding. [[paper](http://dbgroup.cs.tsinghua.edu.cn/ligl/papers/vldb2020-concurrent.pdf)]


## 6. Database Diagnosis

### System and Kernel Causes

**Automatic Performance Diagnosis and Tuning in Oracle**

*Karl Dias, Mark Ramacher, Uri Shaft, et al. CIDR, 2005.* [[paper](https://www.cidrdb.org/cidr2005/papers/P07.pdf)]


**DBSherlock: A Performance Diagnostic Tool for Transactional Databases.**

*Yoon, D. Y., Niu, N., & Mozafari, B. SIGMOD, 2016.*  [[paper](https://web.eecs.umich.edu/~mozafari/php/data/uploads/sigmod_2016.pdf)]


**iQCAR: inter-Query Contention Analyzer for Data Analytics Frameworks.**

*Kalmegh, P., Babu, S., & Roy, S. SIGMOD, 2019.*  [[paper](https://users.cs.duke.edu/~sudeepa/papers/SIGMOD2019-iqcar.pdf)]


**FluxInfer: Automatic Diagnosis of Performance Anomaly for Online Database System**

*Ping Liu, Shenglin Zhang, Yongqian Sun, et al. IPCCC, 2020.* [[paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9391550)]


**Diagnosing Root Causes of Intermittent Slow Queries in Cloud Databases.**

*Minghua Ma, Zheng Yin, Shenglin Zhang, et al. VLDB, 2020.*  [[paper](http://www.vldb.org/pvldb/vol13/p1176-ma.pdf)]


**Generic and Robust Performance Diagnosis via Causal Inference for OLTP Database Systems.**

*Xianglin Lu, Zhe Xie, Zeyan Li, et al. CCGrid, 2022.*  [[paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9826016)]


**DBPA: A Benchmark for Transactional Database Performance Anomalies.**

*Shiyue Huang,Ziwei Wang, Xinyi Zhang, et al. SIGMOD, 2023.*  [[paper](https://dl.acm.org/doi/abs/10.1145/3588926)]

### Bottleneck Queries

**PinSQL: Pinpoint Root Cause SQLs to Resolve Performance Issues in Cloud Databases.** 

*Xiaoze Liu, Zheng Yin, Chao Zhao, et al. ICDE, 2022.*  [[paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9835371)]

## 7. General Techniques

### Feature Engineering for DB

**[PlanEncoding]** Yue Zhao, Gao Cong, Jiachen Shi, Chunyan Miao. *QueryFormer: A Tree Transformer Model for Query Plan Representation*. VLDB, 2022. [[paper](https://www.vldb.org/pvldb/vol15/p1658-zhao.pdf)]

**[Plan2Feature]** Debjyoti Paul, Jie Cao, Feifei Li, Vivek Srikumar. *Database Workload Characterization with Query Plan Encoders*. VLDB, 2022. [[paper](https://www.vldb.org/pvldb/vol15/p923-paul.pdf)]

**[Pretrained Representation]** Xiu Tang, Sai Wu, Mingli Song, Shanshan Ying, Feifei Li, Gang Chen: *PreQR: Pre-training Representation for SQL Understanding*. SIGMOD Conference 2022: 204-216 [[paper](https://dl.acm.org/doi/abs/10.1145/3514221.3517878)]

**[WorkloadAsGraph]** Sanjay Agrawal, Eric Chu, Vivek R. Narasayya. Automatic physical design tuning: workload as a sequence. SIGMOD, 2006. [[paper](https://doi.org/10.1145/1142473.1142549)]

**[DataSummary]** Brit Youngmann et al. *Guided Exploration of Data Summaries*. VLDB, 2022. [[paper](https://www.vldb.org/pvldb/vol15/p1798-youngmann.pdf)]

Jiang H, Liu C, Paparrizos J, et al. Good to the Last Bit: Data-Driven Encoding with CodecDB. SIGMOD 2021. [[paper](https://dl.acm.org/doi/pdf/10.1145/3448016.3457283?casa_token=NVcav-WiJuwAAAAA:iYwHvshbC43qeBpObX4d7UYndrtqsfgE2FkI2Pkx43r59YCZJjsvm1C0Qv-M_oESKhZicbJLTIi0WsI)] 


### Feature Engineering for AI

**Apache flink: Stream and batch processing in a single engine[J].** ![](https://img.shields.io/badge/-feature_extraction-green)

*Carbone P, Katsifodimos A, Ewen S, et al.  The Bulletin of the Technical Committee on Data Engineering, 2015, 38(4).* [[paper](https://asterios.katsifodimos.com/assets/publications/flink-deb.pdf)]

**Feature selection in machine learning: A new perspective[J/OL].** ![](https://img.shields.io/badge/-feature_selection-yellow)

*CAI J, LUO J, WANG S et al.  Neurocomputing (Amsterdam), 2018, 300: 70-79. DOI:10.1016/j.neucom.2017.11.077.* [[paper](https://www.sciencedirect.com/science/article/abs/pii/S0925231218302911)]

**Optimizing in-memory database engine for AI-powered on-line decision augmentation using persistent memory.** ![](https://img.shields.io/badge/-feature_extraction-green)

*Cheng Chen, Jun Yang, Mian Lu, Taize Wang, Zhao Zheng, Yuqiang Chen, Wenyuan Dai, Bingsheng He, Weng-Fai Wong, Guoan Wu, Yuping Zhao, and Andy Rudoff. 2021.  Proc. VLDB Endow. 14, 5 (January 2021), 799–812.* [[paper](https://doi.org/10.14778/3446095.3446102)]

**Managing ML pipelines: feature stores and the coming wave of embedding ecosystems[J].** ![](https://img.shields.io/badge/-fe_for_embedding-red)

*Orr L, Sanyal A, Ling X, et al.  arXiv preprint arXiv:2108.05053, 2021.* [[paper](https://arxiv.org/pdf/2108.05053.pdf)]

**A System for Time Series Feature Extraction in Federated Learning.** ![](https://img.shields.io/badge/-fe_for_federated_learning-blue)

*Siqi Wang, Jiashu Li, Mian Lu, Zhao Zheng, Yuqiang Chen, and Bingsheng He. 2022.  In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (CIKM '22). Association for Computing Machinery, New York, NY, USA, 5024–5028.* [[paper](https://doi.org/10.1145/3511808.3557176)]

**FEBench: A Benchmark for Real-Time Relational Data Feature Extraction.** ![](https://img.shields.io/badge/-benchmark-purple)

*Xuanhe Zhou, Cheng Chen, Kunyi Li, Bingsheng He, Mian Lu, Qiaosheng Liu, Wei Huang, Guoliang Li, Zhao Zheng, Yuqqiang Chen. 2023. Proc. VLDB Endow.* [[paper](https://github.com/decis-bench/febench/blob/main/report/febench.pdf)]


### Model Transfer

Meghdad Kurmanji, Peter Triantafillou. Detect, Distill and Update: Learned DB Systems Facing Out of Distribution Data. SIGMOD, 2023. [[paper](https://arxiv.org/pdf/2210.05508.pdf)]

### Query And Data Generation

#### Query Generation

L.Zhang, C.Chai, X.Zhou, and G.Li. Learned sqlgen: Constraint-aware sql generation using reinforcement learning. In SIGMOD, 2022. [[paper](http://dbgroup.cs.tsinghua.edu.cn/ligl/papers/sigmod2022-sqlgen.pdf)]

Liu X, Kong X, Liu L, et al. TreeGAN: syntax-aware sequence generation with generative adversarial networks. In ICDM, 2018. [[paper](http://cn.liuleics.com/uploads/1/4/1/2/14126273/1808.07582.pdf)]

#### Data Generation

**[DeepAR]** Jingyi Yang, Peizhi Wu, Gao Cong, Tieying Zhang, Xiao He. *SAM: Database Generation from Query Workloads with Supervised Autoregressive Models*. SIGMOD, 2022. [[paper](https://dl.acm.org/doi/pdf/10.1145/3514221.3526168)]

Francesco Ventura, Zoi Kaoudi, Jorge-Arnulfo Quiané-Ruiz, Volker Markl. Expand your training limits! Generating training data for ML-based data management. SIGMOD, 2021 [[paper](https://dl.acm.org/doi/pdf/10.1145/3448016.3457286)]

Ju Fan, Tongyu Liu, Guoliang Li, Yuwei Shen, Xiaoyong Du. Relational Data Synthesis using Generative Adversarial Networks: A Design Space Exploration. VLDB 2020. [[paper](http://dbgroup.cs.tsinghua.edu.cn/ligl/papers/vldb2020-datagan.pdf)]


## 8. Database Frameworks

**Self-Driving Database Management Systems.** ![](https://img.shields.io/badge/-model_assembly-orange) 

*Andrew Pavlo, Gustavo Angulo, Joy Arulraj, et al. CIDR, 2017.* [[paper](https://www.pdl.cmu.edu/PDL-FTP/Database/p42-pavlo-cidr17.pdf)]


**Cloud native database systems at Alibaba: Opportunities and challenges.** ![](https://img.shields.io/badge/-learned_tuning-orange)  

*Feifei Li. VLDB, 2018.* [[paper](http://www.vldb.org/pvldb/vol12/p2263-li.pdf)]


**SageDB: A learned database system.** ![](https://img.shields.io/badge/-learned_CDFs-orange) 

*Tim Kraska, Mohammad Alizadeh, Alex Beutel, et al. CIDR, 2019.* [[paper](http://www.alexbeutel.com/papers/CIDR2019_SageDB.pdf)]


**MonetDBLite: An embedded analytical database.** 

*Mark Raasveldt. SIGMOD, 2018.* [[paper](https://mytherin.github.io/papers/2018-monetdblitecikm.pdf)]


**XuanYuan: An AI-Native Database.** ![](https://img.shields.io/badge/-model_assembly-blue)  

*Guoliang Li, Xuanhe Zhou, Sihao Li. Data Eng., 2019* [[paper](http://sites.computer.org/debull/A19june/p70.pdf)]


**DBMS Fitting: Why should we learn what we already know?** ![](https://img.shields.io/badge/-near_white_box_cost_model-blue)   

*Benjamin Hilprecht, Tiemo Bang, Muhammad El-Hindi, et al. CIDR, 2020.* [[paper](http://cidrdb.org/cidr2020/papers/p34-hilprecht-cidr20.pdf)]


**MB2 : Decomposed Behavior Modeling for Self-Driving Database Management Systems.** ![](https://img.shields.io/badge/-forecast_model_driven-orange)

*Lin Ma, William Zhang, Jie Jiao, et al. SIGMOD, 2021.* [[paper](https://dl.acm.org/doi/10.1145/3448016.3457276)]


**openGauss: An Autonomous Database System.** ![](https://img.shields.io/badge/-learned_models-orange)

*Guoliang Li, Xuanhe Zhou, Ji Sun, et al. VLDB, 2021.* [[paper](http://dbgroup.cs.tsinghua.edu.cn/ligl/papers/vldb21-opengauss.pdf)]


**From Natural Language Processing to Neural Databases.** ![](https://img.shields.io/badge/-nlp_interface-orange) 

*James Thorne, Majid Yazdani, Marzieh Saeidi, et al. VLDB, 2021.* [[paper](http://www.vldb.org/pvldb/vol14/p1033-thorne.pdf)]


**One Model to Rule them All: Towards Zero-Shot Learning for Databases.** ![](https://img.shields.io/badge/-model_transfer-blue)

*Benjamin Hilprecht, Carsten Binnig. CIDR, 2022.* [[paper](https://www.cidrdb.org/cidr2022/papers/p16-hilprecht.pdf.)]


**A Unified Transferable Model for ML-Enhanced DBMS.** ![](https://img.shields.io/badge/-model_transfer-blue)

*Ziniu Wu, et al. CIDR, 2022.* [[paper](https://www.cidrdb.org/cidr2022/papers/p6-wu.pdf)]


**PerfGuard: Deploying ML-for-Systems without Performance Regressions, Almost!** ![](https://img.shields.io/badge/-model_validation-purple) 

*Remmelt Ammerlaan, Gilbert Antonius, Marc Friedman, et al. VLDB, 2022.* [[[paper](https://vldb.org/pvldb/vol14/p3362-hossain.pdf)]


**Database Gyms.** ![](https://img.shields.io/badge/-model_training-purple)

*Lim, Wan Shen, Matthew Butrovich, William Zhang, et al. CIDR, 2023.* [[paper](https://www.cidrdb.org/cidr2023/papers/p27-lim.pdf)]


**mutable: A Modern DBMS for Research and Fast Prototyping.** ![](https://img.shields.io/badge/-module_separation-green)

*Immanuel L Haffner, Jens Dittrich. CIDR, 2023.* [[paper](https://www.cidrdb.org/cidr2023/papers/p41-haffner.pdf)]


**SageDB: An Instance-Optimized Data Analytics System.** ![](https://img.shields.io/badge/-partial_MVs-orange)

*Jialin Ding, Ryan Marcus, Andreas Kipf, et al. VLDB, 2022.* [[paper](https://www.vldb.org/pvldb/vol15/p4062-ding.pdf)]


**Towards Building Autonomous Data Services on Azure.** ![](https://img.shields.io/badge/-auto_cloud_services-purple)

*Yiwen Zhu, Yuanyuan Tian, Joyce Cahoon, et al. SIGMOD, 2023.* [[paper](https://dl.acm.org/doi/pdf/10.1145/3555041.3589674)]

## 9. Demonstrations

**[DB Tuning]** Bohan Zhang, Dana Van Aken, Justin Wang, Tao Dai, Shuli Jiang, Jacky Lao, Siyuan Sheng, Andrew Pavlo, Geoffrey J. Gordon. *A Demonstration of the ottertune automatic database management system tuning service*. VLDB, 2018. [[paper](http://www.vldb.org/pvldb/vol11/p1910-zhang.pdf)]

**[O&M Platform]** Xuanhe Zhou, Lianyuan Jin, Ji Sun, Xinyang Zhao, Xiang Yu, Shifu Li, Tianqing Wang, Kun Li, luyang liu. *DBMind: A Self-Driving Platform in openGauss*. VLDB, 2021. [[paper](http://vldb.org/pvldb/vol14/p2743-zhou.pdf)] [[website](https://www.dbmind.cn/)] 

**[DB Tuning]** Junxiong Wang, Immanuel Trummer, Debabrota Basu. *Demonstrating UDO: A Unified Approach for Optimizing Transaction Code, Physical Design, and System Parameters via Reinforcement Learning*. SIGMOD, 2021. [[paper](https://dl.acm.org/doi/abs/10.1145/3448016.3452754)]

**[DB Tuning]** Immanuel Trummer. *Demonstrating DB-BERT: A Database Tuning Tool that "Reads" the Manual*. SIGMOD, 2022. [[paper](https://arxiv.org/pdf/2112.10925.pdf)]

**[DB Tuning]** Luming Sun, Tao Ji, Cuiping Li, Hong Chen. *DeepO: A Learned Query Optimizer*. SIGMOD, 2022. [[paper](https://dl.acm.org/doi/pdf/10.1145/3514221.3520167)]

**[DB Tuning]** Xuanhe Zhou, Guoliang Li, Jianming Wu, Jiesi Liu, Zhaoyan Sun, Xinning Zhang. *A Learned Query Rewrite System*. VLDB, 2023. [[paper](https://www.vldb.org/pvldb/vol16/p4110-li.pdf)] [[website](http://rewrite_demo.dbmind.cn/)]

**[DB Tuning]** Wei Zhou, Chen Lin, Xuanhe Zhou, Guoliang Li, Tianqing Wang. *Demonstration of ViTA: Visualizing, Testing and Analyzing Index Advisors.* CIKM, 2023. [[video](https://youtu.be/RfV4ylOxpcc)]

**[DB Tuning]** Qiushi Bai, Sadeem Alsudais, Chen Li. *Demo of QueryBooster: Supporting Middleware-based SQL Query Rewriting as a Service.* VLDB, 2023. [[paper](https://www.vldb.org/pvldb/vol16/p4038-bai.pdf)]

**[DB Tuning]** Christoph Anneser, Mario Petruccelli, Nesime Tatbul, David Cohen, Zhenggang Xu, Prithviraj Pandian, Nikolay Laptev, Ryan Marcus, and Alfons Kemper. *QO-Insight: Inspecting Steered Query Optimizers.* VLDB, 2023. [[paper](https://www.vldb.org/pvldb/vol16/p3922-anneser.pdf)]

**[DB Tuning]** Zilong Wang, Qixiong Zeng, Ning Wang, Haowen Lu, Yue Zhang. *CEDA: Learned Cardinality Estimation with Domain Adaptation.* VLDB, 2023. [[paper](https://www.vldb.org/pvldb/vol16/p3934-wang.pdf)]

**[DB Tuning]** Junxiong Wang, Immanuel Trummer, Ahmet Kara, Dan Olteanu. *ADOPT: Adaptively Optimizing Attribute Orders for Worst-Case Optimal Join Algorithms via Reinforcement Learning.* VLDB, 2023. [[paper](https://www.vldb.org/pvldb/vol16/p2805-wang.pdf)]

**[DB Tuning]** Immanuel Trummer. *Demonstrating GPT-DB: Generating Query-Specifc and Customizable Code for SQL Processing with GPT-4.* VLDB, 2023. [[paper](https://www.vldb.org/pvldb/vol16/p4098-trummer.pdf)]

**[DB Diagnosis]** Xiu Tang, Sai Wu, Dongxiang Zhang, Ziyue Wang, Gongsheng Yuan, and Gang Chen. *A Demonstration of DLBD: Database Logic Bug Detection System.* VLDB, 2023. [[paper](https://www.vldb.org/pvldb/vol16/p3914-wu.pdf)]


# *📧 Special Issues*

## S1 Large Language Models Meet Database

**Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs.**  ![](https://img.shields.io/badge/text2sql-orange) 

   *Jinyang Li, Binyuan Hui, Ge Qu, et al. arXiv, 2023.* [[pdf](https://arxiv.org/pdf/2305.03111.pdf)].  

**Can Foundation Models Wrangle Your Data?** ![](https://img.shields.io/badge/data_cleaning-green)

  *Avanika Narayan, Ines Chami, Laurel Orr, and Christopher Ré. VLDB, 2023.* [[pdf](https://www.vldb.org/pvldb/vol16/p738-narayan.pdf)].

**Querying Large Language Models with SQL [Vision].**  ![](https://img.shields.io/badge/sql2res-red) 

   *Mohammed Saeed, Nicola De Cao, Paolo Papotti. arXiv 2023.* [[pdf](https://arxiv.org/pdf/2304.00472.pdf)].  

**CAESURA: Language Models as Multi-Modal Query Planners.** ![](https://img.shields.io/badge/multi_mode-blue)

*Matthias Urban, Carsten Binnig. arXiv 2023.* [[pdf](https://arxiv.org/pdf/2308.03424.pdf)].  

**Towards Multi-Modal DBMSs for Seamless Querying of Texts and Tables.**   ![](https://img.shields.io/badge/multi_mode-blue) 

   *Matthias Urban, Carsten Binnig. arXiv 2023.* [[pdf](https://arxiv.org/pdf/2304.13559.pdf)].  

**Multimodal Neural Databases.**   ![](https://img.shields.io/badge/multi_mode-blue) 

   *Giovanni Trappolini, Andrea Santilli, Emanuele Rodolà, Alon Halevy, Fabrizio Silvestri. arXiv 2023.* [[pdf](https://arxiv.org/pdf/2305.01447.pdf)].  


**ChatDB: Augmenting LLMs with Databases AS Their Symbolic Memory.** ![](https://img.shields.io/badge/split_task_to_sqls-orange)

   *Chenxu Hu, Jie Fu, Chenzhuang Du, Simian Luo, Junbo Zhao, Hang Zhao. arXiv 2023.* [[pdf](https://arxiv.org/pdf/2306.03901.pdf)].

**Chat2DB**  ![](https://img.shields.io/badge/text2sql-orange)  ![](https://img.shields.io/badge/query_optimization-purple) 

*https://github.com/chat2db/Chat2DB*

**LLM As DBA.** ![](https://img.shields.io/badge/db_maintenance-purple) *arXiv 2023.* [[pdf](https://arxiv.org/abs/2308.05481)].

*https://github.com/TsinghuaDatabaseGroup/DB-GPT*

**DB-GPT: Large Language Model Meets Database.** ![](https://img.shields.io/badge/query_optimization-purple) 

   *Xuanhe Zhou, Zhaoyan Sun, Guoliang Li. Data Science and Engineering 2023.* [[pdf](http://dbgroup.cs.tsinghua.edu.cn/ligl/papers/dbgpt-dse.pdf)].

## S2 AI Paper And Code List

**Partially Filtered NLP Papers** ![](https://img.shields.io/badge/paper_list-orange)  

*https://qinyuenlp.com/read/*


**Prompt Engineering for LLMs** ![](https://img.shields.io/badge/paper_list-orange)  

*https://www.promptingguide.ai/papers*


**Deployed AI Algorithms** ![](https://img.shields.io/badge/program_list-orange)  

*https://github.com/labmlai/annotated_deep_learning_paper_implementations*


**Brain in a Vat: On Missing Pieces Towards Artificial General Intelligence in Large Language Models**  ![](https://img.shields.io/badge/benchmark-orange)

*Yuxi Ma, Chi Zhang, Song-Chun Zhu. arXiv 2023.* [[pdf](https://arxiv.org/pdf/2307.03762.pdf)].

## S3 Open Datasets And SQLs

[https://github.com/cmu-db/benchbase](https://github.com/cmu-db/benchbase)

[https://github.com/cwida/public_bi_benchmark/tree/dev/master](https://github.com/cwida/public_bi_benchmark/tree/dev/master)

[https://github.com/TsinghuaDatabaseGroup/datasets](https://github.com/TsinghuaDatabaseGroup/datasets)