# LLM-Generated-Code

**Repository Path**: stonelost/LLM-Generated-Code

## Basic Information

- **Project Name**: LLM-Generated-Code
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-12-08
- **Last Updated**: 2025-12-08

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# LLM-CSEC: Empirical Evaluation of Security in C/C++ Code Generated by Large Language Models

[![SAC 2026 Submission](https://img.shields.io/badge/Submitted%20to-SAC%202026-blue)](https://www.sigapp.org/sac/sac2026/)
[![arXiv Status](https://img.shields.io/badge/arXiv-2511.18966-B31B1B.svg)](https://arxiv.org/abs/2511.18966)
[![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-green?style=flat-square)](https://creativecommons.org/licenses/by/4.0/)


## 📖 Overview

This repository contains the supporting data for the paper:
> **"LLM-CSEC: Empirical Evaluation of Security in C/C++ Code Generated by Large Language Models"**

The security of code generated by large language models (LLMs) is a significant concern, as studies indicate that such code often contains vulnerabilities and lacks essential defensive programming constructs. This work focuses on examining and evaluating the security of LLM-generated code, particularly in the context of C/C++. We categorized known vulnerabilities using the Common Weakness Enumeration (CWE) and, to study their criticality, mapped them to CVEs. We used ten different LLMs for code generation and analyzed the outputs through static analysis. The amount of CWEs present in AI-generated code is concerning. Our findings highlight the need for developers to be cautious when using LLM-generated code. This study provides valuable insights to advance automated code generation and encourage further research in this domain.

## 📂 Repository Structure

| File/Folder | Description |
| :--- | :--- |
| `prompts/` | **Contains prompt dataset that was created and used in the study. |
| `generated_codes/` | Contains the actual code generated by each model. |
| `SarifMiner_reports/` | Custom build tool used to extract meaningful info from the SARIF files, it can be access through (https://github.com/Codesbyusman/SarifMiner) |
| `sarifReports_codeql_snyk/` | Original SARIF reports generated by CodeQL and Snyk Code after analysis |
| `codeShield` | Contains reports generated by codeShield |

## 📚 Paper and Citation

The full paper is available on the arXiv. Please cite the paper if you use this code or data in your own research.

* **arXiv Link:** [(https://arxiv.org/abs/2511.18966)]

### BibTeX Citation

```bibtex
@misc{shahid2025llmcsecempiricalevaluationsecurity,
  title={LLM-CSEC: Empirical Evaluation of Security in C/C++ Code Generated by Large Language Models},
  author={Muhammad Usman Shahid and Chuadhry Mujeeb Ahmed and Rajiv Ranjan},
  year={2025},
  eprint={2511.18966},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={[https://arxiv.org/abs/2511.18966](https://arxiv.org/abs/2511.18966)},
}