# lrcode_gen

**Repository Path**: brick-pid/lrcode_gen

## Basic Information

- **Project Name**: lrcode_gen
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-07-31
- **Last Updated**: 2024-08-24

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# README
## DataProcess
1. 对编程语言的文档进行分块；

    ```bash
    bash run_split_doc.sh
    ```

    分块的结果会保存在 `./data_process/doc_chunk.txt` 中。
2. 将分块输入到大模型中，转化为 knowledge-level databse，用于后续的检索。
    
    ```bash
    bash run_convert_knowledge.sh
    ```

    如果在运行 convert_knowledge 时出现了解析错误的问题（例如LLM没有生成格式正确的 JSON 字符串），则将解析错误的结果保存到 `data_process/knowledges_parse_error.txt` 中，由后续人工进行处理。

## Evaluation
我们使用 MultiPL-E 来测评模型的效果。测评包括两个步骤：

1. 生成模型推理结果

    ```bash
    python3 ./MultiPL-E/automodel.py --name bigcode/starcoderbase-1b --root-dataset humaneval --lang jl --temperature 0.8 --batch-size 20 --output-dir-prefix tutorial
    ```

    使用 vllm 版本可以提高推理的速度，如果需要使用多gpu进行推理，请设置 `--num-gpus` 命令行参数
    
    ```bash
    python3 ./MultiPL-E/automodel_vllm.py --name bigcode/starcoderbase-1b --root-dataset humaneval --lang jl --temperature 0.8 --batch-size 40 --output-dir-prefix tutorial
    ```

    如果只需要测试 pass@1 则为每个测试用例生成20个采样结果即可，同时设置温度为 0.2

    ```bash
    python3 ./MultiPL-E/automodel_vllm.py --name bigcode/starcoderbase-1b --root-dataset humaneval --lang jl --temperature 0.2 --completion-limit 20 --batch-size 40 --output-dir-prefix tutorial
    ```

    cot inference
    
    ```
    python3 ./inference/automodel_cot.py --name bigcode/starcoderbase-1b --root-dataset humaneval --lang jl --temperature 0.2 --completion-limit 20 --batch-size 40 --output-dir-prefix tutorial/cot
    ```

2. 执行测试套件

    ```bash
    python3 ./MultiPL-E/evaluation/src/main.py --dir ./tutorial --output-dir ./tutorial --recursive
    ```

3. 根据执行结果计算 pass@k 指标

    ```bash
    python3 ./MultiPL-E/pass_k.py ./tutorial/*
    ```