# multipl_e

**Repository Path**: brick-pid/multipl_e

## Basic Information

- **Project Name**: multipl_e
- **Description**: No description available
- **Primary Language**: Python
- **License**: Not specified
- **Default Branch**: dafny
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2024-09-11
- **Last Updated**: 2025-02-14

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Multi-Programming Language Evaluation of Large Language Models of Code (MultiPL-E)

MultiPL-E is a system for translating unit test-driven neural code generation 
benchmarks to new languages. We have used MultiPL-E to translate two popular 
Python benchmarks (HumanEval and MBPP) to 18 other programming languages.

For more information:

- MultiPL-E is part of the [BigCode Code Generation LM Harness]. This
  is the easiest way to use MultiPL-E.
- The [Multilingual Code Models Evaluation] by BigCode evaluates Code LLMs
  using several benchmarks, including MultiPL-E.
- We have a [tutorial] on how to use MultiPL-E directly.
- Read our paper [MultiPL-E: A Scalable and Polyglot Approach to Benchmarking Neural Code Generation].
- The [MultiPL-E dataset] of translated prompts is available on the Hugging Face
  Hub.

## Versions

- Version 0.4.0: Work in progress.

  - New languages: OCaml, MATLAB
  - Using `.jsonl` instead of `.json` for prompts
  - Several bugfixes to prompts

- Version 0.3.0: used to evaluate [StarCoder]

  - This version corrects several bugs in prompts and test cases that resulted in lower
    pass@k rates for some of the statically typed languages. The most significant difference
    is that the pass@k for Java increases by about 2% on HumanEval.
  

- Version 0.2.0: used to evaluate [SantaCoder]

[tutorial]: https://nuprl.github.io/MultiPL-E/
[BigCode Code Generation LM Harness]: https://github.com/bigcode-project/bigcode-evaluation-harness
[MultiPL-E: A Scalable and Polyglot Approach to Benchmarking Neural Code Generation]: https://ieeexplore.ieee.org/abstract/document/10103177
[SantaCoder]: https://arxiv.org/abs/2301.03988
[MultiPL-E dataset]: https://huggingface.co/datasets/nuprl/MultiPL-E
[StarCoder]: https://arxiv.org/abs/2305.06161
[Multilingual Code Models Evaluation]: https://huggingface.co/spaces/bigcode/multilingual-code-evals