🏆 Code Lingua 🏆
This leaderboard evaluates LLMs in Programming Language Translation
While other leaderboards assess abilities of LLMs to understand Natural Language (NL) for code synthesis, the ultimate way of assessing whether LLMs understand code syntax and semantics is code translation. Code Lingua serves as such leaderboard, and compares the ability of LLMs to understand what the code implements in source language and translate the same semantics in target language. The dataset used in this leaderboard can be accessed on HuggingFace 🤗.
🙏 Please cite our paper if you are using this leaderboard in your work 🙏
@inproceedings{pan2024lost,
title = {Lost in translation: A study of bugs introduced by large language models while translating code},
author = {Pan, Rangeet and Ibrahimzada, Ali Reza and Krishna, Rahul and Sankar, Divya and Wassi, Lambert Pouguem and Merler, Michele and Sobolev, Boris and Pavuluri, Raju and Sinha, Saurabh and Jabbarvand, Reyhaneh},
booktitle = {2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE)},
pages = {866--866},
year = {2024},
organization = {IEEE Computer Society}
}
✉️ Reach out to Ali (alirezai@illinois.edu) or Rangeet (rangeet.pan@ibm.com) for questions about the leaderboard ✉️
While other leaderboards assess abilities of LLMs to understand Natural Language (NL) for code synthesis, the ultimate way of assessing whether LLMs understand code syntax and semantics is code translation. Code Lingua serves as such leaderboard, and compares the ability of LLMs to understand what the code implements in source language and translate the same semantics in target language. The dataset used in this leaderboard can be accessed on HuggingFace 🤗.
🙏 Please cite our paper if you are using this leaderboard in your work 🙏
title = {Lost in translation: A study of bugs introduced by large language models while translating code},
author = {Pan, Rangeet and Ibrahimzada, Ali Reza and Krishna, Rahul and Sankar, Divya and Wassi, Lambert Pouguem and Merler, Michele and Sobolev, Boris and Pavuluri, Raju and Sinha, Saurabh and Jabbarvand, Reyhaneh},
booktitle = {2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE)},
pages = {866--866},
year = {2024},
organization = {IEEE Computer Society}
}
✉️ Reach out to Ali (alirezai@illinois.edu) or Rangeet (rangeet.pan@ibm.com) for questions about the leaderboard ✉️
📝 Notes
- We use Pass@1* (greedy decoding with temperature=0), Pass@1 and Pass@5 for evaluating LLMs in our leaderboard. For Pass@1 and Pass@5, we report the max value from temperatures 0.2 and 0.8.
- For "All Dataset", the scores are averaged over each source-target language pair.
- It is the model providers' responsibility to avoid data contamination as much as possible. In other words, we cannot guarantee if the evaluated models are contaminated or not.
🤗 More Leaderboards
In addition to Code Lingua leaderboards, it is recommended to comprehensively understand LLM coding ability through a diverse set of benchmarks and leaderboards, such as:
We would like to thank authors of EvalPlus for their artifacts and leaderboard template 🙏