§ feed · storyline
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
OpenAI introduces MLE-bench, a benchmark designed to measure how well AI agents perform at machine learning engineering tasks.
We introduce MLE-bench, a benchmark for measuring how well AI agents perform at machine learning engineering.
§ sources1 publication · timeline below