Commonsense (Micro) | Commonsense (Macro) | Hard (Micro) | Hard (Macro) | Final Pass Rate | |
---|---|---|---|---|---|
Direct Prompting | |||||
Llama3.1-8B | 60.1 | 0.0 | 7.9 | 2.8 | 0.0 |
Qwen2-7B | 49.9 | 1.1 | 2.1 | 0.0 | 0.0 |
Fine-tuning | |||||
Llama3.1-8B | 78.3 | 17.8 | 19.3 | 6.1 | 3.8 |
Qwen2-7B | 59.0 | 0.6 | 0.2 | 0.0 | 0.0 |
If our related resources prove valuable to your research, we kindly ask for a citation.
@article{xie2024revealing,
title={Revealing the Barriers of Language Agents in Planning},
author={Xie, Jian and Zhang, Kexun and Chen, Jiangjie and Yuan, Siyu and Zhang, Kai and Zhang, Yikai and Li, Lei and Xiao, Yanghua},
journal={arXiv preprint arXiv:2410.12409},
year={2024}
}
- Downloads last month
- 136
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.