File size: 2,357 Bytes
5f433c2 6de7199 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
---
license: other
license_name: nvidia-open-model-license
license_link: >-
https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf
---
# Mistral-NeMo-Minitron-8B-ARChitects-Full-bnb-4bit
## Model Overview
Mistral-NeMo-Minitron-8B-ARChitects-Full-bnb-4bit is a retrained variant of [Nvidia Mistral-NeMo-Minitron-8B-Base](https://huggingface.co/nvidia/Mistral-NeMo-Minitron-8B-Base), finetuned specifically to solve [ARC-AGI](https://arcprize.org/) tasks. In order to save GPU memory, the embedding and vocabulary size have been reduced to only 77 tokens. The model achieved a score of 53.5 on the ARC-AGI private evaluation set during the [Kaggle ARC Prize 2024 Competition](https://www.kaggle.com/competitions/arc-prize-2024/leaderboard). Note that the ARC-AGI public evaluation set was used as training data for this model. Please refer to our [paper](https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf) for more details. For more models tuned for ARC-AGI, check out our [model collection](https://huggingface.co/collections/da-fr/arc-agi-models-674f0d88c8b2fa1edecffadb).
## Finetuning Datasets
This model was finetuned on the following datasets:
* the [ReArc data set](https://github.com/michaelhodel/re-arc) by Michael Hodel
* the official [ARC Prize](https://arcprize.org/) evaluation set
* the [ConceptARC data set](https://github.com/victorvikram/ConceptARC)
## License
This model is released under the [NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf).
## Usage
This model can be used with the `transformers` or `unsloth` packages. For more information on preprocessing the ARC Prize tasks to generate prompts for the model, please refer to our [Paper](https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf) and our [github repositiory](https://github.com/da-fr/arc-prize-2024).
## References
* [The LLM ARChitect: Solving ARC-AGI is a Matter of Perspective](https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf)
* [Minitron: Compact Language Models via Pruning and Knowledge Distillation](https://arxiv.org/abs/2407.14679)
* [LLM Pruning and Distillation in Practice: The Minitron Approach](https://arxiv.org/abs/2408.11796)
|