File size: 2,357 Bytes
5f433c2
 
 
 
 
 
6de7199
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
---

license: other
license_name: nvidia-open-model-license
license_link: >-
  https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf
---


# Mistral-NeMo-Minitron-8B-ARChitects-Full-bnb-4bit

## Model Overview

Mistral-NeMo-Minitron-8B-ARChitects-Full-bnb-4bit is a retrained variant of [Nvidia Mistral-NeMo-Minitron-8B-Base](https://huggingface.co/nvidia/Mistral-NeMo-Minitron-8B-Base), finetuned specifically to solve [ARC-AGI](https://arcprize.org/) tasks. In order to save GPU memory, the embedding and vocabulary size have been reduced to only 77 tokens. The model achieved a score of 53.5 on the ARC-AGI private evaluation set during the [Kaggle ARC Prize 2024 Competition](https://www.kaggle.com/competitions/arc-prize-2024/leaderboard). Note that the ARC-AGI public evaluation set was used as training data for this model. Please refer to our [paper](https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf) for more details. For more models tuned for ARC-AGI, check out our [model collection](https://huggingface.co/collections/da-fr/arc-agi-models-674f0d88c8b2fa1edecffadb).

## Finetuning Datasets

This model was finetuned on the following datasets:

* the [ReArc data set](https://github.com/michaelhodel/re-arc) by Michael Hodel
* the official [ARC Prize](https://arcprize.org/) evaluation set
* the [ConceptARC data set](https://github.com/victorvikram/ConceptARC)

## License

This model is released under the [NVIDIA Open Model License Agreement](https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf).

## Usage
This model can be used with the `transformers` or `unsloth` packages. For more information on preprocessing the ARC Prize tasks to generate prompts for the model, please refer to our [Paper](https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf) and our [github repositiory](https://github.com/da-fr/arc-prize-2024).

## References
* [The LLM ARChitect: Solving ARC-AGI is a Matter of Perspective](https://github.com/da-fr/arc-prize-2024/blob/main/the_architects.pdf)
* [Minitron: Compact Language Models via Pruning and Knowledge Distillation](https://arxiv.org/abs/2407.14679)
* [LLM Pruning and Distillation in Practice: The Minitron Approach](https://arxiv.org/abs/2408.11796)