oopere
/

pruned20-llama-3.2-1B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

oopere commited on Dec 16, 2024

Commit

d0cdc50

·

verified ·

1 Parent(s): 71b0b49

Update README.md

Files changed (1) hide show

README.md +3 -4

README.md CHANGED Viewed

@@ -11,14 +11,14 @@ base_model:
 # Model Card for oopere/pruned20-llama-1b
 <!-- Provide a quick summary of what the model is/does. -->
-This model is a pruned version of the Llama-3.2 architecture, with a parameter reduction of 20% in the MLP Layers.
 The pruning process aims to enhance computational efficiency while maintaining acceptable performance across specific tasks.
 This model is not intended to be used directly, but rather to be fine-tuned for specific tasks where it can achieve equal or superior performance compared to fine-tuning the base model for the same task.
 ## Model Details
-- **Model Type:** Pruned version of LLaMA-1.2B using structured pruning
 - **Original Model:** meta-llama/Llama-3.2-1B
 - **Pruning Method:** Structured pruning of MLP layers using importance scores based on absolute maximum weights
 - **Size Reduction:** 13.7% (from 1.24B to 1.07B parameters)
@@ -61,5 +61,4 @@ This model is not intended to be used directly, but rather to be fine-tuned for
 - Can run on hardware with ~20% less memory than original
 ## Acknowledgments
-- Thanks to [Mariusz Kurman](https://huggingface.co/mkurman) for creating [llama-pruning](https://github.com/MedITSolutionsKurman/llama-pruning), a library that extends and improve this pruning methodology.

 # Model Card for oopere/pruned20-llama-1b
 <!-- Provide a quick summary of what the model is/does. -->
+This model is a pruned version of the Llama-3.2-1b model, with a parameter reduction of 20% in the MLP Layers.
 The pruning process aims to enhance computational efficiency while maintaining acceptable performance across specific tasks.
 This model is not intended to be used directly, but rather to be fine-tuned for specific tasks where it can achieve equal or superior performance compared to fine-tuning the base model for the same task.
 ## Model Details
+- **Model Type:** Pruned version of LLaMA-3.2 using structured pruning
 - **Original Model:** meta-llama/Llama-3.2-1B
 - **Pruning Method:** Structured pruning of MLP layers using importance scores based on absolute maximum weights
 - **Size Reduction:** 13.7% (from 1.24B to 1.07B parameters)
 - Can run on hardware with ~20% less memory than original
 ## Acknowledgments
+- Thanks to [Mariusz Kurman](https://huggingface.co/mkurman) for creating [llama-pruning](https://github.com/MedITSolutionsKurman/llama-pruning), a library that extends and improve this pruning methodology.