HF1BitLLM
/

Llama3-8B-1.58-100B-tokens

Text Generation

text-generation-inference

Inference Endpoints

8-bit precision

Model card Files Files and versions Community

medmekk HF staff commited on Sep 14, 2024

Commit

05f37de

·

verified ·

1 Parent(s): d159216

Update README.md

Files changed (1) hide show

README.md +5 -28

README.md CHANGED Viewed

@@ -15,27 +15,13 @@ For a deeper dive into the methods and results, check out our [blog post](https:
 ## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
 <!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
 ## Uses
@@ -81,20 +67,11 @@ Use the code below to get started with the model.
 ### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 [More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
 - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->

 ## Model Details
+### Model Sources
 <!-- Provide the basic links for the model. -->
+- **Repository:** [Model](https://huggingface.co/HF1BitLLM/Llama3-8B-1.58-100B-tokens)
+- **Paper:** [The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits](https://arxiv.org/abs/2402.17764)
 ## Uses
 ### Training Data
+The model was trained on a subset of [FineWeb-edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu)
 [More Information Needed]
+### Training Hyperparameters
 - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->