tensoropera
/

Fox-1-1.6B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

zijianhu commited on Nov 21, 2024

Commit

d13ba2d

·

verified ·

1 Parent(s): 3c2bd69

Update README.md

Add technical report link

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -113,8 +113,8 @@ by [TensorOpera AI](https://tensoropera.ai/). The model was trained with a 3-sta
 tokens of text and code data in 8K sequence length. Fox-1 uses Grouped Query Attention (GQA) with 4 key-value heads and
 16 attention heads for faster inference.
-For the full details of this model please read
-our [release blog post](https://blog.tensoropera.ai/tensoropera-unveils-fox-foundation-model-a-pioneering-open-source-slm-leading-the-way-against-tech-giants).
 ## Benchmarks

 tokens of text and code data in 8K sequence length. Fox-1 uses Grouped Query Attention (GQA) with 4 key-value heads and
 16 attention heads for faster inference.
+For the full details of this model please read [Fox-1 technical report](https://arxiv.org/abs/2411.05281)
+and [release blog post](https://blog.tensoropera.ai/tensoropera-unveils-fox-foundation-model-a-pioneering-open-source-slm-leading-the-way-against-tech-giants).
 ## Benchmarks