Update README.md
Browse filesAdd technical report link
README.md
CHANGED
@@ -113,8 +113,8 @@ by [TensorOpera AI](https://tensoropera.ai/). The model was trained with a 3-sta
|
|
113 |
tokens of text and code data in 8K sequence length. Fox-1 uses Grouped Query Attention (GQA) with 4 key-value heads and
|
114 |
16 attention heads for faster inference.
|
115 |
|
116 |
-
For the full details of this model please read
|
117 |
-
|
118 |
|
119 |
## Benchmarks
|
120 |
|
|
|
113 |
tokens of text and code data in 8K sequence length. Fox-1 uses Grouped Query Attention (GQA) with 4 key-value heads and
|
114 |
16 attention heads for faster inference.
|
115 |
|
116 |
+
For the full details of this model please read [Fox-1 technical report](https://arxiv.org/abs/2411.05281)
|
117 |
+
and [release blog post](https://blog.tensoropera.ai/tensoropera-unveils-fox-foundation-model-a-pioneering-open-source-slm-leading-the-way-against-tech-giants).
|
118 |
|
119 |
## Benchmarks
|
120 |
|