Triangle104 commited on
Commit
effc964
·
verified ·
1 Parent(s): fd32520

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +3 -154
README.md CHANGED
@@ -8,7 +8,9 @@ tags:
8
  - trl
9
  - llama-cpp
10
  - gguf-my-repo
11
- license: apache-2.0
 
 
12
  language:
13
  - en
14
  ---
@@ -17,159 +19,6 @@ language:
17
  This model was converted to GGUF format from [`Spestly/Athena-1-3B`](https://huggingface.co/Spestly/Athena-1-3B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
18
  Refer to the [original model card](https://huggingface.co/Spestly/Athena-1-3B) for more details on the model.
19
 
20
- ---
21
- Model details:
22
- -
23
- Athena-1 3B is a fine-tuned, instruction-following large language model derived from Qwen/Qwen2.5-3B-Instruct.
24
- It is designed to provide efficient, high-quality text generation while
25
- maintaining a compact size. Athena 3B is optimized for lightweight
26
- applications, conversational AI, and structured data tasks, making it
27
- ideal for real-world use cases where performance and resource efficiency
28
- are critical.
29
-
30
-
31
-
32
-
33
-
34
-
35
-
36
-
37
- Key Features
38
-
39
-
40
-
41
-
42
-
43
-
44
-
45
-
46
-
47
- ⚡ Lightweight and Efficient
48
-
49
-
50
-
51
-
52
- Compact Size: At just 3.09 billion parameters, Athena-1 3B offers excellent performance with reduced computational requirements.
53
- Instruction Following: Fine-tuned for precise and reliable adherence to user prompts.
54
- Coding and Mathematics: Proficient in solving coding challenges and handling mathematical tasks.
55
-
56
-
57
-
58
-
59
-
60
-
61
-
62
- 📖 Long-Context Understanding
63
-
64
-
65
-
66
-
67
- Context Length: Supports up to 32,768 tokens, enabling the processing of moderately lengthy documents or conversations.
68
- Token Generation: Can generate up to 8K tokens of output.
69
-
70
-
71
-
72
-
73
-
74
-
75
-
76
- 🌍 Multilingual Support
77
-
78
-
79
-
80
-
81
- Supports 29+ languages, including:
82
- English, Chinese, French, Spanish, Portuguese, German, Italian, Russian
83
- Japanese, Korean, Vietnamese, Thai, Arabic, and more.
84
-
85
-
86
-
87
-
88
-
89
-
90
-
91
-
92
-
93
- 📊 Structured Data & Outputs
94
-
95
-
96
-
97
-
98
- Structured Data Interpretation: Processes structured formats like tables and JSON.
99
- Structured Output Generation: Generates well-formatted outputs, including JSON and other structured formats.
100
-
101
-
102
-
103
-
104
-
105
-
106
-
107
-
108
- Model Details
109
-
110
-
111
-
112
-
113
- Base Model: Qwen/Qwen2.5-3B-Instruct
114
- Architecture: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.
115
- Parameters: 3.09B total (2.77B non-embedding).
116
- Layers: 36
117
- Attention Heads: 16 for Q, 2 for KV.
118
- Context Length: Up to 32,768 tokens.
119
-
120
-
121
-
122
-
123
-
124
-
125
-
126
-
127
- Applications
128
-
129
-
130
-
131
-
132
- Athena 3B is designed for a variety of real-world applications:
133
-
134
-
135
- Conversational AI: Build fast, responsive, and lightweight chatbots.
136
- Code Generation: Generate, debug, or explain code snippets.
137
- Mathematical Problem Solving: Assist with calculations and reasoning.
138
- Document Processing: Summarize and analyze moderately large documents.
139
- Multilingual Applications: Support for global use cases with diverse language requirements.
140
- Structured Data: Process and generate structured data, such as tables and JSON.
141
-
142
-
143
-
144
-
145
-
146
-
147
-
148
-
149
- Quickstart
150
-
151
-
152
-
153
-
154
- Here’s how you can use Athena 3B for quick text generation:
155
-
156
-
157
- # Use a pipeline as a high-level helper
158
- from transformers import pipeline
159
-
160
- messages = [
161
- {"role": "user", "content": "Who are you?"},
162
- ]
163
- pipe = pipeline("text-generation", model="Spestly/Athena-1-3B")
164
- pipe(messages)
165
-
166
- # Load model directly
167
- from transformers import AutoTokenizer, AutoModelForCausalLM
168
-
169
- tokenizer = AutoTokenizer.from_pretrained("Spestly/Athena-1-3B")
170
- model = AutoModelForCausalLM.from_pretrained("Spestly/Athena-1-3B")
171
-
172
- ---
173
  ## Use with llama.cpp
174
  Install llama.cpp through brew (works on Mac and Linux)
175
 
 
8
  - trl
9
  - llama-cpp
10
  - gguf-my-repo
11
+ license: other
12
+ license_name: qwen-research
13
+ license_link: https://huggingface.co/Spestly/Athena-1-3B/blob/main/LICENSE
14
  language:
15
  - en
16
  ---
 
19
  This model was converted to GGUF format from [`Spestly/Athena-1-3B`](https://huggingface.co/Spestly/Athena-1-3B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
20
  Refer to the [original model card](https://huggingface.co/Spestly/Athena-1-3B) for more details on the model.
21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  ## Use with llama.cpp
23
  Install llama.cpp through brew (works on Mac and Linux)
24