File size: 7,710 Bytes
c2dc597
 
 
 
 
 
 
 
 
 
 
 
 
 
ebebbe8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c2dc597
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
---
license: apache-2.0
library_name: transformers
inference: false
base_model: AIDC-AI/Marco-o1
tags:
- llama-cpp
- gguf-my-repo
---

# Triangle104/Marco-o1-Q4_K_M-GGUF
This model was converted to GGUF format from [`AIDC-AI/Marco-o1`](https://huggingface.co/AIDC-AI/Marco-o1) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
Refer to the [original model card](https://huggingface.co/AIDC-AI/Marco-o1) for more details on the model.

---
Model details:
-
Marco-o1 not only focuses on disciplines with 
standard answers, such as mathematics, physics, and coding—which are 
well-suited for reinforcement learning (RL)—but also places greater 
emphasis on open-ended resolutions. We aim to address the question: "Can
 the o1 model effectively generalize to broader domains where clear 
standards are absent and rewards are challenging to quantify?"

Currently, Marco-o1 Large Language Model (LLM) is powered by Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), reflection mechanisms, and _innovative reasoning strategies_—optimized for complex real-world problem-solving tasks. 


⚠️ Limitations: We would like to emphasize that
 this research work is inspired by OpenAI's o1 (from which the name is 
also derived). This work aims to explore potential approaches to shed 
light on the currently unclear technical roadmap for large reasoning 
models. Besides, our focus is on open-ended questions, and we have 
observed interesting phenomena in multilingual applications. However, we
 must acknowledge that the current model primarily exhibits o1-like 
reasoning characteristics and its performance still fall short of a 
fully realized "o1" model. This is not a one-time effort, and we remain 
committed to continuous optimization and ongoing improvement.






	
		
	

		🚀 Highlights
	



Currently, our work is distinguished by the following highlights:


🍀 Fine-Tuning with CoT Data: We develop Marco-o1-CoT by performing 
full-parameter fine-tuning on the base model using open-source CoT 
dataset combined with our self-developed synthetic data.
🍀 Solution Space Expansion via MCTS: We integrate LLMs with MCTS 
(Marco-o1-MCTS), using the model's output confidence to guide the search
 and expand the solution space.
🍀 Reasoning Action Strategy: We implement novel reasoning action 
strategies and a reflection mechanism (Marco-o1-MCTS Mini-Step), 
including exploring different action granularities within the MCTS 
framework and prompting the model to self-reflect, thereby significantly
 enhancing the model's ability to solve complex problems.
🍀 Application in Translation Tasks: We are the first to apply Large
 Reasoning Models (LRM) to Machine Translation task, exploring inference
 time scaling laws in the multilingual and translation domain.


OpenAI recently introduced the groundbreaking o1 model, renowned for 
its exceptional reasoning capabilities. This model has demonstrated 
outstanding performance on platforms such as AIME, CodeForces, 
surpassing other leading models. Inspired by this success, we aimed to 
push the boundaries of LLMs even further, enhancing their reasoning 
abilities to tackle complex, real-world challenges.


🌍 Marco-o1 leverages advanced techniques like CoT fine-tuning, MCTS,
 and Reasoning Action Strategies to enhance its reasoning power. As 
shown in Figure 2, by fine-tuning Qwen2-7B-Instruct with a combination 
of the filtered Open-O1 CoT dataset, Marco-o1 CoT dataset, and Marco-o1 
Instruction dataset, Marco-o1 improved its handling of complex tasks. 
MCTS allows exploration of multiple reasoning paths using confidence 
scores derived from softmax-applied log probabilities of the top-k 
alternative tokens, guiding the model to optimal solutions. Moreover, 
our reasoning action strategy involves varying the granularity of 
actions within steps and mini-steps to optimize search efficiency and 
accuracy.



  
  
Figure 2: The overview of Marco-o1.





🌏 As shown in Figure 3, Marco-o1 achieved accuracy improvements of 
+6.17% on the MGSM (English) dataset and +5.60% on the MGSM (Chinese) 
dataset, showcasing enhanced reasoning capabilities. 



  
  
Figure 3: The main results of Marco-o1.





🌎 Additionally, in translation tasks, we demonstrate that Marco-o1 
excels in translating slang expressions, such as translating "这个鞋拥有踩屎感" 
(literal translation: "This shoe offers a stepping-on-poop sensation.") 
to "This shoe has a comfortable sole," demonstrating its superior grasp 
of colloquial nuances.



  
  
Figure 4: The demostration of translation task using Marco-o1.





For more information,please visit our Github.



	
		
	

		Usage
	



Load Marco-o1-CoT model: 


# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AIDC-AI/Marco-o1")
model = AutoModelForCausalLM.from_pretrained("AIDC-AI/Marco-o1")






Inference: 


 Execute the inference script (you can give any customized inputs inside):


./src/talk_with_model.py

# Use vLLM
./src/talk_with_model_vllm.py






	
		
	

		👨🏻‍💻 Acknowledgement
	




	
		
	

		Main Contributors
	



From MarcoPolo Team, AI Business, Alibaba International Digital Commerce:


Yu Zhao
Huifeng Yin
Hao Wang
Longyue Wang



	
		
	

		Citation
	



If you find Marco-o1 useful for your research and applications, please cite:


@misc{zhao2024marcoo1openreasoningmodels,
      title={Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions}, 
      author={Yu Zhao and Huifeng Yin and Bo Zeng and Hao Wang and Tianqi Shi and Chenyang Lyu and Longyue Wang and Weihua Luo and Kaifu Zhang},
      year={2024},
      eprint={2411.14405},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2411.14405}, 
}




	
		
	

		LICENSE
	



This project is licensed under Apache License Version 2 (SPDX-License-identifier: Apache-2.0).



	
		
	

		DISCLAIMER
	



We used compliance checking algorithms during the training process, 
to ensure the compliance of the trained model and dataset to the best of
 our ability. Due to complex data and the diversity of language model 
usage scenarios, we cannot guarantee that the model is completely free 
of copyright issues or improper content. If you believe anything 
infringes on your rights or generates improper content, please contact 
us, and we will promptly address the matter.

---
## Use with llama.cpp
Install llama.cpp through brew (works on Mac and Linux)

```bash
brew install llama.cpp

```
Invoke the llama.cpp server or the CLI.

### CLI:
```bash
llama-cli --hf-repo Triangle104/Marco-o1-Q4_K_M-GGUF --hf-file marco-o1-q4_k_m.gguf -p "The meaning to life and the universe is"
```

### Server:
```bash
llama-server --hf-repo Triangle104/Marco-o1-Q4_K_M-GGUF --hf-file marco-o1-q4_k_m.gguf -c 2048
```

Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.

Step 1: Clone llama.cpp from GitHub.
```
git clone https://github.com/ggerganov/llama.cpp
```

Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
```
cd llama.cpp && LLAMA_CURL=1 make
```

Step 3: Run inference through the main binary.
```
./llama-cli --hf-repo Triangle104/Marco-o1-Q4_K_M-GGUF --hf-file marco-o1-q4_k_m.gguf -p "The meaning to life and the universe is"
```
or 
```
./llama-server --hf-repo Triangle104/Marco-o1-Q4_K_M-GGUF --hf-file marco-o1-q4_k_m.gguf -c 2048
```