File size: 4,588 Bytes
a9649b9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
# VietCoMath Model Usage

## Overview
This repository contains code for running the VietCoMath language model for mathematical problem-solving and text generation tasks.

## Installation

### Prerequisites
- Python 3.8+
- PyTorch
- Transformers library

### Required Dependencies
```bash
pip install transformers torch
```

## Usage

### Basic Text Generation
```python
import transformers
import torch

# Load the model
model_id = "VietnamAIHub/VietCoMath-o1-8B"
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

# Prepare messages
messages = [
    {"role": "system", "content": ""},
    {"role": "user", "content": "Who are you?"},
]

# Define terminators
terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

# Generate text
outputs = pipeline(
    messages,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)

# Print generated text
print(outputs[0]["generated_text"][-1])
```

### Advanced Usage with Response Parsing

#### Helper Functions
```python
import re

def check_patterns(response):
    """
    Check if the response contains all required XML patterns.
    
    Args:
        response (str): The model's generated response
    
    Returns:
        str: Parsed response or 'Missing' if patterns are incomplete
    """
    patterns = {
        'answer': r'<answer>(.*?)</answer>',
        'reflection': r'<reflection>(.*?)</reflection>',
        'steps': r'<step>(.*?)</step>',
        'count': r'<count>(.*?)</count>'
    }
    
    matches = {
        'answer': re.search(patterns['answer'], response, re.DOTALL),
        'reflection': re.search(patterns['reflection'], response, re.DOTALL),
        'steps': re.findall(patterns['steps'], response, re.DOTALL),
        'count': re.findall(patterns['count'], response, re.DOTALL)
    }
    
    return "Missing" if not all([matches['answer'], matches['reflection'], matches['steps'], matches['count']]) else response

def parse_response(response):
    """
    Parse the model's response and extract key components.
    
    Args:
        response (str): The model's generated response
    
    Returns:
        tuple: Parsed answer, reflection, steps, and clarification
    """
    response_check = check_patterns(response)
    
    if response_check == "Missing":
        clarification_match = re.search(r'<clarification>(.*?)</clarification>', response, re.DOTALL)
        clarification = clarification_match.group(1).strip() if clarification_match else response
        return "", "", [], clarification
    else:
        answer_match = re.search(r'<answer>(.*?)</answer>', response, re.DOTALL)
        reflection_match = re.search(r'<reflection>(.*?)</reflection>', response, re.DOTALL)
        
        answer = answer_match.group(1).strip() if answer_match else ""
        reflection = reflection_match.group(1).strip() if reflection_match else ""
        steps = re.findall(r'<step>(.*?)</step>', response, re.DOTALL)
        
        return answer, reflection, steps, ""
```

## Example Problem Solving
```python
from openai import OpenAI

client = OpenAI(
    base_url="http://13.65.249.11:8887/v1",
    api_key="token-abc123",
)

# Example mathematical word problem
problem = "Có 100 sinh viên đỗ đại học. Trong số đó, có 55 sinh viên chọn âm nhạc, 44 sinh viên chọn thể thao, và 20 sinh viên chọn cả 2. Hỏi có bao nhiêu sinh viên không chọn âm nhạc, cũng không chọn thể thao?"

completion = client.chat.completions.create(
    model="/LLM_32T/pretrained_weights/preview_model/VietCoMath_8B_instruct_2024_11",
    messages=[
        {"role": "system", "content": ""},
        {"role": "user", "content": problem}
    ],
    temperature=0.6,
    top_p=0.9,
    max_tokens=4096,
)

generated_text = completion.choices[0].message.content
answer, reflection, steps, clarification = parse_response(generated_text)
```

## Notes
- Ensure you have a stable internet connection
- The model requires significant computational resources
- Performance may vary based on input complexity

## Limitations
- The model is trained primarily on Vietnamese mathematical problems
- May require fine-tuning for specific use cases

## License
[Insert Appropriate License Information]

## Citation
[If applicable, add citation information for the model]
```

## Contribution
Feel free to open issues or submit pull requests to improve the code and documentation.