File size: 2,173 Bytes
a9895b4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
language:
- ko
- en
license: apache-2.0
tags:
- text-generation
- qwen2.5
- korean
- instruct
- mlx
- 4bit
pipeline_tag: text-generation
---

## Qwen2.5-7B-Instruct-kowiki-qa-4bit mlx convert model
- Original model is [beomi/Qwen2.5-7B-Instruct-kowiki-qa](https://huggingface.co/beomi/Qwen2.5-7B-Instruct-kowiki-qa)


## Requirement
- `pip install mlx-lm`

## Usage
- [Generate with CLI](https://github.com/ml-explore/mlx-examples/blob/main/llms/README.md#command-line)
    ```bash
    mlx_lm.generate --model mlx-community/Qwen2.5-7B-Instruct-kowiki-qa-4bit --prompt "ν•˜λŠ˜μ΄ νŒŒλž€ μ΄μœ κ°€ 뭐야?"
    ```

- [In Python](https://github.com/ml-explore/mlx-examples/blob/main/llms/README.md#python-api)
    ```python
    from mlx_lm import load, generate
    
    model, tokenizer = load(
        "mlx-community/Qwen2.5-7B-Instruct-kowiki-qa-4bit",
        tokenizer_config={"trust_remote_code": True},
    )

    prompt = "ν•˜λŠ˜μ΄ νŒŒλž€ μ΄μœ κ°€ 뭐야?"
    
    messages = [
        {"role": "system", "content": "당신은 μΉœμ² ν•œ μ±—λ΄‡μž…λ‹ˆλ‹€."},
        {"role": "user", "content": prompt},
    ]
    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
    )
    
    text = generate(
        model,
        tokenizer,
        prompt=prompt,
        # verbose=True,
        # max_tokens=8196,
        # temp=0.0,
    )
    ```

- [OpenAI Compitable HTTP Server](https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/SERVER.md)
    ```bash
    mlx_lm.server --model mlx-community/Qwen2.5-7B-Instruct-kowiki-qa-4bit --host 0.0.0.0
    ```

    ```python
    import openai


    client = openai.OpenAI(
        base_url="http://localhost:8080/v1",
    )

    prompt = "ν•˜λŠ˜μ΄ νŒŒλž€ μ΄μœ κ°€ 뭐야?"

    messages = [
        {"role": "system", "content": "당신은 μΉœμ ˆν•œ μ±—λ΄‡μž…λ‹ˆλ‹€.",},
        {"role": "user", "content": prompt},
    ]
    res = client.chat.completions.create(
        model='mlx-community/Qwen2.5-7B-Instruct-kowiki-qa-4bit',
        messages=messages,
        temperature=0.2,
    )
    
    print(res.choices[0].message.content)
    ```