tinybiggames
/

Phi-3-mini-4k-instruct-Q4_K_M-GGUF

@@ -1,18 +1,18 @@
 ---
 language:
 - en
 license: mit
 tags:
 - nlp
 - code
 - llama-cpp
 - gguf-my-repo
-- LMEngine
-license_link: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/resolve/main/LICENSE
-pipeline_tag: text-generation
 inference:
   parameters:
-    temperature: 0
 widget:
 - messages:
   - role: user
@@ -22,59 +22,43 @@ widget:
 # tinybiggames/Phi-3-mini-4k-instruct-Q4_K_M-GGUF
 This model was converted to GGUF format from [`microsoft/Phi-3-mini-4k-instruct`](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) for more details on the model.
-## Use with tinyBigGAMES's [Inference](https://github.com/tinyBigGAMES) Libraries.
-How to configure LMEngine:
-```Delphi
-InitConfig(
- 'C:/LLM/gguf', // path to model files
- -1             // number of GPU layer, -1 to use all available layers
-);
 ```
-How to define model:
-```Delphi
-DefineModel('phi-3-mini-4k-instruct.Q4_K_M.gguf',
-  'phi-3-mini-4k-instruct.Q4_K_M', 4000,
-  '<|{role}|>{content}<|end|>',
-  '<|assistant|>');
 ```
-How to add a message:
-```Delphi
-AddMessage(
-  ROLE_USER,    // role
- 'What is AI?'  // content
-);
 ```
-`{role}` - will be substituted with the message "role"
-`{content}` - will be substituted with the message "content"
-How to do inference:
-```Delphi
-var
-  LTokenOutputSpeed: Single;
-  LInputTokens: Int32;
-  LOutputTokens: Int32;
-  LTotalTokens: Int32;
-if RunInference('phi-3-mini-4k-instruct.Q4_K_M', 1024) then
-  begin
-    GetInferenceStats(nil, @LTokenOutputSpeed, @LInputTokens, @LOutputTokens,
-      @LTotalTokens);
-    PrintLn('', FG_WHITE);
-    PrintLn('Tokens :: Input: %d, Output: %d, Total: %d, Speed: %3.1f t/s',
-      FG_BRIGHTYELLOW, LInputTokens, LOutputTokens, LTotalTokens, LTokenOutputSpeed);
-  end
-else
-  begin
-    PrintLn('', FG_WHITE);
-    PrintLn('Error: %s', FG_RED, GetError());
-  end;
-```

 ---
+base_model: microsoft/Phi-3-mini-4k-instruct
 language:
 - en
 license: mit
+license_link: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/resolve/main/LICENSE
+pipeline_tag: text-generation
 tags:
 - nlp
 - code
 - llama-cpp
 - gguf-my-repo
 inference:
   parameters:
+    temperature: 0.0
 widget:
 - messages:
   - role: user
 # tinybiggames/Phi-3-mini-4k-instruct-Q4_K_M-GGUF
 This model was converted to GGUF format from [`microsoft/Phi-3-mini-4k-instruct`](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) for more details on the model.
+## Use with llama.cpp
+Install llama.cpp through brew (works on Mac and Linux)
+```bash
+brew install llama.cpp
 ```
+Invoke the llama.cpp server or the CLI.
+### CLI:
+```bash
+llama-cli --hf-repo tinybiggames/Phi-3-mini-4k-instruct-Q4_K_M-GGUF --hf-file phi-3-mini-4k-instruct-q4_k_m.gguf -p "The meaning to life and the universe is"
+```
+### Server:
+```bash
+llama-server --hf-repo tinybiggames/Phi-3-mini-4k-instruct-Q4_K_M-GGUF --hf-file phi-3-mini-4k-instruct-q4_k_m.gguf -c 2048
 ```
+Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.
+Step 1: Clone llama.cpp from GitHub.
+```
+git clone https://github.com/ggerganov/llama.cpp
 ```
+Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
+```
+cd llama.cpp && LLAMA_CURL=1 make
+```
+Step 3: Run inference through the main binary.
+```
+./llama-cli --hf-repo tinybiggames/Phi-3-mini-4k-instruct-Q4_K_M-GGUF --hf-file phi-3-mini-4k-instruct-q4_k_m.gguf -p "The meaning to life and the universe is"
+```
+or
+```
+./llama-server --hf-repo tinybiggames/Phi-3-mini-4k-instruct-Q4_K_M-GGUF --hf-file phi-3-mini-4k-instruct-q4_k_m.gguf -c 2048
+```