llmware
/

bling-phi-3-gguf

Transformers

GGUF

phi-3

conversational

Model card Files Files and versions Community

doberst commited on May 2, 2024

Commit

94aa11a

verified ·

1 Parent(s): df060c4

Update README.md

Browse files

Files changed (1) hide show

README.md +17 -46

README.md CHANGED Viewed

@@ -3,19 +3,17 @@ license: apache-2.0
 inference: false
 ---
-# dragon-phi-3-answer-tool
 <!-- Provide a quick summary of what the model is/does. -->
-dragon-phi-3-answer-tool is part of the DRAGON ("Delivering RAG On ...") model series, RAG-instruct trained on top of a Microsoft Phi-3 base model.
-DRAGON models are fine-tuned with high-quality custom instruct datasets, designed for production use in RAG scenarios.
 ### Benchmark Tests
 Evaluated against the benchmark test:   [RAG-Instruct-Benchmark-Tester](https://www.huggingface.co/datasets/llmware/rag_instruct_benchmark_tester)
-Average of 2 Test Runs with 1 point for correct answer, 0.5 point for partial correct or blank / NF, 0.0 points for incorrect, and -1 points for hallucinations.
 --**Accuracy Score**:  **100.0** correct out of 100
 --Not Found Classification:  95.0%
@@ -32,7 +30,7 @@ For test run results (and good indicator of target use cases), please see the fi
 <!-- Provide a longer summary of what this model is. -->
 - **Developed by:** llmware
-- **Model type:** Dragon
 - **Language(s) (NLP):** English
 - **License:** Apache 2.0
 - **Finetuned from model:** Microsoft Phi-3
@@ -63,55 +61,28 @@ without the need for a lot of complex instruction verbiage - provide a text pass
 <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 Any model can provide inaccurate or incomplete information, and should be used in conjunction with appropriate safeguards and fact-checking mechanisms.
 ## How to Get Started with the Model
-The fastest way to get started with BLING is through direct import in transformers:
-    from transformers import AutoTokenizer, AutoModelForCausalLM
-    tokenizer = AutoTokenizer.from_pretrained("bling-phi-2-v0", trust_remote_code=True)
-    model = AutoModelForCausalLM.from_pretrained("bling-phi-2-v0", trust_remote_code=True)
-Please refer to the generation_test .py files in the Files repository, which includes 200 samples and script to test the model.  The **generation_test_llmware_script.py** includes built-in llmware capabilities for fact-checking, as well as easy integration with document parsing and actual retrieval to swap out the test set for RAG workflow consisting of business documents.
-The dRAGon model was fine-tuned with a simple "\<human> and \<bot> wrapper", so to get the best results, wrap inference entries as:
-    full_prompt = "<human>: " + my_prompt + "\n" + "<bot>:"
-The BLING model was fine-tuned with closed-context samples, which assume generally that the prompt consists of two sub-parts:
-1.  Text Passage Context, and
-2.  Specific question or instruction based on the text passage
-To get the best results, package "my_prompt" as follows:
-    my_prompt = {{text_passage}} + "\n" + {{question/instruction}}
-If you are using a HuggingFace generation script:
-    # prepare prompt packaging used in fine-tuning process
-    new_prompt = "<human>: " + entries["context"] + "\n" + entries["query"] + "\n" + "<bot>:"
-    inputs = tokenizer(new_prompt, return_tensors="pt")
-    start_of_output = len(inputs.input_ids[0])
-    #   temperature: set at 0.3 for consistency of output
-    #   max_new_tokens:  set at 100 - may prematurely stop a few of the summaries
-    outputs = model.generate(
-            inputs.input_ids.to(device),
-            eos_token_id=tokenizer.eos_token_id,
-            pad_token_id=tokenizer.eos_token_id,
-            do_sample=True,
-            temperature=0.3,
-            max_new_tokens=100,
-            )
-    output_only = tokenizer.decode(outputs[0][start_of_output:],skip_special_tokens=True)
 ## Model Card Contact

 inference: false
 ---
+# bling-phi-3-gguf
 <!-- Provide a quick summary of what the model is/does. -->
+bling-phi-3-gguf is part of the BLING ("Best Little Instruct No-GPU") model series, RAG-instruct trained for fact-based question-answering use cases on top of a Microsoft Phi-3 base model.
 ### Benchmark Tests
 Evaluated against the benchmark test:   [RAG-Instruct-Benchmark-Tester](https://www.huggingface.co/datasets/llmware/rag_instruct_benchmark_tester)
+1 Test Run (with temperature = 0.0 and sample = False) with 1 point for correct answer, 0.5 point for partial correct or blank / NF, 0.0 points for incorrect, and -1 points for hallucinations.
 --**Accuracy Score**:  **100.0** correct out of 100
 --Not Found Classification:  95.0%
 <!-- Provide a longer summary of what this model is. -->
 - **Developed by:** llmware
+- **Model type:** bling-rag-instruct
 - **Language(s) (NLP):** English
 - **License:** Apache 2.0
 - **Finetuned from model:** Microsoft Phi-3
 <!-- This section is meant to convey both technical and sociotechnical limitations. -->
+BLING models are designed to operate with grounded sources, e.g., inclusion of a context passage in the prompt, and will not yield consistent or positive results if open-context prompting in which you are looking for the model to draw upon potential background knowledge of the world - in fact, it is likely that the BLING will respond with a simple "Not Found." to an open context query.
 Any model can provide inaccurate or incomplete information, and should be used in conjunction with appropriate safeguards and fact-checking mechanisms.
 ## How to Get Started with the Model
+To pull the model via API:
+    from huggingface_hub import snapshot_download
+    snapshot_download("llmware/bling-phi-3-gguf", local_dir="/path/on/your/machine/", local_dir_use_symlinks=False)
+Load in your favorite GGUF inference engine, or try with llmware as follows:
+    from llmware.models import ModelCatalog
+    # to load the model and make a basic inference
+    model = ModelCatalog().load_model("llmware/bling-phi-3-gguf", temperature=0.0, sample=False)
+    response = model.function_call(text_sample)
+Details on the prompt wrapper and other configurations are on the config.json file in the files repository.
 ## Model Card Contact