--- license: apache-2.0 library_name: transformers tags: - mistral - alpaca datasets: - tatsu-lab/alpaca pipeline_tag: text-generation base_model: mistralai/Mistral-7B-v0.1 model-index: - name: Mistral-7B-Alpaca-52k-v0.1 results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 60.92 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Mistral-7B-Alpaca-52k-v0.1 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 82.13 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Mistral-7B-Alpaca-52k-v0.1 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 63.41 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Mistral-7B-Alpaca-52k-v0.1 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 41.5 source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Mistral-7B-Alpaca-52k-v0.1 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 77.35 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Mistral-7B-Alpaca-52k-v0.1 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 37.45 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Mistral-7B-Alpaca-52k-v0.1 name: Open LLM Leaderboard --- # Description `mistralai/Mistral-7B-v0.1` model fine-tuned over 52k alpaca dataset # How to use it ```python # pip install transformers==4.35.2 import torch from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer from transformers import pipeline model_id="MaziyarPanahi/Mistral-7B-Alpaca-52k-v0.1" tokenizer = AutoTokenizer.from_pretrained(model_id) streamer = TextStreamer(tokenizer) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.float16, device_map="auto", ) pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer, max_new_tokens=1024, temperature=0.1, do_sample=True, top_p=0.95, repetition_penalty=1.15, return_full_text=False, streamer=streamer ) prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: describe about pros and cons of docker system. Answer in bullet point ### Response: """ res = pipe(prompt)[0]['generated_text'] ``` Results: ``` Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: describe about pros and cons of docker system. Answer in bullet point ### Response: Pros of Docker System: - Improved portability - Docker containers can be easily moved between different environments, making it easier to deploy applications across multiple platforms. - Increased security - Containers are isolated from each other, which helps prevent malicious code from spreading throughout the system. - Better resource utilization - Containers allow for better resource management by allowing users to run multiple applications on a single host without having to worry about conflicts or performance issues. Cons of Docker System: - Learning curve - It takes time to learn how to use Docker effectively, as there are many commands and concepts involved. - Limited customization options - While Docker provides some basic configuration options, more advanced features such as network routing require additional tools. - Performance overhead - Running multiple containers on a single host may result in slower performance due to increased memory usage. ``` ## Eval ```python { "all": { "acc": 0.6309850839451187, "acc_stderr": 0.032333688535613636, "acc_norm": 0.6368691004374645, "acc_norm_stderr": 0.03298401757997533, "mc1": 0.29008567931456547, "mc1_stderr": 0.01588623687420952, "mc2": 0.41501661742948026, "mc2_stderr": 0.014285902986671931 }, "harness|arc:challenge|25": { "acc": 0.5750853242320819, "acc_stderr": 0.014445698968520767, "acc_norm": 0.6092150170648464, "acc_norm_stderr": 0.01425856388051378 }, "harness|hellaswag|10": { "acc": 0.6221868153754232, "acc_stderr": 0.0048384969668239025, "acc_norm": 0.8212507468631747, "acc_norm_stderr": 0.0038235918141330347 }, "harness|hendrycksTest-abstract_algebra|5": { "acc": 0.32, "acc_stderr": 0.046882617226215034, "acc_norm": 0.32, "acc_norm_stderr": 0.046882617226215034 }, "harness|hendrycksTest-anatomy|5": { "acc": 0.6, "acc_stderr": 0.04232073695151589, "acc_norm": 0.6, "acc_norm_stderr": 0.04232073695151589 }, "harness|hendrycksTest-astronomy|5": { "acc": 0.6447368421052632, "acc_stderr": 0.038947344870133176, "acc_norm": 0.6447368421052632, "acc_norm_stderr": 0.038947344870133176 }, "harness|hendrycksTest-business_ethics|5": { "acc": 0.57, "acc_stderr": 0.04975698519562428, "acc_norm": 0.57, "acc_norm_stderr": 0.04975698519562428 }, "harness|hendrycksTest-clinical_knowledge|5": { "acc": 0.6792452830188679, "acc_stderr": 0.02872750295788027, "acc_norm": 0.6792452830188679, "acc_norm_stderr": 0.02872750295788027 }, "harness|hendrycksTest-college_biology|5": { "acc": 0.7430555555555556, "acc_stderr": 0.03653946969442099, "acc_norm": 0.7430555555555556, "acc_norm_stderr": 0.03653946969442099 }, "harness|hendrycksTest-college_chemistry|5": { "acc": 0.49, "acc_stderr": 0.05024183937956912, "acc_norm": 0.49, "acc_norm_stderr": 0.05024183937956912 }, "harness|hendrycksTest-college_computer_science|5": { "acc": 0.56, "acc_stderr": 0.04988876515698589, "acc_norm": 0.56, "acc_norm_stderr": 0.04988876515698589 }, "harness|hendrycksTest-college_mathematics|5": { "acc": 0.36, "acc_stderr": 0.048241815132442176, "acc_norm": 0.36, "acc_norm_stderr": 0.048241815132442176 }, "harness|hendrycksTest-college_medicine|5": { "acc": 0.653179190751445, "acc_stderr": 0.036291466701596636, "acc_norm": 0.653179190751445, "acc_norm_stderr": 0.036291466701596636 }, "harness|hendrycksTest-college_physics|5": { "acc": 0.4019607843137255, "acc_stderr": 0.048786087144669955, "acc_norm": 0.4019607843137255, "acc_norm_stderr": 0.048786087144669955 }, "harness|hendrycksTest-computer_security|5": { "acc": 0.79, "acc_stderr": 0.04093601807403326, "acc_norm": 0.79, "acc_norm_stderr": 0.04093601807403326 }, "harness|hendrycksTest-conceptual_physics|5": { "acc": 0.5702127659574469, "acc_stderr": 0.03236214467715564, "acc_norm": 0.5702127659574469, "acc_norm_stderr": 0.03236214467715564 }, "harness|hendrycksTest-econometrics|5": { "acc": 0.49122807017543857, "acc_stderr": 0.047028804320496165, "acc_norm": 0.49122807017543857, "acc_norm_stderr": 0.047028804320496165 }, "harness|hendrycksTest-electrical_engineering|5": { "acc": 0.5862068965517241, "acc_stderr": 0.04104269211806232, "acc_norm": 0.5862068965517241, "acc_norm_stderr": 0.04104269211806232 }, "harness|hendrycksTest-elementary_mathematics|5": { "acc": 0.3915343915343915, "acc_stderr": 0.025138091388851116, "acc_norm": 0.3915343915343915, "acc_norm_stderr": 0.025138091388851116 }, "harness|hendrycksTest-formal_logic|5": { "acc": 0.4444444444444444, "acc_stderr": 0.04444444444444449, "acc_norm": 0.4444444444444444, "acc_norm_stderr": 0.04444444444444449 }, "harness|hendrycksTest-global_facts|5": { "acc": 0.32, "acc_stderr": 0.04688261722621504, "acc_norm": 0.32, "acc_norm_stderr": 0.04688261722621504 }, "harness|hendrycksTest-high_school_biology|5": { "acc": 0.7419354838709677, "acc_stderr": 0.02489246917246283, "acc_norm": 0.7419354838709677, "acc_norm_stderr": 0.02489246917246283 }, "harness|hendrycksTest-high_school_chemistry|5": { "acc": 0.5024630541871922, "acc_stderr": 0.035179450386910616, "acc_norm": 0.5024630541871922, "acc_norm_stderr": 0.035179450386910616 }, "harness|hendrycksTest-high_school_computer_science|5": { "acc": 0.67, "acc_stderr": 0.047258156262526066, "acc_norm": 0.67, "acc_norm_stderr": 0.047258156262526066 }, "harness|hendrycksTest-high_school_european_history|5": { "acc": 0.7575757575757576, "acc_stderr": 0.03346409881055953, "acc_norm": 0.7575757575757576, "acc_norm_stderr": 0.03346409881055953 }, "harness|hendrycksTest-high_school_geography|5": { "acc": 0.7929292929292929, "acc_stderr": 0.028869778460267042, "acc_norm": 0.7929292929292929, "acc_norm_stderr": 0.028869778460267042 }, "harness|hendrycksTest-high_school_government_and_politics|5": { "acc": 0.8601036269430051, "acc_stderr": 0.025033870583015184, "acc_norm": 0.8601036269430051, "acc_norm_stderr": 0.025033870583015184 }, "harness|hendrycksTest-high_school_macroeconomics|5": { "acc": 0.6358974358974359, "acc_stderr": 0.024396672985094764, "acc_norm": 0.6358974358974359, "acc_norm_stderr": 0.024396672985094764 }, "harness|hendrycksTest-high_school_mathematics|5": { "acc": 0.362962962962963, "acc_stderr": 0.029318203645206865, "acc_norm": 0.362962962962963, "acc_norm_stderr": 0.029318203645206865 }, "harness|hendrycksTest-high_school_microeconomics|5": { "acc": 0.6218487394957983, "acc_stderr": 0.03149930577784906, "acc_norm": 0.6218487394957983, "acc_norm_stderr": 0.03149930577784906 }, "harness|hendrycksTest-high_school_physics|5": { "acc": 0.32450331125827814, "acc_stderr": 0.038227469376587525, "acc_norm": 0.32450331125827814, "acc_norm_stderr": 0.038227469376587525 }, "harness|hendrycksTest-high_school_psychology|5": { "acc": 0.8146788990825689, "acc_stderr": 0.016659279700295838, "acc_norm": 0.8146788990825689, "acc_norm_stderr": 0.016659279700295838 }, "harness|hendrycksTest-high_school_statistics|5": { "acc": 0.49537037037037035, "acc_stderr": 0.03409825519163572, "acc_norm": 0.49537037037037035, "acc_norm_stderr": 0.03409825519163572 }, "harness|hendrycksTest-high_school_us_history|5": { "acc": 0.7892156862745098, "acc_stderr": 0.028626547912437406, "acc_norm": 0.7892156862745098, "acc_norm_stderr": 0.028626547912437406 }, "harness|hendrycksTest-high_school_world_history|5": { "acc": 0.7552742616033755, "acc_stderr": 0.027985699387036423, "acc_norm": 0.7552742616033755, "acc_norm_stderr": 0.027985699387036423 }, "harness|hendrycksTest-human_aging|5": { "acc": 0.6636771300448431, "acc_stderr": 0.031708824268455, "acc_norm": 0.6636771300448431, "acc_norm_stderr": 0.031708824268455 }, "harness|hendrycksTest-human_sexuality|5": { "acc": 0.7862595419847328, "acc_stderr": 0.0359546161177469, "acc_norm": 0.7862595419847328, "acc_norm_stderr": 0.0359546161177469 }, "harness|hendrycksTest-international_law|5": { "acc": 0.7933884297520661, "acc_stderr": 0.03695980128098824, "acc_norm": 0.7933884297520661, "acc_norm_stderr": 0.03695980128098824 }, "harness|hendrycksTest-jurisprudence|5": { "acc": 0.7592592592592593, "acc_stderr": 0.04133119440243838, "acc_norm": 0.7592592592592593, "acc_norm_stderr": 0.04133119440243838 }, "harness|hendrycksTest-logical_fallacies|5": { "acc": 0.803680981595092, "acc_stderr": 0.031207970394709218, "acc_norm": 0.803680981595092, "acc_norm_stderr": 0.031207970394709218 }, "harness|hendrycksTest-machine_learning|5": { "acc": 0.5178571428571429, "acc_stderr": 0.047427623612430116, "acc_norm": 0.5178571428571429, "acc_norm_stderr": 0.047427623612430116 }, "harness|hendrycksTest-management|5": { "acc": 0.8252427184466019, "acc_stderr": 0.03760178006026621, "acc_norm": 0.8252427184466019, "acc_norm_stderr": 0.03760178006026621 }, "harness|hendrycksTest-marketing|5": { "acc": 0.8632478632478633, "acc_stderr": 0.022509033937077816, "acc_norm": 0.8632478632478633, "acc_norm_stderr": 0.022509033937077816 }, "harness|hendrycksTest-medical_genetics|5": { "acc": 0.74, "acc_stderr": 0.04408440022768078, "acc_norm": 0.74, "acc_norm_stderr": 0.04408440022768078 }, "harness|hendrycksTest-miscellaneous|5": { "acc": 0.8173690932311622, "acc_stderr": 0.013816335389973136, "acc_norm": 0.8173690932311622, "acc_norm_stderr": 0.013816335389973136 }, "harness|hendrycksTest-moral_disputes|5": { "acc": 0.7023121387283237, "acc_stderr": 0.024617055388677, "acc_norm": 0.7023121387283237, "acc_norm_stderr": 0.024617055388677 }, "harness|hendrycksTest-moral_scenarios|5": { "acc": 0.2335195530726257, "acc_stderr": 0.014149575348976269, "acc_norm": 0.2335195530726257, "acc_norm_stderr": 0.014149575348976269 }, "harness|hendrycksTest-nutrition|5": { "acc": 0.7450980392156863, "acc_stderr": 0.024954184324879905, "acc_norm": 0.7450980392156863, "acc_norm_stderr": 0.024954184324879905 }, "harness|hendrycksTest-philosophy|5": { "acc": 0.7106109324758842, "acc_stderr": 0.025755865922632945, "acc_norm": 0.7106109324758842, "acc_norm_stderr": 0.025755865922632945 }, "harness|hendrycksTest-prehistory|5": { "acc": 0.7191358024691358, "acc_stderr": 0.025006469755799215, "acc_norm": 0.7191358024691358, "acc_norm_stderr": 0.025006469755799215 }, "harness|hendrycksTest-professional_accounting|5": { "acc": 0.4716312056737589, "acc_stderr": 0.029779450957303062, "acc_norm": 0.4716312056737589, "acc_norm_stderr": 0.029779450957303062 }, "harness|hendrycksTest-professional_law|5": { "acc": 0.4498044328552803, "acc_stderr": 0.012705721498565107, "acc_norm": 0.4498044328552803, "acc_norm_stderr": 0.012705721498565107 }, "harness|hendrycksTest-professional_medicine|5": { "acc": 0.6580882352941176, "acc_stderr": 0.02881472242225418, "acc_norm": 0.6580882352941176, "acc_norm_stderr": 0.02881472242225418 }, "harness|hendrycksTest-professional_psychology|5": { "acc": 0.6519607843137255, "acc_stderr": 0.019270998708223974, "acc_norm": 0.6519607843137255, "acc_norm_stderr": 0.019270998708223974 }, "harness|hendrycksTest-public_relations|5": { "acc": 0.6636363636363637, "acc_stderr": 0.04525393596302506, "acc_norm": 0.6636363636363637, "acc_norm_stderr": 0.04525393596302506 }, "harness|hendrycksTest-security_studies|5": { "acc": 0.7224489795918367, "acc_stderr": 0.028666857790274645, "acc_norm": 0.7224489795918367, "acc_norm_stderr": 0.028666857790274645 }, "harness|hendrycksTest-sociology|5": { "acc": 0.8557213930348259, "acc_stderr": 0.02484575321230604, "acc_norm": 0.8557213930348259, "acc_norm_stderr": 0.02484575321230604 }, "harness|hendrycksTest-us_foreign_policy|5": { "acc": 0.86, "acc_stderr": 0.03487350880197771, "acc_norm": 0.86, "acc_norm_stderr": 0.03487350880197771 }, "harness|hendrycksTest-virology|5": { "acc": 0.5481927710843374, "acc_stderr": 0.03874371556587953, "acc_norm": 0.5481927710843374, "acc_norm_stderr": 0.03874371556587953 }, "harness|hendrycksTest-world_religions|5": { "acc": 0.8421052631578947, "acc_stderr": 0.027966785859160896, "acc_norm": 0.8421052631578947, "acc_norm_stderr": 0.027966785859160896 }, "harness|truthfulqa:mc|0": { "mc1": 0.29008567931456547, "mc1_stderr": 0.01588623687420952, "mc2": 0.41501661742948026, "mc2_stderr": 0.014285902986671931 }, "harness|winogrande|5": { "acc": 0.7734806629834254, "acc_stderr": 0.011764149054698332 }, "harness|gsm8k|5": { "acc": 0.37452615617892343, "acc_stderr": 0.013331774158491393 } } ``` # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_MaziyarPanahi__Mistral-7B-Alpaca-52k-v0.1) | Metric |Value| |---------------------------------|----:| |Avg. |60.46| |AI2 Reasoning Challenge (25-Shot)|60.92| |HellaSwag (10-Shot) |82.13| |MMLU (5-Shot) |63.41| |TruthfulQA (0-shot) |41.50| |Winogrande (5-shot) |77.35| |GSM8k (5-shot) |37.45|