Crystalcareai
/

llama-3-4x8b

Text Generation

Model card Files Files and versions Community

Crystalcareai commited on Apr 19, 2024

Commit

0f236b2

·

verified ·

1 Parent(s): 3147e3e

Create README.md

Files changed (1) hide show

README.md +41 -0

README.md ADDED Viewed

	@@ -0,0 +1,41 @@

+This is an MOE of Llama-3-8b with 4 experts. This does not use semantic routing, as this utilizes the deepseek-moe architecture. There is no routing, and there is no gate - all experts are active on every token.
+```import torch
+from transformers import AutoTokenizer, TextStreamer, AutoModelForCausalLM
+model_path = "./content"
+model = AutoModelForCausalLM.from_pretrained(
+    model_path,
+    device_map="auto",
+    low_cpu_mem_usage=True,
+    torch_dtype=torch.bfloat16,
+    trust_remote_code=True,
+    attn_implementation="flash_attention_2",
+)
+tokenizer = AutoTokenizer.from_pretrained(model_path)
+streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
+# Modify the prompt to match the Alpaca instruction template
+prompt = """
+Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
+### Instruction:
+Sam is faster than Joe. Joe is faster than Jane. Is Sam faster than Jane? Explain your reasoning step by step.
+### Input:
+### Response:
+"""
+tokens = tokenizer(
+    prompt,
+    return_tensors='pt'
+).input_ids.cuda()
+generation_output = model.generate(
+    tokens,
+    streamer=streamer,
+    max_new_tokens=512,
+)
+```