MaziyarPanahi
/

Llama-3-16B-Instruct-v0.1

Text Generation

text-generation-inference

Model card Files Files and versions Community

Base

#8

by ehartford - opened Apr 21, 2024

Apr 21, 2024

Base model please

Apr 21, 2024

Instruct is not useful, I can't tune this

Owner Apr 21, 2024

•

edited Apr 21, 2024

@ehartford I was actually going to make baes for all 3 of the (11b, 13b, and 16b) and start fine-tuning them.

~~Are you going to fine-tune as well?~~ (once I finish my other fine-tunes I'll make the base ones)

Apr 21, 2024

•

edited Apr 21, 2024

I will not tune them directly

My intent is to use it to initialize the expert weights of a MoE,

Then pretrain and fine-tune on top of that to produce a dolphin MoE

Apr 21, 2024

I think proving you process by doing it with a Instruct model first is a great strategy to show that the output is coherent and the method is sound

Apr 21, 2024

@ehartford I was actually going to make baes for all 3 of the (11b, 13b, and 16b) and start fine-tuning them.

~~Are you going to fine-tune as well?~~ (once I finish my other fine-tunes I'll make the base ones)

Thank you!

Owner Apr 21, 2024

@ehartford sounds really interesting! Love to see how they work as experts. Here are the merges based on the base Llama-3-8B:

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment