Contextsize
How big is the context size of the model?
32768
hmm, it seems it is only 255 tokens:
from transformers import AutoTokenizer
model_id = "TomGrc/FusionNet_7Bx2_MoE_14B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
print(f"model max token input {tokenizer.model_max_length}")
I tested it pretty rudimentarily. I tokenized a text of certain lengths (4k, 6k, 8k) and tested at which point it began to not follow my instructions and hallucinate. It works flawlessly until 4000 tokens, after that it begins to hallucinate heavily and doesn't follow instructions anymore.
This model hasn't been trained on super-long text yet.
@TomGrc , could you please train this model or make a new model on super-long input text? None of the models so far on huggingface can't handle super long input texts properly regardless their context size. Even so called 200k models, can't handle a 2500 words article correctly. Could you please do that?
I will consider that.
@TomGrc The new method by bge called activation beacon could suit long contexts very well, when its released soon
@Bearsaerker ,Could you please give us more information about this activation beacon method? Do you know any model which was trained using this method so far?
@HR1777 I don't know much myself. I just follow certain stuff by BAAI. Here is the paper: https://huggingface.co/papers/2401.03462 This is the github https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/activation_beacon and this is the only model so far, which I couldn't test because I don't have enough vram lol https://huggingface.co/namespace-Pt/activation-beacon-llama2-7b-chat
It seems interesting. Is it possible to run the model you mentioned in gguf format? I mean does llama.cpp support it already?
(https://huggingface.co/namespace-Pt/activation-beacon-llama2-7b-chat)
Activation beacon is out now :) https://github.com/FlagOpen/FlagEmbedding/blob/master/Long_LLM/activation_beacon/docs/training.md