Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
alielfilali01 
posted an update Oct 9, 2024
Post
1829
Why nobdoy is talking about the new training corpus released by MBZUAI today.

TxT360 is +15 Trillion tokens corpus outperforming FineWeb on several metrics. Ablation studies were done up to 1T tokens.

Read blog here : LLM360/TxT360
Dataset : LLM360/TxT360