I like to train large deep neural nets too ๐ง ๐ค๐ฅ | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy
nanoBLT: Simplified lightweight implementation of a character-level Byte Latent Transformer model (under 500 lines of code). The model is 2x4x2 (n_layers_encoder, n_layers_latent, n_layers_decoder) layer deep trained on ~1M bytes of tiny Shakespeare with a patch size of 4.
Implements from first-principle a discrete flow matching model for code generation- trained a small sized 2D dfm model on two variations of code for binary search. The result was amazing, code in comment: Code: https://github.com/Jaykef/ai-algorithms/blob/main/dfm.ipynb