Jaward Sesay

Jaward

AI & ML interests

I like to train large deep neural nets too 🧠🤖💥 | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Recent Activity

liked a model 1 day ago

Qwen/QVQ-72B-Preview

posted an update 4 days ago

nanoBLT: Simplified lightweight implementation of a character-level Byte Latent Transformer model (under 500 lines of code). The model is 2x4x2 (n_layers_encoder, n_layers_latent, n_layers_decoder) layer deep trained on ~1M bytes of tiny Shakespeare with a patch size of 4. Code: https://github.com/Jaykef/ai-algorithms/blob/main/byte_latent_transformer.ipynb

liked a model 6 days ago

deepseek-ai/DeepSeek-V3-Base

View all activity

Articles

In Honour of This Year's NeurIPs Test of Time Paper Awardees

22 days ago

• 2

Rethinking Backpropagation: Thoughts on What's Wrong with Backpropagation

about 1 month ago

• 5

Journey With Me Into The Mind of Large Language Models: Interesting Findings in AnthropicAI's Scaling Monosemanticity paper.

May 22, 2024

• 2

On Coding Your First Attention

Apr 21, 2024

• 7

Organizations

Posts 66

Post

2869

nanoBLT: Simplified lightweight implementation of a character-level Byte Latent Transformer model (under 500 lines of code). The model is 2x4x2 (n_layers_encoder, n_layers_latent, n_layers_decoder) layer deep trained on ~1M bytes of tiny Shakespeare with a patch size of 4.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/byte_latent_transformer.ipynb

Post

1758

Implements from first-principle a discrete flow matching model for code generation- trained a small sized 2D dfm model on two variations of code for binary search. The result was amazing, code in comment:
Code: https://github.com/Jaykef/ai-algorithms/blob/main/dfm.ipynb

View all posts