21 29 2

TNQ

LHPKAI

trannhatquy

AI & ML interests

NLP,CV,MLOps

Recent Activity

upvoted an article 3 days ago

🐺🐦‍⬛ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark

upvoted a paper 6 days ago

1.58-bit FLUX

upvoted a paper 12 days ago

Adding Conditional Control to Text-to-Image Diffusion Models

View all activity

Organizations

LHPKAI's activity

upvoted an article 3 days ago

Article

🐺🐦‍⬛ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark

•

4 days ago

• 30

upvoted a paper 6 days ago

1.58-bit FLUX

Paper • 2412.18653 • Published 13 days ago • 66

upvoted a paper 12 days ago

Adding Conditional Control to Text-to-Image Diffusion Models

Paper • 2302.05543 • Published Feb 10, 2023 • 44

upvoted a paper 13 days ago

Training Large Language Models to Reason in a Continuous Latent Space

Paper • 2412.06769 • Published 28 days ago • 66

upvoted a paper 26 days ago

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published about 1 month ago • 123

upvoted a paper about 1 month ago

O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

Paper • 2411.16489 • Published Nov 25, 2024 • 41

upvoted an article about 2 months ago

Article

ColPali: Efficient Document Retrieval with Vision Language Models 👀

•

Jul 5, 2024

• 183

upvoted a paper 3 months ago

Contextual Document Embeddings

Paper • 2410.02525 • Published Oct 3, 2024 • 18

upvoted a paper 7 months ago

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12, 2024 • 65

updated a collection 7 months ago

NLP paper

Collection

37 items • Updated Jun 11, 2024

upvoted an article 7 months ago

Article

Falcon 2: An 11B parameter pretrained language model and VLM, trained on over 5000B tokens tokens and 11 languages

May 24, 2024

• 25

upvoted 2 articles 8 months ago

Article

Hugging Face x LangChain : A new partner package in LangChain

May 14, 2024

• 115

Article

PaliGemma – Google's Cutting-Edge Open Vision Language Model

May 14, 2024

• 231

upvoted a paper 8 months ago

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 14

updated a collection 9 months ago

NLP paper

Collection

37 items • Updated Jun 11, 2024

reacted to Jaward's post with 👍 9 months ago

Post

5312

All You need To Know About Phi-3 (Technical Report Walkthrough)

Summary of Summaries:
Phi-3-mini
- Architecture specs: decoder-only transformer, ModelSize: 3.8 billion
parameters, LongRope [ 128K Context length ], Vocab Size [ 32064 ],
trained on 3.3 trillion tokens. at bfloat16.
- Rivals performance to larger models like Mixtral 8x7B and GPT-3.5,
capable of running locally on a smartphone.
- Utilizes high quality training dataset heavily filtered from web data and
llm-generated synthetic data.
- Can be quantized to 4-bits, occupying ≈ 1.8GB of memory.
- Ran natively on iPhone 14 with A16 Bionic chip with inference speed of up
to 12 tokens per second.

Phi-3-small
- Architecture specs: Also decoder-only, 7B parameters, Vocab size [ 100352 ], default context length [ 8k ], Context Length: 8K, Hidden Dimension: 4096, Number of Heads and Layers: Follows 7B class structure.
- Uses tiktoken tokenizer (for enhanced multilingual tokenization)

Phi-3-medium:
- Architecture specs: Also decoder-only, Hidden Dimension: 5120, Number of Heads: 40, Number of Layers: 40, Tokenization: Consistent with other models, Training on 4.8 trillion tokens.

Training Methodology:
- Focuses on high-quality training data deviating from standard scaling laws.
- The models undergo two-phase pre-training using a mix of web sources and synthetic data for general knowledge and logical reasoning skills.

Performance:
- Phi-3-mini achieves competitive scores on standard benchmarks like MMLU and MT-Bench, indicating strong reasoning capabilities.
- Higher variants show even better performance, suggesting effective scaling with increased model size.

Limitations:
- phi-3-mini: limited by its smaller size in tasks requiring extensive factual knowledge, primarily supports English.
- phi-3-small limited multilingual support.

Hosting LLMs locally is a big win for OSS - private, secured inferencing on the go😎