Gary Hutson's picture

4 2 47

Gary Hutson

StatsGary

·

https://hutsons-hacks.info/

StatsGary

AI & ML interests

- Deep Learning - Transformers for MLM - Transformers for seq2seq classification - NLP - Computer Vision

Recent Activity

liked a model about 1 month ago

mistralai/Mistral-7B-Instruct-v0.3

liked a Space 7 months ago

EmilyWitko/Hugging_Face_Values

liked a Space 7 months ago

andito/Florence-2-DocVQA

View all activity

Organizations

None yet

StatsGary's activity

liked a model about 1 month ago

mistralai/Mistral-7B-Instruct-v0.3

Text Generation • Updated Aug 21, 2024 • 2.56M • • 1.26k

liked 2 Spaces 7 months ago

Hugging Face Values

Running on Zero

Florence 2

liked a Space 8 months ago

Running on CPU Upgrade

MTEB Leaderboard

upvoted an article 8 months ago

Article

Getting Started With Embeddings

Jun 23, 2022

• 44

updated a model 8 months ago

StatsGary/Mistral-7B-v0.1-SFT-ultrachat-DPO

Updated May 31, 2024

liked a dataset 8 months ago

kaitchup/UltraFeedback-prompt-chosen-rejected

Viewer • Updated Oct 23, 2023 • 18k • 33 • 6

updated a model 8 months ago

StatsGary/Maixtchup-4x7b

Text Generation • Updated May 29, 2024 • 17

liked a model 8 months ago

CompVis/stable-diffusion-v1-4

Text-to-Image • Updated Aug 23, 2023 • 1.11M • • 6.65k

updated a collection 8 months ago

Papers

1 item • Updated May 22, 2024

updated a dataset 9 months ago

StatsGary/idefics2

Updated May 8, 2024 • 20

liked a Space 9 months ago

Datasets Tagging

updated a model 9 months ago

StatsGary/idefics2

Updated May 8, 2024

New activity in mistralai/Mixtral-8x7B-Instruct-v0.1 9 months ago

Enable inference API

#49 opened about 1 year ago by

liked a model 9 months ago

microsoft/Phi-3-mini-128k-instruct

Text Generation • Updated Aug 20, 2024 • 356k • 1.63k

liked a Space 9 months ago

Running on CPU Upgrade

Open VLM Leaderboard

VLMEvalKit Evaluation Results Collection

reacted to Jaward's post with 🚀 9 months ago

Post

5347

All You need To Know About Phi-3 (Technical Report Walkthrough)

Summary of Summaries:
Phi-3-mini
- Architecture specs: decoder-only transformer, ModelSize: 3.8 billion
parameters, LongRope [ 128K Context length ], Vocab Size [ 32064 ],
trained on 3.3 trillion tokens. at bfloat16.
- Rivals performance to larger models like Mixtral 8x7B and GPT-3.5,
capable of running locally on a smartphone.
- Utilizes high quality training dataset heavily filtered from web data and
llm-generated synthetic data.
- Can be quantized to 4-bits, occupying ≈ 1.8GB of memory.
- Ran natively on iPhone 14 with A16 Bionic chip with inference speed of up
to 12 tokens per second.

Phi-3-small
- Architecture specs: Also decoder-only, 7B parameters, Vocab size [ 100352 ], default context length [ 8k ], Context Length: 8K, Hidden Dimension: 4096, Number of Heads and Layers: Follows 7B class structure.
- Uses tiktoken tokenizer (for enhanced multilingual tokenization)

Phi-3-medium:
- Architecture specs: Also decoder-only, Hidden Dimension: 5120, Number of Heads: 40, Number of Layers: 40, Tokenization: Consistent with other models, Training on 4.8 trillion tokens.

Training Methodology:
- Focuses on high-quality training data deviating from standard scaling laws.
- The models undergo two-phase pre-training using a mix of web sources and synthetic data for general knowledge and logical reasoning skills.

Performance:
- Phi-3-mini achieves competitive scores on standard benchmarks like MMLU and MT-Bench, indicating strong reasoning capabilities.
- Higher variants show even better performance, suggesting effective scaling with increased model size.

Limitations:
- phi-3-mini: limited by its smaller size in tasks requiring extensive factual knowledge, primarily supports English.
- phi-3-small limited multilingual support.

Hosting LLMs locally is a big win for OSS - private, secured inferencing on the go😎

4 replies

·

liked a model 9 months ago

meta-llama/Meta-Llama-3-8B

Text Generation • Updated Sep 27, 2024 • 514k • 5.97k

upvoted an article 9 months ago

Article

How to Finetune phi-3 on MacBook Pro

By

•

Apr 24, 2024

• 65