Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
1
1
Clement Perrot
clementperrot
Follow
perrotcl
clemfeelsgood
AI & ML interests
None yet
Recent Activity
liked
a model
about 1 month ago
nvidia/Hymba-1.5B-Base
new
activity
3 months ago
aws-neuron/optimum-neuron-cache:
[Cache Request] NousResearch/Hermes-3-Llama-3.1-8B
reacted
to
clem
's
post
with π
about 1 year ago
Is synthetic data the future of AI? π₯π₯π₯ @HugoLaurencon @Leyo & @VictorSanh are introducing https://huggingface.co/datasets/HuggingFaceM4/WebSight , a multimodal dataset featuring 823,000 pairs of synthetically generated HTML/CSS codes along with screenshots of the corresponding rendered websites to train GPT4-V-like models ππ» While crafting their upcoming foundation vision language model, they faced the challenge of converting website screenshots into usable HTML/CSS codes. Most VLMs suck at this and there was no public dataset available for this specific task, so they decided to create their own. They prompted existing LLMs to generate 823k HTML/CSS codes of very simple websites. Through supervised fine-tuning of a vision language model on WebSight, they were able to generate the code to reproduce a website component, given a screenshot. You can explore the dataset here: https://huggingface.co/datasets/HuggingFaceM4/WebSight What do you think?
View all activity
Organizations
None yet
models
1
clementperrot/test
Updated
Jan 12, 2024
datasets
None public yet