Mairaj

hikmalab

AI & ML interests

Applying AI and ML to vast corpus of Islamic thought texts

Recent Activity

liked a dataset about 5 hours ago

free-law/Caselaw_Access_Project

liked a dataset 23 days ago

PleIAs/YouTube-Commons

reacted to Salama1429's post with ❤️ 23 days ago

📺 Introducing the YouTube-Commons Dataset 📺 🌐 Overview: The YouTube Commons Dataset is a comprehensive collection of 30 billion words from 15,112,121 original and automatically translated transcripts, drawn from 2,063,066 videos on YouTube. 🔗 License: All videos are shared under the CC-BY license, with the majority (71%) in English. 🤖 Applications: This dataset is ideal for training powerful AI models for converting speech to text (ASR) and translation models. 📊 Utilization: The text can be used for model training and is republishable for reproducibility purposes. 🤝 Collaboration: This dataset is the result of a collaboration between state start-up LANGU:IA, the French Ministry of Culture, and DINUM. It will be expanded in the coming months. 🔗 Explore the dataset here: https://lnkd.in/d_paWKFE #YouTubeCommons #AIResearch #MachineLearning #OpenData #ArtificialIntelligence #NLP #Dataset #TechCollaboration #Innovation #DigitalTransformation

View all activity

Organizations

hikmalab's activity

liked a dataset about 5 hours ago

free-law/Caselaw_Access_Project

Viewer • Updated Mar 16, 2024 • 4.28M • 8.99k • 64

liked a dataset 23 days ago

PleIAs/YouTube-Commons

Updated Jun 26, 2024 • 832 • 336

reacted to Salama1429's post with ❤️ 23 days ago

Post

2499

📺 Introducing the YouTube-Commons Dataset 📺

🌐 Overview: The YouTube Commons Dataset is a comprehensive collection of 30 billion words from 15,112,121 original and automatically translated transcripts, drawn from 2,063,066 videos on YouTube.

🔗 License: All videos are shared under the CC-BY license, with the majority (71%) in English.

🤖 Applications: This dataset is ideal for training powerful AI models for converting speech to text (ASR) and translation models.

📊 Utilization: The text can be used for model training and is republishable for reproducibility purposes.

🤝 Collaboration: This dataset is the result of a collaboration between state start-up LANGU:IA, the French Ministry of Culture, and DINUM. It will be expanded in the coming months.

🔗 Explore the dataset here: https://lnkd.in/d_paWKFE

#YouTubeCommons #AIResearch #MachineLearning #OpenData #ArtificialIntelligence #NLP #Dataset #TechCollaboration #Innovation #DigitalTransformation

liked a dataset 23 days ago

MohamedRashad/Quran-Tafseer

Viewer • Updated Sep 13, 2024 • 219k • 175 • 37