Supplementary Reading and Resources 🤗
We hope that you found the unit on multimodal models exciting. If you’d like to learn and explore in detail about multimodal learning and models, here is a list of resources for your reference:
- Hugging Face Tasks offers an overview of various tasks under domains like Computer Vision, Audio, NLP, Multimodal Learning and Reinforcement Learning. The tasks contain demos, use cases, models, datasets, etc.
- 11-777 MMML course on multimodal machine learning by CMU. You can find the video lectures here.
- Blog on Multimodality and LLMs by Chip Huyen provides a comprehensive overview of multimodality, large multimodal models, systems like BLIP, CLIP, etc.
- Awesome Multimodal ML, a GitHub repository containing papers, courses, architectures, workshops, tutorials etc.
- Awesome Multimodal Large Language Models, a GitHub repository containing papers and datasets related to multimodal LLMs.
- EE/CS 148, Caltech course on Large Language and Vision Models.
In the next unit we will take a look at another kind of Neural Network Models that were revolutionized by multimodality in the last years: Generative Neural Networks Get you paint brush ready and join us on another exciting adventure in the realm of Computer Vision 🤠
< > Update on GitHub