Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation Paper β’ 2412.18176 β’ Published Dec 24, 2024 β’ 15
view post Post 5590 I have put together a notebook on Multimodal RAG, where we do not process the documents with hefty pipelines but natively use:- vidore/colpali for retrieval π it doesn't need indexing with image-text pairs but just images!- Qwen/Qwen2-VL-2B-Instruct for generation π¬ directly feed images as is to a vision language model with no processing to text! I used ColPali implementation of the new π Byaldi library by @bclavie π€https://github.com/answerdotai/byaldiLink to notebook: https://github.com/merveenoyan/smol-vision/blob/main/ColPali_%2B_Qwen2_VL.ipynb π₯ 23 23 π 10 10 β€οΈ 4 4 + Reply
Awesome Document AI Collection A collection of open-source document AI π π π β’ 27 items β’ Updated Mar 11, 2024 β’ 77
sentence-transformers/bert-base-nli-mean-tokens Sentence Similarity β’ Updated Nov 5, 2024 β’ 1.16M β’ 36
sentence-transformers/all-MiniLM-L12-v2 Sentence Similarity β’ Updated Nov 5, 2024 β’ 3.98M β’ 222
sentence-transformers/paraphrase-MiniLM-L6-v2 Sentence Similarity β’ Updated Nov 5, 2024 β’ 8.85M β’ 115
sentence-transformers/multi-qa-mpnet-base-dot-v1 Sentence Similarity β’ Updated Nov 5, 2024 β’ 1.28M β’ 162
sentence-transformers/all-mpnet-base-v2 Sentence Similarity β’ Updated Nov 5, 2024 β’ 28.2M β’ β’ 964
sentence-transformers/all-MiniLM-L6-v2 Sentence Similarity β’ Updated Nov 1, 2024 β’ 78.5M β’ β’ 2.91k