Discrete Infomax Codes for Supervised Representation Learning Paper • 1905.11656 • Published May 28, 2019
STELLA: Continual Audio-Video Pre-training with Spatio-Temporal Localized Alignment Paper • 2310.08204 • Published Oct 12, 2023
Language-only Efficient Training of Zero-shot Composed Image Retrieval Paper • 2312.01998 • Published Dec 4, 2023
An Extendable, Efficient and Effective Transformer-based Object Detector Paper • 2204.07962 • Published Apr 17, 2022
HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts Paper • 2404.17507 • Published Apr 26, 2024
Pivotal Role of Language Modeling in Recommender Systems: Enriching Task-specific and Task-agnostic Representation Learning Paper • 2212.03760 • Published Dec 7, 2022
Learning Dynamics of Attention: Human Prior for Interpretable Machine Reasoning Paper • 1905.11666 • Published May 28, 2019
Computational Approaches for App-to-App Retrieval and Design Consistency Check Paper • 2309.10328 • Published Sep 19, 2023
Reducing Task Discrepancy of Text Encoders for Zero-Shot Composed Image Retrieval Paper • 2406.09188 • Published Jun 13, 2024
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision Paper • 2102.03334 • Published Feb 5, 2021
SeiT: Storage-Efficient Vision Training with Tokens Using 1% of Pixel Storage Paper • 2303.11114 • Published Mar 20, 2023
CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion Paper • 2303.11916 • Published Mar 21, 2023
ViDT: An Efficient and Effective Fully Transformer-based Object Detector Paper • 2110.03921 • Published Oct 8, 2021 • 1