BERTopic_Multimodal

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

This model was trained on 8000 images from Flickr without the captions. This demonstrates how BERTopic can be used for topic modeling using images as input only.

A few examples of generated topics:

"multimodal.png"

Usage

To use this model, please install BERTopic:

pip install -U bertopic[vision]
pip install -U safetensors

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("MaartenGr/BERTopic_Multimodal")

topic_model.get_topic_info()

You can view all information about a topic as follows:

topic_model.get_topic(topic_id, full=True)

Topic overview

  • Number of topics: 29
  • Number of training documents: 8091
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 while - air - the - in - jumping 34 -1_while_air_the_in
0 bench - sitting - people - woman - street 1132 0_bench_sitting_people_woman
1 grass - running - dog - grassy - field 1693 1_grass_running_dog_grassy
2 boy - girl - little - young - holding 1290 2_boy_girl_little_young
3 dog - frisbee - running - water - mouth 1224 3_dog_frisbee_running_water
4 skateboard - ramp - doing - trick - cement 415 4_skateboard_ramp_doing_trick
5 snow - dog - covered - running - through 309 5_snow_dog_covered_running
6 mountain - range - slope - standing - person 205 6_mountain_range_slope_standing
7 pool - blue - boy - toy - water 189 7_pool_blue_boy_toy
8 trail - bike - down - riding - person 166 8_trail_bike_down_riding
9 snowboarder - mid - jump - air - after 126 9_snowboarder_mid_jump_air
10 rock - climbing - up - wall - tree 124 10_rock_climbing_up_wall
11 wave - surfboard - top - riding - of 112 11_wave_surfboard_top_riding
12 beach - surfboard - people - with - walking 102 12_beach_surfboard_people_with
13 jumping - track - horse - racquet - dog 98 13_jumping_track_horse_racquet
14 snowboard - snow - girl - hill - slope 95 14_snowboard_snow_girl_hill
15 game - being - football - played - professional 91 15_game_being_football_played
16 soccer - kicking - team - ball - player 80 16_soccer_kicking_team_ball
17 dirt - bike - person - rider - going 75 17_dirt_bike_person_rider
18 soccer - boys - field - ball - kicking 69 18_soccer_boys_field_ball
19 baseball - player - bat - swinging - into 63 19_baseball_player_bat_swinging
20 basketball - up - and - playing - jumping 59 20_basketball_up_and_playing
21 bird - body - flying - over - long 55 21_bird_body_flying_over
22 motorcycle - track - race - racer - racing 55 22_motorcycle_track_race_racer
23 boat - sitting - water - lake - hose 53 23_boat_sitting_water_lake
24 street - riding - down - bike - woman 52 24_street_riding_down_bike
25 paddle - suit - paddling - water - in 49 25_paddle_suit_paddling_water
26 pair - scissors - stage - white - shirt 42 26_pair_scissors_stage_white
27 tennis - court - racket - racquet - swinging 34 27_tennis_court_racket_racquet

Training Procedure

The data was retrieved as follows:

import os
import glob
import zipfile
import numpy as np
import pandas as pd
from tqdm import tqdm
from sentence_transformers import util

# Flickr 8k images
img_folder = 'photos/'
caps_folder = 'captions/'
if not os.path.exists(img_folder) or len(os.listdir(img_folder)) == 0:
    os.makedirs(img_folder, exist_ok=True)

    if not os.path.exists('Flickr8k_Dataset.zip'):   #Download dataset if does not exist
        util.http_get('https://github.com/jbrownlee/Datasets/releases/download/Flickr8k/Flickr8k_Dataset.zip', 'Flickr8k_Dataset.zip')
        util.http_get('https://github.com/jbrownlee/Datasets/releases/download/Flickr8k/Flickr8k_text.zip', 'Flickr8k_text.zip')

    for folder, file in [(img_folder, 'Flickr8k_Dataset.zip'), (caps_folder, 'Flickr8k_text.zip')]:
        with zipfile.ZipFile(file, 'r') as zf:
            for member in tqdm(zf.infolist(), desc='Extracting'):
                zf.extract(member, folder)
images = list(glob.glob('photos/Flicker8k_Dataset/*.jpg'))

Then, to perform topic modeling on multimodal data with BERTopic:

from bertopic import BERTopic
from bertopic.backend import MultiModalBackend
from bertopic.representation import VisualRepresentation, KeyBERTInspired

# Image embedding model
embedding_model = MultiModalBackend('clip-ViT-B-32', batch_size=32)

# Image to text representation model
representation_model = {
    "Visual_Aspect": VisualRepresentation(image_to_text_model="nlpconnect/vit-gpt2-image-captioning", image_squares=True),
    "KeyBERT": KeyBERTInspired()
}

# Train our model with images only
topic_model = BERTopic(representation_model=representation_model, verbose=True, embedding_model=embedding_model, min_topic_size=30)
topics, probs = topic_model.fit_transform(documents=None, images=images)

The above demonstrates that the input were only images. These images are clustered and from those clusters a small subset of representative images are extracted. The representative images are captioned using "nlpconnect/vit-gpt2-image-captioning" to generate a small textual dataset over which we can run c-TF-IDF and the additional KeyBERTInspired representation model.

Training hyperparameters

  • calculate_probabilities: False
  • language: None
  • low_memory: False
  • min_topic_size: 30
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True

Framework versions

  • Numpy: 1.23.5
  • HDBSCAN: 0.8.29
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.29.2
  • Numba: 0.56.4
  • Plotly: 5.14.1
  • Python: 3.10.10
Downloads last month
6
Inference API
Unable to determine this model’s pipeline type. Check the docs .