Fairly new to ML and very new to transformers. Want to make sure I’m doing the right thing … I’m trying to do text classification with a small data set and though this would be a good option (is it?)
Here’s the basics of my code:
texts = ["random text string...", ...]
labels = [1, 0, ...]
tokenized_sents = []
attention_masks = []
for sentence in sentences:
tokenized_sents.append(tokenizer.encode(sentence, add_special_tokens=True, ...))
input_ids = pad_sequences(tokenized_sents)
for sentence in input_ids:
att_mask = [int(token_id > 0) for token_id in sentence]
attention_masks.append(att_mask)
dataset = tf.data.Dataset.from_tensor_slices(input_ids, attention_masks, labels)
# copied from https://huggingface.co/transformers/training.html
optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer=optimizer, loss=loss)
model.fit(dataset, epochs=2, steps_per_epoch=115)
I was pretty confident this all worked, but then when I did the following test:
sent = ["I like to watch movies"]
sent = tokenizer.encode(sentence, add_special_tokens=True, ...)
att_mask = [int(token_id > 0) for token_id in sent]
ds = tf.data.Dataset.from_tensor_slices(sent, att_mask)
model.predict(ds)
I got a super long array. But the labels can only be 1 or 0 and there’s only one sample, so I was expecting a 1 by 2 array. Any idea why this doesn’t work?
Also, what’s the best way to save this model and use it for predictions later.
Thank you.