Hong Lou Meng Fine-tuned Model for Word Alignment

This repository contains a fine-tuned version of the BERT multilingual model (bert-base-multilingual-cased) on the Hong Lou Meng dataset for word alignment tasks. This model is fine-tuned using the awesome-align framework and is designed for Chinese-Vietnamese (Zh-Vn) alignment.

Model Details

Base Model: bert-base-multilingual-cased
Fine-tuned Dataset: Excerpts from the classic "Hong Lou Meng" novel, annotated with Chinese and Vietnamese sentence pairs.
Alignment Task: Fine-tuned to align word pairs in parallel texts for translation and linguistic analysis.

Example Usage

Below is an example of how to use this model for word alignment using the transformers library:

from transformers import AutoTokenizer, AutoModel
import torch

# Load model and tokenizer
model_name = "username/zh-vn-hongloumeng-align"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

# Input sentences (Chinese and Vietnamese)
source_sentence = "第一回 甄士隱夢幻識通靈 賈雨村風塵懷閨秀"
target_sentence = "Hồi thứ nhất: Chân Sĩ Ẩn mộng ảo ngộ đá thiêng, Giả Vũ Thôn phong trần nhớ giai nhân."

# Tokenize inputs
inputs = tokenizer(source_sentence, target_sentence, return_tensors="pt", padding=True, truncation=True)

# Pass through model
outputs = model(**inputs)

# Further processing for alignment visualization or analysis would follow
print("Model outputs:", outputs)