|
--- |
|
license: mit |
|
--- |
|
|
|
# Bengali Word2Vec Model |
|
This is a pre-trained word2vec model for Bengali language. |
|
|
|
This model is build for [bengalinlp](https://github.com/banglawiki/bengalinlp) package. |
|
|
|
## Datasets |
|
- [Wikipedia dump datasets](https://dumps.wikimedia.org/bnwiki/latest/) |
|
|
|
## Training details |
|
- Word2Vec word embedding dimension = 100, min_count=5, window=5, epochs=10 |
|
|
|
## Usage |
|
- `pip install -U bengalinlp_toolkit` |
|
- Generate Vector using pretrain model |
|
|
|
```py |
|
from bengalinlp import BengaliWord2Vec |
|
|
|
bwv = BengaliWord2Vec() |
|
model_path = "bengali_word2vec.model" |
|
word = 'গ্রাম' |
|
vector = bwv.generate_word_vector(model_path, word) |
|
print(vector.shape) |
|
print(vector) |
|
|
|
``` |
|
|
|
- Find Most Similar Word Using Pretrained Model |
|
|
|
```py |
|
from bengalinlp import BengaliWord2Vec |
|
|
|
bwv = BengaliWord2Vec() |
|
model_path = "bengali_word2vec.model" |
|
word = 'গ্রাম' |
|
similar = bwv.most_similar(model_path, word, topn=10) |
|
print(similar) |
|
|
|
``` |