File size: 2,255 Bytes
8868a37 dd2272b 8868a37 156189f 2e2ee82 156189f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
---
tags:
- text-classification
library_name: fasttext
widget:
- text: "apple"
example_title: "apple"
- text: "cat"
example_title: "cat"
- text: "sunny"
example_title: "sunny"
- text: "water"
example_title: "water"
---
# debate2vec
Word-vectors created from a large corpus of competitive debate evidence, and data extraction / processing scripts
#usage
```
import fasttext.util
ft = fasttext.load_model('debate2vec.bin')
ft.get_word_vector('dialectics')
```
# Download Link
Github won't let me store large files in their repos.
* [FastText Vectors Here](https://drive.google.com/file/d/1m-CwPcaIUun4qvg69Hx2gom9dMScuQwS/view?usp=sharing) (~260mb)
# About
Created from all publically available Cross Examination Competitive debate evidence posted by the community on [Open Evidence](https://openev.debatecoaches.org/) (From 2013-2020)
Search through the original evidence by going to [debate.cards](http://debate.cards/)
Stats about this corpus:
* 222485 unique documents larger than 200 words (DebateSum plus some additional debate docs that weren't well-formed enough for inclusion into DebateSum)
* 107555 unique words (showing up more than 10 times in the corpus)
* 101 million total words
Stats about debate2vec vectors:
* 300 dimensions, minimum number of appearances of a word was 10, trained for 100 epochs with lr set to 0.10 using FastText
* lowercased (will release cased)
* No subword information
The corpus includes the following topics
* 2013-2014 Cuba/Mexico/Venezuela Economic Engagement
* 2014-2015 Oceans
* 2015-2016 Domestic Surveillance
* 2016-2017 China
* 2017-2018 Education
* 2018-2019 Immigration
* 2019-2020 Reducing Arms Sales
Other topics that this word vector model will handle extremely well
* Philosophy (Especially Left-Wing / Post-modernist)
* Law
* Government
* Politics
Initial release is of fasttext vectors without subword information. Future releases will include fine-tuned GPT-2 and other high end models as my GPU compute allows.
# Screenshots
![](https://github.com/Hellisotherpeople/debate2vec/blob/master/debate2vec.jpg)
![](https://github.com/Hellisotherpeople/debate2vec/blob/master/debate2vec2.jpg)
![](https://github.com/Hellisotherpeople/debate2vec/blob/master/debate2vec3.jpg)
|