File size: 2,255 Bytes
8868a37
 
dd2272b
 
 
 
 
 
 
 
 
 
 
8868a37
 
156189f
 
 
2e2ee82
 
 
 
 
 
156189f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
tags:
- text-classification
library_name: fasttext
widget:
- text: "apple"
  example_title: "apple"
- text: "cat"
  example_title: "cat"
- text: "sunny"
  example_title: "sunny"
- text: "water"
  example_title: "water"
---

# debate2vec
Word-vectors created from a large corpus of competitive debate evidence, and data extraction / processing scripts

#usage
```
import fasttext.util
ft = fasttext.load_model('debate2vec.bin')
ft.get_word_vector('dialectics')
```
# Download Link
Github won't let me store large files in their repos. 
* [FastText Vectors Here](https://drive.google.com/file/d/1m-CwPcaIUun4qvg69Hx2gom9dMScuQwS/view?usp=sharing) (~260mb)


# About 

Created from all publically available Cross Examination Competitive debate evidence posted by the community on [Open Evidence](https://openev.debatecoaches.org/) (From 2013-2020)

Search through the original evidence by going to [debate.cards](http://debate.cards/)

Stats about this corpus: 
* 222485 unique documents larger than 200 words (DebateSum plus some additional debate docs that weren't well-formed enough for inclusion into DebateSum)
* 107555 unique words (showing up more than 10 times in the corpus)
* 101 million total words

Stats about debate2vec vectors: 
* 300 dimensions, minimum number of appearances of a word was 10, trained for 100 epochs with lr set to 0.10 using FastText
* lowercased (will release cased)
* No subword information

The corpus includes the following topics 

* 2013-2014 Cuba/Mexico/Venezuela Economic Engagement
* 2014-2015 Oceans
* 2015-2016 Domestic Surveillance
* 2016-2017 China
* 2017-2018 Education
* 2018-2019 Immigration
* 2019-2020 Reducing Arms Sales

Other topics that this word vector model will handle extremely well

* Philosophy (Especially Left-Wing / Post-modernist)
* Law
* Government 
* Politics


Initial release is of fasttext vectors without subword information. Future releases will include fine-tuned GPT-2 and other high end models as my GPU compute allows. 

# Screenshots
![](https://github.com/Hellisotherpeople/debate2vec/blob/master/debate2vec.jpg)
![](https://github.com/Hellisotherpeople/debate2vec/blob/master/debate2vec2.jpg)
![](https://github.com/Hellisotherpeople/debate2vec/blob/master/debate2vec3.jpg)