dooly-hub / dicts /dict.transformer.base.en.const.txt
jinmang2's picture
add tokenizers and dict files
00fcbe7
raw
history blame
1.26 kB
</NP> 834587
<NP> 834587
NN 655960
IN 559326
DT 458456
</VP> 450160
<VP> 450160
NNP 405496
</S> 300422
<S> 300422
JJ 292882
NNS 272432
. 251036
XX 250846
</PP> 243084
<PP> 243084
, 230006
PRP 225438
RB 224084
VB 181970
VBD 169356
CC 154986
</TOP> 141871
<TOP> 141871
VBZ 124814
VBP 107908
VBN 104034
CD 101496
</SBAR> 85487
<SBAR> 85487
VBG 85382
TO 76424
</ADVP> 73664
<ADVP> 73664
MD 63004
PRP$ 57414
</ADJP> 45325
<ADJP> 45325
HYPH 35952
</NML> 34655
<NML> 34655
UH 34140
POS 33028
</WHNP> 25806
<WHNP> 25806
WP 25254
WDT 24226
'' 23860
`` 22524
: 21704
RP 20390
WRB 20208
</INTJ> 16172
<INTJ> 16172
" 15072
JJR 13996
NNPS 13962
</QP> 11774
<QP> 11774
VERB 10126
</PRT> 9713
<PRT> 9713
</WHADVP> 9662
<WHADVP> 9662
$ 9022
EX 8722
JJS 7716
RBR 7040
) 6862
</EDITED> 6841
<EDITED> 6841
( 6782
</SQ> 6255
<SQ> 6255
</FRAG> 5225
<FRAG> 5225
PDT 5088
</PRN> 3999
<PRN> 3999
</SINV> 3317
<SINV> 3317
</SBARQ> 3305
<SBARQ> 3305
RBS 2882
FW 1956
</UCP> 1534
<UCP> 1534
NFP 1348
</CONJP> 1103
<CONJP> 1103
SYM 1102
</WHPP> 1002
<WHPP> 1002
</X> 816
<X> 816
' 782
WP$ 750
</EMBED> 718
<EMBED> 718
LS 690
</WHADJP> 516
<WHADJP> 516
ADD 460
</LST> 356
<LST> 356
</META> 288
<META> 288
</RRC> 232
<RRC> 232
</NAC> 218
<NAC> 218
</NX> 74
<NX> 74
AFX 62
</XX> 35
<XX> 35