gremid commited on
Commit
7db61d5
·
verified ·
1 Parent(s): a53cc16

Upload folder using huggingface_hub

Browse files
Files changed (15) hide show
  1. BUILT +1 -1
  2. GIT_REV +1 -1
  3. GIT_REV_LEX +1 -1
  4. README.md +23 -11
  5. finite.a +2 -2
  6. finite.ca +0 -0
  7. index.a +0 -0
  8. index.ca +0 -0
  9. index.csv.lzma +0 -0
  10. lemma.a +2 -2
  11. lemma.ca +0 -0
  12. morph.a +2 -2
  13. morph.ca +0 -0
  14. root.a +2 -2
  15. root.ca +2 -2
BUILT CHANGED
@@ -1 +1 @@
1
- 2024-11-28T13:37:11.407789
 
1
+ 2025-01-17T15:15:52.449967
GIT_REV CHANGED
@@ -1 +1 @@
1
- 76701cc
 
1
+ 1ff61eb
GIT_REV_LEX CHANGED
@@ -1 +1 @@
1
- 76701cc
 
1
+ 1ff61eb
README.md CHANGED
@@ -20,7 +20,7 @@ model-index:
20
  split: train
21
  metrics:
22
  - type: coverage
23
- value: 0.8415293963067323
24
  name: Coverage
25
  - type: coverage
26
  value: 1.0
@@ -32,7 +32,7 @@ model-index:
32
  value: 0.9999580703997988
33
  name: Coverage ($.)
34
  - type: coverage
35
- value: 0.774030155216797
36
  name: Coverage (ADJA)
37
  - type: coverage
38
  value: 0.7548407611333322
@@ -80,7 +80,7 @@ model-index:
80
  value: 0.0618080812117821
81
  name: Coverage (NE)
82
  - type: coverage
83
- value: 0.7440482047389456
84
  name: Coverage (NN)
85
  - type: coverage
86
  value: 0.9799275737196068
@@ -183,21 +183,34 @@ model-index:
183
  name: Coverage (XY)
184
  ---
185
 
186
- # DWDSmor
187
 
188
- _SFST/SMOR/DWDS-based German morphology_
189
 
190
 
191
 
192
 
 
 
 
 
 
 
193
 
194
- DWDSmor implements the lemmatisation and morphological analysis of
195
- word forms as well as the generation of paradigms of lexical words in
196
- written German.
 
 
 
 
 
 
 
 
197
 
198
  ## Usage
199
 
200
- DWDSmor is available via PyPI:
201
 
202
  ``` plaintext
203
  pip install dwdsmor
@@ -224,8 +237,7 @@ generation:
224
  scripts for morphological analysis and for paradigm generation by
225
  means of DWDSmor transducers.
226
  * `share/` contains XSLT stylesheets for extracting lexical entries in SMORLemma
227
- format form XML sources of DWDS articles. Sample inputs and outputs can be
228
- found in `samples/`.
229
  * `lexicon/dwds/` contains scripts for building DWDSmor lexica by means of the
230
  XSLT stylesheets in `share/` and DWDS sources in `lexicon/dwds/wb/`, which are
231
  not part of this repository.
 
20
  split: train
21
  metrics:
22
  - type: coverage
23
+ value: 0.8415324536382167
24
  name: Coverage
25
  - type: coverage
26
  value: 1.0
 
32
  value: 0.9999580703997988
33
  name: Coverage ($.)
34
  - type: coverage
35
+ value: 0.7740509710590406
36
  name: Coverage (ADJA)
37
  - type: coverage
38
  value: 0.7548407611333322
 
80
  value: 0.0618080812117821
81
  name: Coverage (NE)
82
  - type: coverage
83
+ value: 0.7440593189565299
84
  name: Coverage (NN)
85
  - type: coverage
86
  value: 0.9799275737196068
 
183
  name: Coverage (XY)
184
  ---
185
 
186
+ # DWDSmor – German morphology
187
 
 
188
 
189
 
190
 
191
 
192
+ DWDSmor implements the **lemmatisation and morphological analysis** of
193
+ word forms as well as the **generation of paradigms of lexical words**
194
+ in **written German**. Finite state transducers (automata) map word
195
+ forms to specifications of corresponding lexical words and tagging
196
+ which represents morphological properties. By traversing such
197
+ transducers
198
 
199
+ 1. a given word form can be analysed and lemmatised, or
200
+ 1. a lexical word together with a set of morphological tagging will
201
+ generate corresponding inflected word forms.
202
+
203
+ The automata are compiled and traversed via
204
+ [SFST](https://www.cis.uni-muenchen.de/~schmid/tools/SFST/), a C++
205
+ library and toolbox for finite-state transducers (FSTs). Their
206
+ coverage of the German language depends on
207
+
208
+ 1. the DWDSmor grammar, defining the rules by which word formation happens, and
209
+ 1. a lexicon, assigning inflection classes to lexical words.
210
 
211
  ## Usage
212
 
213
+ DWDSmor as a Python library is available via the package index PyPI:
214
 
215
  ``` plaintext
216
  pip install dwdsmor
 
237
  scripts for morphological analysis and for paradigm generation by
238
  means of DWDSmor transducers.
239
  * `share/` contains XSLT stylesheets for extracting lexical entries in SMORLemma
240
+ format from XML sources of DWDS articles.
 
241
  * `lexicon/dwds/` contains scripts for building DWDSmor lexica by means of the
242
  XSLT stylesheets in `share/` and DWDS sources in `lexicon/dwds/wb/`, which are
243
  not part of this repository.
finite.a CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a90dbeb10d36b610bb58a58c8720d1d0793508459796f38e85877a9b65b78316
3
- size 1134309
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:95bfdcf91767d315c623b4cc48a43f715578575249b99357523d0289536554ee
3
+ size 1135931
finite.ca CHANGED
Binary files a/finite.ca and b/finite.ca differ
 
index.a CHANGED
Binary files a/index.a and b/index.a differ
 
index.ca CHANGED
Binary files a/index.ca and b/index.ca differ
 
index.csv.lzma CHANGED
Binary files a/index.csv.lzma and b/index.csv.lzma differ
 
lemma.a CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4b41ad7a8e276d80c356cb29e2a2d4a71e0ab3947a407274174b24c9420ce86f
3
- size 1233648
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dd4241272ed62e7ad712d3fee625978580819d0022fb4daed528aea02884f327
3
+ size 1235294
lemma.ca CHANGED
Binary files a/lemma.ca and b/lemma.ca differ
 
morph.a CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:11153b2ad849789b455ba27f7c801604007b38e6e4eba223184d855268fa039c
3
- size 1241182
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:36591d8397ca7d2bfcb6bbc2fc8ef265081fe2cda411c9007fac7cc7a46dd75e
3
+ size 1242812
morph.ca CHANGED
Binary files a/morph.ca and b/morph.ca differ
 
root.a CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:de16583efbd2441ca98d843cb2bd519bf4c925ac405d65396b6572a4bd3c9bdf
3
- size 6980498
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ad73caec7518a7b8908a44df97a5a56c25dc37c19352c0ca3417d7e8a7907396
3
+ size 6985697
root.ca CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c73e66ea1433a035e929e0d06d7081a6b03a2c5d60b74f5950d99f98871a55c7
3
- size 3632222
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:57d7e5e6aca069ea2bd4add3321fab28cae874bb5aee19dbba10ddab9b788f94
3
+ size 3635046