We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
1 parent 7efa7e5 commit 7b61a0bCopy full SHA for 7b61a0b
1 file changed
Readme.md
@@ -22,7 +22,11 @@ the german reference corpus.
22
23

24
25
-Speed comparison of different tokenizers and sentence splitters for German.
+Chart showing speed comparison of different tokenizers and sentence splitters
26
+for German. `Effi` refers to tokenizing and/or sentence splitting of one
27
+issue of [Effi Briest](https://www.gutenberg.org/cache/epub/5323/pg5323.html).
28
+Datok is optimized for large batch sizes, while other tools may
29
+perform better in other scenarios.
30
For further benchmarks, especially regarding the quality of tokenization,
31
see Diewald/Kupietz/Lüngen (2022).
32
0 commit comments