- rewrite test functions from
bool foo(x) ...tobool is_foo(x) ... - all about emoji flag sequences
-
bool is_mirrored(char32_t) noexcept(such as parenthesis, curly braces, brackets, ...)- also ability to get the mirroring codepoint
- map codepoint to block (enum) - see Blocks.txt
- map coepoint to plane (enum)
- map block to codepoint range
- map plane to codepoint range
- provide C API binding for basic functionality
-
script_segmenter: add support for commonPreferredScript tracking with regards to brackets () [] {}. -
script_segmenter: test "foo(λ);" -> {Latin, Greek, Latin} -
orientation_segmenter(and integrate it intorun_segmenteras well as its tests) - mktables:
to_stringbuilder - mktables:
to_typebuilder - mktables: pylint into CI
- clang-tidy into CI
- META: cmake install target (header files and .a file, executable)
- META: pkg-config file
- word segmentation (UTS algorithm)
- generic text segmentation (top level segmentation API suitable for text shaping implementations)
- CLI tool: unicode-inspect for inspecting input files by code point, grapheme cluster, word, script, ...
- unit tests for most parts (wcwidth / segmentation)
- README: list all TRs that are being implemented
- API for accessing UCD properties
- UTF8 <-> UTF32 conversion
- grapheme segmentation (UTS algorithm)
- symbol/emoji segmentation (UTS algorithm)
- wcwidth equivalent (
unicode::width(char32_t)) - script segmentation
-
out<T>helper to force explicitref(val)for more readability. -
operator<<(ostream&, T)for all UCD properties - in its own header file (ucd_ostream.h) -
emoji_segmenter: test "x 😀 y" -> {Text, Emoji, Text} - make
run_segmentermore templated / customizable - mktables:
enum classbuilder
- integrate into contour
- see if this makes sense: make use of this library in klex lexical scanner, to allow unicode input