A simple Unicode alphabetic tokenizer.
一个简单的Unicode字母标记生成器。
用法----------Usage----------
Unicode_alphabetic_tokenizer(x)
参数----------Arguments----------
参数:x
a character vector.
字符向量。
Details
详细信息----------Details----------
Tokenization first replaces the elements of x by their Unicode character sequences. Then, the non-alphabetic characters (i.e., the ones which do not have the Alphabetic property) are replaced by blanks, and the corresponding strings are split according to the blanks.
符号化的元素替换它们的Unicode字符序列的x。然后,非字母的字符(即,不具有顺序排列的属性的那些)所取代由空格,和相应的字符串分割根据空白。
值----------Value----------
A character vector with the tokenized strings.
带标记的字符串的字符向量。