R语言 Rlibstree包 getLongestSubstring()函数中文帮助文档(中英文对照)

loveR · 发表于 2012-2-22 21:12:15

getLongestSubstring(Rlibstree)
getLongestSubstring()所属R语言包：Rlibstree

                                    Compute longest repeated or common substring in a SuffixTree
                                       计算最长在SuffixTree反复或公共子串

                                       译者：生物统计家园网机器人LoveR

描述----------Description----------

This function works with a suffix tree, either passed to it directly or by building one from a character vector or a StringSet. The function can be used to find the longest common substring shared by two or more words, or alernatively to find the longest substring that is repeated, i.e. occurs at least twice, within a word or across two or more words.
此功能与后缀树的作品，无论是传递给它的直接或通过建立从特征向量或StringSet之一。函数可以用来寻找最长公共子串由两个或两个以上的字共享，，或alernatively找到重复最长的子串，即至少发生了两次，在一个字或跨两个或两个以上的字。

When finding the common substring, the string must be present in each of the words. When finding the repeated substring, the substring can be found across two
当发现共同的子字符串必须是目前在每个字。当发现重复子串，可以发现在两个子串

If one is going to do multiple operations on the same collection of strings, it is sensible to first build the SuffixTree (using SuffixTree) and then pass this object in each of the calls.
如果一个人做相同的字符串集合多个操作，它是明智的，首先建立SuffixTree（使用SuffixTree），然后通过这个在每个调用的对象。

This function is a relatively straightforward interface to the libstree routines lst_alg_longest_repeated_substring  and lst_alg_longest_common_substring. Therefore, more information can be found from their documentation.
此功能是一个相对简单的接口的libstree例程lst_alg_longest_repeated_substring和lst_alg_longest_common_substring。因此，更多的信息可以发现它们的文档。

用法----------Usage----------

getLongestRepeatedSubstring(words, range = c(1, 0), asCharacter = TRUE)
getLongestCommonSubstring(words, range = c(1, 0), asCharacter = TRUE)
getLongestSubstring(stree, repeated = TRUE, range = c(1, 0), asCharacter = TRUE)

参数----------Arguments----------

参数：stree, words
the collection of strings which are to be searched for the longest substring. This can be a character vector, a StringSet or a SuffixTree.
这是最长的子串搜索的字符串的集合。这可以是一个字符向量，StringSet或SuffixTree。

参数：repeated
a logical value.  If this is TRUE, then we look for repeated substrings. If it is FALSE, then we look for common substrings.  See the document for  libstree,
一个逻辑值。如果这是TRUE，然后我们寻找重复子串。如果是的话FALSE，然后我们来看看常见的子串。看到文件libstree，

参数：range
a pair of integers giving the minimum and maximum length of the substrings over which to search.  If the second value is 0, this means substrings of all possible length, i.e. the maximum of the longest string in the set. If the caller supplies just a single integer, the trailing 0 is assumed.
一双给子串来搜索的最小和最大长度的整数。如果第二个值是0，这意味着所有可能长度的子串，即集合中的最长的字符串的最大。如果来电者提供只是一个整数，尾随0的假设。

参数：asCharacter
a logical value indicating whether the result should be  converted to a character vector in R or, alternatively (FALSE), left as a StringSet-class.
一个逻辑值，指示是否应转换为特征向量在R或，或者（FALSE），结果，离开StringSet-class。

Details

详情----------Details----------

This uses the libstree routines lst_alg_longest_repeated_substring and  lst_alg_longest_common_substring.
这使用的libstree例程lst_alg_longest_repeated_substring和lst_alg_longest_common_substring。

值----------Value----------

If asCharacter is TRUE, the default, the result  is a character vector. Otherwise, it is an object of class StringSet-class.
asCharacter如果是TRUE，默认情况下，结果是一个字符向量。否则，它是一个对象的类StringSet-class。

注意----------Note----------

The libstree distribution has some bugs. If possible, test any anomalies with the executables in libstree's test directory to determine if they are due to the code in this package or libstree itself.
libstree分布有一些错误。如果可能的话，测试在libstree的test目录下的可执行文件的任何异常情况来确定，如果他们在这个包由于代码或libstree本身。

作者（S）----------Author(s)----------

Duncan Temple Lang <duncan@wald.ucdavis.edu>

参考文献----------References----------

http://www.omegahat.org/Rlibstree

参见----------See Also----------

SuffixTree StringSet getCommonPrefix
SuffixTreeStringSetgetCommonPrefix

举例----------Examples----------

els = c("aaabbbaaabbb", "aaa", "aabb")
  # "aaabbb"[“AAABBB”]
getLongestRepeatedSubstring(els)

  # "aa" [“AA”]
getLongestCommonSubstring(els)
  # Same call but with the geneal getLongestSubstring() function.[相同的呼叫，但与geneal getLongestSubstring（）函数。]
getLongestSubstring(els, repeated = FALSE)

  words = c("stemming", "boing", "springs")
  tree = SuffixTree(words)

# The longest common or repeated substring for these is the same - "ing"[这些共同或重复最长的子串是相同的 - “ING”]
# Longest repeated substring[最长重复子串]
getLongestRepeatedSubstring(tree)

# Longest common substring.[最长公共子串。]
getLongestCommonSubstring(tree)

# Find the repeated substring. [查找重复子串。]
# Note it finds aaaa twice in the second string aaaax and xaaaa[注意发现AAAA第二字符串aaaax和xaaaa的两倍]
# where x is an arbitrary character, admittedly also a.[其中x是一个任意字符，诚然也。]
getLongestRepeatedSubstring(c("aaa sdsd", "aaaaa", "xyz"))

  # This returns "aa" which is repeated as subsequences 1:2 and 2:3,[这将返回“AA”为1:2和2:3的子序列重复，]
  # i.e. repeating the use of the middle "a"[即重复使用中间的“A”]
getLongestRepeatedSubstring("aaa")

# Get the return value as a StringSet[获取作为StringSet的返回值]
set = getLongestSubstring(tree, asCharacter = FALSE)
length(set)

# The word mississipi and the same word backword and we can find the[字密西西比河和同一个词backword我们可以找到]
# longest palindrome.  Taken from the Perl module Tree::Suffix by Gray[最长的回文。从Perl模块树::后缀采取灰色]

# First, a function to reverse the order of the characters in each word[首先，一个函数来扭转字符的顺序，在每一个字]
reverseWord = function(word)
               sapply(strsplit(word, ""), function(x) paste(rev(x), collapse = ""))

# Just check it does it correctly, round trip the word[只检查它确实是正确的，往返字]
"mississippi" == reverseWord(reverseWord("mississippi"))

  # We get "ississi [我们得到“ississi]
getLongestSubstring(c("mississippi", reverseWord("mississippi")), TRUE, c(0, 0))

# just of the word itself.[只是这个词本身。]
# "issi"[“ISSI”]
getLongestSubstring("mississippi", TRUE, c(0, 0))

# Longest repeated substring is esday[最长重复子串是esday]
getLongestSubstring(c("Monday", "Tuesday", "Wednesday"), TRUE)

# Longest common substring is day[最长公共子串是一天]
getLongestSubstring(c("Monday", "Tuesday", "Wednesday"), FALSE)

  # We get the common prefix as the longest substring[我们得到的最长的子串作为共同的前缀]
  # [1] "ABCDEF_"[[1]“ABCDEF_”]
getLongestSubstring(paste("ABCDEF_", c("Monday", "Tuesday", "Wednesday"), sep = ""), TRUE, c(0, 0))

# The names of enumerated constants in Microsoft Word's[在Microsoft Word中的枚举常量的名称]
# scripting interface.  We want to find the common prefix.[脚本接口。我们要找到共同的前缀。]

enumNames = c('wdSummaryModeHighlight',
            'wdSummaryModeHideAllButSummary',
            'wdSummaryModeInsert',
            'wdSummaryModeCreateNew')

# common substring[公共子串]
x = getLongestCommonSubstring(enumNames)

x == "wdSummaryMode"

# longest repeated substring[最长重复子串]
# This is "wdSummaryModeHi" shared by the first two elements.[这是“wdSummaryModeHi”，由前两个元素共享。]

x = getLongestSubstring(enumNames)

x == "wdSummaryModeHi"

# A series of examples of repeated substrings within a single string[一个系列的重复子串的例子，在一个单一字符串]

# "first a"[“第一”]
getLongestSubstring("first and first again and again")

# [1] "first " " again"[[1]“第一”，“再次”]
getLongestSubstring("first then first again and again")

# [1] "first " " again"[[1]“第一”，“再次”]
getLongestSubstring(c("first then first again and again", "first"))

# This finds " again and again" [这“一次又一次地发现”]
getLongestSubstring(c("first then first again and again", "Or again and again"))

  # We take this very long place name in New Zealand and find the[我们这很长的地名，在新西兰找到]
  # repeated substrings.[重复子串。]
  # "ata" "aka" "ang" "mat" "tan" "nga" [“ATA”，“又名”陈子昂“”垫“，”谭“”尔雅“]
  nzPlaceName = "Taumatawhakatangihangakoauauotamateaturipukakapikimaungahoronukupokaiwhenuakitanatahu"
  getLongestRepeatedSubstring(nzPlaceName)

转载请注明:出自生物统计家园网(http://www.biostatistic.net)。

注：
注1：为了方便大家学习，本文档为生物统计家园网机器人LoveR翻译而成，仅供个人R语言学习参考使用，生物统计家园保留版权。
注2：由于是机器人自动翻译，难免有不准确之处，使用时仔细对照中、英文内容进行反复理解，可以帮助R语言的学习。
注3：如遇到不准确之处，请在本贴的后面进行回帖，我们会逐渐进行修订。

账号		自动登录	找回密码
密码			注册