找回密码
 注册
查看: 888|回复: 0

R语言 annotate包 chrCats()函数中文帮助文档(中英文对照)

[复制链接]
发表于 2012-2-25 11:48:23 | 显示全部楼层 |阅读模式
chrCats(annotate)
chrCats()所属R语言包:annotate

                                        Returns a list of chromosome locations from a MAP environment
                                         返回从图环境中的染色体位置列表

                                         译者:生物统计家园网 机器人LoveR

描述----------Description----------

The chrCats function takes a data package that contains a MAP environment and returns a list that contains the locations for each gene (from the chromosome number to more specific locations if they're available).  For example, the hgu95av2MAP environment gives the location, 14q22-q23, for Affymetrix identifier: 1114\_at.  This function will return a list with one named element for 1114\_at and the values it will contain are 14, 14q, 14q2, 14q22, and 14q23 since the Affy id is located at each of those chromosome locations.
chrCats功能需要数据包,其中包含一个MAP环境,并返回一个列表,其中包含每个基因的位置(从染色体数目更具体的地点,如果他们可用)。对于例如,hgu95av2MAP的环境给Affymetrix公司标识的位置,14q22-Q23:1114 \ _at。这个函数将返回为1114的一个命名元素\ _at和它包含的值是14,14q,14q2,14q22,14q23,因为在这些染色体位置位于Affy ID列表。


用法----------Usage----------


chrCats(data)
createMAPIncMat(data)
createLLChrCats(data)



参数----------Arguments----------

参数:data
the data package (a character string)
数据包(一个字符串)


Details

详情----------Details----------

This function does a lot of string manipulation and there are a few known errors so I want to discuss them here in case someone else would like to improve on this function.
这个函数做了很多的字符串操作,有一些错误,所以我想在这里讨论这些问题的情况下,别人想改善此功能。

The first thing, chrCats, does is only allow one location for each Affymetrix identifier.  If the MAP environment has more than one location for an Affy id, then the first location is taken.  Currently, the hgu95av2MAP environment has only 9 Affy ids (out of 12625) that have more than one location and the hgu133aMAP environment has only 16 Affy ids (out of 22283) that have more than one location so this does not affect many identifiers.
第一件事,chrCats,并只允许每个Affymetrix公司标识的一个位置。如果MAP环境Affy ID有多个位置,然后采取的第一个位置。目前中,hgu95av2MAP环境只有9的Affy IDS(12625)有一个以上的位置和“hgu133aMAP环境只有16 Affy IDS(22283)有一个以上的位置,所以这并不影响很多标识符。

Next any spaces are removed from each location as several locations have leading spaces.
下一步从每个位置删除任何空格几个地点有前导空格。

Then a for loop (which is not efficient!) is used to look at each location individually and make a list that will be returned.  A few particular strings are looked for in each location and these include "|" and "-".
然后循环(这是效率不高!)被用来单独看每个位置,并作出将返回列表。一些特定的字符串寻找在每个位置,其中包括|和 - 。

Locations that include "|" in the string are split based on the "|" as though it represents OR.  For example, for Affy id, 32273\_at, in hgu95av2MAP the location is given as 5q33|5q31.1 and this function assumes this means 5q33 or 5q31.1 so it will return the values 5, 5q, 5q3, 5q33, 5q31, and 5q31.1 for this Affy id.
包括|字符串中的位置被分割的基础上的|,但它代表或。例如,对于Affy ID,32273 \ _at,hgu95av2MAP位置给出5q33 | 5q31.1这个函数假设,所以它会返回值5,5Q,5q3,5q33,5q31 5q33或5q31.1这意味着,这Affy ID 5q31.1。

The "-" character is assumed to mean BETWEEN.  For example, for Affy id, 1138\_at, in hgu95av2MAP the location is given as 2q11-q14 and this function assumes this means the location is somewhere between 2q11 and 2q14 so it will return the values 2, 2q, 2q1, 2q11, 2q12, 2q13, and 2q14 for this Affy id.
假定的 - 字符之间的意思。例如,1138 Affy ID,\ _at,在hgu95av2MAP位置2Q11-Q14,这个函数假设,这意味着,所以它会返回值2,第二季度,2q1,2Q11,2q12的位置是介于2Q11和2q14 ,2q13,2q14这Affy ID。

Now here is the first problem with this function.  I do not know how to handle the "-" when the two strings are not of equal length.  For example, for Affy id, 36779\_at, in hgu95av2MAP the location is given as 5q33.3-q34, but I do not know how to treat this BETWEEN because I do not know how many sub-bands there are between 5q33.3 and 5q34.  Is there a 5q33.4 or 5q33.5, etc.?  I'm not sure.  So I treat this "-" as an "|".  This function will return the values 5, 5q, 5q3, 5q33, 5q33.3, and 5q34 for this Affy id and most likely, that is incorrect.  
现在这里是这个函数的第一个问题。我不知道如何处理“ - ”当两个字符串长度相等。例如,对于Affy ID,36779 \ _at,在hgu95av2MAP位置5q33.3-Q34,但我不知道如何处理这之间,因为我不知道有多少子带之间5q33.3 5q34。是有5q33.4或5q33.5,等等?我不太肯定。所以我把这个 - 为|。 Affy ID这个函数将返回该值5,5Q,5q3,5q33,5q33.3,5q34,最有可能的,这是不正确的。

Another problem I have with the "-" occurs when all of the characters up until the last character do not match.  For example, for Affy id, 38927\_i\_at, in hgu95av2MAP the location is given as 11q14-q21, but again I'm not sure how to treat this BETWEEN because I don't know the number of sub-bands between 11q14 and 11q21.  Does 11q15 exist, etc.?  So I again treat this "-" as an "|".  This function will return the values 11, 11q, 11q1, 11q14, 11q2, and 11q21 for this Affy id and this is probably incorrect.
另外一个问题,我有 - 发生时的所有字符,直到最后一个字符不匹配。例如,\ _i \ _at Affy ID,38927,在hgu95av2MAP的位置11q14-Q21,但再次,我不知道如何处理这个,因为我不知道子带之间11q14之间11q21。 11q15不存在等?所以,我再次把这个 - 为|。这个函数将返回这个Affy ID值11,11Q,11q1,11q14,11q2,11q21,这可能是不正确的。

The problem with "-" also occurs when the location is something like 19cen-q13.1 for Affy id, 34670\_at, in hgu95av2MAP.  Again I don't know the number of sub-bands between 19cen and 19q13.1 so I treat this BETWEEN as an OR.
- 的问题时,也会出现像19cen-q13.1的位置是一些Affy ID,34670 \ _at,hgu95av2MAP。再次,我不知道,所以我把此作为OR之间的子带之间19cen和19q13.1。

Another problem I have with "cen" in the location is that sometimes the location looks like: 19p13.2-cen and very rarely it looks like: 5p13.1-5cen.  In the second case, the chromosome number is included after the "-" and before the "cen".  This only occurs with the location 5p13.1-5cen in both hgu95av2MAP and hgu133aMAP and all other locations do not include the chromosome number after the "-".  Currently this function returns the wrong information for that one location.  It will return the values 5, 5p, 5p1, 5p13, 5p13.1, 5p5,and 5p5cen, but it should return 5, 5p, 5p1, 5p13, 5p13.1, and 5cen so this one location is an error.  All other locations that include "cen" are correct.  For example, this function returns the values 19, 19p, 19p1, 19p13, 19p13.2, and 19cen for the location 19p13.2-cen.
我与=中的位置有另一个问题是,有时位置看起来像:19p13.2岑很少它看起来像:5p13.1-5cen。在第二种情况下,包括染色体数目后, - ,“岑”之前。这仅发生在两个hgu95av2MAP和hgu133aMAP的位置5p13.1-5cen和所有其他位置不包括后染色体数目 - 。目前,该函数返回错误的信息,一个位置。它将返回值5,5P,5P1,5p13,5p13.1,5p5,5p5cen,所以这个位置是一个错误,但它应该返回5,5P,5P1,5p13,5p13.1,5cen。所有其他地点,其中包括“岑是正确的。例如,这个函数返回值19,19P,19p1,19p13,19p13.2,19p13.2岑的位置19cen。

This function is very slow because it contains for loops and thus, it would be useful to make it more efficient.  Also, it would be nice at some point for someone with more knowledge on chromosome location figure out how to improve some of my string manipulation errors.
这个功能是非常缓慢的,因为它包含的循环,因此,这将是有益的,使其更有效率。此外,它将与更多染色体上的位置,找出如何提高我的字符串操作错误的一些知识的人在某些时候很好。

createLLChrCats is a wrapper that converts probe IDs to Entrez Gene IDs.
createLLChrCats是Entrez基因ID的转换探针标识的包装。

createMAPIncMat is a wrapper that calls createLLChrCats and then returns an incidence matrix with rows being the categories and cols the Entrez Gene IDs.
createMAPIncMat是一个包装调用createLLChrCats,然后返回关联矩阵的类别和COLS的Entrez基因标识的行。


值----------Value----------

A named list with an element for each Affy id.  The name will be the Affy id and the values will be the locations for that Affy id.  If the Affy id had a location of NA in the MAP environment, then a list element is not returned for that Affy id.
与Affy ID为每个元素命名名单。该名称将成为Affy ID和值将是该Affy ID的位置。 Affy ID如果有一个NA的位置在图环境,然后一个列表元素没有回来,Affy ID。


作者(S)----------Author(s)----------


Elizabeth Whalen



举例----------Examples----------


  library("hgu95av2.db")
  mapValues <- chrCats("hgu95av2")

转载请注明:出自 生物统计家园网(http://www.biostatistic.net)。


注:
注1:为了方便大家学习,本文档为生物统计家园网机器人LoveR翻译而成,仅供个人R语言学习参考使用,生物统计家园保留版权。
注2:由于是机器人自动翻译,难免有不准确之处,使用时仔细对照中、英文内容进行反复理解,可以帮助R语言的学习。
注3:如遇到不准确之处,请在本贴的后面进行回帖,我们会逐渐进行修订。
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|生物统计家园 网站价格

GMT+8, 2025-1-23 12:05 , Processed in 0.031014 second(s), 16 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表