|
本帖最后由 genechip 于 2011-12-18 18:23 编辑
最近研究代谢组学数据分析方法,因此想从网上找一些这方面的资料研究。结果发现大多数文献和资料说的都比较浅(也许是我个人能力有限,没找到想要的东西)。而且多数资料上只是说代谢组学数据处理的一部分或者一个分支。
对于一个新手,最开始接触代谢组学或者化学计量学数据时,会有一种无从下手的感觉。因为一无所知,因此需要从头到尾来熟悉一下整体数据处理流程和每一步处理的意义。通过近期研究,本人把一些比较好的材料拿出来共大家一起学习交流。
《Multivariate statistical analysis in chemometrics》这本书对代谢组学和化学计量学数据方面的各种处理方法都做了一些细致描述。而且有一些还有相应的R代码。这样用户就可以根据例子进行实际联系。通过理论和实际的操作,更深入的理解多种统计方法在代谢组学数据中的应用。
下面是《Multivariate statistical analysis in chemometrics》这本书的目录
International Standard Book Number-13: 978-1-4200-5947-2 (Hardcover)
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher can-
not assume responsibility for the validity of all materials or the consequences of their use. The
authors and publishers have attempted to trace the copyright holders of all material reproduced
in this publication and apologize to copyright holders if permission to publish in this form has not
been obtained. If any copyright material has not been acknowledged please write and let us know so
we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a
photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
Contents
Preface
Acknowledgments
Authors
Chapter 1 Introduction
1.1 Chemoinformatics¨CChemometrics¨CStatistics
1.2 This Book
1.3 Historical Remarks about Chemometrics
1.4 Bibliography
1.5 Starting Examples
1.5.1 Univariate versus Bivariate Classification
1.5.2 Nitrogen Content of Cereals Computed from NIR Data
1.5.3 Elemental Composition of Archaeological Glasses
1.6 Univariate Statistics?aA Reminder
1.6.1 Empirical Distributions
1.6.2 Theoretical Distributions
1.6.3 Central Value
1.6.4 Spread
1.6.5 Statistical Tests
References
Chapter 2 Multivariate Data
2.1 Definitions
2.2 Basic Preprocessing
2.2.1 Data Transformation
2.2.2 Centering and Scaling
2.2.3 Normalization
2.2.4 Transformations for Compositional Data
2.3 Covariance and Correlation
2.3.1 Overview
2.3.2 Estimating Covariance and Correlation
2.4 Distances and Similarities
2.5 Multivariate Outlier Identification
2.6 Linear Latent Variables
2.6.1 Overview
2.6.2 Projection and Mapping
2.6.3 Example
2.7 Summary
References
Chapter 3 Principal Component Analysis
3.1 Concepts
3.2 Number of PCA Components
3.3 Centering and Scaling
3.4 Outliers and Data Distribution
3.5 Robust PCA
3.6 Algorithms for PCA
3.6.1 Mathematics of PCA
3.6.2 Jacobi Rotation
3.6.3 Singular Value Decomposition
3.6.4 NIPALS
3.7 Evaluation and Diagnostics
3.7.1 Cross Validation for Determination of the Number
of Principal Components
3.7.2 Explained Variance for Each Variable
3.7.3 Diagnostic Plots
3.8 Complementary Methods for Exploratory Data Analysis
3.8.1 Factor Analysis
3.8.2 Cluster Analysis and Dendrogram
3.8.3 Kohonen Mapping
3.8.4 Sammon?ˉs Nonlinear Mapping
3.8.5 Multiway PCA
3.9 Examples
3.9.1 Tissue Samples from Human Mummies
and Fatty Acid Concentrations
3.9.2 Polycyclic Aromatic Hydrocarbons in Aerosol
3.10 Summary
References
Chapter 4 Calibration
4.1 Concepts
4.2 Performance of Regression Models
4.2.1 Overview
4.2.2 Overfitting and Underfitting
4.2.3 Performance Criteria
4.2.4 Criteria for Models with Different Numbers of Variables
4.2.5 Cross Validation
4.2.6 Bootstrap
4.3 Ordinary Least-Squares Regression
4.3.1 Simple OLS
4.3.2 Multiple OLS
4.3.2.1 Confidence Intervals and Statistical Tests in OLS
4.3.2.2 Hat Matrix and Full Cross Validation in OLS
4.3.3 Multivariate OLS
4.4 Robust Regression
4.4.1 Overview
4.4.2 Regression Diagnostics
4.4.3 Practical Hints
4.5 Variable Selection
4.5.1 Overview
4.5.2 Univariate and Bivariate Selection Methods
4.5.3 Stepwise Selection Methods
4.5.4 Best-Subset Regression
4.5.5 Variable Selection Based on PCA or PLS Models
4.5.6 Genetic Algorithms
4.5.7 Cluster Analysis of Variables
4.5.8 Example
4.6 Principal Component Regression
4.6.1 Overview
4.6.2 Number of PCA Components
4.7 Partial Least-Squares Regression
4.7.1 Overview
4.7.2 Mathematical Aspects
4.7.3 Kernel Algorithm for PLS
4.7.4 NIPALS Algorithm for PLS
4.7.5 SIMPLS Algorithm for PLS
4.7.6 Other Algorithms for PLS
4.7.7 Robust PLS
4.8 Related Methods
4.8.1 Canonical Correlation Analysis
4.8.2 Ridge and Lasso Regression
4.8.3 Nonlinear Regression
4.8.3.1 Basis Expansions
4.8.3.2 Kernel Methods
4.8.3.3 Regression Trees
4.8.3.4 Artificial Neural Networks
4.9 Examples
4.9.1 GC Retention Indices of Polycyclic
Aromatic Compounds
4.9.1.1 Principal Component Regression
4.9.1.2 Partial Least-Squares Regression
4.9.1.3 Robust PLS
4.9.1.4 Ridge Regression
4.9.1.5 Lasso Regression
4.9.1.6 Stepwise Regression
4.9.1.7 Summary
4.9.2 Cereal Data
4.10 Summary
References
Chapter 5 Classification
5.1 Concepts
5.2 Linear Classification Methods
5.2.1 Linear Discriminant Analysis
5.2.1.1 Bayes Discriminant Analysis
5.2.1.2 Fisher Discriminant Analysis
5.2.1.3 Example
5.2.2 Linear Regression for Discriminant Analysis
5.2.2.1 Binary Classification
5.2.2.2 Multicategory Classification with OLS
5.2.2.3 Multicategory Classification with PLS
5.2.3 Logistic Regression
5.3 Kernel and Prototype Methods
5.3.1 SIMCA
5.3.2 Gaussian Mixture Models
5.3.3 k-NN Classification
5.4 Classification Trees
5.5 Artificial Neural Networks
5.6 Support Vector Machine
5.7 Evaluation
5.7.1 Principles and Misclassification Error
5.7.2 Predictive Ability
5.7.3 Confidence in Classification Answers
5.8 Examples
5.8.1 Origin of Glass Samples
5.8.1.1 Linear Discriminant Analysis
5.8.1.2 Logistic Regression
5.8.1.3 Gaussian Mixture Models
5.8.1.4 k-NN Methods
5.8.1.5 Classification Trees
5.8.1.6 Artificial Neural Networks
5.8.1.7 Support Vector Machines
5.8.1.8 Overall Comparison
5.8.2 Recognition of Chemical Substructures from Mass Spectra
5.9 Summary
References
Chapter 6 Cluster Analysis
6.1 Concepts
6.2 Distance and Similarity Measures
6.3 Partitioning Methods
6.4 Hierarchical Clustering Methods
6.5 Fuzzy Clustering
6.6 Model-Based Clustering
6.7 Cluster Validity and Clustering Tendency Measures
6.8 Examples
6.8.1 Chemotaxonomy of Plants
6.8.2 Glass Samples
6.9 Summary
References
Chapter 7 Preprocessing
7.1 Concepts
7.2 Smoothing and Differentiation
7.3 Multiplicative Signal Correction
7.4 Mass Spectral Features
7.4.1 Logarithmic Intensity Ratios
7.4.2 Averaged Intensities of Mass Intervals
7.4.3 Intensities Normalized to Local Intensity Sum
7.4.4 Modulo-14 Summation
7.4.5 Autocorrelation
7.4.6 Spectra Type
7.4.7 Example
References
Appendix 1 Symbols and Abbreviations
Appendix 2 Matrix Algebra
A.2.1 Definitions
A.2.2 Addition and Subtraction of Matrices
A.2.3 Multiplication of Vectors
A.2.4 Multiplication of Matrices
A.2.5 Matrix Inversion
A.2.6 Eigenvectors
A.2.7 Singular Value Decomposition
References
Appendix 3 Introduction to R
A.3.1 General Information on R
A.3.2 Installing R
A.3.3 Starting R
A.3.4 Working Directory
A.3.5 Loading and Saving Data
A.3.6 Important R Functions
A.3.7 Operators and Basic Functions
Mathematical and Logical Operators, Comparison
Special Elements
Mathematical Functions
Matrix Manipulation
Statistical Functions
A.3.8 Data Types
Missing Values
A.3.9 Data Structures
A.3.10 Selection and Extraction from Data Objects
Examples for Creating Vectors
Examples for Selecting Elements from a Vector or Factor
Examples for Selecting Elements from a Matrix, Array,
or Data Frame
Examples for Selecting Elements from a List..
A.3.11 Generating and Saving Graphics
Functions Relevant for Graphics
Relevant Plot Parameters
Statistical Graphics
Saving Graphic Output
References
59475_c001.pdf
(538.38 KB, 下载次数: 97)
59475_c002.pdf
(695.91 KB, 下载次数: 95)
59475_c003.pdf
(1011.43 KB, 下载次数: 40, 售价: 5 金钱)
59475_c004.pdf
(2.82 MB, 下载次数: 33, 售价: 5 金钱)
59475_c005.pdf
(1.54 MB, 下载次数: 35, 售价: 1 金钱)
59475_c006.pdf
(937.23 KB, 下载次数: 30, 售价: 1 金钱)
59475_c007.pdf
(203.8 KB, 下载次数: 35, 售价: 1 金钱)
|
|