The Ultimate Guide to Whole Genome Resequencing: Basic and Advanced Facts (V)
|28.4.2020||Posted by tactical33 under Advertising & Marketing|
- Advanced data analysis
2.1 Analysis of population genetic diversity
The main indicators are: calculation of population genetic diversity index
Common analysis software: Arlequin, VCFtools, etc.
2.2 Group Evolution Research
- Principal component analysis (PCA)
- Phylogenetic analysis
- Genetic structure (STRUCTURE)
Whole-genome group evolution analysis is to re-sequence the whole genome of different subgroups or different geographically distributed varieties of the same species. By comparing with the reference genome sequence, a large number of high-precision SNP, InDel and other mutation information are obtained to carry out the population genetics, such as structure, group principal components, linkage disequilibrium and selective elimination, to reveal a series of problems, such as the evolutionary mechanism of species, environmental adaptability, and population evolution history at the molecular level.
- Genetic Map Construction
2.3 Population genetic structure analysis
2.4 QTL positioning
QTL positioning generally requires detailed phenotypic data records and construction of groups, of course, natural groups are also possible (but the genetic background has a greater impact, and ideal results can be expected)
2.5 Whole genome association analysis (GWAS)
With the development of second-generation sequencing technology and the continuous reduction of sequencing costs, it has become easier and easier to use whole-genome variation data for genotyping, resulting in the increasing sample size and number of markers used for association analysis. The original MLM model time taken for the solution can be expressed by mpn3 (m is the number of markers, p is the number of iterations of the solution process, and n is the number of samples). It can be seen that as the sample size increases, the calculation time will be 3 times for each step of the iteration. Fang grows, which makes the calculation time very long.
In GWAS analysis, population structure and genetic background are the main factors causing high false positives. Under the condition of false positive control, how to use genetic markers to a greater extent to improve the calculation efficiency of individual data and improve the detection efficiency is the main problem in the development of analytical software algorithms. Plink is the GWAS software that was released earlier. Its calculation flux and speed are very high. It can realize various non-parametric tests based on allele frequency, general linear model (GLM) and logistic regression. The software is widely used in case-control studies of human complex diseases, which greatly promotes the progress of GWAS.
Genomic data can be used to locate genes and functional mutations that affect phenotypic traits.
However, the current utilization cost is relatively high, so in the early design of the experiment, try to collect more phenotypic information to make full use of the data.
Common analysis software and algorithms: PLINK, Tassel5.0, GAPIT, GenABEL (R library), EMMAX, SNPassoc (R package), GRAMMAR-Gamma, FaST-LMM, FaST-LMM-Select and BOLT-LMM.
2.6 Selective scavenging analysis (selection pressure analysis)
Selective scavenging analysis mainly observes that the somatic mutations may be a complex process and genomic features related to the specific traits of the species under the action of natural selection and artificial selection through genomic DNA sequencing of the species.
Natural selection analysis, we choose signal detection analysis
Judgment of positive selection: Analyze the positive selection trends of SNP and SNV regions, and explain the functionality of SNV and SNP at the level of evolution and population genetics; for control and case group samples, we use different statistical algorithms to calculate SNP, CNV in each sample, and then find the SV with positive selection features.
Autosomal signal detection and analysis
In the current mainstream analysis, only autosomal selection signal analysis is generally considered, and functional regions and mutations related to important economic traits, domestication, and adaptation are explored.
Analysis of sex chromosome selection signals
The study found that 19% to 26% of the reduction in genome polymorphism was caused by autosomal selection, and 12% to 40% was attributed to the selection of sex chromosomes (Mcvicker et al. 2009). Therefore, it is necessary to reveal the genetic mechanism and the correlation with important traits by detecting and analyzing the selection signals of X chromosomes of different species. Studies on adaptation, economic traits, and gender antagonism have been conducted on horses, pigs, sheep, and humans (Heyer & Segurel 2010; Ma Yunlong et al. 2012; Zhu et al. 2015; Liu Xuexue et al. 2015; Lucotte et al. 2016; Liu et al. 2018).
The analysis of sex chromosomes on the basis of a more complete assembly of reference genome sex chromosomes can make full use of and mine the information contained in the genome data, which is also a good research content. It can be used as a research paper for research and analysis.
2.7 Prediction of mutation function
According to the selective clearance analysis, GWAS analysis, QTL-seq and other analytical methods to obtain candidate genes related to biological special traits or phenotypes, the following software can be used to predict gene function changes caused by mutations, and provide data support for subsequent functional verification (Zhang Liang & Su Zhixi 2016).
PolyPhen2: determine the size of the mutation function
To be continued in Part VI…