- 特色分析
- 基因組分析
- 轉(zhuǎn)錄組分析
- 基因圖譜
- AI育種模型構(gòu)建
- 近期發(fā)表文章
特色分析 – 主要針對醫(yī)院及高校的科研團隊,從實驗設(shè)計、樣本收集、測序標(biāo)準(zhǔn)分析、個性化分析到結(jié)果解讀等科研過程提供全方位或是某一過程的服務(wù)。
甲基化測序分析
基于全基因組水平,實現(xiàn)單堿基分辨率的甲基化位點定位。進行差異甲基化分析、相關(guān)基因GO/KEGG分析、PCA多樣本甲基化變化規(guī)律。
16S rDNA測序分析
對群落的16S rDNA可變區(qū)序列進行高通量測序。進行物種注釋、物種組成分析、物種差異分析、環(huán)境因子關(guān)聯(lián)分析、進化分析以及菌群的功能預(yù)測等。
HLA分型
對HLA區(qū)域進行捕獲測序。進行從頭組裝HLA 基因,HLA型別分析、注釋,基于統(tǒng)計學(xué)檢驗的顯著性分析。
家系連鎖分析
基于顯性遺傳模式和隱性遺傳模式的遺傳特點篩選與疾病相關(guān)的變異?;诓煌来膫€體之間存在的基因傳遞關(guān)系,篩選與家族遺傳病相關(guān)的變異。
基因組分析 – 根據(jù)測序結(jié)果,比對參考基因組,檢測樣本基因組的單核苷酸多樣性變異,插入缺失變異,拷貝數(shù)變異及結(jié)構(gòu)變異,并對無參考基因組的物種構(gòu)建對應(yīng)參
考基因組。
全基因組重測序分析
利用最新參考基因組對具有參考序列的不同個體進行全基因組測序。與參考序列進行比對、統(tǒng)計測序深度及覆蓋度;進行SNP、InDel、CNV、SV的檢測、注釋。
外顯子組測序分析
利用序列捕獲或者靶向技術(shù)將全基因組外顯子區(qū)域DNA富集后再進行高通量測序。與參考序列進行比對、統(tǒng)計測序深度及覆蓋度;進行SNP、InDel、CNV、SV的檢測、注釋。
簡化基因組測序及QTL定位分析
利用限制性核酸內(nèi)切酶打斷基因組DNA,對待定片段進行高通量測序獲得海量遺傳多態(tài)性標(biāo)簽序列來充分代表目標(biāo)物種全基因組信息。與參考序列進行比對、統(tǒng)計測序深度及覆蓋度;進行SNP、InDel、CNV、SV的檢測、注釋;進行QTL定位分析
全基因組survey
基于小片段文庫的低深度測序數(shù)據(jù),通過K-mer分析,從而有效的評估基因組大小、GC含量、雜合度及重復(fù)序列含量等信息,全面了解某一物種基因組特征,為后續(xù)的全基因組Denovo測序組裝策略的制定提供依據(jù)。
De novo組裝
對基因組序列未知或沒有近源物種基因組信息的某個物種,對其不同長度基因組DNA片段及其文庫進行序列測定,然后進行拼接、組裝和注釋,從而獲得該物種完整的基因組序列圖譜。
全基因組關(guān)聯(lián)分析(GWAS)
通過高通量測序找到染色體上的變異位點,研究這些變異位點與疾病或其他性狀的關(guān)聯(lián)。全基因組關(guān)聯(lián)分析是對具有豐富遺傳多樣性的群體的每個個體進行全基因組重測序,結(jié)合目標(biāo)性狀的表型數(shù)據(jù),基于一定的統(tǒng)計方法進行全基因組關(guān)聯(lián)分析,可以快速獲得影響目標(biāo)性狀表型變異的染色體區(qū)段或基因位點。利用NGS數(shù)據(jù)提供的海量標(biāo)記,全方位的解析性狀與基因之間的關(guān)系,精確定位目標(biāo)性狀。
轉(zhuǎn)錄組分析 – 根據(jù)測序結(jié)果比對參考基因組,整理轉(zhuǎn)錄本及基因水平表達量,并預(yù)測新轉(zhuǎn)錄及基因融合結(jié)果。
轉(zhuǎn)錄組測序分析
對特定組織或細(xì)胞在某個特定狀態(tài)下轉(zhuǎn)錄的所有mRNA進行測序。進行差異表達基因分析、GO分析和KEGG通路富集,分析得到的差異基因參與的功能或通路。
lncRNA測序分析
利用高通量測序技術(shù)進行l(wèi)ncRNA測序,進行l(wèi)ncRNA分析,分析其與特定生物學(xué)過程的關(guān)系。進行l(wèi)ncRNA拼接組裝、lncRNA位點篩選、lncRNA編碼潛能分析、lncRNA靶基因預(yù)測、lncRNA保守性分析及差異表達分析等。
Small RNA測序分析
對樣本中的miRNA、siRNA、piRNA等進行高通量測序。進行small RNA分類注釋、堿基編輯分析、差異表達分析、靶基因分析及差異sRNA靶基因GO、KEGG富集。
全轉(zhuǎn)錄組測序分析
對特定組織或細(xì)胞在特定狀態(tài)下轉(zhuǎn)錄出的所有轉(zhuǎn)錄本的總和,包括mRNA和所有的non-coding RNA,通過構(gòu)建small RNA文庫和去rRNA的鏈特異性文庫,分析mRNA和non-coding RNA 的表達和調(diào)控關(guān)系。
外泌體測序分析
對外泌體RNA進行測序,獲得外泌體RNA的信息。進行差異表達分析、多組樣本PCA分析、差異表達模式聚類、靶基因預(yù)測。
為珍貴動植物構(gòu)建參考序列圖譜,豐富大基因組參考序列數(shù)據(jù)庫的數(shù)據(jù)。
基因組survey
根據(jù)二代測序結(jié)果,預(yù)測樣本基因組的大小、雜和度、重度序列比例等信息。
contig構(gòu)建
根據(jù)三代測序結(jié)果,將序列初步組裝為contig形式,并利用二代測序結(jié)果校正contig。
scaffold構(gòu)建
結(jié)合bionano結(jié)果,將contig組裝為scaffold形式。
Hi-C協(xié)助組裝
結(jié)合Hi-C測序結(jié)果,將scaffold組裝成更完整的基因組。
基因組注釋
結(jié)合樣本親緣關(guān)系較近物種基因組與樣本轉(zhuǎn)錄組測序數(shù)據(jù),注釋樣本基因組各區(qū)域功能。
AI育種模型構(gòu)建
在全基因組層面上建立機器學(xué)習(xí)預(yù)測模型,實現(xiàn)智能、高效、定向培育新品種。
案例分享
基于深度卷積神經(jīng)網(wǎng)絡(luò)的基因組選擇
明領(lǐng)基因攜手高校農(nóng)業(yè)團隊對企業(yè)大白豬的100kg日齡,100kg背膘厚和母豬乳頭數(shù)3個性狀為分析對象,結(jié)合豬50K基因芯片分型數(shù)據(jù),以加性模型為基礎(chǔ),通過結(jié)合深度學(xué)習(xí)和BLUP形成的集成模型,開發(fā)一套半監(jiān)督自主學(xué)習(xí)的深度卷積神經(jīng)網(wǎng)絡(luò)的算法,訓(xùn)練基因型-表型預(yù)測模型和基因型-表型模型,用在種豬分子育種生產(chǎn)系統(tǒng)中。從根本上縮短了育種時間和成本,同時預(yù)測準(zhǔn)確性得到精確的提升。
Chromosome-level genome assembly of goose provides insight into the adaptation and growth of local goose breeds
Ackground:Anatidae contains numerous waterfowl species with great economic value, but the genetic diversity basis remains insufficiently investigated. Here, we report a chromosome-level genome assembly of Lion-head goose (Anser cygnoides), a native breed in South China, through the combination of PacBio, Bionano, and Hi-C technologies.
Findings:The assembly had a total genome size of 1.19 Gb, consisting of 1,859 contigs with an N50 length of 20.59 Mb, generating 40 pseudochromosomes, representing 97.27% of the assembled genome, and identifying 21,208 protein-coding genes. Comparative genomic analysis revealed that geese and ducks diverged approximately 28.42 million years ago, and geese have undergone massive gene family expansion and contraction. To identify genetic markers associated with body weight in different geese breeds, including Wuzong goose, Huangzong goose, Magang goose, and Lion-head goose, a genome-wide association study was performed, yielding an average of 1,520.6 Mb of raw data that detected 44,858 single-mucleotide polymorphisms (SNPs). Genome-wide association study showed that 6 SNPs were significantly associated with body weight and 25 were potentially associated. The significantly associated SNPs were annotated as LDLRAD4, GPR180, and OR, enriching in growth factor receptor regulation pathways.
Conclusions:We present the first chromosome-level assembly of the Lion-head goose genome, which will expand the genomic resources of the Anatidae family, providing a basis for adaptation and evolution. Candidate genes significantly associated with different goose breeds may serve to understand the underlying mechanisms of weight differences.
Molecular Phylogenesis and Spatiotemporal Spread of SARS-CoV-2 in Southeast Asia
Background: The ongoing coronavirus disease 2019 (COVID-19) pandemic has posed an unprecedented challenge to public health in Southeast Asia, a tropical region with limited resources. This study aimed to investigate the evolutionary dynamics and spatiotemporal patterns of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in the region.
Materials and Methods: A total of 1491 complete SARS-CoV-2 genome sequences from 10 Southeast Asian countries were downloaded from the Global Initiative on Sharing Avian Influenza Data (GISAID) database on November 17, 2020. The evolutionary relationships were assessed using maximum likelihood (ML) and time-scaled Bayesian phylogenetic analyses, and the phylogenetic clustering was tested using principal component analysis (PCA). The spatial patterns of SARS-CoV-2 spread within Southeast Asia were inferred using the Bayesian stochastic search variable selection (BSSVS) model. The effective population size (Ne) trajectory was inferred using the Bayesian Skygrid model.
Results: Four major clades (including one potentially endemic) were identified based on the maximum clade credibility (MCC) tree. Similar clustering was yielded by PCA; the first three PCs explained 46.9% of the total genomic variations among the samples. The time to the most recent common ancestor (tMRCA) and the evolutionary rate of SARS-CoV-2 circulating in Southeast Asia were estimated to be November 28, 2019 (September 7, 2019 to January 4, 2020) and 1.446 × 10?3 (1.292 × 10?3 to 1.613 × 10?3) substitutions per site per year, respectively. Singapore and Thailand were the two most probable root positions, with posterior probabilities of 0.549 and 0.413, respectively. There were high-support transmission links (Bayes factors exceeding 1,000) in Singapore, Malaysia, and Indonesia; Malaysia involved the highest number (7) of inferred transmission links within the region. A twice-accelerated viral population expansion, followed by a temporary setback, was inferred during the early stages of the pandemic in Southeast Asia.
Conclusions: With available genomic data, we illustrate the phylogeography and phylodynamics of SARS-CoV-2 circulating in Southeast Asia. Continuous genomic surveillance and enhanced strategic collaboration should be listed as priorities to curb the pandemic, especially for regional communities dominated by developing countries.
H NMR and UHPLC/Q-Orbitrap-MS-Based Metabolomics Combined with 16S rRNA Gut Microbiota Analysis Revealed the Potential Regulation Mechanism of Nuciferine in Hyperuricemia Rats
Hyperuricemia seriously jeopardizes human health by increasing the risk of several diseases, such as gout and stroke. Nuciferine is able to alleviate hyperuricemia significantly. However, the underlying metabolic regulation mechanism remains unknown. To understand the metabolic effects of nuciferine on hyperuricemia by establishing a rat model of rapid hyperuricemia, 1H NMR and liquid chromatography-mass spectrometry were used to conduct nontargeted metabolomics studies. A total of 21 metabolites were authenticated in plasma and urine to be closely related with hyperuricemia, which were mainly correlated to the six metabolic pathways. Moreover, 16S rRNA analysis indicated that diversified intestinal microorganisms are closely related to changes in differential metabolites, especially bacteria from Firmicutes and Bacteroidetes. We propose that indoxyl sulfate and N-acetylglutamate in urine may be the potential biomarkers besides uric acid for early diagnosis and prevention of hyperuricemia. Gut microbiological analysis found that changes in the gut microbiota are closely related to these metabolites.
A database for risk assessment and comparative genomic analysis of foodborne Vibrio parahaemolyticus in China
Vibrio parahaemolyticus is a major foodborne pathogen worldwide. The increasing number of cases of V. parahaemolyticus infections in China indicates an urgent need to evaluate the prevalence and genetic diversity of this pathogenic bacterium. In this paper, we introduce the Foodborne Vibrio parahaemolyticus genome database (FVPGD), the first scientific database of foodborne V. parahaemolyticus distribution and genomic data in China, based on our previous investigations of V. parahaemolyticus contamination in different kinds of food samples across China from 2011 to 2016. The dataset includes records of 2,499 food samples and 643 V. parahaemolyticus strains from supermarkets and marketplaces distributed over 39 cities in China; 268 whole-genome sequences have been deposited in this database. A spatial view on the risk situations of V. parahaemolyticus contamination in different food types is provided. Additionally, the database provides a functional interface of sequence BLAST, core genome multilocus sequence typing, and phylogenetic analysis. The database will become a powerful tool for risk assessment and outbreak investigations of foodborne pathogens in China.
Different Exosomal microRNA Profile in Aquaporin-4 Antibody Positive Neuromyelitis Optica Spectrum Disorders
Neuromyelitis optica spectrum disorders (NMOSD) and multiple sclerosis (MS) are inflammatory demyelinating diseases of the central nervous system. Exosomal microRNAs (miRNAs) are emerging biomarkers for demyelinating diseases. In this study, 52 aquaporin-4 antibody serum-positive NMOSD patients, 18 relapsing-remitting multiple sclerosis (RRMS) patients and 17 healthy controls (HCs) were included for the next-generation sequencing (NGS). To validate the NGS results, the valuable miRNAs were selected for validation by real-time quantitative polymerase chain reaction in another cohort of patients, comprising 31 NMOSD patients and 14 HCs. In addition, these miRNAs were also validated in a longitudinal study. NGS data revealed the exosomal miRNAs profile in NMOSD patients was different from HCs. Among those potential exosomal miRNAs which can distinguish NMOSD status, hsa-miR-122-3p and hsa-miR-200a-5p were the most abundant miRNAs. In addition, hsa-miR-122-3p and hsa-miR-200a-5p were significantly upregulated in the serum exosome of relapsing NMOSD compared with that in remitting NMOSD. Hsa-miR-122-3p and hsa-miR-200a-5p had positive correlations with disease severity in NMOSD patients. Kyoto Encyclopedia of Genes and Genomes pathway analysis revealed that the MAPK, Wnt and Ras signaling pathways were enriched. Further biological function analysis demonstrated that these two miRNAs might be involved in the immunoregulation of NMOSD pathogenesis. Our results indicated that miRNAs delivered by exosomes could be applied as potential biomarkers for NMOSD.
Myelin oligodendrocyte glycoprotein-associated disorders are associated with HLA subtypes in a Chinese paediatric-onset cohort
Objective: Myelin oligodendrocyte glycoprotein-associated disorders (MOGADs) are a rare new neurological autoimmune disease with unclear pathogenesis. Since a linkage of the disease to the human leucocyte antigen (HLA) has not been shown, we here investigated whether MOGAD is associated with the HLA locus.
Methods: HLA genotypes of 95 patients with MOGADs, assessed between 2016 and 2018 from three academic centres, were compared with 481 healthy Chinese Han individuals. Patients with MOGADs included 51 paediatric-onset and 44 adult-onset cases. All patients were seropositive for IgG targeting the myelin oligodendrocyte glycoprotein (MOG).
Results: Paediatric-onset MOGAD was associated with the DQB1*05:02–DRB1*16:02 alleles (OR=2.43; OR=3.28) or haplotype (OR=2.84) of HLA class II genes. The prevalence of these genotypes in patients with paediatric-onset MOGAD was significantly higher than healthy controls (padj=0.0154; padj=0.0221; padj=0.0331). By contrast, adult-onset MOGAD was not associated with any HLA genotype. Clinically, patients with the DQB1*05:02–DRB1*16:02 haplotype exhibited significantly higher expanded disability status scale scores at onset (p=0.004) and were more likely to undergo a disease relapse (p=0.030). HLA–peptide binding prediction algorithms and computational docking analysis provided supporting evidence for the close relationship between the MOG peptide subunit and DQB1*05:02 allele. In vitro results indicated that site-specific mutations of the predicted target sequence reduced the antigen–antibody binding, especially in the paediatric-onset group with DQB1*05:02 allele.
Conclusions: This study demonstrates a possible association between specific HLA class II alleles and paediatric-onset MOGAD, providing evidence for the conjecture that different aetiology and pathogenesis likely underlie paediatric-onset and adult-onset cases of MOGAD.
Whole-exome sequencing reveals the major genetic factors contributing to neuromyelitis optica spectrum disorder in Chinese patients with aquaporin 4-IgG seropositivity
Background and objective: Neuromyelitis optica spectrum disorder (NMOSD) is an autoimmune disease. Although genetic factors are involved in its pathogenesis, limited evidence is available in this area. The aim of the present study was to identify the major genetic factors contributing to NMOSD in Chinese patients with aquaporin 4 (AQP4)-IgG seropositivity.
Methods: Whole-exome sequencing (WES) was performed on 228 Chinese NMOSD patients seropositive for AQP4-IgG and 1400 healthy controls in Guangzhou, South China. Human leukocyte antigen (HLA) sequencing was also utilized. Genotype model and haplotype, gene burden, and enrichment analyses were conducted.
Results: A significant region of the HLA composition is on chromosome 6, and great variation was observed in DQB1, DQA2 and DQA1. HLA sequencing confirmed that the most significant allele was HLA-DQB1* 05:02 ( p < 0.01, odds ratio [OR] 3.73). The genotype model analysis revealed that HLA-DQB1* 05:02 was significantly associated with NMOSD in the additive effect model and dominant effect model (p < 0.05). The proportion of haplotype “HLA-DQB1* 05:02-DRB1* 15:01” was significantly greater in the NMOSD patients than the controls, at 8.42% and 1.23%, respectively (p < 0.001, OR 7.39). The gene burden analysis demonstrated that loss-of-function mutations in NOP16 were more common in the NMOSD patients (11.84%) than the controls (5.71%; p < 0.001, OR 2.22). The IgG1-G390R variant was significantly more common in NMOSD, and the rate of the T allele was 0.605 in patients and 0.345 in the controls (p < 0.01, OR 2.92). The enrichment analysis indicated that most of the genetic factors were mainly correlated with nervous and immune processes.
Conclusions: Human leukocyte antigen is highly correlated with NMOSD. NOP16 and IgG1-G390R play important roles in disease susceptibility.