- 特色分析
- 基因組分析
- 轉(zhuǎn)錄組分析
- 基因圖譜
- AI育種模型構(gòu)建
- 近期發(fā)表文章
特色分析 – 主要針對(duì)醫(yī)院及高校的科研團(tuán)隊(duì),從實(shí)驗(yàn)設(shè)計(jì)、樣本收集、測(cè)序標(biāo)準(zhǔn)分析、個(gè)性化分析到結(jié)果解讀等科研過(guò)程提供全方位或是某一過(guò)程的服務(wù)。
甲基化測(cè)序分析
基于全基因組水平,實(shí)現(xiàn)單堿基分辨率的甲基化位點(diǎn)定位。進(jìn)行差異甲基化分析、相關(guān)基因GO/KEGG分析、PCA多樣本甲基化變化規(guī)律。
16S rDNA測(cè)序分析
對(duì)群落的16S rDNA可變區(qū)序列進(jìn)行高通量測(cè)序。進(jìn)行物種注釋、物種組成分析、物種差異分析、環(huán)境因子關(guān)聯(lián)分析、進(jìn)化分析以及菌群的功能預(yù)測(cè)等。
HLA分型
對(duì)HLA區(qū)域進(jìn)行捕獲測(cè)序。進(jìn)行從頭組裝HLA 基因,HLA型別分析、注釋,基于統(tǒng)計(jì)學(xué)檢驗(yàn)的顯著性分析。
家系連鎖分析
基于顯性遺傳模式和隱性遺傳模式的遺傳特點(diǎn)篩選與疾病相關(guān)的變異?;诓煌来膫€(gè)體之間存在的基因傳遞關(guān)系,篩選與家族遺傳病相關(guān)的變異。
基因組分析 – 根據(jù)測(cè)序結(jié)果,比對(duì)參考基因組,檢測(cè)樣本基因組的單核苷酸多樣性變異,插入缺失變異,拷貝數(shù)變異及結(jié)構(gòu)變異,并對(duì)無(wú)參考基因組的物種構(gòu)建對(duì)應(yīng)參
考基因組。
全基因組重測(cè)序分析
利用最新參考基因組對(duì)具有參考序列的不同個(gè)體進(jìn)行全基因組測(cè)序。與參考序列進(jìn)行比對(duì)、統(tǒng)計(jì)測(cè)序深度及覆蓋度;進(jìn)行SNP、InDel、CNV、SV的檢測(cè)、注釋。
外顯子組測(cè)序分析
利用序列捕獲或者靶向技術(shù)將全基因組外顯子區(qū)域DNA富集后再進(jìn)行高通量測(cè)序。與參考序列進(jìn)行比對(duì)、統(tǒng)計(jì)測(cè)序深度及覆蓋度;進(jìn)行SNP、InDel、CNV、SV的檢測(cè)、注釋。
簡(jiǎn)化基因組測(cè)序及QTL定位分析
利用限制性核酸內(nèi)切酶打斷基因組DNA,對(duì)待定片段進(jìn)行高通量測(cè)序獲得海量遺傳多態(tài)性標(biāo)簽序列來(lái)充分代表目標(biāo)物種全基因組信息。與參考序列進(jìn)行比對(duì)、統(tǒng)計(jì)測(cè)序深度及覆蓋度;進(jìn)行SNP、InDel、CNV、SV的檢測(cè)、注釋;進(jìn)行QTL定位分析
全基因組survey
基于小片段文庫(kù)的低深度測(cè)序數(shù)據(jù),通過(guò)K-mer分析,從而有效的評(píng)估基因組大小、GC含量、雜合度及重復(fù)序列含量等信息,全面了解某一物種基因組特征,為后續(xù)的全基因組Denovo測(cè)序組裝策略的制定提供依據(jù)。
De novo組裝
對(duì)基因組序列未知或沒(méi)有近源物種基因組信息的某個(gè)物種,對(duì)其不同長(zhǎng)度基因組DNA片段及其文庫(kù)進(jìn)行序列測(cè)定,然后進(jìn)行拼接、組裝和注釋,從而獲得該物種完整的基因組序列圖譜。
全基因組關(guān)聯(lián)分析(GWAS)
通過(guò)高通量測(cè)序找到染色體上的變異位點(diǎn),研究這些變異位點(diǎn)與疾病或其他性狀的關(guān)聯(lián)。全基因組關(guān)聯(lián)分析是對(duì)具有豐富遺傳多樣性的群體的每個(gè)個(gè)體進(jìn)行全基因組重測(cè)序,結(jié)合目標(biāo)性狀的表型數(shù)據(jù),基于一定的統(tǒng)計(jì)方法進(jìn)行全基因組關(guān)聯(lián)分析,可以快速獲得影響目標(biāo)性狀表型變異的染色體區(qū)段或基因位點(diǎn)。利用NGS數(shù)據(jù)提供的海量標(biāo)記,全方位的解析性狀與基因之間的關(guān)系,精確定位目標(biāo)性狀。
轉(zhuǎn)錄組分析 – 根據(jù)測(cè)序結(jié)果比對(duì)參考基因組,整理轉(zhuǎn)錄本及基因水平表達(dá)量,并預(yù)測(cè)新轉(zhuǎn)錄及基因融合結(jié)果。
轉(zhuǎn)錄組測(cè)序分析
對(duì)特定組織或細(xì)胞在某個(gè)特定狀態(tài)下轉(zhuǎn)錄的所有mRNA進(jìn)行測(cè)序。進(jìn)行差異表達(dá)基因分析、GO分析和KEGG通路富集,分析得到的差異基因參與的功能或通路。
lncRNA測(cè)序分析
利用高通量測(cè)序技術(shù)進(jìn)行l(wèi)ncRNA測(cè)序,進(jìn)行l(wèi)ncRNA分析,分析其與特定生物學(xué)過(guò)程的關(guān)系。進(jìn)行l(wèi)ncRNA拼接組裝、lncRNA位點(diǎn)篩選、lncRNA編碼潛能分析、lncRNA靶基因預(yù)測(cè)、lncRNA保守性分析及差異表達(dá)分析等。
Small RNA測(cè)序分析
對(duì)樣本中的miRNA、siRNA、piRNA等進(jìn)行高通量測(cè)序。進(jìn)行small RNA分類(lèi)注釋、堿基編輯分析、差異表達(dá)分析、靶基因分析及差異sRNA靶基因GO、KEGG富集。
全轉(zhuǎn)錄組測(cè)序分析
對(duì)特定組織或細(xì)胞在特定狀態(tài)下轉(zhuǎn)錄出的所有轉(zhuǎn)錄本的總和,包括mRNA和所有的non-coding RNA,通過(guò)構(gòu)建small RNA文庫(kù)和去rRNA的鏈特異性文庫(kù),分析mRNA和non-coding RNA 的表達(dá)和調(diào)控關(guān)系。
外泌體測(cè)序分析
對(duì)外泌體RNA進(jìn)行測(cè)序,獲得外泌體RNA的信息。進(jìn)行差異表達(dá)分析、多組樣本PCA分析、差異表達(dá)模式聚類(lèi)、靶基因預(yù)測(cè)。
為珍貴動(dòng)植物構(gòu)建參考序列圖譜,豐富大基因組參考序列數(shù)據(jù)庫(kù)的數(shù)據(jù)。
基因組survey
根據(jù)二代測(cè)序結(jié)果,預(yù)測(cè)樣本基因組的大小、雜和度、重度序列比例等信息。
contig構(gòu)建
根據(jù)三代測(cè)序結(jié)果,將序列初步組裝為contig形式,并利用二代測(cè)序結(jié)果校正contig。
scaffold構(gòu)建
結(jié)合bionano結(jié)果,將contig組裝為scaffold形式。
Hi-C協(xié)助組裝
結(jié)合Hi-C測(cè)序結(jié)果,將scaffold組裝成更完整的基因組。
基因組注釋
結(jié)合樣本親緣關(guān)系較近物種基因組與樣本轉(zhuǎn)錄組測(cè)序數(shù)據(jù),注釋樣本基因組各區(qū)域功能。
AI育種模型構(gòu)建
在全基因組層面上建立機(jī)器學(xué)習(xí)預(yù)測(cè)模型,實(shí)現(xiàn)智能、高效、定向培育新品種。
案例分享
基于深度卷積神經(jīng)網(wǎng)絡(luò)的基因組選擇
明領(lǐng)基因攜手高校農(nóng)業(yè)團(tuán)隊(duì)對(duì)企業(yè)大白豬的100kg日齡,100kg背膘厚和母豬乳頭數(shù)3個(gè)性狀為分析對(duì)象,結(jié)合豬50K基因芯片分型數(shù)據(jù),以加性模型為基礎(chǔ),通過(guò)結(jié)合深度學(xué)習(xí)和BLUP形成的集成模型,開(kāi)發(fā)一套半監(jiān)督自主學(xué)習(xí)的深度卷積神經(jīng)網(wǎng)絡(luò)的算法,訓(xùn)練基因型-表型預(yù)測(cè)模型和基因型-表型模型,用在種豬分子育種生產(chǎn)系統(tǒng)中。從根本上縮短了育種時(shí)間和成本,同時(shí)預(yù)測(cè)準(zhǔn)確性得到精確的提升。
Chromosome-level genome assembly of goose provides insight into the adaptation and growth of local goose breeds
Ackground:Anatidae contains numerous waterfowl species with great economic value, but the genetic diversity basis remains insufficiently investigated. Here, we report a chromosome-level genome assembly of Lion-head goose (Anser cygnoides), a native breed in South China, through the combination of PacBio, Bionano, and Hi-C technologies.
Findings:The assembly had a total genome size of 1.19 Gb, consisting of 1,859 contigs with an N50 length of 20.59 Mb, generating 40 pseudochromosomes, representing 97.27% of the assembled genome, and identifying 21,208 protein-coding genes. Comparative genomic analysis revealed that geese and ducks diverged approximately 28.42 million years ago, and geese have undergone massive gene family expansion and contraction. To identify genetic markers associated with body weight in different geese breeds, including Wuzong goose, Huangzong goose, Magang goose, and Lion-head goose, a genome-wide association study was performed, yielding an average of 1,520.6 Mb of raw data that detected 44,858 single-mucleotide polymorphisms (SNPs). Genome-wide association study showed that 6 SNPs were significantly associated with body weight and 25 were potentially associated. The significantly associated SNPs were annotated as LDLRAD4, GPR180, and OR, enriching in growth factor receptor regulation pathways.
Conclusions:We present the first chromosome-level assembly of the Lion-head goose genome, which will expand the genomic resources of the Anatidae family, providing a basis for adaptation and evolution. Candidate genes significantly associated with different goose breeds may serve to understand the underlying mechanisms of weight differences.
Molecular Phylogenesis and Spatiotemporal Spread of SARS-CoV-2 in Southeast Asia
Background: The ongoing coronavirus disease 2019 (COVID-19) pandemic has posed an unprecedented challenge to public health in Southeast Asia, a tropical region with limited resources. This study aimed to investigate the evolutionary dynamics and spatiotemporal patterns of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in the region.
Materials and Methods: A total of 1491 complete SARS-CoV-2 genome sequences from 10 Southeast Asian countries were downloaded from the Global Initiative on Sharing Avian Influenza Data (GISAID) database on November 17, 2020. The evolutionary relationships were assessed using maximum likelihood (ML) and time-scaled Bayesian phylogenetic analyses, and the phylogenetic clustering was tested using principal component analysis (PCA). The spatial patterns of SARS-CoV-2 spread within Southeast Asia were inferred using the Bayesian stochastic search variable selection (BSSVS) model. The effective population size (Ne) trajectory was inferred using the Bayesian Skygrid model.
Results: Four major clades (including one potentially endemic) were identified based on the maximum clade credibility (MCC) tree. Similar clustering was yielded by PCA; the first three PCs explained 46.9% of the total genomic variations among the samples. The time to the most recent common ancestor (tMRCA) and the evolutionary rate of SARS-CoV-2 circulating in Southeast Asia were estimated to be November 28, 2019 (September 7, 2019 to January 4, 2020) and 1.446 × 10?3 (1.292 × 10?3 to 1.613 × 10?3) substitutions per site per year, respectively. Singapore and Thailand were the two most probable root positions, with posterior probabilities of 0.549 and 0.413, respectively. There were high-support transmission links (Bayes factors exceeding 1,000) in Singapore, Malaysia, and Indonesia; Malaysia involved the highest number (7) of inferred transmission links within the region. A twice-accelerated viral population expansion, followed by a temporary setback, was inferred during the early stages of the pandemic in Southeast Asia.
Conclusions: With available genomic data, we illustrate the phylogeography and phylodynamics of SARS-CoV-2 circulating in Southeast Asia. Continuous genomic surveillance and enhanced strategic collaboration should be listed as priorities to curb the pandemic, especially for regional communities dominated by developing countries.
H NMR and UHPLC/Q-Orbitrap-MS-Based Metabolomics Combined with 16S rRNA Gut Microbiota Analysis Revealed the Potential Regulation Mechanism of Nuciferine in Hyperuricemia Rats
Hyperuricemia seriously jeopardizes human health by increasing the risk of several diseases, such as gout and stroke. Nuciferine is able to alleviate hyperuricemia significantly. However, the underlying metabolic regulation mechanism remains unknown. To understand the metabolic effects of nuciferine on hyperuricemia by establishing a rat model of rapid hyperuricemia, 1H NMR and liquid chromatography-mass spectrometry were used to conduct nontargeted metabolomics studies. A total of 21 metabolites were authenticated in plasma and urine to be closely related with hyperuricemia, which were mainly correlated to the six metabolic pathways. Moreover, 16S rRNA analysis indicated that diversified intestinal microorganisms are closely related to changes in differential metabolites, especially bacteria from Firmicutes and Bacteroidetes. We propose that indoxyl sulfate and N-acetylglutamate in urine may be the potential biomarkers besides uric acid for early diagnosis and prevention of hyperuricemia. Gut microbiological analysis found that changes in the gut microbiota are closely related to these metabolites.
A database for risk assessment and comparative genomic analysis of foodborne Vibrio parahaemolyticus in China
Vibrio parahaemolyticus is a major foodborne pathogen worldwide. The increasing number of cases of V. parahaemolyticus infections in China indicates an urgent need to evaluate the prevalence and genetic diversity of this pathogenic bacterium. In this paper, we introduce the Foodborne Vibrio parahaemolyticus genome database (FVPGD), the first scientific database of foodborne V. parahaemolyticus distribution and genomic data in China, based on our previous investigations of V. parahaemolyticus contamination in different kinds of food samples across China from 2011 to 2016. The dataset includes records of 2,499 food samples and 643 V. parahaemolyticus strains from supermarkets and marketplaces distributed over 39 cities in China; 268 whole-genome sequences have been deposited in this database. A spatial view on the risk situations of V. parahaemolyticus contamination in different food types is provided. Additionally, the database provides a functional interface of sequence BLAST, core genome multilocus sequence typing, and phylogenetic analysis. The database will become a powerful tool for risk assessment and outbreak investigations of foodborne pathogens in China.
Different Exosomal microRNA Profile in Aquaporin-4 Antibody Positive Neuromyelitis Optica Spectrum Disorders
Neuromyelitis optica spectrum disorders (NMOSD) and multiple sclerosis (MS) are inflammatory demyelinating diseases of the central nervous system. Exosomal microRNAs (miRNAs) are emerging biomarkers for demyelinating diseases. In this study, 52 aquaporin-4 antibody serum-positive NMOSD patients, 18 relapsing-remitting multiple sclerosis (RRMS) patients and 17 healthy controls (HCs) were included for the next-generation sequencing (NGS). To validate the NGS results, the valuable miRNAs were selected for validation by real-time quantitative polymerase chain reaction in another cohort of patients, comprising 31 NMOSD patients and 14 HCs. In addition, these miRNAs were also validated in a longitudinal study. NGS data revealed the exosomal miRNAs profile in NMOSD patients was different from HCs. Among those potential exosomal miRNAs which can distinguish NMOSD status, hsa-miR-122-3p and hsa-miR-200a-5p were the most abundant miRNAs. In addition, hsa-miR-122-3p and hsa-miR-200a-5p were significantly upregulated in the serum exosome of relapsing NMOSD compared with that in remitting NMOSD. Hsa-miR-122-3p and hsa-miR-200a-5p had positive correlations with disease severity in NMOSD patients. Kyoto Encyclopedia of Genes and Genomes pathway analysis revealed that the MAPK, Wnt and Ras signaling pathways were enriched. Further biological function analysis demonstrated that these two miRNAs might be involved in the immunoregulation of NMOSD pathogenesis. Our results indicated that miRNAs delivered by exosomes could be applied as potential biomarkers for NMOSD.
Myelin oligodendrocyte glycoprotein-associated disorders are associated with HLA subtypes in a Chinese paediatric-onset cohort
Objective: Myelin oligodendrocyte glycoprotein-associated disorders (MOGADs) are a rare new neurological autoimmune disease with unclear pathogenesis. Since a linkage of the disease to the human leucocyte antigen (HLA) has not been shown, we here investigated whether MOGAD is associated with the HLA locus.
Methods: HLA genotypes of 95 patients with MOGADs, assessed between 2016 and 2018 from three academic centres, were compared with 481 healthy Chinese Han individuals. Patients with MOGADs included 51 paediatric-onset and 44 adult-onset cases. All patients were seropositive for IgG targeting the myelin oligodendrocyte glycoprotein (MOG).
Results: Paediatric-onset MOGAD was associated with the DQB1*05:02–DRB1*16:02 alleles (OR=2.43; OR=3.28) or haplotype (OR=2.84) of HLA class II genes. The prevalence of these genotypes in patients with paediatric-onset MOGAD was significantly higher than healthy controls (padj=0.0154; padj=0.0221; padj=0.0331). By contrast, adult-onset MOGAD was not associated with any HLA genotype. Clinically, patients with the DQB1*05:02–DRB1*16:02 haplotype exhibited significantly higher expanded disability status scale scores at onset (p=0.004) and were more likely to undergo a disease relapse (p=0.030). HLA–peptide binding prediction algorithms and computational docking analysis provided supporting evidence for the close relationship between the MOG peptide subunit and DQB1*05:02 allele. In vitro results indicated that site-specific mutations of the predicted target sequence reduced the antigen–antibody binding, especially in the paediatric-onset group with DQB1*05:02 allele.
Conclusions: This study demonstrates a possible association between specific HLA class II alleles and paediatric-onset MOGAD, providing evidence for the conjecture that different aetiology and pathogenesis likely underlie paediatric-onset and adult-onset cases of MOGAD.
Whole-exome sequencing reveals the major genetic factors contributing to neuromyelitis optica spectrum disorder in Chinese patients with aquaporin 4-IgG seropositivity
Background and objective: Neuromyelitis optica spectrum disorder (NMOSD) is an autoimmune disease. Although genetic factors are involved in its pathogenesis, limited evidence is available in this area. The aim of the present study was to identify the major genetic factors contributing to NMOSD in Chinese patients with aquaporin 4 (AQP4)-IgG seropositivity.
Methods: Whole-exome sequencing (WES) was performed on 228 Chinese NMOSD patients seropositive for AQP4-IgG and 1400 healthy controls in Guangzhou, South China. Human leukocyte antigen (HLA) sequencing was also utilized. Genotype model and haplotype, gene burden, and enrichment analyses were conducted.
Results: A significant region of the HLA composition is on chromosome 6, and great variation was observed in DQB1, DQA2 and DQA1. HLA sequencing confirmed that the most significant allele was HLA-DQB1* 05:02 ( p < 0.01, odds ratio [OR] 3.73). The genotype model analysis revealed that HLA-DQB1* 05:02 was significantly associated with NMOSD in the additive effect model and dominant effect model (p < 0.05). The proportion of haplotype “HLA-DQB1* 05:02-DRB1* 15:01” was significantly greater in the NMOSD patients than the controls, at 8.42% and 1.23%, respectively (p < 0.001, OR 7.39). The gene burden analysis demonstrated that loss-of-function mutations in NOP16 were more common in the NMOSD patients (11.84%) than the controls (5.71%; p < 0.001, OR 2.22). The IgG1-G390R variant was significantly more common in NMOSD, and the rate of the T allele was 0.605 in patients and 0.345 in the controls (p < 0.01, OR 2.92). The enrichment analysis indicated that most of the genetic factors were mainly correlated with nervous and immune processes.
Conclusions: Human leukocyte antigen is highly correlated with NMOSD. NOP16 and IgG1-G390R play important roles in disease susceptibility.