Abstract
Yellowhorn (Xanthoceras sorbifolium Bunge) is an important wood oil tree species, with high ornamental and medicinal value. Nevertheless, genomic information of yellowhorn is currently unavailable. Here, for the first time, we conducted a genome survey of two yellowhorn cultivars, Zhongshi 4 and Zhongshi 9, which had distinct differences on the phenotype and drought resistance, to obtain knowledge on the genomic information by next generation sequencing (NGS). Meanwhile, its genome size was estimated using flow cytometry. As a result, the whole genome survey of Zhongshi 4 and Zhongshi 9 generated 34.40 and 39.55 GB sequence data. The genome size of Zhongshi 4 and Zhongshi 9 estimated were about 536.58 Mb and 569.52 Mb, which were closed to results of flow cytometry. The heterozygosity rates were calculated to be 0.75% and 0.89%, and the repeat rates were 60.08% and 62.00%. These reads were assembled into 1024,373 and 885,404 contigs with a N50 length of 1005 bp and 1219 bp, respectively, which were further assembled into 714,369 and 686,128 scaffolds with scaffold N50 length of ~ 1963 bp and ~ 1938 bp, total length of 386,915 Kb and 391,904 Kb. These results indicated that there was little difference in genome size and complexity among different cultivars. In addition, 63137 and 65271 high-quality genomic simple sequence repeat (SSR) markers in Zhongshi 4 and Zhongshi 9 were generated. We suggest that the technologies combining Illumina and PacBio, assisted by Hi-C and matching assemble software should be used to one of two yellowhorn cultivars genome sequencing. The result will help to design whole genome sequencing strategies for yellowhorn, and provided a large amount of gene resources for further excavation and utilization of yellowhorn.
Similar content being viewed by others
Data availability
The genome sequence reads obtained by Illumina Hiseq are available at NCBI-SRA. The Bioproject accession number is PRJNA483857 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA483857), and the Biosample accession number is SAMN10980100 (Zhongshi 9) (https://www.ncbi.nlm.nih.gov/biosample/SAMN10980100) and SAMN09748200 (Zhongshi 4) (https://www.ncbi.nlm.nih.gov/biosample/SAMN10980200). The Experiment number is SRX 5401612 and Run number is SRR8601604 (Zhongshi 9) and SRR7768199 (Zhongshi 4).
References
Board E (1985) Flora of China. Science Press, Beijing, p 72
Yu HY, Fan SQ, Bi QX, Wang SX, Hu XY, Chen MY, Wang LB (2017) Seed morphology, oil content and fatty acid composition variability assessment in yellow horn (Xanthoceras sorbifolium Bunge) germplasm for optimum biodiesel production. Ind Crop Prod 97:425–430
Venegas-Calerón M, Ruíz-Méndez MV, Martínez-Force E (2017) Characterization of Xanthoceras sorbifolium Bunge seeds: lipids, proteins and saponins content. Ind Crop Prod 109:192–198
Taylor DC, Guo Y, Katavic V, Mietkiewska E, Francis T, Bettger W (2009) New seed oils for improved human and animal health: genetic manipulation of the brassicaceae for oils enriched in nervonic acid. In: Krishnan AB (ed) Modification of seed composition to promote health and nutrition. ASA-CSSA-SSSA Publishing, Madison, pp 219–233
Zhang Y, Xiao Lu, Xiao B, Yin M, Gu MY, Zhong R, Shang Y, Wang K, Wei L (2018) Research progress and application prospect of Xanthoceras sorbifolia for treating Alzheimer’s disease. Drug Eval Res 41(05):912–917
Qi Y, Ji XF, Chi TY, Liu P, Jin G, Xu Q, Jiao Q, Wang LH, Zou LB (2017) Xanthoceraside attenuates amyloid β peptide1-42-induced memory impairments by reducing neuroinflammatory responses in mice. Eur J Pharmacol 820:18–30
Hamilton JP, Buell CR (2012) Advances in plant genome sequencing. Plant J 70(1):177–190
Imelfort M, Edwards D, Dicks J (2009) De novo sequencing of plant genomes using second-generation technologies. Brief Bioinform 10(6):609–618
Aird D, Ross MG, Chen WS, Danielsson M, Fennell T, Russ C, Jaffe DB, Nusbaum C, Gnirke A (2011) Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol 12(2):R18
Galbraith DW, Harkins KR, Maddox JM, Ayres NM, Sharma DP, Firoozabady E (1983) Rapid flow cytometric analysis of the cell cycle in intact plant tissues. Science 220(4601):1049–1051
Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A (2006) The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313(5793):1596–1604
Doležel J, Greilhuber J, Suda J (2007) Estimation of nuclear DNA content in plants using flow cytometry. Nat Protoc 2(9):2233–2244
Alberto CM, Sanso AM, Xifreda CC (2015) Chromosomal studies in species of Salvia (Lamiaceae) from Argentina. Bot J Linn Soc 141(4):483–490
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15):2114–2120
Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. Embnet J 17(1):10–12
Marçais G, Kingsford C (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27(6):764–770
Chor B, Horn D, Goldman N, Levy Y, Massingham T (2009) Genomic DNA k-mer spectra: models and modalities. Genome Biol 10(10):R108
Liu BH, Shi YJ, Yuan JY, Yuuki G, Zhang H, Nan L, Li ZY, Chen YX, Mu DS, Fan W (2013) Estimation of genomic characteristics by analyzing K-mer frequency in de novo genome projects. ArXiv preprint arXiv. https://doi.org/10.1016/S0925-4005(96)02015-1
Li X, Waterman MS (2003) Estimating the repeat structure and length of DNA sequences using L-tuples. Genome Res 13:1916–1922
Li R, Fan W, Tian G, Zhu H, He L, Cai J, Huang Q, Cai Q, Li B, Bai Y (2010) The sequence and de novo assembly of the giant panda genome. Nature 463(7279):311–317
Parker SCJ, Margulies EH, Tullius TD (2008) The relationship between fine scale dna structure, GC content, and functional elements in 1% of the human genome. Genome Inform 20:199–211
Lu M, An H, Li L (2016) Genome survey sequencing for the characterization of the genetic background of Rosa roxburghii tratt and leaf ascorbate metabolism genes. PLoS ONE 11(2):e147530
Thiel T, Michalek W, Varshney R, Graner A (2003) Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.). Heor Appl Genet 106(3):411–422
Rozen Z, Skaletsky H (1999) Primer3 on the WWW for general users and for biologist programmers. Bioinformatics methods and protocols. Humana Press, Totowa, pp 365–386
Shangguan LF, Han J, Kayesh E, Sun X, Zhang CQ, Pervaiz T et al (2013) Evaluation of genome sequencing quality in selected plant species using expressed sequence tags. PLoS ONE 8(7):e69890
Zhou W, Li B, Li L, Ma W, Liu Y, Feng S, Wang Z (2018) Genome survey sequencing of Dioscorea zingiberensis. Genome 61(8):567–574
Ha SH, Kim JB, Park JS, Lee SW, Cho KJ (2007) A comparison of the carotenoid accumulation in capsicum varieties that show different ripening colours: deletion of the capsanthin-capsorubin synthase gene is not a prerequisite for the formation of a yellow pepper. J Exp Bot 58(12):3135–3144
Rasch EM (1985) DNA “standards” and the range of accurate DNA estimates by Feulgen absorption microspectrophotometry. Prog Clin Biol Res 196:137–166
Zhang JZ, Fan MY (2002) Determination of genome size and restriction fragment length polymorphism of four Chinese rickettsial isolates by pulsed-field gel electrophoresis. Acta Virol 46(1):25–30
Lingohr E, Frost S, Johnson RP (2009) Determination of bacteriophage genome size by pulsed-field gel electrophoresis. Methods Mol Biol 502:19–25
Pellicer J, Leitch IJ (2013) The application of flow cytometry for estimating genome size and ploidy level in plants. Methods Mol Biol 1115:279–307
Palumbo F, Galla G, Vitulo N, Barcaccia G (2018) First draft genome sequencing of fennel (Foeniculum vulgare Mill.): identification of simple sequence repeats and their application in marker-assisted breeding. Mol Breed 38(122):1–17
Wang CR, Yan HD, Li J, Zhou SF, Liu T, Zhang XQ, Huang LK (2018) Genome survey sequencing of purple elephant grass (Pennisetum purpureum Schum ‘Zise’) and identification of its SSR markers. Mol Breed 38:94–104
Hirano M, Das S (2012) Editorial [hot topic: comparative genomics and genome evolution (guest editors: Sabyasachi Das and Masayuki Hirano)]. Curr Genomics 13(2):85
Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B (2009) Real-time dna sequencing from single polymerase molecules. Method Enzymol 323(5910):133–138
Roberts RJ, Carneiro MO, Schatz MC (2013) The advantages of SMRT sequencing. Genome Biol 14(7):405
Burton JN, Adey A, Patwardhan RP, Qiu R, Kitzman JO, Shendure J (2013) Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat Biotechnol 31(12):1119
Xie T, Zheng JF, Liu S, Peng C, Zhou YM, Yang QY, Zhang HY (2015) De novo plant genome assembly based on chromatin interactions: a case study of Arabidopsis thaliana. Mol Plant 8(3):489–492
Funding
This work was financially supported by the Central Public-Interest Scientific Institution Basal Research Fund (CAFYBB2019QD001, CAFYBB2017QB001), the National Natural Science Foundation of China (31800571, 31870594).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Bi, Q., Zhao, Y., Cui, Y. et al. Genome survey sequencing and genetic background characterization of yellow horn based on next-generation sequencing. Mol Biol Rep 46, 4303–4312 (2019). https://doi.org/10.1007/s11033-019-04884-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11033-019-04884-7