留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于综合 DNA 序列特征的支持向量机方法识别核小体定位

崔颖 徐泽龙 李建中

崔颖, 徐泽龙, 李建中. 基于综合 DNA 序列特征的支持向量机方法识别核小体定位[J]. 仁和测试, 2020, 37(3): 496-501. doi: 10.7507/1001-5515.201911064
引用本文: 崔颖, 徐泽龙, 李建中. 基于综合 DNA 序列特征的支持向量机方法识别核小体定位[J]. 仁和测试, 2020, 37(3): 496-501. doi: 10.7507/1001-5515.201911064
Ying CUI, Zelong XU, Jianzhong LI. Identification of nucleosome positioning using support vector machine method based on comprehensive DNA sequence feature[J]. Rhhz Test, 2020, 37(3): 496-501. doi: 10.7507/1001-5515.201911064
Citation: Ying CUI, Zelong XU, Jianzhong LI. Identification of nucleosome positioning using support vector machine method based on comprehensive DNA sequence feature[J]. Rhhz Test, 2020, 37(3): 496-501. doi: 10.7507/1001-5515.201911064

基于综合 DNA 序列特征的支持向量机方法识别核小体定位

doi: 10.7507/1001-5515.201911064
基金项目: 国家自然科学基金资助项目(61832003)
详细信息
    通讯作者:

    李建中,Email:lijzh@hit.edu.cn

Identification of nucleosome positioning using support vector machine method based on comprehensive DNA sequence feature

Funds: The National Natural Science Foundation of China
More Information
  • 摘要: 本文基于 Z 曲线(z-curve)理论和位置权重矩阵(PWM)提出一种构建核小体 DNA 序列的模型。该模型将核小体 DNA 序列集转换成三维空间坐标,通过计算该序列集的位置权重矩阵获得相似性权重得分,将两者整合得到综合序列特征模型(CSeqFM),并分别计算候选核小体序列和连接序列到模型 CSeqFM 的欧氏距离作为特征集,投入到支持向量机(SVM)中训练和检验,通过十折交叉验证进行性能评估。结果显示,酵母核小体定位的敏感性、特异性、准确率和 Matthews 相关系数(MCC)分别为 97.1%、96.9%、94.2% 和 0.89,受试者操作特征(receiver operating characteristic,ROC)曲线下面积(area under curve,AUC)达到 0.980 1。与其他相关 Z 曲线方法比较,CSeqFM 方法在各项评估指标中均表现出优势,具有更好的识别效果。同时,将 CSeqFM 方法推广到线虫、人类和果蝇的核小体定位识别中,AUC 均高于 0.90,与 iNuc-STNC 和 iNuc-PseKNC 方法比较,CSeqFM 方法也表现出较好的稳定性和有效性,进一步表明该方法具有较好的可靠性和识别效能。
  • 图  1  酵母数据集 S1 结果的四项性能指标、AUC 值分布及 ROC 曲线

    Figure  1.  Four performances, AUC distribution and ROC curves of dataset S1 for S. cerevisiae

    图  2  C. elegansH. sapiensD. melanogaster 的实验结果

    Figure  2.  Experimental results of C. elegans, H. sapiens and D. melanogaster species

    表  1  两套酵母数据集的核小体定位识别结果

    Table  1.   Results of identifying nucleosome by two datasets for S. cerevisiae

    数据集 模型 Sn Sp Acc MCC
    S1 CSeqFM 97.1% 96.9% 94.2% 0.89
    Wu’s 模型 88.2% 88.2% 88.3% 0.77
    S2 CSeqFM 92.4% 93.9% 93.1% 0.86
    Wu’s 模型 88.7% 89.1% 88.9% 0.77
    下载: 导出CSV

    表  2  CSeFM 方法与其他方法的实验结果比较

    Table  2.   Comparison of experimental results between CSeFM and other methods

    物种 方法 Sn Sp Acc MCC AUC
    C. elegans iNuc-STNC 91.6% 86.7% 88.6% 0.77
    iNuc-PseKNC 90.3% 83.6% 86.9% 0.74 0.935 0
    CSeqFM 81.4% 86.8% 83.9% 0.68 0.905 2
    H. sapiens iNuc-STNC 89.3% 85.9% 87.6% 0.75
    iNuc-PseKNC 87.9% 84.7% 86.3% 0.73 0.925 0
    CSeqFM 90.1% 80.5% 84.6% 0.70 0.908 7
    D. melanogaster iNuc-STNC 79.8% 83.6% 81.7% 0.63
    iNuc-PseKNC 78.3% 81.7% 80.0% 0.60 0.874 0
    CSeqFM 79.9% 92.3% 84.8% 0.71 0.901 9
    下载: 导出CSV
  • [1] Maskell D P, Renault L, Serrao E, et al. Structural basis for retroviral integration into nucleosomes. Nature, 2015, 523(7560): 366-369. doi:  10.1038/nature14495
    [2] Taberlay P C, Statham A L, Kelly T K, et al. Reconfiguration of nucleosome-depleted regions at distal regulatory elements accompanies DNA methylation of enhancers and insulators in cancer. Genome Res, 2014, 24(9): 1421. doi:  10.1101/gr.163485.113
    [3] Cole H A, Cui F, Ocampo J, et al. Novel nucleosomal particles containing core histones and linker DNA but no histone H1. Nucleic Acids Res, 2016, 44(2): 573-581. doi:  10.1093/nar/gkv943
    [4] Buckwalter J M, Norouzi D, Harutyunyan A, et al. Regulation of chromatin folding by conformational variations of nucleosome linker DNA. Nucleic Acids Res, 2017, 45(16): 9372. doi:  10.1093/nar/gkx562
    [5] Murugan R. Theory of site-specific DNA-protein interactions in the presence of nucleosome roadblocks. Biophys J, 2018, 114(11): 2516. doi:  10.1016/j.bpj.2018.04.039
    [6] Nocetti N, Whitehouse I, et al. Nucleosome repositioning underlies dynamic gene expression. Genes Dev, 2016, 30(6): 660. doi:  10.1101/gad.274910.115
    [7] Bai L, Morozov A V. Gene regulation by nucleosome positioning. Trends in Genetics, 2010, 26(11): 476-483. doi:  10.1016/j.tig.2010.08.003
    [8] Eaton M L, Kyriaki G, Sukhyun K, et al. Conserved nucleosome positioning defines replication origins. Genes Dev, 2010, 24(8): 748-753. doi:  10.1101/gad.1913210
    [9] Hua Y, Epps J, Williams R, et al. Evidence that localized variation in primate sequence divergence arises from an influence of nucleosome placement on DNA repair. Mol Biol Evol, 2010, 27(3): 637-649. doi:  10.1093/molbev/msp253
    [10] Bevington S, Boyes J. Transcription-coupled eviction of histones H2A/H2B governs V(D)J recombination. EMBO J, 2013, 32(10): 1381-1392. doi:  10.1038/emboj.2013.42
    [11] Xing Y Q, Liu G Q, Zhao X J, et al. An analysis and prediction of nucleosome positioning based on information content. Chromosome Res, 2013, 21(1): 63-74. doi:  10.1007/s10577-013-9338-z
    [12] Lieleg C, Krietenstein N, Walker M, et al. Nucleosome positioning in yeasts: methods, maps, and mechanisms. Chromosoma, 2015, 124(2): 131-151. doi:  10.1007/s00412-014-0501-x
    [13] Zhang J, Peng W, Wang L, et al. LeNup: Learning nucleosome positioning from DNA sequences with improved convolutional neural networks. Bioinformatics, 2018, 34(10): 1705-1712. doi:  10.1093/bioinformatics/bty003
    [14] Huang Xiaolin, Mehrkanoon S, Suykens J A K. Support vector machines with piecewise linear feature mapping. Neurocomputing, 2013, 117: 118-127. doi:  10.1016/j.neucom.2013.01.023
    [15] Lee W, Tillo D, Bray N, et al. A high-resolution atlas of nucleosome occupancy in yeast. Nat Genet, 2007, 9(10): 1235-1244.
    [16] Tahir M, Hayat M. iNuc-STNC: a sequence-based predictor for identification of nucleosome positioning in genomes by extending the concept of SAAC and Chou’s PseAAC. Mol Biosyst, 2016, 12(8): 2587-2593. doi:  10.1039/C6MB00221H
    [17] Chen W, Feng P, Ding H, et al. Using deformation energy to analyze nucleosome positioning in genomes. Genomics, 2016, 107: 69-75. doi:  10.1016/j.ygeno.2015.12.005
    [18] Fu Limin, Niu Beifang, Zhu Zhengwei, et al. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics, 2012, 28(23): 3150-3152. doi:  10.1093/bioinformatics/bts565
    [19] Guo Shouhui, Deng Enze, Xu Liqin, et al. iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics, 2014, 30(11): 1522-1529. doi:  10.1093/bioinformatics/btu083
    [20] Zhang R, Zhang C T. A brief review: The Z-curve theory and its application in genome analysis. Curr Genomics, 2014, 15(2): 78-94. doi:  10.2174/1389202915999140328162433
    [21] 崔颖. 基于 Z 曲线理论的转录因子结合位点的识别研究. 长春: 东北师范大学, 2008.
    [22] 岁品品, 邢旭东, 王宏, 等. 基于位置权重矩阵的核小体识别及功能分析. 生物信息学, 2016, 14(1): 1-6. doi:  10.3969/j.issn.1672-5565.2016.01.01
    [23] Alencar J, Bonates T, Lavor C, et al. An algorithm for realizing Euclidean distance matrices. Electronic Notes in Discrete Mathematics, 2015, 50: 397-402. doi:  10.1016/j.endm.2015.07.066
    [24] Wu X, Liu H, Liu H, et al. Z curve theory-based analysis of the dynamic nature of nucleosome positioning in Saccharomyces cerevisiae. Gene, 2013, 530(1): 8-18. doi:  10.1016/j.gene.2013.08.018
  • 加载中
图(2) / 表(2)
计量
  • 文章访问数:  30
  • HTML全文浏览量:  9
  • PDF下载量:  0
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-11-23
  • 修回日期:  2020-02-22
  • 刊出日期:  2020-03-17

目录

    /

    返回文章
    返回