Allele-Specific DNA Methylation Database

1. What is the Allele-Specific DNA Methylation Database?

DNA methylation plays a crucial role in most organisms. Bisulfite sequencing (BS-Seq) is an approach that can be used to obtain quantitative cytosine methylation levels at a genome-wide scale and single-base resolution. Besides, parental alleles in haploids might exhibit different methylation patterns, which can lead to different phenotypes and even different therapeutic and drug responses to diseases. A comprehensive database of high-throughput BS-Seq data and allele-specific DNA methylation (ASM) results from multiple species was not available prior to this study.

Here, we constructed The Allele-Specific DNA Methylation Databases (ASMdb), aiming to provide a comprehensive resource and a web tool for showing the DNA methylation level and differential DNA methylation in diverse organisms, including 47 species, such as Homo sapiens, Mus musculus, Arabidopsis thaliana, and Oryza Sativa, with 5998 GEO samples (4400 BS-Seq data and 1598 RNA-Seq data).

2. The pipeline of database construction

3. Processing of methylation data

  1. The trimming of low-quality reads and artificial sequences was performed with fastp.
  2. The clean reads were mapped to reference genomes using BatMeth2-align, and the SAM files were converted to the BAM format with SAMtools.
  3. DNA methylation calling was performed with calmeth. After sequences with a map quality score lower than 20 were filtered out, the cytosine sites with a coverage greater than 5 were considered effective methylation sites.

4. SNP calling with BS-Seq data

  1. The alignment file for each sample was obtained using the BatMeth2 software package, and the alignments were then sorted using SAMtools.
  2. SNP calling was performed with BSsnpcall with the default parameters.

5. Identification of ASM

  1. Calculation of the DNA methylation level and DNA methylation alignment were performed using the BatMeth2 software package.
  2. ASM was identified by MethHaplo with the default parameters.

6. Identification of allele-specific expression genes (ASEGs)

  1. The raw reads from the RNA-Seq data were first trimmed as paired-end reads using fastp with the default parameters to remove the adaptors and low-quality reads.
  2. The clean reads were mapped to the reference genomes using Hisat2, and SAMtools was then used to sort the BAM file.
  3. The SNP information was derived from the BS-Seq analysis results corresponding to the RNA-Seq data. ASEGs were detected by ASEQ..

7. Identification of PMD/HMD/LMR/UMR

  1. DNA methylation BED files containing the coverage and methylation levels were analyzed using BatMeth2.
  2. Each DNA methylation sample was divided into partially methylated domains (PMDs), lowly methylated regions (LMRs) and unmethylated regions (UMRs) according to the methylation level using MethylSeekR.
  3. Furthermore, we removed the 'N' gap region in the genome, and the remaining region is called highly methylated domains (HMDs)
  4. The details of the scripts are shown in below:
    build <- get("BSgenome.Osativa.MSU.MSU7")
    #also can bulid by ourself..
    #build <- get("BSgenome.Astyanax.mexicanus")
    Args <- commandArgs()
    faifile<-Args[6] ##faifile
    pref<-Args[7] ## pref
    cpgisland<-Args[8] ##cpgislad
    trainchr<-Args[9] ##train chr
    fai <- read.delim(faifile, header=F)
    chromosome_lengths <- fai$V2
    names(chromosome_lengths) <- fai$V1
    d <- readMethylome(paste(pref, "_loci.CG.txt.methylseekr", sep=""), chromosome_lengths) <- readSNPTable(paste(pref, ".snp.methylseekr", sep="") , chromosome_lengths)
    ## removr snp <- removeSNPs(d,
    sample = pref
    pdf(paste0(sample, ".pdf"), width=14) <- segmentPMDs(, chr.sel=trainchr, seqLengths=chromosome_lengths, num.cores=num.cores)
    CpGisland <- import(cpgisland, format="bed")
    write.csv(calculateFDRs(, CpGisland,, num.cores=num.cores)$FDRs, paste0(sample,".PMD.stats"))
    write.csv(calculateFDRs(, CpGisland, num.cores=num.cores)$FDRs, paste0(sample,".stats"))
    ### UMR LMR
    PMD.UMRLMR <- segmentUMRsLMRs(, meth.cutoff=0.5, nCpG.cutoff=5, num.cores=num.cores, myGenomeSeq=build, seqLengths=chromosome_lengths,  
    # Repeat without filtering PMDs
    UMRLMR <- segmentUMRsLMRs(, meth.cutoff=0.5, nCpG.cutoff=5, num.cores=num.cores, myGenomeSeq=build, seqLengths=chromosome_lengths)
    # Export to bed/bigbed
    tmp <- list("PMD"[$type=="PMD"],
    chrom.sizes <- data.frame(seqlevels(build), seqlengths(build))
    write.table(chrom.sizes, paste0(sample, ".chrom.sizes"), quote=FALSE, sep="\t", row.names=FALSE, col.names=FALSE)
    for (j in names(tmp)) { <- paste(sample, j, "bed", sep=".")
        export(tmp[[j]],, format="bed")
        system(paste0("cut -f1,2,3 ",, " > tmp; /public/home/qwzhou/software/EMBOSS-6.6.0/bin/bedToBigBed tmp ", sample, ".chrom.sizes ", gsub(".bed$", ".bb",, "; rm tmp"))

8. Genome List

Species Genome Genome FTP Annotation FTP
Homo sapiens GRCh38.p12
Mus musculus GRCm38.p6
Arabidopsis thaliana TAIR10
Astyanax mexicanus Astyanax mexicanus 2.0
Bos indicus x Bos taurus UMD3.1
Bos taurus UMD3.1
Camellia sinensis var. assamica Tea_Tree_2017
Camponotus floridanus Cflo_v7.5
Canis lupus familiaris CanFam3.1
Ceratina calcarata C. calcarata_v1.1
Citrus sinensis Citrus sinensis (version2)
Oryza sativa MSU7.0
Oryza sativa Japonica Group MSU7.0
Oryza sativa Japonica Group x Oryza sativa Indica Group MSU7.0
Oryza sativa Indica Group MSU7.0
Oryza sativa Indica Group x Oryza sativa Japonica Group MSU7.0
Cordyceps militaris CM01
Cynoglossus semilaevis Cse_v1.0
Danio rerio GRCz11
Daphnia magna dmagna-v2.4-20100422
Dorcoceras hygrometricum Boea_hygrometrica.v1
Zea mays B73_v4
Trichodesmium IMS101
Trichodesmium erythraeum 21-75 21-57
Trichodesmium erythraeum IMS101 IMS101
Trichodesmium thiebautii VI-1 IMS101
Tribolium castaneum Tcas_v5.2
Sus scrofa Sscrofa11.1
Sorghum bicolor Sbi1.4
Rattus norvegicus Rnor6.0
Procambarus virginalis Pvir0.4
Procambarus fallax Pvir0.4
Physcomitrella patens PpatensV3.3
Pan troglodytes PanTro5/pan_tro3.0
Neurospora crassa NC12
Nasonia vitripennis Nvit2.1
Nasonia vitripennis x Nasonia giraulti Nvit2.1
Erythranthe guttata Mguttata2.0
Erythranthe lutea x Erythranthe guttata Mguttata2.0
Erythranthe lutea Mimulus luteus v1.0
Glycine max Glyma2.1
Gorilla gorilla gorGor4
Harpegnathos saltator Hsal_v8.5
Macaca fuscata Mmul_10
Macaca mulatta Mmul_10
Mimulus peregrinus M. guttatus concatenate M. luteus. M. guttatus concatenate M. luteus. M. guttatus concatenate M. luteus.
Nasonia giraulti Nvit2.1
Nomascus leucogenys Nleu_3.0
Xenopus laevis Xenopus_laevis_v2
Xenopus tropicalis x Xenopus laevis Xenopus_laevis_v2
Plectus sambesii P_sam
Trichuris muris T_muris
Trichinella spiralis T_spir
Romanomermis culicivorax R_culici
Drosophila melanogaster BDGP6.22
Solanum lycopersicum SL3.0
Apis mellifera HAv3.1
Pyrus x bretschneideri Pbr_v1.0
Chlamydomonas reinhardtii Crein_v3.0
Marchantia polymorpha Mpoly_v1

9. Jbrowse Usage

  • The methylome browser is built on JBrowse (; a fast, embeddable genome browser built completely with JavaScript and HTML5), click here for information on how to use it.