Quick Start¶
- m6Anet requires eventalign.txt from nanopolish::
- nanopolish eventalign –reads reads.fastq –bam reads.sorted.bam –genome transcript.fa –scale-events –signal-index –summary /path/to/summary.txt –threads 50 > /path/to/eventalign.txt
We have also provided a demo dataset in the repository under /path/to/m6anet/demo/eventalign.txt.
Firstly, we need to preprocess the segmented raw signal file in the form of nanopolish eventalign file using ‘m6anet-dataprep’:
m6anet-dataprep --eventalign m6anet/demo/eventalign.txt \
--out_dir /path/to/output --n_processes 4
The output files are stored in /path/to/output
:
data.index
: Indexing of data.json to allow faster access to the filedata.json
: json file containing the features to feed into m6Anet model for predictiondata.log
: Log file containing all the transcripts that have been successfully preprocesseddata.readcount
: File containing the number of reads for each DRACH positions in eventalign.txteventalign.index
: Index file created during dataprep to allow faster access of Nanopolish eventalign.txt during dataprep
Now we can run m6anet over our data using m6anet-run_inference:
m6anet-run_inference --input_dir demo_data --out_dir demo_data --infer_mod-rate --n_processes 4
The output files demo_data/data.result.csv.gz contains the probability of modification at each individual position for each transcript. The output file will have 4 columns
transcript_id
: The transcript id of the predicted positiontranscript_position
: The transcript position of the predicted positionn_reads
: The number of reads for that particular positionprobability_modified
: The probability that a given site is modifiedkmer
: The 5-mer motif of a given sitemod_ratio
: The estimated percentage of reads in a given site that is modified
The total run time should not exceed 10 minutes on a normal laptop. We also recommend a threshold of 0.9 for selecting m6A sites
based on the probability_modified
column, which can be relaxed at the expense of having lower model precision.
m6Anet also supports pooling over multiple replicates. To do this, simply input multiple folders containing m6anet-dataprep outputs:
m6anet-run_inference --input_dir demo_data_1 demo_data_2 ... --out_dir demo_data --infer_mod-rate --n_processes 4