Command line arguments

We provide 2 main scripts to run m6A prediction as the following.

m6anet dataprep

  • Input

Output files from nanopolish eventalign. Please refer to Quickstart page for more details on running nanopolish.

Argument name Required Default value Description
–eventalign=FILE Yes NA Eventalign filepath, the output from nanopolish.
–out_dir=DIR Yes NA Output directory.
–n_processes=NUM No 1 Number of processes to run.
–chunk_size=NUM No 1000000 chunksize argument for pandas read csv function on the eventalign input
–readcount_max=NUM No 1000 Maximum read counts per gene.
–readcount_min=NUM No 1 Minimum read counts per gene.
–skip_index No False To skip indexing the eventalign nanopolish output, can only be used if the index has been created before
–n_neighbors=NUM No 1 The number of flanking positions to process
–min_segment_count=NUM No 1 Minimum read counts over each candidate m6A segment
–compress No False Round down output features to 3 decimal places
  • Output
File name File type Description
eventalign.index csv File index indicating the position in the eventalign.txt file (the output of nanopolish eventalign) where the segmentation information of each read index is stored, allowing a random access.
data.json json Intensity level mean for each position.
data.info csv File containing readcounts per transcript and index indicating the position in the data.json file where the intensity level means across positions of each gene is stored, allowing a random access.

m6anet inference

  • Input

Output files from m6anet dataprep.

Argument name Required Default value Description
–input_dir=DIR Yes NA Input directory that contains data.json, data.index, and data.readcount from m6anet-dataprep
–out_dir=DIR Yes NA Output directory for the inference results from m6anet
–pretrained_model=STR No Hct116_RNA002 Name of the pre-trained model: Hct116_RNA002, arabidopsis, and HEK293T_RNA004
–model_config=FILE No prod_pooling.toml Model architecture specifications. Please see examples in m6anet/model/configs/model_configs/prod_pooling.toml
–model_state_dict=FILE No prod_pooling_pr_auc.pt Model weights to be used for inference. Please see examples in m6anet/model/model_states/
–batch_size=NUM No 64 Number of sites to be loaded each time for inference
–n_processes=NUM No 1 Number of processes to run.
–num_iterations=NUM No 5 Number of times m6anet iterates through each potential m6a sites.
–read_proba_threshold=NUM No 0.033379376 Threshold for each individual read to be considered modified during stoichiometry calculation
  • Output
File name File type Description
data.site_proba.csv csv Result table for each candidate m6A site
data.indiv_proba.csv csv Result table for each candidate m6A read

m6anet train

Argument name Required Default value Description
–model_config=FILE Yes NA Model architecture specifications. Please see examples in m6anet/model/configs/model_configs/prod_pooling.toml
–train_config=FILE Yes NA Config file for training the model. Please see examples in m6anet/model/configs/training_configs/oversampled.toml
–save_dir=DIR Yes NA Save directory to save the training results
–device=STR No cpu Device to use for training the model. Set to cuda:cuda_id if using GPU
–lr=NUM No 4e-4 Learning rate for the ADAM optimizer
–seed=NUM No 25 Random seed for model training
–epochs=NUM No 50 Number of epochs to train the model.
–num_workers=NUM No 1 Number of processes to run.
–save_per_epoch=NUM No 10 Number of recurring epoch to save the model
–weight_decay=NUM No 0 Weight decay parameteter for the ADAM optimizer
–num_iterations=NUM No 5 Number of times m6anet iterates through each potential m6a sites.