Command line arguments¶

We provide 2 main scripts to run m6A prediction as the following.

`m6anet dataprep`¶

Output files from nanopolish eventalign. Please refer to Quickstart page for more details on running nanopolish.

Argument name	Required	Default value	Description
–eventalign=FILE	Yes	NA	Eventalign filepath, the output from nanopolish.
–out_dir=DIR	Yes	NA	Output directory.
–n_processes=NUM	No	1	Number of processes to run.
–chunk_size=NUM	No	1000000	chunksize argument for pandas read csv function on the eventalign input
–readcount_max=NUM	No	1000	Maximum read counts per gene.
–readcount_min=NUM	No	1	Minimum read counts per gene.
–skip_index	No	False	To skip indexing the eventalign nanopolish output, can only be used if the index has been created before
–n_neighbors=NUM	No	1	The number of flanking positions to process
–min_segment_count=NUM	No	1	Minimum read counts over each candidate m6A segment
–compress	No	False	Round down output features to 3 decimal places

File name	File type	Description
eventalign.index	csv	File index indicating the position in the eventalign.txt file (the output of nanopolish eventalign) where the segmentation information of each read index is stored, allowing a random access.
data.json	json	Intensity level mean for each position.
data.info	csv	File containing readcounts per transcript and index indicating the position in the data.json file where the intensity level means across positions of each gene is stored, allowing a random access.

Output files from m6anet dataprep.

Argument name	Required	Default value	Description
–input_dir=DIR	Yes	NA	Input directory that contains data.json, data.index, and data.readcount from m6anet-dataprep
–out_dir=DIR	Yes	NA	Output directory for the inference results from m6anet
–pretrained_model=STR	No	Hct116_RNA002	Name of the pre-trained model: Hct116_RNA002, arabidopsis, and HEK293T_RNA004
–model_config=FILE	No	prod_pooling.toml	Model architecture specifications. Please see examples in m6anet/model/configs/model_configs/prod_pooling.toml
–model_state_dict=FILE	No	prod_pooling_pr_auc.pt	Model weights to be used for inference. Please see examples in m6anet/model/model_states/
–batch_size=NUM	No	64	Number of sites to be loaded each time for inference
–n_processes=NUM	No	1	Number of processes to run.
–num_iterations=NUM	No	5	Number of times m6anet iterates through each potential m6a sites.
–read_proba_threshold=NUM	No	0.033379376	Threshold for each individual read to be considered modified during stoichiometry calculation

File name	File type	Description
data.site_proba.csv	csv	Result table for each candidate m6A site
data.indiv_proba.csv	csv	Result table for each candidate m6A read

Argument name	Required	Default value	Description
–model_config=FILE	Yes	NA	Model architecture specifications. Please see examples in m6anet/model/configs/model_configs/prod_pooling.toml
–train_config=FILE	Yes	NA	Config file for training the model. Please see examples in m6anet/model/configs/training_configs/oversampled.toml
–save_dir=DIR	Yes	NA	Save directory to save the training results
–device=STR	No	cpu	Device to use for training the model. Set to cuda:cuda_id if using GPU
–lr=NUM	No	4e-4	Learning rate for the ADAM optimizer
–seed=NUM	No	25	Random seed for model training
–epochs=NUM	No	50	Number of epochs to train the model.
–num_workers=NUM	No	1	Number of processes to run.
–save_per_epoch=NUM	No	10	Number of recurring epoch to save the model
–weight_decay=NUM	No	0	Weight decay parameteter for the ADAM optimizer
–num_iterations=NUM	No	5	Number of times m6anet iterates through each potential m6a sites.