Usage¶
Preparing the configufation¶
To provide configurations for the training process and path information, we can use either dictionaries as input to functions or configuration files.
Using the configuration dictionary¶
You can use the network configuration dictionary network_info and the path configuration dictionary path_info for configuration.
See Configuration.
Using the configuration file (.cfg)¶
You can use the configuration format files .cfg for configuration (for example, path_info.cfg and network_info.cfg).
See Configuration.
If then, you can get the dictionaries from the config files by:
# for python 3.x
from deepbiome import configuration
config_data = configuration.Configurator('./path_info.cfg', log)
config_data.set_config_map(config_data.get_section_map())
config_data.print_config_map()
path_info = config_data.get_config_map()
config_network = configuration.Configurator('./network_info.cfg', log)
config_network.set_config_map(config_network.get_section_map())
config_network.print_config_map()
network_info = config_network.get_config_map()
Logging¶
For logging, we can use the logging instance. For example:
# for python 3.x
import logging
logging.basicConfig(format = '[%(name)-8s|%(levelname)s|%(filename)s:%(lineno)s] %(message)s',
level=logging.DEBUG)
log = logging.getLogger()
We can use the logging instance log as an input of the main training fuction below.
For more information about logging module, please check the documentation logging.
Training¶
To use DeepBiome:
# for python 3.x
from deepbiome import deepbiome
test_evaluation, train_evaluation, network = deepbiome.deepbiome_train(log, network_info, path_info)
If you want to train the network with specific number k of cross-validation, you can set the number_of_fold. For example, if you want to run the 5-fold cross-validation:
# for python 3.x
from deepbiome import deepbiome
test_evaluation, train_evaluation, network = deepbiome.deepbiome_train(log, network_info, path_info, number_of_fold=5)
If you use one input file, it will run the 5-fold cross validation using all data in that file. If you use the list of the input files, it will run the training by the first k files.
By defaults, number_of_fold=None. Then, the code will run for the leave-one-out-cross-validation (LOOCV).
If you use one input file, it will run the LOOCV using all data in that file. If you use the list of the input files, it will repeat the training for every file.
Testing¶
If you want to test the pre-trained model, you can use the deepbiome.deepbiome_test function.
# for python 3.x
from deepbiome import deepbiome
evaluation = deepbiome.deepbiome_test(log, network_info, path_info, number_of_fold=None)
If you use the index file, this function provides the evaluation using test index (index set not included in the index file) for each fold. If not, this function provides the evaluation using the whole sample. If number_of_fold is setted as k, the function will test the model only with first k folds.
This function provides the evaluation result as a numpy array with a shape of (number of folds, number of evaluation measures).
Prediction¶
If you want to predict the output using the pre-trained model, you can use the deepbiome.deepbiome_prediction function.
# for python 3.x
from deepbiome import deepbiome
evaluation = deepbiome.deepbiome_prediction(log, network_info, path_info, num_classes = 1, number_of_fold=None, change_weight_for_each_fold=False)
If number_of_fold is set as k, the function will predict the output of the first k folds’ samples.
If change_weight_for_each_fold is set as False, the function will predict the output of every repeatition by same weight from the given path. If change_weight_for_each_fold is set as True, the function will predict the output of by each fold weight.
If ‘get_y=True’, the function will provide a list of tuples (prediction, true output) as a numpy array output with the shape of (n_samples, 2, n_classes). If ‘get_y=False’, the function will provide a numpy array of predictions only. The numpy array output will have the shape of (n_samples, n_classes).
Cheatsheet for running the project on console¶
- Preprocessing the data (convert raw data to the format readable for python): (See Data preprocessing.)
Example: TODO (Julia)
- Set configuration file about training hyperparameter and path information:
- Set the training hyper-parameter (network_info.cfg): (See Configuration.)
- Set the path information (path_info.cfg): (See Configuration.)
- Write the python script for running the function deepbiome.deepbiome_train. For example:
Example: https://github.com/Young-won/deepbiome/tree/master/examples/main.py
Example of the python script:
import sys from deepbiome import configuration from deepbiome import logging_daily from deepbiome import deepbiome from deepbiome.utils import argv_parse # Argument ########################################################## argdict = argv_parse(sys.argv) try: gpu_memory_fraction = float(argdict['gpu_memory_fraction'][0]) except: gpu_memory_fraction = None try: max_queue_size=int(argdict['max_queue_size'][0]) except: max_queue_size=10 try: workers=int(argdict['workers'][0]) except: workers=1 try: use_multiprocessing=argdict['use_multiprocessing'][0]=='True' except: use_multiprocessing=False # Logger ########################################################### logger = logging_daily.logging_daily(argdict['log_info'][0]) logger.reset_logging() log = logger.get_logging() log.setLevel(logging_daily.logging.INFO) log.info('Argument input') for argname, arg in argdict.items(): log.info(' {}:{}'.format(argname,arg)) # Configuration #################################################### config_data = configuration.Configurator(argdict['path_info'][0], log) config_data.set_config_map(config_data.get_section_map()) config_data.print_config_map() config_network = configuration.Configurator(argdict['network_info'][0], log) config_network.set_config_map(config_network.get_section_map()) config_network.print_config_map() path_info = config_data.get_config_map() network_info = config_network.get_config_map() test_evaluation, train_evaluation, network = deepbiome.deepbiome_train(log, network_info, path_info)
Check the available GPU (if you have no GPU, it will run on CPU):
nvidia-smi
- Select the number of GPUs and CPU cores from the bash file:
-
Example of the bash file (.sh):
export CUDA_VISIBLE_DEVICES=2 echo $CUDA_VISIBLE_DEVICES model=${PWD##*/} echo $model python3 ../../main.py --log_info=config/log_info.yaml --path_info=config/path_info.cfg --network_info=config/network_info.cfg --max_queue_size=50 --workers=10 --use_multiprocessing=False
Run the bash file!
./run.sh
Summary¶
To use deepbiome in a project:
from deepbiome import deepbiome
-
deepbiome.deepbiome.
deepbiome_train
(log, network_info, path_info, number_of_fold=None, tree_level_list=['Genus', 'Family', 'Order', 'Class', 'Phylum'], max_queue_size=10, workers=1, use_multiprocessing=False, verbose=True)[source]¶ Function for training the deep neural network with phylogenetic tree weight regularizer.
It uses microbiome abundance data as input and uses the phylogenetic taxonomy to guide the decision of the optimal number of layers and neurons in the deep learning architecture.
- Parameters
- log (logging instance) :
python logging instance for logging
- network_info (dictionary) :
python dictionary with network_information
- path_info (dictionary):
python dictionary with path_information
- number_of_fold (int):
default=None
- tree_level_list (list):
name of each level of the given reference tree weights default=[‘Genus’, ‘Family’, ‘Order’, ‘Class’, ‘Phylum’]
- max_queue_size (int):
default=10
- workers (int):
default=1
- use_multiprocessing (boolean):
default=False
- verbose (boolean):
show the log if True default=True
- Returns
- test_evaluation (numpy array):
numpy array of the evaluation using testset from all fold
- train_evaluation (numpy array):
numpy array of the evaluation using training from all fold
- network (deepbiome network instance):
deepbiome class instance
Examples
Training the deep neural network with phylogenetic tree weight regularizer.
test_evaluation, train_evaluation, network = deepbiome_train(log, network_info, path_info)
-
deepbiome.deepbiome.
deepbiome_test
(log, network_info, path_info, number_of_fold=None, tree_level_list=['Genus', 'Family', 'Order', 'Class', 'Phylum'], max_queue_size=10, workers=1, use_multiprocessing=False, verbose=True)[source]¶ Function for testing the pretrained deep neural network with phylogenetic tree weight regularizer.
If you use the index file, this function provide the evaluation using test index (index set not included in the index file) for each fold. If not, this function provide the evaluation using the whole samples.
- Parameters
- log (logging instance) :
python logging instance for logging
- network_info (dictionary) :
python dictionary with network_information
- path_info (dictionary):
python dictionary with path_information
- number_of_fold (int):
If number_of_fold is set as k, the function will test the model only with first k folds. default=None
- tree_level_list (list):
name of each level of the given reference tree weights default=[‘Genus’, ‘Family’, ‘Order’, ‘Class’, ‘Phylum’]
- max_queue_size (int):
default=10
- workers (int):
default=1
- use_multiprocessing (boolean):
default=False
- verbose (boolean):
show the log if True default=True
- Returns
- evaluation (numpy array):
evaluation result using testset as a numpy array with a shape of (number of fold, number of evaluation measures)
Examples
Test the pre-trained deep neural network with phylogenetic tree weight regularizer.
evaluation = deepbiome_test(log, network_info, path_info)
-
deepbiome.deepbiome.
deepbiome_prediction
(log, network_info, path_info, num_classes, number_of_fold=None, change_weight_for_each_fold=False, get_y=False, tree_level_list=['Genus', 'Family', 'Order', 'Class', 'Phylum'], max_queue_size=10, workers=1, use_multiprocessing=False, verbose=True)[source]¶ Function for prediction by the pretrained deep neural network with phylogenetic tree weight regularizer.
- Parameters
- log (logging instance) :
python logging instance for logging
- network_info (dictionary) :
python dictionary with network_information
- path_info (dictionary):
python dictionary with path_information
- num_classes (int):
number of classes for the network. 0 for regression, 1 for binary classificatin.
- number_of_fold (int):
For the list of input files for repeatitions, the function will predict the output of the first number_of_fold repetitions. If number_of_fold is None, then the function will predict the output of the whole repetitions.
For the one input file for cross-validation, the function will predict the output of the k-fold cross validatoin. If number_of_fold is None, then the function will predict the output of the LOOCV.
default=None
- change_weight_for_each_fold (boolean):
If True, weight will be changed for each fold (repetition). For example, if the given weight’s name is weight.h5 then weight_0.h5 will loaded for the first fold (repetition). If False, weight path in the path_info will used for whole prediction. For example, if the given weight’s name is weight_0.h5 then weight_0.h5 will used for whole fold (repetition). default=False
- get_y (boolean):
If ‘True’, the function will provide a list of tuples (prediction, true output) as a output. degault=False
- tree_level_list (list):
name of each level of the given reference tree weights default=[‘Genus’, ‘Family’, ‘Order’, ‘Class’, ‘Phylum’]
- max_queue_size (int):
default=10
- workers (int):
default=1
- use_multiprocessing (boolean):
default=False
- verbose (boolean):
show the log if True default=True
- Returns
- prediction (numpy array):
prediction using whole dataset in the data path
Examples
Prediction by the pre-trained deep neural network with phylogenetic tree weight regularizer.
prediction = deepbiome_predictoin(log, network_info, path_info, num_classes)
For LOOCV prediction, we can use this options. prediction = deepbiome_predictoin(log, network_info, path_info, num_classes, number_of_fold=None, change_weight_for_each_fold=True)
-
deepbiome.deepbiome.
deepbiome_get_trained_weight
(log, network_info, path_info, num_classes, weight_path, tree_level_list=['Genus', 'Family', 'Order', 'Class', 'Phylum'], verbose=True)[source]¶ Function for prediction by the pretrained deep neural network with phylogenetic tree weight regularizer.
- Parameters
- log (logging instance) :
python logging instance for logging
- network_info (dictionary) :
python dictionary with network_information
- path_info (dictionary):
python dictionary with path_information
- num_classes (int):
number of classes for the network. 0 for regression, 1 for binary classificatin.
- weight_path (string):
path of the model weight
- tree_level_list (list):
name of each level of the given reference tree weights default=[‘Genus’, ‘Family’, ‘Order’, ‘Class’, ‘Phylum’]
- verbose (boolean):
show the log if True default=True
- Returns
- list of pandas dataframe:
the trained model’s weight
Examples
Trained weight of the deep neural network with phylogenetic tree weight regularizer.
tree_weight_list = deepbiome_get_trained_weight(log, network_info, path_info, num_classes, weight_path)
-
deepbiome.deepbiome.
deepbiome_taxa_selection_performance
(log, network_info, path_info, num_classes, true_tree_weight_list, trained_weight_path_list, tree_level_list=['Genus', 'Family', 'Order', 'Class', 'Phylum'], lvl_category_dict=None, verbose=True)[source]¶ Function for prediction by the pretrained deep neural network with phylogenetic tree weight regularizer.
- Parameters
- log (logging instance) :
python logging instance for logging
- network_info (dictionary) :
python dictionary with network_information
- path_info (dictionary):
python dictionary with path_information
- num_classes (int):
number of classes for the network. 0 for regression, 1 for binary classificatin.
- true_tree_weight_list (ndarray):
lists of the true weight information with the shape of (k folds, number of weights) true_tree_weight_list[0][0] is the true weight information between the first and second layers for the first fold. It is a numpy array with the shape of (number of nodes for the first layer, number of nodes for the second layer).
- trained_weight_path_list (list):
lists of the path of trained weight for each fold.
- tree_level_list (list):
name of each level of the given reference tree weights default=[‘Genus’, ‘Family’, ‘Order’, ‘Class’, ‘Phylum’]
- verbose (boolean):
show the log if True default=True
- Returns
- summary (numpy array):
summary of the taxa selection performance
Examples
The taxa selection performance of the trained deep neural network with phylogenetic tree weight regularizer.
summary = deepbiome_taxa_selection_performance(log, network_info, path_info, num_classes)
-
deepbiome.deepbiome.
deepbiome_draw_phylogenetic_tree
(log, network_info, path_info, num_classes, file_name='%%inline', img_w=500, branch_vertical_margin=20, arc_start=0, arc_span=360, node_name_on=True, name_fsize=10, tree_weight_on=True, tree_weight=None, tree_level_list=['Genus', 'Family', 'Order', 'Class', 'Phylum'], weight_opacity=0.4, weight_max_radios=10, phylum_background_color_on=True, phylum_color=[], phylum_color_legend=False, show_covariates=True, verbose=True)[source]¶ Draw phylogenetic tree
- Parameters
- log (logging instance) :
python logging instance for logging
- network_info (dictionary) :
python dictionary with network_information
- path_info (dictionary):
python dictionary with path_information
- num_classes (int):
number of classes for the network. 0 for regression, 1 for binary classificatin.
- file_name (str):
name of the figure for save. - “.png”, “.jpg” - “%%inline” for notebook inline output. default=”%%inline”
- img_w (int):
image width (pt) default=500
- branch_vertical_margin (int):
vertical margin for branch default=20
- arc_start (int):
angle that arc start default=0
- arc_span (int):
total amount of angle for the arc span default=360
- node_name_on (boolean):
show the name of the last leaf node if True default=False
- name_fsize (int):
font size for the name of the last leaf node default=10
- tree_weight_on (boolean):
show the amount and the direction of the weight for each edge in the tree by circle size and color. default=True
- tree_weight (ndarray):
reference tree weights default=None
- tree_level_list (list):
name of each level of the given reference tree weights default=[‘Genus’, ‘Family’, ‘Order’, ‘Class’, ‘Phylum’]
- weight_opacity (float):
opacity for weight circle default= 0.4
- weight_max_radios (int):
maximum radios for weight circle default= 10
- phylum_background_color_on (boolean):
show the background color for each phylum based on phylumn_color. default= True
- phylum_color (list):
specify the list of background colors for phylum level. If phylumn_color is empty, it will arbitrarily assign the color for each phylum. default= []
- phylum_color_legend (boolean):
show the legend for the background colors for phylum level default= False
- show_covariates (boolean):
show the effect of the covariates default= True
- verbose (boolean):
show the log if True default=True
- Returns
- ——-
Examples
Draw phylogenetic tree
deepbiome_draw_phylogenetic_tree(log, network_info, path_info, num_classes, file_name = “%%inline”)