Usage¶

Preparing the dataset¶

We assume that you have the dataset

Preparing the configufation¶

To provide configurations for the training process and path information, we can use either dictionaries as input to functions or configuration files.

Using the configuration dictionary¶

You can use the network configuration dictionary network_info and the path configuration dictionary path_info for configuration.

See Configuration.

Using the configuration file (.cfg)¶

You can use the configuration format files .cfg for configuration (for example, path_info.cfg and network_info.cfg).

See Configuration.

If then, you can get the dictionaries from the config files by:

# for python 3.x
from deepbiome import configuration

config_data = configuration.Configurator('./path_info.cfg', log)
config_data.set_config_map(config_data.get_section_map())
config_data.print_config_map()
path_info = config_data.get_config_map()

config_network = configuration.Configurator('./network_info.cfg', log)
config_network.set_config_map(config_network.get_section_map())
config_network.print_config_map()
network_info = config_network.get_config_map()

Logging¶

For logging, we can use the logging instance. For example:

# for python 3.x
import logging

logging.basicConfig(format = '[%(name)-8s|%(levelname)s|%(filename)s:%(lineno)s] %(message)s',
                level=logging.DEBUG)

log = logging.getLogger()

We can use the logging instance log as an input of the main training fuction below.

For more information about logging module, please check the documentation logging.

Training¶

To use DeepBiome:

# for python 3.x
from deepbiome import deepbiome

test_evaluation, train_evaluation, network = deepbiome.deepbiome_train(log, network_info, path_info)

If you want to train the network with specific number k of cross-validation, you can set the number_of_fold. For example, if you want to run the 5-fold cross-validation:

# for python 3.x
from deepbiome import deepbiome

test_evaluation, train_evaluation, network = deepbiome.deepbiome_train(log, network_info, path_info, number_of_fold=5)

If you use one input file, it will run the 5-fold cross validation using all data in that file. If you use the list of the input files, it will run the training by the first k files.

By defaults, number_of_fold=None. Then, the code will run for the leave-one-out-cross-validation (LOOCV).

If you use one input file, it will run the LOOCV using all data in that file. If you use the list of the input files, it will repeat the training for every file.

Testing¶

If you want to test the pre-trained model, you can use the deepbiome.deepbiome_test function.

# for python 3.x
from deepbiome import deepbiome

evaluation = deepbiome.deepbiome_test(log, network_info, path_info, number_of_fold=None)

If you use the index file, this function provides the evaluation using test index (index set not included in the index file) for each fold. If not, this function provides the evaluation using the whole sample. If number_of_fold is setted as k, the function will test the model only with first k folds.

This function provides the evaluation result as a numpy array with a shape of (number of folds, number of evaluation measures).

Prediction¶

If you want to predict the output using the pre-trained model, you can use the deepbiome.deepbiome_prediction function.

# for python 3.x
from deepbiome import deepbiome

evaluation = deepbiome.deepbiome_prediction(log, network_info, path_info, num_classes = 1, number_of_fold=None, change_weight_for_each_fold=False)

If number_of_fold is set as k, the function will predict the output of the first k folds’ samples.

If change_weight_for_each_fold is set as False, the function will predict the output of every repeatition by same weight from the given path. If change_weight_for_each_fold is set as True, the function will predict the output of by each fold weight.

If ‘get_y=True’, the function will provide a list of tuples (prediction, true output) as a numpy array output with the shape of (n_samples, 2, n_classes). If ‘get_y=False’, the function will provide a numpy array of predictions only. The numpy array output will have the shape of (n_samples, n_classes).

Cheatsheet for running the project on console¶

Preprocessing the data (convert raw data to the format readable for python): (See Data preprocessing.)
Example: TODO (Julia)
Set configuration file about training hyperparameter and path information:
1. Set the training hyper-parameter (network_info.cfg): (See Configuration.)
  Example: https://github.com/Young-won/deepbiome/tree/master/examples/simulation_s0/simulation_s0_deepbiome//config/network_info.cfg
2. Set the path information (path_info.cfg): (See Configuration.)
  Example: https://github.com/Young-won/deepbiome/tree/master/examples/simulation_s0/simulation_s0_deepbiome/config/path_info.cfg

Write the python script for running the function deepbiome.deepbiome_train. For example:

Example: https://github.com/Young-won/deepbiome/tree/master/examples/main.py

Example of the python script:

import sys

from deepbiome import configuration
from deepbiome import logging_daily
from deepbiome import deepbiome
from deepbiome.utils import argv_parse

# Argument ##########################################################
argdict = argv_parse(sys.argv)
try: gpu_memory_fraction = float(argdict['gpu_memory_fraction'][0])
except: gpu_memory_fraction = None
try: max_queue_size=int(argdict['max_queue_size'][0])
except: max_queue_size=10
try: workers=int(argdict['workers'][0])
except: workers=1
try: use_multiprocessing=argdict['use_multiprocessing'][0]=='True'
except: use_multiprocessing=False

# Logger ###########################################################
logger = logging_daily.logging_daily(argdict['log_info'][0])
logger.reset_logging()
log = logger.get_logging()
log.setLevel(logging_daily.logging.INFO)

log.info('Argument input')
for argname, arg in argdict.items():
    log.info('    {}:{}'.format(argname,arg))

# Configuration ####################################################
config_data = configuration.Configurator(argdict['path_info'][0], log)
config_data.set_config_map(config_data.get_section_map())
config_data.print_config_map()

config_network = configuration.Configurator(argdict['network_info'][0], log)
config_network.set_config_map(config_network.get_section_map())
config_network.print_config_map()

path_info = config_data.get_config_map()
network_info = config_network.get_config_map()
test_evaluation, train_evaluation, network = deepbiome.deepbiome_train(log, network_info, path_info)

Check the available GPU (if you have no GPU, it will run on CPU):
nvidia-smi

Select the number of GPUs and CPU cores from the bash file:

Example: https://github.com/Young-won/deepbiome/tree/master/examples/simulation_s0/simulation_s0_deepbiome/run.sh

Example of the bash file (.sh):

export CUDA_VISIBLE_DEVICES=2
echo $CUDA_VISIBLE_DEVICES

model=${PWD##*/}
echo $model

python3 ../../main.py --log_info=config/log_info.yaml --path_info=config/path_info.cfg --network_info=config/network_info.cfg  --max_queue_size=50 --workers=10 --use_multiprocessing=False

Run the bash file!
./run.sh

Summary¶

To use deepbiome in a project:

from deepbiome import deepbiome

deepbiome.deepbiome.deepbiome_train(log, network_info, path_info, number_of_fold=None, tree_level_list=['Genus', 'Family', 'Order', 'Class', 'Phylum'], max_queue_size=10, workers=1, use_multiprocessing=False, verbose=True)[source]¶

Function for training the deep neural network with phylogenetic tree weight regularizer.

It uses microbiome abundance data as input and uses the phylogenetic taxonomy to guide the decision of the optimal number of layers and neurons in the deep learning architecture.

Parameters

log (logging instance) :: python logging instance for logging
network_info (dictionary) :: python dictionary with network_information
path_info (dictionary):: python dictionary with path_information
number_of_fold (int):: default=None
tree_level_list (list):: name of each level of the given reference tree weights default=[‘Genus’, ‘Family’, ‘Order’, ‘Class’, ‘Phylum’]
max_queue_size (int):: default=10
workers (int):: default=1
use_multiprocessing (boolean):: default=False
verbose (boolean):: show the log if True default=True

Returns

test_evaluation (numpy array):: numpy array of the evaluation using testset from all fold
train_evaluation (numpy array):: numpy array of the evaluation using training from all fold
network (deepbiome network instance):: deepbiome class instance

Examples

Training the deep neural network with phylogenetic tree weight regularizer.

test_evaluation, train_evaluation, network = deepbiome_train(log, network_info, path_info)

deepbiome.deepbiome.deepbiome_test(log, network_info, path_info, number_of_fold=None, tree_level_list=['Genus', 'Family', 'Order', 'Class', 'Phylum'], max_queue_size=10, workers=1, use_multiprocessing=False, verbose=True)[source]¶

Function for testing the pretrained deep neural network with phylogenetic tree weight regularizer.

If you use the index file, this function provide the evaluation using test index (index set not included in the index file) for each fold. If not, this function provide the evaluation using the whole samples.

Parameters

log (logging instance) :: python logging instance for logging
network_info (dictionary) :: python dictionary with network_information
path_info (dictionary):: python dictionary with path_information
number_of_fold (int):: If number_of_fold is set as k, the function will test the model only with first k folds. default=None
tree_level_list (list):: name of each level of the given reference tree weights default=[‘Genus’, ‘Family’, ‘Order’, ‘Class’, ‘Phylum’]
max_queue_size (int):: default=10
workers (int):: default=1
use_multiprocessing (boolean):: default=False
verbose (boolean):: show the log if True default=True

Returns

evaluation (numpy array):: evaluation result using testset as a numpy array with a shape of (number of fold, number of evaluation measures)

Examples

Test the pre-trained deep neural network with phylogenetic tree weight regularizer.

evaluation = deepbiome_test(log, network_info, path_info)

deepbiome.deepbiome.deepbiome_prediction(log, network_info, path_info, num_classes, number_of_fold=None, change_weight_for_each_fold=False, get_y=False, tree_level_list=['Genus', 'Family', 'Order', 'Class', 'Phylum'], max_queue_size=10, workers=1, use_multiprocessing=False, verbose=True)[source]¶

Function for prediction by the pretrained deep neural network with phylogenetic tree weight regularizer.

Parameters

log (logging instance) :

python logging instance for logging

network_info (dictionary) :

python dictionary with network_information

path_info (dictionary):

python dictionary with path_information

num_classes (int):

number of classes for the network. 0 for regression, 1 for binary classificatin.

number_of_fold (int):

For the list of input files for repeatitions, the function will predict the output of the first number_of_fold repetitions. If number_of_fold is None, then the function will predict the output of the whole repetitions.
For the one input file for cross-validation, the function will predict the output of the k-fold cross validatoin. If number_of_fold is None, then the function will predict the output of the LOOCV.

default=None

change_weight_for_each_fold (boolean):

If True, weight will be changed for each fold (repetition). For example, if the given weight’s name is weight.h5 then weight_0.h5 will loaded for the first fold (repetition). If False, weight path in the path_info will used for whole prediction. For example, if the given weight’s name is weight_0.h5 then weight_0.h5 will used for whole fold (repetition). default=False

get_y (boolean):

If ‘True’, the function will provide a list of tuples (prediction, true output) as a output. degault=False

tree_level_list (list):

name of each level of the given reference tree weights default=[‘Genus’, ‘Family’, ‘Order’, ‘Class’, ‘Phylum’]

max_queue_size (int):

default=10

workers (int):

default=1

use_multiprocessing (boolean):

default=False

verbose (boolean):

show the log if True default=True

Returns

prediction (numpy array):: prediction using whole dataset in the data path

Examples

Prediction by the pre-trained deep neural network with phylogenetic tree weight regularizer.

prediction = deepbiome_predictoin(log, network_info, path_info, num_classes)

For LOOCV prediction, we can use this options. prediction = deepbiome_predictoin(log, network_info, path_info, num_classes, number_of_fold=None, change_weight_for_each_fold=True)

deepbiome.deepbiome.deepbiome_get_trained_weight(log, network_info, path_info, num_classes, weight_path, tree_level_list=['Genus', 'Family', 'Order', 'Class', 'Phylum'], verbose=True)[source]¶

Function for prediction by the pretrained deep neural network with phylogenetic tree weight regularizer.

Parameters

log (logging instance) :: python logging instance for logging
network_info (dictionary) :: python dictionary with network_information
path_info (dictionary):: python dictionary with path_information
num_classes (int):: number of classes for the network. 0 for regression, 1 for binary classificatin.
weight_path (string):: path of the model weight
tree_level_list (list):: name of each level of the given reference tree weights default=[‘Genus’, ‘Family’, ‘Order’, ‘Class’, ‘Phylum’]
verbose (boolean):: show the log if True default=True

Returns

list of pandas dataframe:: the trained model’s weight

Examples

Trained weight of the deep neural network with phylogenetic tree weight regularizer.

tree_weight_list = deepbiome_get_trained_weight(log, network_info, path_info, num_classes, weight_path)

deepbiome.deepbiome.deepbiome_taxa_selection_performance(log, network_info, path_info, num_classes, true_tree_weight_list, trained_weight_path_list, tree_level_list=['Genus', 'Family', 'Order', 'Class', 'Phylum'], lvl_category_dict=None, verbose=True)[source]¶

Function for prediction by the pretrained deep neural network with phylogenetic tree weight regularizer.

Parameters

log (logging instance) :: python logging instance for logging
network_info (dictionary) :: python dictionary with network_information
path_info (dictionary):: python dictionary with path_information
num_classes (int):: number of classes for the network. 0 for regression, 1 for binary classificatin.
true_tree_weight_list (ndarray):: lists of the true weight information with the shape of (k folds, number of weights) true_tree_weight_list[0][0] is the true weight information between the first and second layers for the first fold. It is a numpy array with the shape of (number of nodes for the first layer, number of nodes for the second layer).
trained_weight_path_list (list):: lists of the path of trained weight for each fold.
tree_level_list (list):: name of each level of the given reference tree weights default=[‘Genus’, ‘Family’, ‘Order’, ‘Class’, ‘Phylum’]
verbose (boolean):: show the log if True default=True

Returns

summary (numpy array):: summary of the taxa selection performance

Examples

The taxa selection performance of the trained deep neural network with phylogenetic tree weight regularizer.

summary = deepbiome_taxa_selection_performance(log, network_info, path_info, num_classes)

deepbiome.deepbiome.deepbiome_draw_phylogenetic_tree(log, network_info, path_info, num_classes, file_name='%%inline', img_w=500, branch_vertical_margin=20, arc_start=0, arc_span=360, node_name_on=True, name_fsize=10, tree_weight_on=True, tree_weight=None, tree_level_list=['Genus', 'Family', 'Order', 'Class', 'Phylum'], weight_opacity=0.4, weight_max_radios=10, phylum_background_color_on=True, phylum_color=[], phylum_color_legend=False, show_covariates=True, verbose=True)[source]¶

Draw phylogenetic tree

Parameters

log (logging instance) :: python logging instance for logging
network_info (dictionary) :: python dictionary with network_information
path_info (dictionary):: python dictionary with path_information
num_classes (int):: number of classes for the network. 0 for regression, 1 for binary classificatin.
file_name (str):: name of the figure for save. - “.png”, “.jpg” - “%%inline” for notebook inline output. default=”%%inline”
img_w (int):: image width (pt) default=500
branch_vertical_margin (int):: vertical margin for branch default=20
arc_start (int):: angle that arc start default=0
arc_span (int):: total amount of angle for the arc span default=360
node_name_on (boolean):: show the name of the last leaf node if True default=False
name_fsize (int):: font size for the name of the last leaf node default=10
tree_weight_on (boolean):: show the amount and the direction of the weight for each edge in the tree by circle size and color. default=True
tree_weight (ndarray):: reference tree weights default=None
tree_level_list (list):: name of each level of the given reference tree weights default=[‘Genus’, ‘Family’, ‘Order’, ‘Class’, ‘Phylum’]
weight_opacity (float):: opacity for weight circle default= 0.4
weight_max_radios (int):: maximum radios for weight circle default= 10
phylum_background_color_on (boolean):: show the background color for each phylum based on phylumn_color. default= True
phylum_color (list):: specify the list of background colors for phylum level. If phylumn_color is empty, it will arbitrarily assign the color for each phylum. default= []
phylum_color_legend (boolean):: show the legend for the background colors for phylum level default= False
show_covariates (boolean):: show the effect of the covariates default= True
verbose (boolean):: show the log if True default=True
Returns
——-

Examples

Draw phylogenetic tree

deepbiome_draw_phylogenetic_tree(log, network_info, path_info, num_classes, file_name = “%%inline”)