Usage

Preparing the dataset

We assume that you have the dataset

See Data preprocessing.

Preparing the configufation

To provide configurations for the training process and path information, we can use either dictionaries as input to functions or configuration files.

Using the configuration dictionary

You can use the network configuration dictionary network_info and the path configuration dictionary path_info for configuration.

See Configuration.

Using the configuration file (.cfg)

You can use the configuration format files .cfg for configuration (for example, path_info.cfg and network_info.cfg).

See Configuration.

If then, you can get the dictionaries from the config files by:

# for python 3.x
from deepbiome import configuration

config_data = configuration.Configurator('./path_info.cfg', log)
config_data.set_config_map(config_data.get_section_map())
config_data.print_config_map()
path_info = config_data.get_config_map()

config_network = configuration.Configurator('./network_info.cfg', log)
config_network.set_config_map(config_network.get_section_map())
config_network.print_config_map()
network_info = config_network.get_config_map()

Logging

For logging, we can use the logging instance. For example:

# for python 3.x
import logging

logging.basicConfig(format = '[%(name)-8s|%(levelname)s|%(filename)s:%(lineno)s] %(message)s',
                level=logging.DEBUG)

log = logging.getLogger()

We can use the logging instance log as an input of the main training fuction below.

For more information about logging module, please check the documentation logging.

Training

To use DeepBiome:

# for python 3.x
from deepbiome import deepbiome

test_evaluation, train_evaluation, network = deepbiome.deepbiome_train(log, network_info, path_info)

If you want to train the network with specific number k of cross-validation, you can set the number_of_fold. For example, if you want to run the 5-fold cross-validation:

# for python 3.x
from deepbiome import deepbiome

test_evaluation, train_evaluation, network = deepbiome.deepbiome_train(log, network_info, path_info, number_of_fold=5)

If you use one input file, it will run the 5-fold cross validation using all data in that file. If you use the list of the input files, it will run the training by the first k files.

By defaults, number_of_fold=None. Then, the code will run for the leave-one-out-cross-validation (LOOCV).

If you use one input file, it will run the LOOCV using all data in that file. If you use the list of the input files, it will repeat the training for every file.

Testing

If you want to test the pre-trained model, you can use the deepbiome.deepbiome_test function.

# for python 3.x
from deepbiome import deepbiome

evaluation = deepbiome.deepbiome_test(log, network_info, path_info, number_of_fold=None)

If you use the index file, this function provides the evaluation using test index (index set not included in the index file) for each fold. If not, this function provides the evaluation using the whole sample. If number_of_fold is setted as k, the function will test the model only with first k folds.

This function provides the evaluation result as a numpy array with a shape of (number of folds, number of evaluation measures).

Prediction

If you want to predict the output using the pre-trained model, you can use the deepbiome.deepbiome_prediction function.

# for python 3.x
from deepbiome import deepbiome

evaluation = deepbiome.deepbiome_prediction(log, network_info, path_info, num_classes = 1, number_of_fold=None, change_weight_for_each_fold=False)

If number_of_fold is set as k, the function will predict the output of the first k folds’ samples.

If change_weight_for_each_fold is set as False, the function will predict the output of every repeatition by same weight from the given path. If change_weight_for_each_fold is set as True, the function will predict the output of by each fold weight.

If ‘get_y=True’, the function will provide a list of tuples (prediction, true output) as a numpy array output with the shape of (n_samples, 2, n_classes). If ‘get_y=False’, the function will provide a numpy array of predictions only. The numpy array output will have the shape of (n_samples, n_classes).

Cheatsheet for running the project on console

  1. Preprocessing the data (convert raw data to the format readable for python): (See Data preprocessing.)

    Example: TODO (Julia)

  2. Set configuration file about training hyperparameter and path information:
    1. Set the training hyper-parameter (network_info.cfg): (See Configuration.)

      Example: https://github.com/Young-won/deepbiome/tree/master/examples/simulation_s0/simulation_s0_deepbiome//config/network_info.cfg

    2. Set the path information (path_info.cfg): (See Configuration.)

      Example: https://github.com/Young-won/deepbiome/tree/master/examples/simulation_s0/simulation_s0_deepbiome/config/path_info.cfg

  3. Write the python script for running the function deepbiome.deepbiome_train. For example:

    Example: https://github.com/Young-won/deepbiome/tree/master/examples/main.py

    Example of the python script:

    import sys
    
    from deepbiome import configuration
    from deepbiome import logging_daily
    from deepbiome import deepbiome
    from deepbiome.utils import argv_parse
    
    # Argument ##########################################################
    argdict = argv_parse(sys.argv)
    try: gpu_memory_fraction = float(argdict['gpu_memory_fraction'][0])
    except: gpu_memory_fraction = None
    try: max_queue_size=int(argdict['max_queue_size'][0])
    except: max_queue_size=10
    try: workers=int(argdict['workers'][0])
    except: workers=1
    try: use_multiprocessing=argdict['use_multiprocessing'][0]=='True'
    except: use_multiprocessing=False
    
    # Logger ###########################################################
    logger = logging_daily.logging_daily(argdict['log_info'][0])
    logger.reset_logging()
    log = logger.get_logging()
    log.setLevel(logging_daily.logging.INFO)
    
    log.info('Argument input')
    for argname, arg in argdict.items():
        log.info('    {}:{}'.format(argname,arg))
    
    # Configuration ####################################################
    config_data = configuration.Configurator(argdict['path_info'][0], log)
    config_data.set_config_map(config_data.get_section_map())
    config_data.print_config_map()
    
    config_network = configuration.Configurator(argdict['network_info'][0], log)
    config_network.set_config_map(config_network.get_section_map())
    config_network.print_config_map()
    
    path_info = config_data.get_config_map()
    network_info = config_network.get_config_map()
    test_evaluation, train_evaluation, network = deepbiome.deepbiome_train(log, network_info, path_info)
    
  1. Check the available GPU (if you have no GPU, it will run on CPU):

    nvidia-smi
    
  2. Select the number of GPUs and CPU cores from the bash file:

    Example: https://github.com/Young-won/deepbiome/tree/master/examples/simulation_s0/simulation_s0_deepbiome/run.sh

    Example of the bash file (.sh):

    export CUDA_VISIBLE_DEVICES=2
    echo $CUDA_VISIBLE_DEVICES
    
    model=${PWD##*/}
    echo $model
    
    python3 ../../main.py --log_info=config/log_info.yaml --path_info=config/path_info.cfg --network_info=config/network_info.cfg  --max_queue_size=50 --workers=10 --use_multiprocessing=False
    
  3. Run the bash file!

    ./run.sh
    

Summary

To use deepbiome in a project:

from deepbiome import deepbiome
deepbiome.deepbiome.deepbiome_train(log, network_info, path_info, number_of_fold=None, tree_level_list=['Genus', 'Family', 'Order', 'Class', 'Phylum'], max_queue_size=10, workers=1, use_multiprocessing=False, verbose=True)[source]

Function for training the deep neural network with phylogenetic tree weight regularizer.

It uses microbiome abundance data as input and uses the phylogenetic taxonomy to guide the decision of the optimal number of layers and neurons in the deep learning architecture.

Parameters
log (logging instance) :

python logging instance for logging

network_info (dictionary) :

python dictionary with network_information

path_info (dictionary):

python dictionary with path_information

number_of_fold (int):

default=None

tree_level_list (list):

name of each level of the given reference tree weights default=[‘Genus’, ‘Family’, ‘Order’, ‘Class’, ‘Phylum’]

max_queue_size (int):

default=10

workers (int):

default=1

use_multiprocessing (boolean):

default=False

verbose (boolean):

show the log if True default=True

Returns
test_evaluation (numpy array):

numpy array of the evaluation using testset from all fold

train_evaluation (numpy array):

numpy array of the evaluation using training from all fold

network (deepbiome network instance):

deepbiome class instance

Examples

Training the deep neural network with phylogenetic tree weight regularizer.

test_evaluation, train_evaluation, network = deepbiome_train(log, network_info, path_info)

deepbiome.deepbiome.deepbiome_test(log, network_info, path_info, number_of_fold=None, tree_level_list=['Genus', 'Family', 'Order', 'Class', 'Phylum'], max_queue_size=10, workers=1, use_multiprocessing=False, verbose=True)[source]

Function for testing the pretrained deep neural network with phylogenetic tree weight regularizer.

If you use the index file, this function provide the evaluation using test index (index set not included in the index file) for each fold. If not, this function provide the evaluation using the whole samples.

Parameters
log (logging instance) :

python logging instance for logging

network_info (dictionary) :

python dictionary with network_information

path_info (dictionary):

python dictionary with path_information

number_of_fold (int):

If number_of_fold is set as k, the function will test the model only with first k folds. default=None

tree_level_list (list):

name of each level of the given reference tree weights default=[‘Genus’, ‘Family’, ‘Order’, ‘Class’, ‘Phylum’]

max_queue_size (int):

default=10

workers (int):

default=1

use_multiprocessing (boolean):

default=False

verbose (boolean):

show the log if True default=True

Returns
evaluation (numpy array):

evaluation result using testset as a numpy array with a shape of (number of fold, number of evaluation measures)

Examples

Test the pre-trained deep neural network with phylogenetic tree weight regularizer.

evaluation = deepbiome_test(log, network_info, path_info)

deepbiome.deepbiome.deepbiome_prediction(log, network_info, path_info, num_classes, number_of_fold=None, change_weight_for_each_fold=False, get_y=False, tree_level_list=['Genus', 'Family', 'Order', 'Class', 'Phylum'], max_queue_size=10, workers=1, use_multiprocessing=False, verbose=True)[source]

Function for prediction by the pretrained deep neural network with phylogenetic tree weight regularizer.

Parameters
log (logging instance) :

python logging instance for logging

network_info (dictionary) :

python dictionary with network_information

path_info (dictionary):

python dictionary with path_information

num_classes (int):

number of classes for the network. 0 for regression, 1 for binary classificatin.

number_of_fold (int):
  1. For the list of input files for repeatitions, the function will predict the output of the first number_of_fold repetitions. If number_of_fold is None, then the function will predict the output of the whole repetitions.

  2. For the one input file for cross-validation, the function will predict the output of the k-fold cross validatoin. If number_of_fold is None, then the function will predict the output of the LOOCV.

default=None

change_weight_for_each_fold (boolean):

If True, weight will be changed for each fold (repetition). For example, if the given weight’s name is weight.h5 then weight_0.h5 will loaded for the first fold (repetition). If False, weight path in the path_info will used for whole prediction. For example, if the given weight’s name is weight_0.h5 then weight_0.h5 will used for whole fold (repetition). default=False

get_y (boolean):

If ‘True’, the function will provide a list of tuples (prediction, true output) as a output. degault=False

tree_level_list (list):

name of each level of the given reference tree weights default=[‘Genus’, ‘Family’, ‘Order’, ‘Class’, ‘Phylum’]

max_queue_size (int):

default=10

workers (int):

default=1

use_multiprocessing (boolean):

default=False

verbose (boolean):

show the log if True default=True

Returns
prediction (numpy array):

prediction using whole dataset in the data path

Examples

Prediction by the pre-trained deep neural network with phylogenetic tree weight regularizer.

prediction = deepbiome_predictoin(log, network_info, path_info, num_classes)

For LOOCV prediction, we can use this options. prediction = deepbiome_predictoin(log, network_info, path_info, num_classes, number_of_fold=None, change_weight_for_each_fold=True)

deepbiome.deepbiome.deepbiome_get_trained_weight(log, network_info, path_info, num_classes, weight_path, tree_level_list=['Genus', 'Family', 'Order', 'Class', 'Phylum'], verbose=True)[source]

Function for prediction by the pretrained deep neural network with phylogenetic tree weight regularizer.

Parameters
log (logging instance) :

python logging instance for logging

network_info (dictionary) :

python dictionary with network_information

path_info (dictionary):

python dictionary with path_information

num_classes (int):

number of classes for the network. 0 for regression, 1 for binary classificatin.

weight_path (string):

path of the model weight

tree_level_list (list):

name of each level of the given reference tree weights default=[‘Genus’, ‘Family’, ‘Order’, ‘Class’, ‘Phylum’]

verbose (boolean):

show the log if True default=True

Returns
list of pandas dataframe:

the trained model’s weight

Examples

Trained weight of the deep neural network with phylogenetic tree weight regularizer.

tree_weight_list = deepbiome_get_trained_weight(log, network_info, path_info, num_classes, weight_path)

deepbiome.deepbiome.deepbiome_taxa_selection_performance(log, network_info, path_info, num_classes, true_tree_weight_list, trained_weight_path_list, tree_level_list=['Genus', 'Family', 'Order', 'Class', 'Phylum'], lvl_category_dict=None, verbose=True)[source]

Function for prediction by the pretrained deep neural network with phylogenetic tree weight regularizer.

Parameters
log (logging instance) :

python logging instance for logging

network_info (dictionary) :

python dictionary with network_information

path_info (dictionary):

python dictionary with path_information

num_classes (int):

number of classes for the network. 0 for regression, 1 for binary classificatin.

true_tree_weight_list (ndarray):

lists of the true weight information with the shape of (k folds, number of weights) true_tree_weight_list[0][0] is the true weight information between the first and second layers for the first fold. It is a numpy array with the shape of (number of nodes for the first layer, number of nodes for the second layer).

trained_weight_path_list (list):

lists of the path of trained weight for each fold.

tree_level_list (list):

name of each level of the given reference tree weights default=[‘Genus’, ‘Family’, ‘Order’, ‘Class’, ‘Phylum’]

verbose (boolean):

show the log if True default=True

Returns
summary (numpy array):

summary of the taxa selection performance

Examples

The taxa selection performance of the trained deep neural network with phylogenetic tree weight regularizer.

summary = deepbiome_taxa_selection_performance(log, network_info, path_info, num_classes)

deepbiome.deepbiome.deepbiome_draw_phylogenetic_tree(log, network_info, path_info, num_classes, file_name='%%inline', img_w=500, branch_vertical_margin=20, arc_start=0, arc_span=360, node_name_on=True, name_fsize=10, tree_weight_on=True, tree_weight=None, tree_level_list=['Genus', 'Family', 'Order', 'Class', 'Phylum'], weight_opacity=0.4, weight_max_radios=10, phylum_background_color_on=True, phylum_color=[], phylum_color_legend=False, show_covariates=True, verbose=True)[source]

Draw phylogenetic tree

Parameters
log (logging instance) :

python logging instance for logging

network_info (dictionary) :

python dictionary with network_information

path_info (dictionary):

python dictionary with path_information

num_classes (int):

number of classes for the network. 0 for regression, 1 for binary classificatin.

file_name (str):

name of the figure for save. - “.png”, “.jpg” - “%%inline” for notebook inline output. default=”%%inline”

img_w (int):

image width (pt) default=500

branch_vertical_margin (int):

vertical margin for branch default=20

arc_start (int):

angle that arc start default=0

arc_span (int):

total amount of angle for the arc span default=360

node_name_on (boolean):

show the name of the last leaf node if True default=False

name_fsize (int):

font size for the name of the last leaf node default=10

tree_weight_on (boolean):

show the amount and the direction of the weight for each edge in the tree by circle size and color. default=True

tree_weight (ndarray):

reference tree weights default=None

tree_level_list (list):

name of each level of the given reference tree weights default=[‘Genus’, ‘Family’, ‘Order’, ‘Class’, ‘Phylum’]

weight_opacity (float):

opacity for weight circle default= 0.4

weight_max_radios (int):

maximum radios for weight circle default= 10

phylum_background_color_on (boolean):

show the background color for each phylum based on phylumn_color. default= True

phylum_color (list):

specify the list of background colors for phylum level. If phylumn_color is empty, it will arbitrarily assign the color for each phylum. default= []

phylum_color_legend (boolean):

show the legend for the background colors for phylum level default= False

show_covariates (boolean):

show the effect of the covariates default= True

verbose (boolean):

show the log if True default=True

Returns
——-

Examples

Draw phylogenetic tree

deepbiome_draw_phylogenetic_tree(log, network_info, path_info, num_classes, file_name = “%%inline”)