Getting start with the classification problem.

Let’s start with the baby step example for classification problem. Below is the basic example of the configuration for binary classification problem using example data contained in the package. For the required data type or more detailed configuration, please check the detailed information about each option in the documantation and the detailed examples

[1]:
from pkg_resources import resource_filename
[2]:
network_info = {
    'architecture_info': {
        'batch_normalization': 'False',
        'drop_out': '0',
        'weight_initial': 'glorot_uniform',
        'weight_l1_penalty':'0.01',
        'weight_decay': 'phylogenetic_tree',
    },
    'model_info': {
        'decay': '0.001',
        'loss': 'binary_crossentropy',
        'lr': '0.01',
        'metrics': 'binary_accuracy, sensitivity, specificity, gmeasure, auc',
        'network_class': 'DeepBiomeNetwork',
        'normalizer': 'normalize_minmax',
        'optimizer': 'adam',
        'reader_class': 'MicroBiomeClassificationReader',
        'taxa_selection_metrics': 'sensitivity, specificity, gmeasure, accuracy',
    },
    'training_info': {
        'batch_size': '200',
        'epochs': '10',
        'callbacks': 'ModelCheckpoint',
        'monitor': 'val_loss',
        'mode' : 'min',
        'min_delta': '1e-7',
    },
    'validation_info': {
        'batch_size': 'None',
        'validation_size': '0.2'
    },
    'test_info': {
        'batch_size': 'None',
    },
}

path_info = {
    'data_info': {
        'data_path': resource_filename('deepbiome', 'tests/data'),
        'idx_path': resource_filename('deepbiome', 'tests/data/onefile_idx.csv'),
        'tree_info_path': resource_filename('deepbiome', 'tests/data/genus48_dic.csv'),
        'x_path': 'onefile_x.csv',
        'y_path': 'classification_y.csv'
    },
    'model_info': {
        'evaluation': 'eval.npy',
        'history': 'hist.json',
        'model_dir': './',
        'weight': 'weight.h5'
    }
}

For logging, we used the python logging library.

[3]:
import logging

logging.basicConfig(format = '[%(name)-8s|%(levelname)s|%(filename)s:%(lineno)s] %(message)s',
                    level=logging.DEBUG)
log = logging.getLogger()

Here is the deepbiome.deepbiome_train function for training:

[4]:
from deepbiome import deepbiome

test_evaluation, train_evaluation, network = deepbiome.deepbiome_train(log, network_info, path_info, number_of_fold=2)
Using TensorFlow backend.
[root    |INFO|deepbiome.py:115] -----------------------------------------------------------------
[root    |INFO|deepbiome.py:153] -------1 simulation start!----------------------------------
[root    |INFO|readers.py:58] -----------------------------------------------------------------------
[root    |INFO|readers.py:59] Construct Dataset
[root    |INFO|readers.py:60] -----------------------------------------------------------------------
[root    |INFO|readers.py:61] Load data
[root    |INFO|deepbiome.py:164] -----------------------------------------------------------------
[root    |INFO|deepbiome.py:165] Build network for 1 simulation
[root    |INFO|build_network.py:521] ------------------------------------------------------------------------------------------
[root    |INFO|build_network.py:522] Read phylogenetic tree information from /DATA/home/muha/github_repos/deepbiome/deepbiome/tests/data/genus48_dic.csv
[root    |INFO|build_network.py:528] Phylogenetic tree level list: ['Genus', 'Family', 'Order', 'Class', 'Phylum']
[root    |INFO|build_network.py:529] ------------------------------------------------------------------------------------------
[root    |INFO|build_network.py:537]      Genus: 48
[root    |INFO|build_network.py:537]     Family: 40
[root    |INFO|build_network.py:537]      Order: 23
[root    |INFO|build_network.py:537]      Class: 17
[root    |INFO|build_network.py:537]     Phylum: 9
[root    |INFO|build_network.py:546] ------------------------------------------------------------------------------------------
[root    |INFO|build_network.py:547] Phylogenetic_tree_dict info: ['Class', 'Number', 'Order', 'Family', 'Phylum', 'Genus']
[root    |INFO|build_network.py:548] ------------------------------------------------------------------------------------------
[root    |INFO|build_network.py:558] Build edge weights between [ Genus, Family]
[root    |INFO|build_network.py:558] Build edge weights between [Family,  Order]
[root    |INFO|build_network.py:558] Build edge weights between [ Order,  Class]
[root    |INFO|build_network.py:558] Build edge weights between [ Class, Phylum]
[root    |INFO|build_network.py:571] ------------------------------------------------------------------------------------------
[root    |INFO|build_network.py:586] ------------------------------------------------------------------------------------------
[root    |INFO|build_network.py:587] Build network based on phylogenetic tree information
[root    |INFO|build_network.py:588] ------------------------------------------------------------------------------------------
[tensorflow|WARNING|deprecation.py:328] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/resource_variable_ops.py:432: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
[root    |INFO|build_network.py:670] ------------------------------------------------------------------------------------------
Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input (InputLayer)           (None, 48)                0
_________________________________________________________________
l1_dense (Dense_with_tree)   (None, 40)                1960
_________________________________________________________________
l1_activation (Activation)   (None, 40)                0
_________________________________________________________________
l2_dense (Dense_with_tree)   (None, 23)                943
_________________________________________________________________
l2_activation (Activation)   (None, 23)                0
_________________________________________________________________
l3_dense (Dense_with_tree)   (None, 17)                408
_________________________________________________________________
l3_activation (Activation)   (None, 17)                0
_________________________________________________________________
l4_dense (Dense_with_tree)   (None, 9)                 162
_________________________________________________________________
l4_activation (Activation)   (None, 9)                 0
_________________________________________________________________
last_dense_h (Dense)         (None, 1)                 10
_________________________________________________________________
p_hat (Activation)           (None, 1)                 0
=================================================================
Total params: 3,483
Trainable params: 3,483
Non-trainable params: 0
_________________________________________________________________
[root    |INFO|build_network.py:61] Build Network
[root    |INFO|build_network.py:62] Optimizer = adam
[root    |INFO|build_network.py:63] Loss = binary_crossentropy
[root    |INFO|build_network.py:64] Metrics = binary_accuracy, sensitivity, specificity, gmeasure, auc
[root    |INFO|deepbiome.py:176] -----------------------------------------------------------------
[root    |INFO|deepbiome.py:177] 1 fold computing start!----------------------------------
[root    |INFO|build_network.py:137] Training start!
[tensorflow|WARNING|deprecation.py:328] From /usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py:2862: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Train on 640 samples, validate on 160 samples
Epoch 1/10
640/640 [==============================] - 1s 1ms/step - loss: 0.6898 - binary_accuracy: 0.6047 - sensitivity: 0.8222 - specificity: 0.1731 - gmeasure: 0.1118 - auc: 0.4496 - val_loss: 0.6786 - val_binary_accuracy: 0.6938 - val_sensitivity: 1.0000 - val_specificity: 0.0000e+00 - val_gmeasure: 0.0000e+00 - val_auc: 0.4840
Epoch 2/10
640/640 [==============================] - 0s 205us/step - loss: 0.6752 - binary_accuracy: 0.6844 - sensitivity: 1.0000 - specificity: 0.0000e+00 - gmeasure: 0.0000e+00 - auc: 0.5014 - val_loss: 0.6618 - val_binary_accuracy: 0.6938 - val_sensitivity: 1.0000 - val_specificity: 0.0000e+00 - val_gmeasure: 0.0000e+00 - val_auc: 0.5005
Epoch 3/10
640/640 [==============================] - 0s 90us/step - loss: 0.6596 - binary_accuracy: 0.6844 - sensitivity: 1.0000 - specificity: 0.0000e+00 - gmeasure: 0.0000e+00 - auc: 0.5386 - val_loss: 0.6450 - val_binary_accuracy: 0.6938 - val_sensitivity: 1.0000 - val_specificity: 0.0000e+00 - val_gmeasure: 0.0000e+00 - val_auc: 0.5134
Epoch 4/10
640/640 [==============================] - 0s 63us/step - loss: 0.6434 - binary_accuracy: 0.6844 - sensitivity: 1.0000 - specificity: 0.0000e+00 - gmeasure: 0.0000e+00 - auc: 0.4943 - val_loss: 0.6301 - val_binary_accuracy: 0.6938 - val_sensitivity: 1.0000 - val_specificity: 0.0000e+00 - val_gmeasure: 0.0000e+00 - val_auc: 0.5203
Epoch 5/10
640/640 [==============================] - 0s 78us/step - loss: 0.6323 - binary_accuracy: 0.6844 - sensitivity: 1.0000 - specificity: 0.0000e+00 - gmeasure: 0.0000e+00 - auc: 0.5018 - val_loss: 0.6195 - val_binary_accuracy: 0.6938 - val_sensitivity: 1.0000 - val_specificity: 0.0000e+00 - val_gmeasure: 0.0000e+00 - val_auc: 0.5243
Epoch 6/10
640/640 [==============================] - 0s 68us/step - loss: 0.6243 - binary_accuracy: 0.6844 - sensitivity: 1.0000 - specificity: 0.0000e+00 - gmeasure: 0.0000e+00 - auc: 0.5288 - val_loss: 0.6160 - val_binary_accuracy: 0.6938 - val_sensitivity: 1.0000 - val_specificity: 0.0000e+00 - val_gmeasure: 0.0000e+00 - val_auc: 0.5334
Epoch 7/10
640/640 [==============================] - 0s 62us/step - loss: 0.6238 - binary_accuracy: 0.6844 - sensitivity: 1.0000 - specificity: 0.0000e+00 - gmeasure: 0.0000e+00 - auc: 0.5577 - val_loss: 0.6169 - val_binary_accuracy: 0.6938 - val_sensitivity: 1.0000 - val_specificity: 0.0000e+00 - val_gmeasure: 0.0000e+00 - val_auc: 0.5526
Epoch 8/10
640/640 [==============================] - 0s 81us/step - loss: 0.6257 - binary_accuracy: 0.6844 - sensitivity: 1.0000 - specificity: 0.0000e+00 - gmeasure: 0.0000e+00 - auc: 0.5954 - val_loss: 0.6169 - val_binary_accuracy: 0.6938 - val_sensitivity: 1.0000 - val_specificity: 0.0000e+00 - val_gmeasure: 0.0000e+00 - val_auc: 0.5736
Epoch 9/10
640/640 [==============================] - ETA: 0s - loss: 0.6430 - binary_accuracy: 0.6650 - sensitivity: 1.0000 - specificity: 0.0000e+00 - gmeasure: 0.0000e+00 - auc: 0.53 - 0s 78us/step - loss: 0.6254 - binary_accuracy: 0.6844 - sensitivity: 1.0000 - specificity: 0.0000e+00 - gmeasure: 0.0000e+00 - auc: 0.6563 - val_loss: 0.6161 - val_binary_accuracy: 0.6938 - val_sensitivity: 1.0000 - val_specificity: 0.0000e+00 - val_gmeasure: 0.0000e+00 - val_auc: 0.6002
Epoch 10/10
640/640 [==============================] - 0s 91us/step - loss: 0.6238 - binary_accuracy: 0.6844 - sensitivity: 1.0000 - specificity: 0.0000e+00 - gmeasure: 0.0000e+00 - auc: 0.6318 - val_loss: 0.6161 - val_binary_accuracy: 0.6938 - val_sensitivity: 1.0000 - val_specificity: 0.0000e+00 - val_gmeasure: 0.0000e+00 - val_auc: 0.6208
[root    |INFO|build_network.py:87] Load trained model weight at ./weight_0.h5
[root    |INFO|build_network.py:147] Training end with time 3.954432725906372!
[root    |INFO|build_network.py:83] Saved trained model weight at ./weight_0.h5
[root    |DEBUG|deepbiome.py:185] Save weight at ./weight_0.h5
[root    |DEBUG|deepbiome.py:188] Save history at ./hist_0.json
[root    |INFO|build_network.py:173] Evaluation start!
800/800 [==============================] - 0s 7us/step
[root    |INFO|build_network.py:178] Evaluation end with time 0.015036821365356445!
[root    |INFO|build_network.py:179] Evaluation: [0.6221581101417542, 0.6862499713897705, 1.0, 0.0, 0.0, 0.5438174605369568]
[root    |INFO|build_network.py:173] Evaluation start!
200/200 [==============================] - 0s 22us/step
[root    |INFO|build_network.py:178] Evaluation end with time 0.013112068176269531!
[root    |INFO|build_network.py:179] Evaluation: [0.6190831661224365, 0.6899999976158142, 1.0, 0.0, 0.0, 0.6127863526344299]
[root    |INFO|deepbiome.py:199] Compute time : 5.095764636993408
[root    |INFO|deepbiome.py:200] 1 fold computing end!---------------------------------------------
[root    |INFO|deepbiome.py:153] -------2 simulation start!----------------------------------
[root    |INFO|readers.py:58] -----------------------------------------------------------------------
[root    |INFO|readers.py:59] Construct Dataset
[root    |INFO|readers.py:60] -----------------------------------------------------------------------
[root    |INFO|readers.py:61] Load data
[root    |INFO|deepbiome.py:164] -----------------------------------------------------------------
[root    |INFO|deepbiome.py:165] Build network for 2 simulation
[root    |INFO|build_network.py:521] ------------------------------------------------------------------------------------------
[root    |INFO|build_network.py:522] Read phylogenetic tree information from /DATA/home/muha/github_repos/deepbiome/deepbiome/tests/data/genus48_dic.csv
[root    |INFO|build_network.py:528] Phylogenetic tree level list: ['Genus', 'Family', 'Order', 'Class', 'Phylum']
[root    |INFO|build_network.py:529] ------------------------------------------------------------------------------------------
[root    |INFO|build_network.py:537]      Genus: 48
[root    |INFO|build_network.py:537]     Family: 40
[root    |INFO|build_network.py:537]      Order: 23
[root    |INFO|build_network.py:537]      Class: 17
[root    |INFO|build_network.py:537]     Phylum: 9
[root    |INFO|build_network.py:546] ------------------------------------------------------------------------------------------
[root    |INFO|build_network.py:547] Phylogenetic_tree_dict info: ['Class', 'Number', 'Order', 'Family', 'Phylum', 'Genus']
[root    |INFO|build_network.py:548] ------------------------------------------------------------------------------------------
[root    |INFO|build_network.py:558] Build edge weights between [ Genus, Family]
[root    |INFO|build_network.py:558] Build edge weights between [Family,  Order]
[root    |INFO|build_network.py:558] Build edge weights between [ Order,  Class]
[root    |INFO|build_network.py:558] Build edge weights between [ Class, Phylum]
[root    |INFO|build_network.py:571] ------------------------------------------------------------------------------------------
[root    |INFO|build_network.py:586] ------------------------------------------------------------------------------------------
[root    |INFO|build_network.py:587] Build network based on phylogenetic tree information
[root    |INFO|build_network.py:588] ------------------------------------------------------------------------------------------
[root    |INFO|build_network.py:670] ------------------------------------------------------------------------------------------
Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input (InputLayer)           (None, 48)                0
_________________________________________________________________
l1_dense (Dense_with_tree)   (None, 40)                1960
_________________________________________________________________
l1_activation (Activation)   (None, 40)                0
_________________________________________________________________
l2_dense (Dense_with_tree)   (None, 23)                943
_________________________________________________________________
l2_activation (Activation)   (None, 23)                0
_________________________________________________________________
l3_dense (Dense_with_tree)   (None, 17)                408
_________________________________________________________________
l3_activation (Activation)   (None, 17)                0
_________________________________________________________________
l4_dense (Dense_with_tree)   (None, 9)                 162
_________________________________________________________________
l4_activation (Activation)   (None, 9)                 0
_________________________________________________________________
last_dense_h (Dense)         (None, 1)                 10
_________________________________________________________________
p_hat (Activation)           (None, 1)                 0
=================================================================
Total params: 3,483
Trainable params: 3,483
Non-trainable params: 0
_________________________________________________________________
[root    |INFO|build_network.py:61] Build Network
[root    |INFO|build_network.py:62] Optimizer = adam
[root    |INFO|build_network.py:63] Loss = binary_crossentropy
[root    |INFO|build_network.py:64] Metrics = binary_accuracy, sensitivity, specificity, gmeasure, auc
[root    |INFO|deepbiome.py:176] -----------------------------------------------------------------
[root    |INFO|deepbiome.py:177] 2 fold computing start!----------------------------------
[root    |INFO|build_network.py:137] Training start!
Train on 640 samples, validate on 160 samples
Epoch 1/10
640/640 [==============================] - 1s 1ms/step - loss: 0.6910 - binary_accuracy: 0.6062 - sensitivity: 0.7500 - specificity: 0.2500 - gmeasure: 0.0000e+00 - auc: 0.5193 - val_loss: 0.6810 - val_binary_accuracy: 0.7375 - val_sensitivity: 1.0000 - val_specificity: 0.0000e+00 - val_gmeasure: 0.0000e+00 - val_auc: 0.5068
Epoch 2/10
640/640 [==============================] - 0s 62us/step - loss: 0.6791 - binary_accuracy: 0.7156 - sensitivity: 1.0000 - specificity: 0.0000e+00 - gmeasure: 0.0000e+00 - auc: 0.4727 - val_loss: 0.6677 - val_binary_accuracy: 0.7375 - val_sensitivity: 1.0000 - val_specificity: 0.0000e+00 - val_gmeasure: 0.0000e+00 - val_auc: 0.4705
Epoch 3/10
640/640 [==============================] - 0s 61us/step - loss: 0.6666 - binary_accuracy: 0.7156 - sensitivity: 1.0000 - specificity: 0.0000e+00 - gmeasure: 0.0000e+00 - auc: 0.5033 - val_loss: 0.6550 - val_binary_accuracy: 0.7375 - val_sensitivity: 1.0000 - val_specificity: 0.0000e+00 - val_gmeasure: 0.0000e+00 - val_auc: 0.4520
Epoch 4/10
640/640 [==============================] - 0s 74us/step - loss: 0.6557 - binary_accuracy: 0.7156 - sensitivity: 1.0000 - specificity: 0.0000e+00 - gmeasure: 0.0000e+00 - auc: 0.5027 - val_loss: 0.6428 - val_binary_accuracy: 0.7375 - val_sensitivity: 1.0000 - val_specificity: 0.0000e+00 - val_gmeasure: 0.0000e+00 - val_auc: 0.4691
Epoch 5/10
640/640 [==============================] - 0s 74us/step - loss: 0.6456 - binary_accuracy: 0.7156 - sensitivity: 1.0000 - specificity: 0.0000e+00 - gmeasure: 0.0000e+00 - auc: 0.5123 - val_loss: 0.6316 - val_binary_accuracy: 0.7375 - val_sensitivity: 1.0000 - val_specificity: 0.0000e+00 - val_gmeasure: 0.0000e+00 - val_auc: 0.4974
Epoch 6/10
640/640 [==============================] - 0s 61us/step - loss: 0.6358 - binary_accuracy: 0.7156 - sensitivity: 1.0000 - specificity: 0.0000e+00 - gmeasure: 0.0000e+00 - auc: 0.5090 - val_loss: 0.6218 - val_binary_accuracy: 0.7375 - val_sensitivity: 1.0000 - val_specificity: 0.0000e+00 - val_gmeasure: 0.0000e+00 - val_auc: 0.5056
Epoch 7/10
640/640 [==============================] - 0s 91us/step - loss: 0.6285 - binary_accuracy: 0.7156 - sensitivity: 1.0000 - specificity: 0.0000e+00 - gmeasure: 0.0000e+00 - auc: 0.5027 - val_loss: 0.6129 - val_binary_accuracy: 0.7375 - val_sensitivity: 1.0000 - val_specificity: 0.0000e+00 - val_gmeasure: 0.0000e+00 - val_auc: 0.5174
Epoch 8/10
640/640 [==============================] - 0s 60us/step - loss: 0.6203 - binary_accuracy: 0.7156 - sensitivity: 1.0000 - specificity: 0.0000e+00 - gmeasure: 0.0000e+00 - auc: 0.4762 - val_loss: 0.6051 - val_binary_accuracy: 0.7375 - val_sensitivity: 1.0000 - val_specificity: 0.0000e+00 - val_gmeasure: 0.0000e+00 - val_auc: 0.5225
Epoch 9/10
640/640 [==============================] - 0s 59us/step - loss: 0.6145 - binary_accuracy: 0.7156 - sensitivity: 1.0000 - specificity: 0.0000e+00 - gmeasure: 0.0000e+00 - auc: 0.4875 - val_loss: 0.5979 - val_binary_accuracy: 0.7375 - val_sensitivity: 1.0000 - val_specificity: 0.0000e+00 - val_gmeasure: 0.0000e+00 - val_auc: 0.5271
Epoch 10/10
640/640 [==============================] - 0s 77us/step - loss: 0.6093 - binary_accuracy: 0.7156 - sensitivity: 1.0000 - specificity: 0.0000e+00 - gmeasure: 0.0000e+00 - auc: 0.5059 - val_loss: 0.5916 - val_binary_accuracy: 0.7375 - val_sensitivity: 1.0000 - val_specificity: 0.0000e+00 - val_gmeasure: 0.0000e+00 - val_auc: 0.5252
[root    |INFO|build_network.py:87] Load trained model weight at ./weight_1.h5
[root    |INFO|build_network.py:147] Training end with time 2.9402072429656982!
[root    |INFO|build_network.py:83] Saved trained model weight at ./weight_1.h5
[root    |DEBUG|deepbiome.py:185] Save weight at ./weight_1.h5
[root    |DEBUG|deepbiome.py:188] Save history at ./hist_1.json
[root    |INFO|build_network.py:173] Evaluation start!
800/800 [==============================] - 0s 8us/step
[root    |INFO|build_network.py:178] Evaluation end with time 0.013414382934570312!
[root    |INFO|build_network.py:179] Evaluation: [0.6027529239654541, 0.7200000286102295, 1.0, 0.0, 0.0, 0.4951443076133728]
[root    |INFO|build_network.py:173] Evaluation start!
200/200 [==============================] - 0s 28us/step
[root    |INFO|build_network.py:178] Evaluation end with time 0.011273860931396484!
[root    |INFO|build_network.py:179] Evaluation: [0.6027520895004272, 0.7200000286102295, 1.0, 0.0, 0.0, 0.4978918731212616]
[root    |INFO|deepbiome.py:199] Compute time : 3.5541372299194336
[root    |INFO|deepbiome.py:200] 2 fold computing end!---------------------------------------------
[root    |INFO|deepbiome.py:211] -----------------------------------------------------------------
[root    |INFO|deepbiome.py:212] Train Evaluation : ['loss' 'binary_accuracy' 'sensitivity' 'specificity' 'gmeasure' 'auc']
[root    |INFO|deepbiome.py:213]       mean : [0.612 0.703 1.000 0.000 0.000 0.519]
[root    |INFO|deepbiome.py:214]        std : [0.010 0.017 0.000 0.000 0.000 0.024]
[root    |INFO|deepbiome.py:215] -----------------------------------------------------------------
[root    |INFO|deepbiome.py:216] Test Evaluation : ['loss' 'binary_accuracy' 'sensitivity' 'specificity' 'gmeasure' 'auc']
[root    |INFO|deepbiome.py:217]       mean : [0.611 0.705 1.000 0.000 0.000 0.555]
[root    |INFO|deepbiome.py:218]        std : [0.008 0.015 0.000 0.000 0.000 0.057]
[root    |INFO|deepbiome.py:219] -----------------------------------------------------------------
[root    |INFO|deepbiome.py:230] Total Computing Ended
[root    |INFO|deepbiome.py:231] -----------------------------------------------------------------