class: center, middle # Lab: implementing Decision trees *abulhasan@dcs.bbk.ac.uk* DS[TA] --- ## Decision-trees: keywords Discretization, iterated binary segmentation, *misclassification* ...
--- ## Decision-trees, cont'd Purity, Entropy, Information Gain, root-to-leaf ...
--- # A
UCI dataset
on Bank note authentication
Four numerical values from analysis of Wavelet transformation: * variance, skewness, curtosis and * entropy of image. One (integer) classification value: class --- # Lab 2-c: implementing Decision trees The Banknotes dataset, baseline code and model solutions are all available from the class [Repl.it repository](https://repl.it/team/DSTA) Alternatively, code will also be available from the [class repository](https://www.dcs.bbk.ac.uk/~ale/dsta/) --- # Objective a: write your own Gini function 1. inspect a Decision-tree baseline code on Repl.it 2. Lay out a function that segments the data according to the best Gini values available: ```python def get_split(dataset): b_index, b_value, b_score, b_groups = 999, 999, 999, None # TODO: Find the best possible place to split the dataset # # TODO: assign datapoints to 'left' and 'right' segments # using the test_split(index, value, dataset) # function. # TODO: define a gini_index(groups, classes) # func. to construct a branch of the tree return {'index':b_index, 'value':b_value, 'groups':b_groups} ``` Remember: Gini=0 is the best scenario --- # Objective a, cont'd Compute Gini index for a split dataset ```python def gini_index(groups, classes): total_gini = 0.0 # TODO : For each group calculate the gini index. return total_gini ``` A model solution for this exercise is available on Repl.it, please see it after you have elaborated your solution. If you are uncertain on how to write this type of function you can go directly to the model solution. --- # Objective b: write your own DT generator! Can you write Python functions that iteratively segment the data until you have a decision tree?