Gini Impurity Measure
When a decision tree is defined with a target variable and the Best Split algorithm is applied, the algorithm aims to partition the data so that the resulting group of records at the new node minimizes impurity. A node with high impurity has a high population of several different values of the target variable because the parent split has not segmented the data effectively.
When you minimize impurity, you want the observations in each node to have the same value of the target variable. The homogeneity or purity of a partition increases with the proportion of observations that share the same target value. The Best Split algorithm in Xpress Insight uses the measure of Gini impurity, which calculates the heterogeneity or impurity of the node. When the Gini impurity value is 0.0 (minimum value), the partition is homogeneous or pure. When the Gini impurity value is at its maximum value, the node is heterogeneous or impure. The maximum Gini impurity value varies for binary and multinomial target variables.