"EBM is based on Generalized Additive Models with Pairwise Interactions (GA2M) by Lou et al. ([Accurate Intelligible Models with Pairwise Interactions](https://www.cs.cornell.edu/~yinlou/papers/lou-kdd13.pdf))\n",
"EBM is based on Generalized Additive Models with Pairwise Interactions (GA2M) by Lou et al. ([Accurate Intelligible Models with Pairwise Interactions](https://www.cs.cornell.edu/~yinlou/papers/lou-kdd13.pdf))\n",
"\n",
"\n",
"\n",
"\n",
"We will use the [adult](https://archive.ics.uci.edu/ml/datasets/adult) that focuses on a (binary) classification task whether or not a person makes more than 50k USD per year. The data are taken from a 1994 census and were first discussed in the paper [Ron Kohavi, \"Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid\", Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 1996](https://www.academia.edu/download/40088603/Scaling_Up_the_Accuracy_of_Naive-Bayes_C20151116-5477-1fw84ob.pdf)\n",
"We will use the [adult dataset](https://archive.ics.uci.edu/ml/datasets/adult) that focuses on a (binary) classification task whether or not a person makes more than 50k USD per year. The data are taken from a 1994 census and were first discussed in the paper [Ron Kohavi, \"Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid\", Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 1996](https://www.academia.edu/download/40088603/Scaling_Up_the_Accuracy_of_Naive-Bayes_C20151116-5477-1fw84ob.pdf)\n",
"\n",
"\n",
"The data have a number of categorial and numerical features.\n",
"The data have a number of categorial and numerical features.\n",
"We can access the data directly from the archive (or use a local copy).\n",
"We can access the data directly from the archive (or use a local copy).\n",
...
@@ -348,7 +348,7 @@
...
@@ -348,7 +348,7 @@
"\n",
"\n",
"As this is a \"white box\" model, we can look at the details of individual predictions.\n",
"As this is a \"white box\" model, we can look at the details of individual predictions.\n",
"\n",
"\n",
"In the interactive widget, we can select individual cases and then analyse how the features contribute to the classification result\n"
"In the interactive widget, we can select individual cases and then analyse how the features contribute to the classification result.\n"
]
]
},
},
{
{
...
...
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
# Explainable AI: Explainable Boosting Machine
# Explainable AI: Explainable Boosting Machine
In this example we use the white-box model "EBM - Explainable Boosting Machine" from the package [InterpretML](https://github.com/interpretml/interpret) by Microsoft. The package supports a range of explainable AI tools, from white-box models to explainers for black-box models such as Shapley values, LIME, partial dependency plots and others.
In this example we use the white-box model "EBM - Explainable Boosting Machine" from the package [InterpretML](https://github.com/interpretml/interpret) by Microsoft. The package supports a range of explainable AI tools, from white-box models to explainers for black-box models such as Shapley values, LIME, partial dependency plots and others.
EBM is based on Generalized Additive Models with Pairwise Interactions (GA2M) by Lou et al. ([Accurate Intelligible Models with Pairwise Interactions](https://www.cs.cornell.edu/~yinlou/papers/lou-kdd13.pdf))
EBM is based on Generalized Additive Models with Pairwise Interactions (GA2M) by Lou et al. ([Accurate Intelligible Models with Pairwise Interactions](https://www.cs.cornell.edu/~yinlou/papers/lou-kdd13.pdf))
We will use the [adult](https://archive.ics.uci.edu/ml/datasets/adult) that focuses on a (binary) classification task whether or not a person makes more than 50k USD per year. The data are taken from a 1994 census and were first discussed in the paper [Ron Kohavi, "Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid", Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 1996](https://www.academia.edu/download/40088603/Scaling_Up_the_Accuracy_of_Naive-Bayes_C20151116-5477-1fw84ob.pdf)
We will use the [adult dataset](https://archive.ics.uci.edu/ml/datasets/adult) that focuses on a (binary) classification task whether or not a person makes more than 50k USD per year. The data are taken from a 1994 census and were first discussed in the paper [Ron Kohavi, "Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid", Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 1996](https://www.academia.edu/download/40088603/Scaling_Up_the_Accuracy_of_Naive-Bayes_C20151116-5477-1fw84ob.pdf)
The data have a number of categorial and numerical features.
The data have a number of categorial and numerical features.
We can access the data directly from the archive (or use a local copy).
We can access the data directly from the archive (or use a local copy).
Since we already discussed this example when we looked at classifications, we won't do any exploratory data analysis here.
Since we already discussed this example when we looked at classifications, we won't do any exploratory data analysis here.
Now we create an instance of the model and train it.
Now we create an instance of the model and train it.
Note that the explainable boosting classifier can directly work on the categorial variables as text, i.e. we do not need to transform them to a numerical representation.
Note that the explainable boosting classifier can directly work on the categorial variables as text, i.e. we do not need to transform them to a numerical representation.