Demonstration Of What-If Tool For Machine Learning Model Investigation – Analytics India Magazine

Machine learning era has reached the stage of interpretability where developing models and making predictions is simply not enough any more. To make a powerful impact and get good results on the data it is important to investigate and probe the dataset and the models. A good model investigation involves digging deep into the understanding of the model to find insights and inconsistencies in the developed model. This task usually involves writing a lot of custom functions. But, with tools like What-If, it makes the probing task very easy and saves time and efforts for programmers.

In this article we will learn about:

What-If tool is a visualization tool that is designed to interactively probe the machine learning models. WIT allows users to understand machine learning models like classification, regression and deep neural networks by providing methods to evaluate, analyse and compare the model. It is user friendly and can be used not only by developers but also by researchers and non-programmers very easily.

WIT was developed by Google under the People+AI research (PAIR) program. It is open-source and brings together researchers across Google to study and redesign the ways people interact with AI systems.

This tool provides multiple features and advantages for users to investigate the model.

Some of the features of using this are:

WIT can be used with a Google Colab notebook or Jupyter notebook. It can also be used with Tensorflow Board.

Let us take a sample dataset to understand the different features of WIT. I will choose the forest fire dataset available for download on Kaggle. You can click here for downloading the dataset. The goal here is to predict the areas affected by forest fires given the temperature, month, amount of rain etc.

I will implement this tool on google collaboratory. Before we load the dataset and perform the processing, we will first install the WIT. To install this tool use,

!pip install witwidget

Once we have split the data, we can convert the columns month and day to categorical values using label encoder.

Now we can build our model. I will use sklearn ensemble model and implement the gradient boosting regression model.

Now that we have the model trained, we will write a function to predict the data since we need to use this for the widget.

Next, we will write the code to call the widget.

This opens an interactive widget with two panels.

To the left, there is a panel for selecting multiple techniques to perform on the data and to the right is the data points.

As you can see on the right panel we have options to select features in the dataset along X-axis and Y-axis. I will set these values and check the graphs.

Here I have set FFMC along the X-axis and area as the target. Keep in mind that these points are displayed after the regression is performed.

Let us now explore each of the options provided to us.

You can select a random data point and highlight the point selected. You can also change the value of the datapoint and observe how the predictions change dynamically and immediately.

As you can see, changing the values changes the predicted outcomes. You can change multiple values and experiment with the model behaviour.

Another way to understand the behaviour of a model is to use counterfactuals. Counterfactuals are slight changes made that can cause a model to flip its decision.

By clicking on the slide button shown below we can identify the counterfactual which gets highlighted in green.

This plot shows the effects that the features have on the trained machine learning model.

As shown below, we can see the inference of all the features with the target value.

This tab allows us to look at the overall model performance. You can evaluate the model performance with respect to one feature or more than the one feature. There are multiple options available for analysis of the performance.

I have selected two features FFMC and temp against the area to understand performance using mean error.

If multiple training models are used their performance can be evaluated here.

The features tab is used to get the statistics of each feature in the dataset. It displays the data in the form of histograms or quantile charts.

The tab also enables us to look into the distribution of values for each feature in the dataset.

It also highlights the features that are most non-uniform in comparison to the other features in the dataset.

Identifying non-uniformity is a good way to reduce bias in the model.

WIT is a very useful tool for analysis of model performance. Ability to inspect models in a simple no-code environment will be of great help especially in the business perspective.

It also gives insights to factors beyond training the model like understanding why and how that model was created and how the dataset is fitting in the model.

comments

More:
Demonstration Of What-If Tool For Machine Learning Model Investigation - Analytics India Magazine

Related Posts

Comments are closed.