Visualize Strengths And Weaknesses Of A Sample From Pre-trained Model
Solution 1:
Happy news for you, there is.
A package called "SHAP" (SHapley Additive exPlanation) was recently released just for that purpose. Here's a link to the github.
It supports visualization of complicated models (which are hard to intuitively explain) like boosted trees (and XGBOOST in particular!)
It can show you "real" feature importance which is better than the "gain"
, "weight"
, and "cover"
xgboost supplies as they are not consistent.
You can read all about why SHAP is better for feature evaluation here.
It will be hard to give you code that will work for you, but there is a good documentation and you should write one that suits you.
Here's the guide lines of building your first graph:
import shap
import xgboost as xgb
# Assume X_train and y_train are both features and labels of data samples
dtrain = xgb.DMatrix(X_train, label=y_train, feature_names=feature_names, weight=weights_trn)
# Train your xgboost model
bst = xgb.train(params0, dtrain, num_boost_round=2500, evals=watchlist, early_stopping_rounds=200)
# "explainer" object of shap
explainer = shap.TreeExplainer(bst)
# "Values you explain, I took them from my training set but you can "explain" here what ever you want
shap_values = explainer.shap_values(X_test)
shap.summary_plot(shap_values, X_test)
shap.summary_plot(shap_values, X_test, plot_type="bar")
To plot the "Why a certain sample got its score" you can either use built in SHAP function for it (only works on a Jupyter Notebook). Perfect example here
I personally wrote a function that will plot it using matplotlib
, which will take some effort.
Here is an example of a plot I've made using the shap values (features are confidential so all erased)
You can see a 97% prediction to be label=1
and each feature and how much it added or negate from the log-loss, for that specific sample.
Post a Comment for "Visualize Strengths And Weaknesses Of A Sample From Pre-trained Model"