Validating Credit Scoring Models: Comparing Alternative Methods |



Validating Credit Scoring Models: Comparing Alternative Methods

BLOG - Validating Credit Scoring Models
In lending, there are a number of reasons to validate the ability of a forecasting model to differentiate between creditworthy (Good) and non-creditworthy (Bad) customers. Important reasons for validating credit scoring models include:
  • Compliance with regulatory requirements
  • Assurance that the model is performing satisfactorily at the time the model is developed as well as over time
The most common statistical measures used for validation of scoring models are the Kolmogorov-Smirnov (K-S) statistic and Divergence.
The K-S statistic is a value ranging from 0 to 100. A value of 0 indicates that there is no difference between two distributions, while a value of 100 indicates that there is no overlap (perfect separation) between the two distributions. A value of 0 for Divergence indicates no difference between the means of two distributions with positive values indicating that there is a difference between the means of two distributions.
Differences between the two tests include:
[ DOWNLOAD THE WHITE PAPER to read these differences ]
In comparing K-S with Divergence, it first should be noted that the two statistics for validating credit scoring models are not designed to measure the same differences between two populations. Divergence measures the differences between the location of two distributions (that is, the difference between the means of the distributions). K-S, on the other hand, measures differences in the location, scale and dispersion of two population distributions.
In addition to what it measures, the usefulness of a statistical measure depends upon the conditions that must be satisfied for the measure to be theoretically valid.
From a business or decision- making point of view, it is important to know to what extent a given scoring model really separates Good customers from Bad customers. This knowledge is particularly important when comparing the effectiveness of one model against another, when determining how many models are required to predict the credit risk for a given portfolio, or when monitoring the change in predictive strength over time.
From a legal point of view, there may be a Regulation B concern. Regulation B states in part, that: “…To qualify as an empirically derived, demonstrably and statistically sound credit scoring system, the system must be – (iii) developed and validated using accepted statistical principles and methodology, and (iv) periodically revalidated by the use of appropriate statistical principles and methodology and adjusted as necessary to maintain predictive ability.”
The K-S Statistic is an accepted statistical method for validating credit scoring models as long as proper sampling techniques are employed. Divergence may not be an accepted method unless other necessary conditions are met.
Testing for Significance
Within the effects test concept, the courts have defined statistical significance to mean that there is no more than a 1-in-20 chance that a result could have occurred due to random chance. Thus, when we say that a scoring model separates Good and Bad accounts at a statistically significant level, we must be able to demonstrate that the observed separation had no more than a 1-in-20 chance of occurring randomly.
The properties of the K-S statistic for validating credit scoring models are well known and have been thoroughly explored by statisticians in a large body of readily available literature. It is possible to establish a minimum K-S value that must be obtained in order to claim statistical significance for a given sample. A survey of available literature by the authors has not revealed a corresponding method for testing Divergence for statistical significance.
Use as a Decision Tool
Putting legal issues aside, and returning to the more practical, it is important for a decision maker to be able to measure the useful predictive strength of a scoring model. Creditors usually employ a scoring model by selecting a specific cut-off score and approving applicants who score above (or below) that cutoff. But the cut- off may be quite different from the score where the maximum separation between the two distributions occurs.