In our second event for the year, held on 11.02.2014, the presenter was Vladimir Labov, who is coincidentally also the writer of these pieces. Writing about yourself in third person is something reserved for megalomaniacs, so I will humbly switch to a first person narrative from this point on. My path to the lecture room of the Technical University was a bumpy one, a path of twists and turns. After obtaining my double degree in Finance and Accounting from the University of Economics Varna, I was set firmly on the path to become a typical financial professional. But during my studies for a MSc in Financial Economics at Erasmus University Rotterdam, I was captivated by the world of Risk Management. Out of nowhere came the job offer for a Quantitative Analyst at First Investment Bank, a stint that taught me a lot about risk modeling among other things. The nostalgia for Western Europe still had a firm grip on me, so I first decided to do a PhD in Finance. Due to the lack of time and will to come up with a proposal for a top UK university, I decided to give an inexpensive Master’s program in Computing Science at Imperial College London a go, but declined the offer after being invited for a job in a debt collection company, Cabot Financial. The job turned out a mundane one, and six months, a second of distraction, one car hitting me and one broken leg after I started it, I went back to the roots in FiBank, covering pretty much all areas of financial risk management, for which my FRM qualification stands as a testament.
But enough with yours truly, let’s get back to business. In the Wednesday presentation, I revealed solutions you can’t find in any popular textbook that I have applied for problems faced while developing credit risk models. Credit risk models predict whether a borrower will pay their loan back, and people like me in effect decide if you will be approved for a credit card or a mortgage loan. Statistical classification algorithms work best for such problems with a binary outcome. In particular, the logistic regression is preferred in practice due to its quantification of a probability of default between 0 and 1.
As in any data analysis, the inevitable data-related problems show their ugly face. The most prevalent of those problems is missing values. Instead of trimming the observations, I have found that you can assign them to the group with the closest default rate, or to the most logical group. If all else fails, you can still keep the observations by transforming the raw values to weights of evidence. This approach is a solution for several other common problems.
Outliers have long been plaguing the work of statisticians. In credit risk environment, you can circumvent trimming these observations by applying weights of evidence that distribute all extreme values in the marginal groups where their weight is the same as all the other not-so-extreme values in the groups.
Working with weights of evidence transformation has other side benefits as well. First, it deals elegantly with categorical variables. Otherwise, you are left wondering what to do with 3 significant dummy variables and 2 insignificant ones, created from a variable with 5 categories. Second, weights of evidence take care of wrong regression signs for you – if everything is in line with economical logic, then all coefficient signs should be negative for WoE-transformed variables.
If you are thinking all this is too good to be true, indeed using the WoE approach comes at a cost – you need to split the numerical variables into bins. My personal solution is to split every numerical variable into 10 deciles, see if the default rates in each bin change monotonically and/or in a logical way, combine bins that don’t differ much in terms of default rates and play with the cut-offs of bins that spoil the perfect picture of monotonically changing default rates.
Apart from the weight-of-evidence all-around solution, I also revealed how to tackle the persistent problem of multicollinearity. Instead of dropping the weaker variables, I combine the correlated variables into a new one. A solution similar in spirit is applied for taking into account both the income and indebtedness of a loan applicant – you end up working with the disposable income by subtracting the debt payments.
Next in my lineup for the evening were problems typical for Bulgarian scorecard data. You can request a declaration from the employer for the applicants with a unverifiable salary higher than the officially declared one. And a clever way to detect the current residence of applicants is to take the branch where they submitted the application.
Finally, I shared a secret discovery with my audience. Take a look at the presentation to find out what it is!
Written by: Vladimir Labov