I got a dataset for credit risk analysis. The dependent variable is var0 and the following variables are all independent variables.
I used two way to build the model, python library Scikit-learn and spark MLlib. The procedures are followed.
Scikit-learn way:
Spark MLlib way: The Ipython notebook on Databricks