MLlib Pipelines API

Model, Binary Classification

Binary Classification: Logistic regression practice

I got a dataset for credit risk analysis. The dependent variable is var0 and the following variables are all independent variables.

Screen Shot 2018-01-25 at 10.29.25 AM.png

I used two way to build the model, python library Scikit-learn and spark MLlib. The procedures are followed.

Scikit-learn way:

Spark MLlib way: The Ipython notebook on Databricks