python

Model, Binary Classification

Binary Classification: Logistic regression practice

I got a dataset for credit risk analysis. The dependent variable is var0 and the following variables are all independent variables.

Screen Shot 2018-01-25 at 10.29.25 AM.png

I used two way to build the model, python library Scikit-learn and spark MLlib. The procedures are followed.

Scikit-learn way:

Spark MLlib way: The Ipython notebook on Databricks 

data cleaning

Geocoding

I am working on a property sale dataset. Geographic coordinates are needed to make maps on Tableau. To convert addresses in original dataset into geographic coordinates, I use google maps geocoding key and python script. Free limits are 2,500 per day. Also, we can also adopt google sheet with geocode add ons to realize geocoding on small data size( <1,000 observations). Geocoding is pretty cool! 

My python code reference is here:https://www.shanelynn.ie/batch-geocoding-in-python-with-google-geocoding-api/