Model, Binary Classification

Binary Classification: Logistic regression practice

I got a dataset for credit risk analysis. The dependent variable is var0 and the following variables are all independent variables.

Screen Shot 2018-01-25 at 10.29.25 AM.png

I used two way to build the model, python library Scikit-learn and spark MLlib. The procedures are followed.

Scikit-learn way:

Spark MLlib way: The Ipython notebook on Databricks 

data analysis

R packages in Tableau calculation function

The reason I need geocoding at last blog is that I want to map the sale data in Tableau. Before I tried to use python script and google geocoding API key, I adopted the geocode() function from the ggmap library in R and put them in tableau calculation function. While, It doesn't work.

Luckily, the other R  package works! I use 'mvoutliers' package to analyze the outlier in the Tableau graph. R gives Tableau more analysis tools. It is amazing to use them together !

  

IF SCRIPT_REAL("

library(mvoutlier);
sign2(cbind(.arg1))$wfinal01

",

SUM([Sale Price])
Screen Shot 2018-01-17 at 4.32.26 PM.png

data cleaning

Geocoding

I am working on a property sale dataset. Geographic coordinates are needed to make maps on Tableau. To convert addresses in original dataset into geographic coordinates, I use google maps geocoding key and python script. Free limits are 2,500 per day. Also, we can also adopt google sheet with geocode add ons to realize geocoding on small data size( <1,000 observations). Geocoding is pretty cool! 

My python code reference is here:https://www.shanelynn.ie/batch-geocoding-in-python-with-google-geocoding-api/