Graphic Designer Desk
Working from Home
Woman with Headphones
Clicking on a Tablet
Girl with Tablet
In the Woods

Predicting House Prices

King County House Sales Dataset

Objective: To clean, explore, and model this dataset with a multivariate linear regression to predict the sale price of houses as accurately as possible.


Data Exploration and Scrubbing

Explore Dataset

Our first step is to explore and understand the dataset we are working with. We need to make sure we know what all of the columns represent and how the computer is interpreting them. This step allows us to force the computer to understand the data the way we want it to.

Deal with Missing Data

We need to deal with rows that are missing data. The dataset is incomplete, which causes problems when running statistical analysis, so we have to determine how best to deal with missing data on a case by case basis.

Check Multicollinearity

We make sure that all of the predictors we are using describe price and not each other. If two of our predictors are highly correlated, it’s hard to determine which one is affecting the price.

Check Model Assumptions

We want to be sure that our data fulfills all the assumptions that are necessary to create a model. If our data does not satisfy all the assumptions, we need to transform our data appropriately so that we can build a statistically significant model.

In the Woods

Test Our Model

We want to ensure that our model is actually predicting results, so we run several tests to protect us from creating a model that works on the data we currently have.

Playing on Tablet

Remove Inconsequential Predictors

We figure out which predictors don’t actually influence our model and remove them from the equation to keep everything as simple as possible.

Augmented Reality Glasses

Final Results

Our final Model explains 98.9% of the variations in our dataset.