Northwind Hypothesis Tests
Microsoft's Northwind Dataset
For this project, I worked with the Northwind database--a free, open-source dataset created by Microsoft containing data from a fictional company. The goal of this project was to gather information from a real-world database and use my knowledge of statistical analysis and hypothesis testing to generate analytical insights that can be of value to the company. I tested four separate hypotheses and analyzed the results.
Data Exploration and Scrubbing
Our first step is to explore and understand the dataset we are working with. We need to make sure we know what all of the columns represent and how the computer is interpreting them. This step allows us to force the computer to understand the data the way we want it to.
Deal with Missing Data
We need to deal with rows that are missing data. The dataset is incomplete, which causes problems when running statistical analysis, so we have to determine how best to deal with missing data on a case by case basis.
We make sure that all of the predictors we are using describe price and not each other. If two of our predictors are highly correlated, it’s hard to determine which one is affecting the price.
Check Model Assumptions
We want to be sure that our data fulfills all the assumptions that are necessary to create a model. If our data does not satisfy all the assumptions, we need to transform our data appropriately so that we can build a statistically significant model.
This is your Feature section introductory paragraph. Use this space to present specific credentials, benefits or special features you offer. This is a chance to highlight the unique and valuable aspects that differentiate you from the competition.
Test Our Model
We want to ensure that our model is actually predicting results, so we run several tests to protect us from creating a model that works on the data we currently have.
Remove Inconsequential Predictors
We figure out which predictors don’t actually influence our model and remove them from the equation to keep everything as simple as possible.
Our final Model explains 98.9% of the variations in our dataset.
This is your Feature section closing paragraph. It’s a great place to remind visitors of the unique features you provide, and encourage them to get in touch to learn more.