Photo by Luca Micheli on Unsplash

In this project, I develop a predictive model for property sale prices in King County, Washington.

View this project on GitHub.


Overview

In this project, I develop a predictive model for property values in King County, Washington. I use statistical testing & analysis, and ultimately develop a linear regression model that is tested on a holdout dataset. The dataset used includes multiple features for 17,290 property sales in King County. The holdout dataset contains several thousand different property sales also in King County, and had all features found in the training dataset, with exception to the target variable—property sale price. With the final model, I was able to export a CSV file containing sale price predictions that correspond with the properties in the holdout dataset. Model performance was evaluated on Root Mean Square Error.

Methodology

My methodology implements the CRISP-DM model for exploratory data analysis, cleaning, modeling, and evaluation. I use descriptive and inferential statistics to evaluate a test dataset, including hypothesis testing with t-tests and analysis of variance. From my inferences, I developed predictive models with use of polynomial features, categorical dummy variables, as well as several features engineered from those that were available in the dataset. Finally, I deployed and evaluated linear regression models using a filter method F-test to find K best, and wrapper method recursive feature elimination cross-validation. Models were evaluated primarily on the Root Mean Square Error of training and testing data in a train-test-split.

Tools used include Python, NumPy, Pandas, SciPy, StatsModels, and SciKit Learn. Visualizations were created with MatPlotLib and Seaborn.

Previous
Previous

Contraceptive Use of Women in Indonesia — Inferential Analysis