Post main image

Tri-Variable Visualizations - Ames Iowa

Demonstration of what we can see using hue as a 3rd variable

While doing my exploratory data analysis of the Ames Iowa Housing dataset, I created some visualizations that gave me insights and understandings that I did not expect before doing the EDA. As a beginning student data scientist at the time, I came to realize how valuable EDA is not just to understand what the data is telling you, but also to change the beliefs you have based on your assumptions. Even when the data visualization aligns with your assumptions, when you can see how the data is distributed you can gain an even deeper appreciation and understanding of why you believe what you believe. Attached are just a few plots from my EDA that made me go huh? What? Oh that's interesting. Plot #1: Here is a plot demonstrating the Overall Condition of a home as a function of when it was built (x-axis) and it's Sale price (y-axis). Notice that the darker dots, which represent better overall condition, are distributed mostly in homes built before 1980. I would have thought that the overall condition of the newer homes would be better. Plot #2: This plot is similar to the last one except that Overall Condition is replaced by Overall Quality. Notice that homes with the highest overall quality (the darker dots) were built mostly after 1980. This makes sense as we would expect that the newer homes would be built with better quality materials and designs. My question is, why are the older homes in better condition? Perhaps the age old saying of "they sure don't make em like they used to" answers the question. Ie. the external parts of the home look better, but perhaps in building homes that look more attractive, the builders, in order to cut cost, used cheaper materials. Plot #3 Even though this plot demonstrates what we would expect, I find it fascinating to see the distributions in these swarm plots. We can see that 1 and 2 story homes make up the majority of the dataset and we can also see the strata of levels in sale price based on just a single variable, Garage Size.

Project screenshot 1
Project screenshot 2
Project screenshot 3
Project screenshot 4

