Half 3: As a landlord, I wish to know what the estimated every day worth my property can obtain.
For this half, I wished to make use of a machine studying mannequin from the library scikit-learn to estimate a list’s worth relying on sure explanatory variables. At first I used a linear mannequin which produced a horrible imply R-squared values (near 0%).
Wanting on the response variable “worth”, I discovered some giant outliers. The common worth for listings in London was £197 per evening however our largest worth was £80,000 per evening — a transparent outlier.
Moreover, the distribution of the costs was closely skewed to the smaller worth vary so a non-linear relationship must be used to enhance the mannequin accuracy. I checked out two fashions: Random Forest and GradientBoost.
The cross-validation outcomes for each fashions had been round 72% — which means 72% of the variation in information might be defined by my mannequin utilizing the logarithmic values of worth. This was excellent news!
#Random Forest Characteristic Significance
characteristic significance
room_type_Private room 0.431843
loos 0.124451
length_of_description 0.068693
reviews_per_month 0.068419
accommodates 0.06739
#Gradient Increase Characteristic Significance
characteristic Significance
room_type_Private room 0.498895
accommodates 0.180230
loos 0.087165
bedrooms 0.083031
neighbourhood_Westminster 0.053714
Lastly, I wished to focus on the characteristic significance that might be helpful when taking a look at potential properties to listing on Airbnb. Each fashions put heavy weight on a non-public room with loos and the way many individuals the keep accommodates being widespread in each fashions.
Nevertheless, Random Forest seems to be on the size of description to be third necessary — I engineered this column to take into consideration the outline size as I believed the longer and detailed an outline is will show extra worth to the potential buyer and validates increased costs for increased high quality.