How have Airbnb prices changed due to COVID-19?

Airbnb Data Analysis in San Francisco

Victory Sharaf
5 min readOct 12, 2020
Photo by Timothy Buck on Unsplash

Introduction

San Francisco is a great city for tourism. There are beautiful historic buildings in the Downtown, wonderful walks in the Marina and Pacific Heights, huge Golden Gates Park, and tons of entertainment. For many tourists, Airbnb is the best travel companion to book accommodation at a piece rate.

This was the case before COVID.

What about now? How have prices and business for Airbnb changed since the beginning of COVID and up to October? Whether it is worth to go to San Francisco or start an Airbnb business here?

Data Understanding

All the data we are interested in can be found on the official Airbnb website. For my research, I chose the latest listing dataset for early October and I compiled the calendar from the calendars for the last year (from September to September).

There are some questions I want to answer in this analysis below:

  1. What correlates best with the price?

2. How has price and busyness changed over the course of COVID-19?

4. Can we predict the price based on its features?

Question 1: What correlates best with the price?

To begin with, consider the correlation between prices and amenities.

Сorrelation between prices and amenities

As you can see, air conditioning, gym, and building staff are highly correlated with price. There is also a negative correlation between price and laptop-friendly workspace, kitchen, or lock on the bedroom door. This means that there is a correlation, although not in all cases are strong.

What about Housing Characteristics, such as the number of bedrooms, beds, etc?

Correlations of Housing Characteristics and price

There is an obvious correlation. The more people you can accommodate, the more expensive it is to rent a room. Same about bedrooms and beds, but the number of bathrooms does not have a strong impact.

How Review Scores connected with the price?

Correlations of Review Scores and price

In this case, the correlation is not as strong as in the previous ones. The check-in rating has nothing to do with price.

The last thing we will look at is the neighborhoods. As you probably know, the location has a big impact on the value of real estate.

San Francisco

As you can see from the map, the high price is more related to the location. The most expensive areas are Golden Gate Park and Financial District. If you look at my previous research, you understand that Golden Gate Park is quite safe, unlike the Financial District which pretty criminal.

Summarizing the above, we realized that there is some correlation with the price, which we can use in the future to predict the price by features.

Let’s answer the second question.

Question 2: How has price and busyness changed over the course of COVID-19?

Let’s compare the two metrics: Average Price and Average Availability of each month.

During the COVID period, the average price per night rose by $33. At the same time, the number of tourists has noticeably decreased. September last year was quite popular and then the decline began. By May, half of the housing was vacant.

As expected, the coronavirus has affected business in a bad way. Prices have gone up and there are fewer customers. The indicators have not yet returned to their previous values.

Question 3: Can we predict the price based on its features?

To answer the last question, we have to prepare the data for modeling. Only necessary features for predictive models are kept. The data preparation and list of features can be found on my Github repo.

Result of Modeling

To predict the price, I used several models and compared them with each other.

To assess the quality of the models, I used the following metrics:

  1. Mean Squared Error is a risk function, corresponding to the expected value of the squared error loss. In other words, the closer this estimate is to zero, the better.
  2. R-squared explains to what extent the variance of one variable explains the variance of the second variable. In general, the closer the R-squared to one, the better the model fits our data.
metrics table

The AdaBoost regressor showed a bad R-squared score. The predictions of this model are not similar to real values. Gradient Boosting and Extreme Gradient Boosting showed similar results, but Gradient Boosting is slightly better. Finally, I trained a neural network that performs worse than Gradient Boosting and shows overfitting.

Conclusion

In this article, we reviewed data from Airbnb. We’ve found that some amenities are more price related than others. But the estimates have almost no effect on the price. Neighborhoods have the largest correlation with price.
Airbnb’s business has suffered from COVID, 2/5 of all premises are empty, and the price has risen by $33. Also, we were able to predict prices.

--

--