My aim was to create a website that can answer these questions about Northeastern's gyms:
- At what times is the gym usually crowded?
- How many people are currently in the gym?
- Is the gym going to get more crowded in the next hour or two?
- On average, how crowded is the gym at 5 PM on Sunday?
- Is 5 PM usually a good time to go to the gym?
- Is 5 PM (on Monday) usually a good time to go to the gym?
The project will be divided into two parts: a web scraper, and a website. There will also be a database containing records of the number of people in a gym at a specified times.
The database has three tables:
Record. Each gym at Northeastern has one or more sections. Each section has multiple records. A record has the following data:
- Number of people in this section at this time
I used PlanetScale for the database. It provides free databases with generous usage limits. The only issue was that it does not support foreign keys, but this was an easy fix.
This part was fairly simple. No issues in scraping the website displaying the live counts.
I used Python and BeautifulSoup for this part. A GitHub action runs every 30 minutes. Upon scraping the data, records are added to the database.
I will use nextjs, and a graph/visualization library to display the data. While D3 exists, there are many simpler libraries like Nivo, Chartjs, Recharts, etc that should make it easier to display complex charts.