Experiment 20
This post describes a command line tool I created that scrapes a website and generates reports.
Introduction
It is greatly rewarding to build a tool that solves a real problem. The scale of the problem dictates the number of potential users your tool can help. This project solves a small problem - improving the registration process for my son’s summer camps. The problem is mostly created by a poor website design that doesn’t allow parents to easily find the correct camps.
I built this tool to scrape the camp data on Fairfax County Park Authority (FCPA) website and generate useful camp options. I also added new functionality like calculating the estimated commute times to each camp.
Goals
The goals of this project are to:
- Improve user experience registering for camps hosted by FCPA
- Demonstrate the use of developing a command line tool
- Demonstrate the 3rd party API (Google MAP API)
- Demonstrate report creation
FCPA site limitations
Why is this tool needed? Because FCPA offers hundreds of potential camps to register for and their filter options are rather limited. Here is a screenshot of the FCPA camp search site:
At first glance the fields look reasonable. We should be able to narrow down the list of available camps and register for one. Let’s see what happens if I try to find a camp for my 6 year old son for the two days after New Years Day (Jan 2-3).
Due to the broad age range filter (6-12 yrs old), only one of the first six camps listed in the results are appropriate for a 6 year old. Unfortunately that camp is for the wrong dates (January 20). This happens because the date filter only allows you to specify the date “starting on or after”. So the best I can do is limit results from a starting date to the end of that month. This is especially frustrating when trying to book camps over the summer break. Typically I want to book one camp per week for the entire summer. To do this on the FCPA website I can change the “starting on or after” date, but for the first week in a month I get all camps in that month (which could be more than 100 sessions!).
Desired capabilities
I would like to build a tool that can output a tailored report to facilitate selecting and registring for camps. Ideally I would be able to quickly update the data based on the current openings as classes can fill up quickly. So I need a way to:
- Automatically scrape the camp data from the FCPA website
- Select camps by various criteria
- Generate a custom report
Installation
- Clone the repo (git clone https://github.com/dlfelps/camp-little-scrape.git)
- Install dependencies (pip install -r requirements.txt)
- Install Chrome (https://www.google.com/chrome/)
- Sign up for a Google Maps API key (https://developers.google.com/maps)
Usage
Process overview
The overall process is described in the flowchart below. I will demonstrate each step below in detail, but at a high level, the tool completes a series of steps to:
- Scrape camp links
- Scrape camp details
- (optional) Calculate commute times to each camp
- Filter camps and generate report
Step 1 - Scrape camp links
The first step in the process uses Selenium to semi-automatically scrape the camp session links from the FCPA website.
- launch jupyter lab
- open step1.ipynb
- Follow instructions in step1.ipynb to open remote Chrome browser
- Go to FCPA website and search for ALL camps
- Follow instructions in step1.ipynb to scrape each page
This step produces a links.txt
file containing all the camp session links.
Step 2 - Scrape camp details
Steps 2-4 all use the command line gui provided by the camp.py file. It is built using Typer. View the list of possible commands by calling python camp.py --help
Step 2 is initiated by calling python camp.py details
. This step scrapes the camp details (e.g. location, dates, times) for each camp found in the links.txt
.
Step 3 - (optional) Calculate commutes
The third step uses Google Maps API to calculate the typical daily commute to each camp. This calculation is the sum of the following distances between home (user specified) and camp location:
- morning dropoff (arriving by 9 AM)
- morning return (departing at 9 AM)
- afternoon pickup (arriving by 4 PM)
- afternooon return (departing at 4 PM)
Commute time is one of the most important factors when selecting camps. Since full day camps are only 7 hours, it would not be time efficient to spend 2 hours in the car, which is shockingly easy to do even staying within Fairfax county (where all FCPA camps are located). This step does require a Google Map API key, which can be created at https://developers.google.com/maps. The code retrieves your individial key from the environment variable MAP_API_KEY
. Here is a guide if you need help.
Step 4 - Filter camps and generate report
The final step filters the available camps to generate a report. We can see the full list of available options by looking at the report help command python camp.py report --help
.
The filtering options for this step are as follows:
Parameter | Required? | Default value | Description | Example |
---|---|---|---|---|
first_day | Y | N/A | The camp must start on this date | “01/02/2025” |
last_day | Y | N/A | The camp must end on this date | “01/03/2025” |
min_age | N | 5 (years) | The camp must accept kids of this age | –min-age=5 |
max_commute | N | 120 (minutes) | The camp must not have an estimated daily commute greater than this many minutes | –max-commute=120 |
remove_half_day | N | True | Half day camps (from 9-noon) are not shown | –remove-half-day |
remove_schools | N | True | Camps located at schools are not shown | –remove-schools |
show_full | N | False | Camps that are already fully registered are not shown | –no-show-full |
If your desired parameters are different from default, then change them when you call the report command, e.g. python camp.py report "01/02/2025" "01/03/2025" --min-age=7
.
This will produce an HTML file in the reports directory with all matching camps:
The report is generated using the yattag library to procedurally generate the HTML report from python code. A few highlights of the report:
- The camps are sorted by shortest commute first
- The report highlights which camps include a swimming break
- Each camp includes a session link for easy registration
Conclusion
Overall I found this side project to be quite rewarding. I learned how to:
- scrape data from websites (selenium and httpx)
- parse structured data from websites (beautifulsoup)
- keep API keys secret using environment variables
- use 3rd party APIs
- generate HTML from Python
Finally I found the tool to actually make finding and registering my son for camp much easier!
Possible improvements
I was not able to find a way for Typer to hide commands that were not yet applicable. Ideally, only the details
command would be initially available (taking in the links.txt that was produced during the semi-manual scraping step). Then after creating “camps.pkl” and “places.txt” the commute
command would be available. Finally the report
command could be used.