The Olympic Games 2024 🥇 are in Paris and, in a weird twist of fate, so am I. Sadly, my chances of participating in any athletic event seem small. So, aside from watching the torch pass by and being intimidated by machine gun wielding police everywhere, can we get in the Olympic spirit with some open data?
Waiting for the Olympic Torch to pass…
Finding Data: National Access Points
One of the great things about Paris is its public transport network. It is fast, easy to use and very dense. Transport data is also one of the domains that are often covered well with open data.
A reason for this is that the EU actually requires member states to establish National Access Points (NAP) for transport data. We’ve previously worked with the German NAP, the Mobilithek.
France also has a NAP for transport data, transport.data.gouv.fr. Interesting sidenote: Public transport providers in France are actually required to publish open data! Probably one of the reasons why France takes the gold in the most recent open data maturity report by the EU (2023). Allez les blues 🇫🇷!
The French NAP also helpfully created a topic for data sets related to the Olympic Games. Maybe they have read my previous rant on how to make sure no one cares about your open data ;).
Here, we can find a data set of GTFS data for the Paris region during the Olympic Games: Paris 2024 - Réseau urbain et interurbain.
Data Format: GTFS
The data comes in the form of GTFS (General Transit Feed Specification) which is an open standard, originally developed in cooperation with Google to ingest data into Google Maps. Accordingly, it was originally called “Google Transit Feed Specification” and then sneakily renamed to fit a more general purpose.
The format is very simple, which is probably one of the reasons it is used so much: A GTFS-formatted file is just a ZIP-file, bundling .txt files that contain CSV data (one day I will understand why these files have a .txt file extension. Today is not that day.)
Data about public transport stops as CSV
Accessing the Data: Jayvee
As part of our research project JValue that aims to make open data easy and fun to use, we’ve developed a domain-specific language to build data pipelines, called Jayvee. Jayvee aligns with the mental model of data pipelines as a series of steps (called blocks), connected by pipes. Because we provide a domain-specific extension for open transport data that covers GTFS, a simple pipeline to download the GTFS data set, extract the information about public transport stops and save it locally into a SQLite database looks like this:
pipeline OlympicGamesTransport {
ScheduleExtractor
-> StopsInterpreter
-> StopsLoader;
block ScheduleExtractor oftype GTFSExtractor {
url: "https://data.iledefrance-mobilites.fr/api/explore/v2.1/catalog/datasets/offre-horaires-tc-gtfs_jop_2024-idfm/files/a8f5a65028f7ce75bf3fe368bd3372c4";
}
block StopsInterpreter oftype GTFSStopsInterpreter { }
block StopsLoader oftype SQLiteLoader {
table: "stops";
file: "./gtfs.sqlite";
}
}
Here, a GTFSExtractor
block downloads the data from the source, the domain-specific GTFSStopsInterpreter
block parses the resulting file according to the GTFS format and extracts the information about public transport stops as a relational table with fitting value types. Finally, the SQLiteLoader
block saves the data in a SQLite database.
The pipeline can be executed with the Jayvee interpreter, version 0.6.0:
➜ olympics-transport git:(main) jv -V
0.6.0
➜ olympics-transport git:(main) jv olympic-games-transport.jv
Found 1 pipelines to execute: OlympicGamesTransport
[OlympicGamesTransport] Overview:
Blocks (3 blocks with 1 pipes):
-> ScheduleExtractor (GTFSExtractor)
-> StopsInterpreter (GTFSStopsInterpreter)
-> StopsLoader (SQLiteLoader)
After the run, a gtfs.sqlite
file is created in the directory, containing all public transport stops of the GTFS file.
After processing with Jayvee, public transport stops in SQLite
If you’re interested, Jayvee is open-source. You can try it or give it a star to follow along on GitHub: https://github.com/jvalue/jayvee.
Playing with the Data: Python
Now that we have the data about public transport stops in the Paris region in a local SQLite database, it is trivial to play around with it with Python. Most of the stops have an assigned zone_id
that is highly relevant because it determines what ticket you need to buy and how expensive travel will be.
To load the data and only keep stops with an associated zone ID, we can use Pandas:
import pandas as pd
import sqlite3
con = sqlite3.connect("gtfs.sqlite")
df = pd.read_sql_query("SELECT * from stops", con)
df = df[df["zone_id"] != ""]
Then, using Plotly, we can visualize the data on an interactive map of Paris, provided by OpenStreetMap:
import plotly.express as px
fig = px.scatter_mapbox(df,
lat="stop_lat",
lon="stop_lon",
hover_name="stop_name",
hover_data=["stop_id", "stop_name", "zone_id"],
color="zone_id",
zoom=9,
height=800,
width=800)
fig.update_traces(marker=dict(size=3))
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
The Result: An Interactive Map of Public Transport Stops With Their Zone in Paris
And finally, we’ve accomplished our goal of participating in the Olympic Games 2024 in Paris. And all of that without leaving the chair or breaking a sweat. I am off to the Stade de France in Saint-Denis, knowing that I will need to buy a ticket for zone 2.