Intro to SQL with SQLite: Exploring International Rugby Matches
Introduction
In the following activity, you will use data compiled by Lyle Begbie from Kaggle, which contains detailed results from international rugby union matches played between 1871 and 2022. The dataset includes match dates, home and away teams, scores, competitions, venues, countries, and flags for World Cup and neutral-site matches.
Each observation represents a single rugby match, providing a rich source of information to explore team performances, scoring patterns, and trends over time.
Using this data, you will practice fundamental SQL skills within Python, while also learning how to perform descriptive data analysis and aggregation on real sports data.
Learning Goals
In this module, students will explore how to use SQL to analyze international rugby match data. By writing, executing, and interpreting SQL queries on this dataset, students will practice core data science skills such as:
- Filtering data using
WHEREclauses to focus on subsets of interest - Computing summary statistics with
GROUP BYand aggregate functions, such asAVG - Sorting and ranking results to identify top teams, competitions, or scoring patterns
- Extracting insights from dates and categorical variables
- Developing familiarity with relational database tools like SQLite for data exploration
Data
The dataset contains over 2600 international rugby union results.
The notebook loads the data directly from the project’s GitHub source, so the local climb_df.csv file shown in the folder is not needed for this module.
Variable Descriptions
| Variable | Description |
|---|---|
| date | Date when the match was played (format: YYYY-MM-DD) |
| home_team | Name of the team playing at home |
| away_team | Name of the team playing away |
| home_score | Number of points scored by the home team |
| away_score | Number of points scored by the away team |
| competition | Name of the competition or series, often including the year |
| stadium | Name of the stadium where the match was played |
| city | City where the match took place |
| country | Country where the match was hosted |
| neutral | Boolean indicating if the match was on neutral ground |
| world_cup | Boolean indicating if the match was part of the Rugby World Cup |
Module Files
The full module materials are linked below.
Module
This shell page provides access to the module materials. Download the student-facing Jupyter notebook above to complete the activity.
To run the notebook, users will need a Python environment with packages such as pandas and access to Python’s built-in sqlite3 library.
Summary
In the provided material, students explored how to use SQL to analyze international rugby match data. By writing, running, and interpreting SQL queries on a real sports dataset, students practiced core data science skills such as:
- Filtering data using
WHEREclauses to focus on specific subsets - Computing summary statistics and aggregates with
GROUP BYand functions likeAVG - Sorting and ranking data using
ORDER BYto identify patterns and top results - Using date filters to explore trends over time
- Combining Python and SQL to perform efficient data wrangling inside a notebook