---
title: "Ultimate Worksheet Answers"
format: html
---

Ultimate, also known as Ultimate Frisbee is a sport played with 2 teams of 7 players where players pass a Frisbee to each other and try to score in the opposing team's end zone. Each point starts with a pull where the defensive team throws the disc from their end zone towards the offensive team's end zone. The offensive team starts the point with possession and tries to score without dropping the Frisbee. If they drop the Frisbee, the defensive team can pick it up and try to score. The point ends when either team scores and the team that scored becomes the defensive team and pulls to the other team to start the next point. Games are played to 15 points, or to a time cutoff where the new target score is 1 point above the leading team.

The College Ultimate Championships consist of 2 weekends of competition, one for Division 3 and one for Division 1. 16 teams competed in both the Men's and Women's tournaments to be Division 3 Champion, while 20 teams competed for the two Division 1 titles. Statistics were measured from each game at both events and can be found on the [USA Ultimate](https://play.usaultimate.org/) Website.

## Import Packages

```{r}
#| warning: false
library(tidyverse)
library(readr)
```

## Data Import

Load in both datasets with data from the 2024 College Ultimate Championships.
```{r}
#| warning: false
ultimate_statistics <- read_csv('ultimate_total.csv', show_col_types = FALSE)
ultimate_games <- read_csv('ultimate_games.csv', show_col_types = FALSE)
```


## Tidy and Clean Data

The ultimate statistics dataset has 5 rows with statistics for each player, combine these into a single row for each player using a pivot function.

```{r}
ultimate_statistics <- ultimate_statistics |>
  pivot_wider(names_from = statistic, values_from = value)
```

Then use a join to combine the 2 datasets by player.

```{r}
ultimate_data <- left_join(ultimate_statistics, ultimate_games, 
                           by = join_by(player))
```

If columns are named inappropriately, use a rename function to rename them.

```{r}
ultimate_data <- ultimate_data |> select(player, 
                                         level.x, 
                                         gender.x,
                                         division.x,
                                         team_name.x, 
                                         Turns, 
                                         Ds, 
                                         Assists, 
                                         Points, 
                                         `Plus Minus`, 
                                         team_games)
```

Check if the type of each column is appropriate for the data. If not change the type.

```{r}
ultimate_data <- ultimate_data |> rename(level = level.x, 
                        gender = gender.x, 
                        division = division.x,
                        team_name = team_name.x,
                        plus_minus = `Plus Minus`)
## Could also rename variables to be upper/lowercase
```

Check if the type of each column is appropriate for the data, if not change the type.

```{r}
ultimate_data |> summary()
## Data should be in an appropriate form
```


Now that the current data is in a tidy form, create new variables that have statistics in a per_game format using a mutate function and the team games column.

```{r}
ultimate_full <- ultimate_data |> mutate(turns_per_game = Turns/team_games, 
                                         ds_per_game = Ds/team_games,
                                         ast_per_game = Assists/team_games,
                                         pts_per_game = Points/team_games,
                                         pls_mns_per_game = plus_minus/team_games)
```

## Data Visualization

1. The data set has 1,665 total players that participated in the College Ultimate Championships, figure out how many actually contributed in any statistical category. (Don't include team games as a category)

```{r}
ultimate_full |> filter(Points > 0 | 
                        Assists > 0 | 
                        Turns > 0 | 
                        Ds > 0 | 
                        plus_minus != 0) |> 
  summarise(total_player = n())

## 1455 players had at least 1 Point, Assist, Turn, D, or a non-zero plus_minus
```


2. Make a table that lists the top 15 players in assist totals from any division. Only include, player name, assists, and assists per game in your final table.

```{r}
ultimate_full |> arrange(desc(Assists)) |> 
                 slice(1:15) |> 
                 select(player, Assists, ast_per_game)
```

3. Which player had the most points and assists combined in each of the 4 divisions?

```{r}
ultimate_full |> mutate(pts_contributed = Points + Assists) |> 
  group_by(division) |> arrange(desc(pts_contributed)) |> 
  select(player, Points, Assists, pts_contributed) |> slice(1)

## Leo Gordon, Jolie Krebs, Jacob Felton, Julianna Galian
```


4. Make a plot comparing assists by division and comment on any interesting features that are visible

```{r}
ggplot(data = ultimate_full, aes(y = Assists, x = division)) +
  geom_jitter(aes(color = division))


ggplot(data = ultimate_full, aes(y = Assists, x = division)) +
  geom_boxplot(aes(color = division))

ggplot(data = ultimate_full, aes(y = Assists, x = division)) +
  geom_violin(aes(color = division))


## Could do boxplot or strip plot, violin plot would also 
## work but doesn't display the data very well
## All divisions are pretty similar when it comes to how assists are distibuted
```


5. Assists per game and Turns per game

a. Make a plot that compares turns per game and assists per game. Then label the names of players in the top 10 for assists using geom_text().

```{r}
top_10_assists <- ultimate_full |> arrange(desc(Assists)) |> slice(1:10)


ggplot(data = ultimate_full, aes(x = ast_per_game, y = turns_per_game)) +
  geom_point(color = "skyblue4") +
  geom_text(data = top_10_assists, aes(label = player))

## Could also color = division

ggplot(data = ultimate_full, aes(x = ast_per_game, y = turns_per_game)) +
  geom_point(aes(color = division)) +
  geom_text(data = top_10_assists, aes(label = player))
```


b. Based on this plot, should any of the players in the top 10 for assists throw the disc less because of their turnover frequency?

Claire Lee, Jolie Krebs, and Juliana Galian all have more Turns per game than assists per game, so perhaps they should throw the disc less. Jacob Felton has about an equal number of both.



6. Comparing Turns by Division

a. Make a plot comparing turns by division. Are there any differences between the divisions?

```{r}
ggplot(data = ultimate_full, aes(x = division, y = Turns)) +
  geom_boxplot(aes(color = division))
```

D3 Women tend to have more turns than other divisions, along with a larger spread of outliers, D1 Women have more turns than the Mens divisions but less than D3 Women. The Mens divisions are pretty similar in distribution of turns.

b. Label the names of players that are top 10 in assists. What does a high assist total indicate about your turnover rate?

```{r}
ggplot(data = ultimate_full, aes(x = division, y = Turns)) +
  geom_boxplot(aes(color = division)) +
  geom_text(data = top_10_assists, aes(label = player))
```

Players with a high assist count all tend to be outliers in the Turnover category. They likely are handlers who throw the disc significantly more than most other players. More throws mean more opportunites to assist, but more opportunities to create a turnover.


7. Plot turns vs. plus minus and describe the relationship.

```{r}
ggplot(data = ultimate_full, aes(x = plus_minus, y = Turns)) +
  geom_point(color = "skyblue4")

## This is a moderate negative linear relationship. 
## Not a very strong relationship, but there is definitely
## a correlation between large amounts of turnovers and a lower plus/minus.
```


8. Points and Ds across divisions

a. Make a bar plot with the top 15 point scorers in order using reorder if necessary. Group by division and note how the players are distributed across divisions.

```{r}
top_15_points <- ultimate_full |> arrange(desc(Points)) |> 
  slice(1:15) |> mutate(player = reorder(player, Points))

ggplot(data = top_15_points, aes(y = player, x = Points)) +
  geom_col(aes(fill = division))

## The distribution of divisions for points is fairly balanced with every division included
```

b. Make the same bar plot with the top 15 players in Ds and compare the distribution of divisions.

```{r}
top_15_ds <- ultimate_full |> arrange(desc(Ds)) |> 
  slice(1:15) |> mutate(player = reorder(player, Ds))

ggplot(data = top_15_ds, aes(y = player, x = Ds)) +
  geom_col(aes(fill = division))


## Points is fairly balanced, with every division included while Ds only has the women's divisions, with D3 taking 11 of the top 15 spots.
```


9. Offensive Dominance

a. Make a table with the sum of plus_minus for each division. 
```{r}
ultimate_full |> group_by(division) |> 
  summarise(tot_plus_minus = sum(plus_minus))
```


b. Plus Minus is only counted for points where a player starts on offense. If the offensive team scores they get +1, if the defensive team scores, the offensive players get -1. Based on how they are calculated, what do the total plus minus numbers say about the dominance of offense in each division?

In Division 1 Men, the offense is very dominant and scores much more often than the defense. Division 1 Women and Division 3 Men are similar where the offense is still very efficient and score much more than the defense. In Division 3 Women, the defense has a slight edge in plus_minus, meaning that teams on offense scores on slightly less than half of all points in this division. We can see this in both the Ds statistics and the Turns statistic, where Division 3 Women have the lead in both categories. This makes for very back and forth games when two teams are defensively strong and slightly weaker at offense.




