---
title: "Visualizaing Points Scored in the PWHL"
output: html_document
---

## Introduction

The Professional Woman’s Hockey League (PWHL) began its inaugural season in 2023-24. The league has players from 11 different countries. The league looks to expand its exposure and gain new fans for the future of the sport.  

We will be investigating the player statistics from the league’s inaugural season. Our focus will be on all players including goalies which includes 147 athletes. We want to discover which age groups and positions have had the most impact on the number of points scored per game by each player.

Each position has a different role in contributing to the team. In general, the main goal of the forwards is to stay in three different lanes across the ice, moving the puck between them to make the goalie move and open scoring opportunities. The defense players compliment the forwards by positioning themselves along the boundary of the offensive zone to prevent the opposing team from moving the puck away from the zone and provide more opportunities for the forwards to score. The goalie’s focus is to guard their team’s goal by positioning themselves in front of it to prevent the opposing team from scoring. 
The different age groups represent a blend of experience and athleticism at a point in the player’s career. More experience should help the player score more goals because they would have more knowledge of the game and ideas on how to score. However, more experience comes with more aging and players with more experience may be past their years of peak athleticism. That is why our goal is to find if there is a perfect blend between the two (i.e. an ideal age and position group).

## Data Description

| Column Name | Description                                                  |
|-------------|--------------------------------------------------------------|
| PWHL_Final  | Name of the data set                                         |
| P_Per_GP    | Number of points scored by the player per game played       |
| Pos         | Position of the player (D = Defense, F = Forward, G = Goalie)|
| Age         | Age of the player in years                                   |


**Exercises**

```{r}
# Load in the necessary packages and data
library(tidyverse)
library(here)
PWHL_Final <- read.csv(here("Your-Directory/PWHL_Final.csv"))
```

The density plot below displays the distribution of goals per game played for each position. Use it to answer the following two questions.

```{r}
ggplot(data = PWHL_Final) + 
  geom_density(aes(P_Per_GP, color = Pos, fill = Pos), alpha = 0.25) +
  theme_minimal()
```

1. What would you need to add to the code below to add a title and change the x-axis label to "Points Per Game Played"?

```{r}
ggplot(data = PWHL_Final) + 
  geom_density(aes(P_Per_GP, color = Pos, fill = Pos), alpha = 0.25) +
  theme_minimal() # your code here
```

2. Describe the distribution of the Forward and Defense positions in the density plot above. Make sure to mention shape and skew. Give a possible reason why the Goalie’s curve is concentrated around zero in this visual. 





3. Fill in the code below to create different age groups and filter out goalies from the dataset. Make 3 different age groups: Call ages 22-25 "youngest", ages 26-30 "middle", and ages 31-36 "oldest".

```{r}
# Making the different age groups and filtering out goalies
PWHL_Graph <- PWHL_Final %>%
  mutate(Age_Group = case_when(

    
    
  )) %>%
  filter() # filter out goalies
PWHL_Graph <- PWHL_Graph %>%
  mutate(Age_Group = factor(Age_Group, levels = c("youngest", "middle", "oldest")))
```

Use the boxplots below to answer the following question.

```{r}
ggplot(data = PWHL_Graph, mapping = aes(x = Age_Group, y = P_Per_GP)) +
  geom_boxplot() +
  labs(x = "Age Group", y = "Points Per Game Played", title = "Points per Game Played by Age Group") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))
```

```{r}
ggplot(data = PWHL_Graph, mapping = aes(x = Pos, y = P_Per_GP)) +
  geom_boxplot() + 
  labs(x = "Position", y = "Points Per Game Played", title = "Points per Game Played by Position") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) 
```

4. Brainstorm some ideas on which combinations of age and positions would be ideal for a player to maximize the number of points per game played. Explain.








5. Let’s put this all together on one graph to see the trends between the age groups and positions. Create side-by-side boxplots that display the points per game for each group. Only include Forward and Defense positions (since we already showed how few points goalies score). Make sure to add a theme and change the x and y axis labels. Refer to the data description on the second page. Once completed, get into small groups and decide whose graph displays the data the best and why. Try switching the variables between x, y, and color to see if it improves your visualization.

```{r}
# your code here

```


6. What trends, if any, do you see in your graph? Does it confirm your original thoughts in question 3? What seems to be the ideal position and age group to maximize goals per game played?






7. Give one reason why you think older forwards have the most points per game compared to the other age groups?







8. Reread the summary at the top of the file. What might be a limitation to this dataset.





