Animated ‘Top 10’ NHL Scoring Charts With R

Reading Time: 5 minutes

Maybe you’ve seen some of those nifty ‘top 10’ style animated bar charts which show how categories change over time. I thought it would be cool to try and create some of my own with NHL hockey scoring data. This post shares my approach and how I implemented using R.

The first thing we need is some data, specifically season by season point totals of all the players in a team’s history. We could scrape it from directly from a sports or hockey stats website using the rvest package in R. However, in many cases that would violate the website’s terms of service, so I would recommend checking first if you go that route. There’s a better option; using the freely accessible NHL API. There’s no official documentation, so I recommend using the excellent work by Drew Hynes, who has documented many of the NHL API endpoints. There is an astounding amount of data here that the NHL makes available, including play by play and full shift data for the past 10+ seasons. It also has complete team and scoring information going all the way back to 1917 when the league was founded. In a future post I will share the code I used to fetch data from the API, but for now we will just use the final output file. Note that I have posted all code and data to my NHL GitHub repository.

First let’s install the required packages. Tidyverse is the standard workhorse for manipulating, shaping, and visualizing our data and gganimate is the additional package to turn our ggplots into animated ones. I added viridis which has a nice colour palette for our charts. Finally, we need a couple of packages (gifski and png) to render our final animated image files.

library(tidyverse) # data manipulation and plotting
library(gganimate) # chart animation
library(viridis)   # colour palettes
library(gifski)    # image rendering
library(png)       # image rendering

Next, let’s load and prepare data. The ‘games.rds’ file was built by fetching the data of all players who have played in the NHL and their game stats for each game they’ve played (eg. goals, assists, time on ice etc.). It’s a large file with over 2 million rows. This post will show how to create a chart for my favorite team the Vancouver Canucks, but you could easily modify this for your favorite team just by changing the team_name variable. So first we load the data and fix the season field to show for example ‘2018-2019’ instead of ‘2018’ just for readability. Also I added a note to add a middle initial to distinguish between the 2 Greg Adams that played for the Canucks.

team_name <- 'Vancouver Canucks'

# Load previously scraped API data
games <- readRDS('../Data/games.rds') %>%
         mutate(fullName = case_when(id == 8444894 ~ 'Greg D Adams',
                                     id == 8444898 ~ 'Greg C Adams',
                                     TRUE ~ fullName),
                season = paste0(substr(season,1,4),"-",substr(season,5,8)))

Now let’s prepare the data frame that we’re going to use for plotting, which consists of the top 10 players in career points for each Canucks season. First we’ll filter the data to Canucks players only and summarize their season totals and career cumulative points. Then we’ll use a spread and gather technique to fill zeros for all the non playing years of each player. After we gather the data again into a tidy format, we will replicate a player’s career points for each year after their retirement. Finally, we will rank all the players based on career points after each season and then filter only on the top 10 for our chart.

# Data Prep
plot_df <- filter(games, team.name == team_name) %>%
         group_by (season, fullName) %>%
         summarise(Pts = sum(stat.points)) %>%
         select(Player = fullName, Pts, season) %>%
         group_by(Player) %>%
         mutate(Total_Pts = cumsum(Pts),
                First_Season = min(season),
                Last_Season = max(season),
                Career_Pts = max(Total_Pts)) %>%
         select(-Pts) %>%
         spread(key = season, value = Total_Pts, fill = 0)

plot_df <- gather(plot_df, season, Total_Pts, 5:dim(plot_df)[2]) %>%
           mutate(Total_Pts = case_when(season > Last_Season ~ Career_Pts,
                                        TRUE ~ Total_Pts)) %>%
           select(Player, season, Total_Pts)

# Filter data to include only top 10 players for each year
plot_df <- group_by(plot_df, season) %>%
           mutate(rank = rank(-Total_Pts),
                  Value_rel = Total_Pts/Total_Pts[rank==1],
                  Value_lbl = paste0(" ",Total_Pts)) %>%
           group_by(season) %>% 
           filter(rank <=10) %>%
           ungroup()

Now that we’ve got our data ready, we’re ready to plot. First we will create a static horizontal bar plot using ggplot. It looks like a lot of code, but most of it is formatting the plot’s appearance.

# Create Static Plot
staticplot = ggplot(plot_df, aes(rank, group = Player, 
                                       fill = as.factor(Player), color = as.factor(Player))) +
                    scale_fill_viridis(discrete=TRUE) +
                    scale_color_viridis(discrete=TRUE) +
                    geom_tile(aes(y = Total_Pts/2,
                                  height = Total_Pts,
                                  width = 0.9), alpha = 0.8, color = NA) +
                    geom_text(aes(y = 0, label = paste(Player, " ")), vjust = 0.2, hjust = 1, color = "black") +
                    geom_text(aes(y=Total_Pts,label = Value_lbl, hjust=0), color = "black") +
                    coord_flip(clip = "off", expand = FALSE) +
                    scale_y_continuous(labels = scales::comma) +
                    scale_x_reverse() +
                    guides(color = FALSE, fill = FALSE) +
                    theme(axis.line=element_blank(),
                          axis.text.x=element_blank(),
                          axis.text.y=element_blank(),
                          axis.ticks=element_blank(),
                          axis.title.x=element_blank(),
                          axis.title.y=element_blank(),
                          legend.position="none",
                          panel.background=element_blank(),
                          panel.border=element_blank(),
                          panel.grid.major=element_blank(),
                          panel.grid.minor=element_blank(),
                          panel.grid.major.x = element_line( size=.1, color="grey" ),
                          panel.grid.minor.x = element_line( size=.1, color="grey" ),
                          plot.title=element_text(size=25, hjust=0.5, face="bold", colour="grey", vjust=-1),
                          plot.subtitle=element_text(size=18, hjust=0.5, face="italic", color="grey"),
                          plot.caption =element_text(size=8, hjust=0.5, face="italic", color="grey"),
                          plot.background=element_blank(),
                          plot.margin = margin(2,2, 2, 4, "cm"))

Now we create our animated plot by adding the transition_states function from gganimate and use the season field to identify the different states of our animation. Also we add a title and caption.

# Animated Plot
anim = staticplot + transition_states(season, transition_length = 4, state_length = 3) +
       view_follow(fixed_x = TRUE)  +
       labs(title = paste0(team_name," All Time Point Leaders"),  
            subtitle  =  "Season : {closest_state}",
            caption  = "Total Regular Season Pts | Data Source: www.hockey-reference.com")

Finally, we use our animated plot object and feed it into the animate function to render our final animated image file. In this case we will render to an animated gif, though other options are available. I suggest playing around with the different parameters to see how it affects the final output.

# Render Plot
animate(anim, fps = 10, duration = 45, width = 600, height = 400, end_pause = 100, detail = 1, rewind = FALSE,
        renderer = gifski_renderer(paste0(gsub(' ','_',team_name),"_alltime.gif")))

And that’s it, we’ve created a cool looking animated ‘top 10’ charts with just a few lines of R code!

Best of all the code can be easily modified to create charts for different teams or even use completely different data. There is also a lot of flexibility to customize the appearance. Enjoy, and please tag and share any charts you make as I’d love to see what you are able to create! Thanks for reading.