Youtube Data Analysis Project

MSBX 5410

Fanyi | Freddie | Hathaway | Jaydip


Project Introduction

This project is a deep analysis on Trending Youtube Video Statistic. This website will walk you through some data exploration and deeper analysis on Youtube Trending Video Statistic in between countries like US, Canada, United Kingdom, France and Germany. Moreover, the website also identified multiple correlations of several related terms.

Data Information #back to top

Data Context

YouTube, the world-famous video sharing website, maintains a list of the top trending videos on the platform. According to Variety magazine, “To determine the year’s top-trending videos, YouTube uses a combination of factors including measuring users interactions (number of views, shares, comments and likes). Note that they’re not the most-viewed videos overall for the calendar year”. Top performers on the YouTube trending list are music videos (such as the famously virile “Gangam Style”), celebrity and/or reality TV performances, and the random dude-with-a-camera viral videos that YouTube is well-known for. This dataset is a daily record of the top trending YouTube videos. Note that this dataset is a structurally improved version of this dataset.

Data Content

This dataset includes several months (and counting) of data on daily trending YouTube videos. Data is included for the US, GB, DE, CA, and FR regions (USA, Great Britain, Germany, Canada, and France, respectively), with up to 200 listed trending videos per day.

Each region’s data is in a separate file. Data includes the video title, channel title, publish time, tags, views, likes and dislikes, description, and comment count.

The data also includes a category_id field, which varies between regions. To retrieve the categories for a specific video, find it in the associated JSON. One such file is included for each of the five regions in the dataset.

Analysis Topic #back to top

  • Correlation between views, likes, and dislikes.
  • Which month would have the most views?
  • Occurrence of tags and the popularity of categories in each countries.
  • Identifying the duration of trending of each category.

Views and Preference #back to top

Preference Between Countries



Correlation Matrix

Trending date vs Views

Date and Views #back to top

Publish Date vs Views (Static Version)

The graph above is an interactive bar chart that compares total views of each Countries through 2006 to 2018. Since the data is incompleted and missing many date, you may observe a lot of empty space in the years before 2013.



Trending Date vs Views

Here is a completed comparsion of total views in each country from the end of 2017 to current date. To aviod most uncertainties and errors, we extracted the most integrated data through the dataset and made bar chart to observe the differences in views.

Category and Tags #back to top

Category Analysis

This bar chart compares the proportion of views in each category and country. The proportion was calculated based on the total number of views; therefore, some bars of the charts are not representative because of missing values.

Tags Occurrence

Tag is an interesting function of Youtube. Through analysising these tags, we would be albe to discover why people would watch these videos and what would make a video trending.

Trending of Duration#back to top

Multi-line Chart

Donut Graph