From 74992aaa27eb384128924c4a3b93052961a3eaab Mon Sep 17 00:00:00 2001 From: Christian Cleberg Date: Sat, 27 Apr 2024 17:01:13 -0500 Subject: test conversion back to markdown --- content/blog/2020-07-20-video-game-sales.org | 173 --------------------------- 1 file changed, 173 deletions(-) delete mode 100644 content/blog/2020-07-20-video-game-sales.org (limited to 'content/blog/2020-07-20-video-game-sales.org') diff --git a/content/blog/2020-07-20-video-game-sales.org b/content/blog/2020-07-20-video-game-sales.org deleted file mode 100644 index 2967c17..0000000 --- a/content/blog/2020-07-20-video-game-sales.org +++ /dev/null @@ -1,173 +0,0 @@ -#+title: Data Exploration: Video Game Sales -#+date: 2020-07-20 -#+description: Exploring and visualizing data with Python. -#+filetags: :data: - -* Background Information -This dataset (obtained from [[https://www.kaggle.com/gregorut/videogamesales/data][Kaggle]]) contains a list of video games with sales -greater than 100,000 copies. It was generated by a scrape of vgchartz.com. - -Fields include: - -- Rank: Ranking of overall sales -- Name: The game name -- Platform: Platform of the game release (i.e. PC,PS4, etc.) -- Year: Year of the game's release -- Genre: Genre of the game -- Publisher: Publisher of the game -- NA_{Sales}: Sales in North America (in millions) -- EU_{Sales}: Sales in Europe (in millions) -- JP_{Sales}: Sales in Japan (in millions) -- Other_{Sales}: Sales in the rest of the world (in millions) -- Global_{Sales}: Total worldwide sales. - -There are 16,598 records. 2 records were dropped due to incomplete information. - -* Import the Data -#+begin_src python -# Import the Python libraries we will be using -import pandas as pd -import numpy as np -import seaborn as sns; sns.set() -import matplotlib.pyplot as plt - -# Load the file using the path to the downloaded file -file = r'video_game_sales.csv' -df = pd.read_csv(file) -df -#+end_src - -#+caption: Dataframe Results -[[https://img.cleberg.net/blog/20200720-data-exploration-video-game-sales/01_dataframe-min.png]] - -* Explore the Data -#+begin_src python -# With the description function, we can see the basic stats. For example, we can -# also see that the 'Year' column has some incomplete values. -df.describe() -#+end_src - -#+caption: df.describe() -[[https://img.cleberg.net/blog/20200720-data-exploration-video-game-sales/02_describe-min.png]] - -#+begin_src python -# This function shows the rows and columns of NaN values. For example, df[179,3] = nan -np.where(pd.isnull(df)) - -(array([179, ..., 16553], dtype=int64), - array([3, ..., 5], dtype=int64)) -#+end_src - -* Visualize the Data -#+begin_src python -# This function plots the global sales by platform -sns.catplot(x='Platform', y='Global_Sales', data=df, jitter=False).set_xticklabels(rotation=90) -#+end_src - -#+caption: Plot of Global Sales by Platform -[[https://img.cleberg.net/blog/20200720-data-exploration-video-game-sales/03_plot-min.png]] - -#+begin_src python -# This function plots the global sales by genre -sns.catplot(x='Genre', y='Global_Sales', data=df, jitter=False).set_xticklabels(rotation=45) -#+end_src - -#+caption: Plot of Global Sales by Genre -[[https://img.cleberg.net/blog/20200720-data-exploration-video-game-sales/04_plot-min.png]] - -#+begin_src python -# This function plots the global sales by year -sns.lmplot(x='Year', y='Global_Sales', data=df).set_xticklabels(rotation=45) -#+end_src - -#+caption: Plot of Global Sales by Year -[[https://img.cleberg.net/blog/20200720-data-exploration-video-game-sales/05_plot-min.png]] - -#+begin_src python -# This function plots four different lines to show sales from different regions. -# The global sales plot line is commented-out, but can be included for comparison -df2 = df.groupby('Year').sum() -years = range(1980,2019) - -a = df2['NA_Sales'] -b = df2['EU_Sales'] -c = df2['JP_Sales'] -d = df2['Other_Sales'] -# e = df2['Global_Sales'] - -fig, ax = plt.subplots(figsize=(12,12)) -ax.set_ylabel('Region Sales (in Millions)') -ax.set_xlabel('Year') - -ax.plot(years, a, label='NA_Sales') -ax.plot(years, b, label='EU_Sales') -ax.plot(years, c, label='JP_Sales') -ax.plot(years, d, label='Other_Sales') -# ax.plot(years, e, label='Global_Sales') - -ax.legend() -plt.show() -#+end_src - -#+caption: Plot of Regional Sales by Year -[[https://img.cleberg.net/blog/20200720-data-exploration-video-game-sales/06_plot-min.png]] - -** Investigate Outliers -#+begin_src python -# Find the game with the highest sales in North America -df.loc[df['NA_Sales'].idxmax()] - -Rank 1 -Name Wii Sports -Platform Wii -Year 2006 -Genre Sports -Publisher Nintendo -NA_Sales 41.49 -EU_Sales 29.02 -JP_Sales 3.77 -Other_Sales 8.46 -Global_Sales 82.74 -Name: 0, dtype: object - -# Explore statistics in the year 2006 (highest selling year) -df3 = df[(df['Year'] == 2006)] -df3.describe() -#+end_src - -#+caption: Descriptive Statistics of 2006 Sales -[[https://img.cleberg.net/blog/20200720-data-exploration-video-game-sales/07_2006_stats-min.png]] - -#+begin_src python -# Plot the results of the previous dataframe (games from 2006) - we can see the year's results were largely carried by Wii Sports -sns.catplot(x="Genre", y="Global_Sales", data=df3, jitter=False).set_xticklabels(rotation=45) -#+end_src - -#+caption: Plot of 2006 Sales -[[https://img.cleberg.net/blog/20200720-data-exploration-video-game-sales/08_plot-min.png]] - -#+begin_src python -# We can see 4 outliers in the graph above, so let's get the top 5 games from that dataframe -# The results below show that Nintendo had all top 5 games (3 on the Wii and 2 on the DS) -df3.sort_values(by=['Global_Sales'], ascending=False).head(5) -#+end_src - -#+caption: Outliers of 2006 Sales -[[https://img.cleberg.net/blog/20200720-data-exploration-video-game-sales/09_outliers-min.png]] - -* Discussion -The purpose of exploring datasets is to ask questions, answer questions, and -discover intelligence that can be used to inform decision-making. So, what have -we found in this dataset? - -Today we simply explored a publicly-available dataset to see what kind of -information it contained. During that exploration, we found that video game -sales peaked in 2006. That peak was largely due to Nintendo, who sold the top 5 -games in 2006 and has a number of games in the top-10 list for the years -1980-2020. Additionally, the top four platforms by global sales (Wii, NES, GB, -DS) are owned by Nintendo. - -We didn't explore everything this dataset has to offer, but we can tell from a -brief analysis that Nintendo seems to rule sales in the video gaming world. -Further analysis could provide insight into which genres, regions, publishers, -or world events are correlated with sales. -- cgit v1.2.3-70-g09d2