aboutsummaryrefslogtreecommitdiff
path: root/content/blog/2020-07-20-video-game-sales.org
diff options
context:
space:
mode:
authorChristian Cleberg <hello@cleberg.net>2024-04-27 17:01:13 -0500
committerChristian Cleberg <hello@cleberg.net>2024-04-27 17:01:13 -0500
commit74992aaa27eb384128924c4a3b93052961a3eaab (patch)
treed5193997d72a52f7a6d6338ea5da8a6c80b4eddc /content/blog/2020-07-20-video-game-sales.org
parent3def68d80edf87e28473609c31970507d9f03467 (diff)
downloadcleberg.net-74992aaa27eb384128924c4a3b93052961a3eaab.tar.gz
cleberg.net-74992aaa27eb384128924c4a3b93052961a3eaab.tar.bz2
cleberg.net-74992aaa27eb384128924c4a3b93052961a3eaab.zip
test conversion back to markdown
Diffstat (limited to 'content/blog/2020-07-20-video-game-sales.org')
-rw-r--r--content/blog/2020-07-20-video-game-sales.org173
1 files changed, 0 insertions, 173 deletions
diff --git a/content/blog/2020-07-20-video-game-sales.org b/content/blog/2020-07-20-video-game-sales.org
deleted file mode 100644
index 2967c17..0000000
--- a/content/blog/2020-07-20-video-game-sales.org
+++ /dev/null
@@ -1,173 +0,0 @@
-#+title: Data Exploration: Video Game Sales
-#+date: 2020-07-20
-#+description: Exploring and visualizing data with Python.
-#+filetags: :data:
-
-* Background Information
-This dataset (obtained from [[https://www.kaggle.com/gregorut/videogamesales/data][Kaggle]]) contains a list of video games with sales
-greater than 100,000 copies. It was generated by a scrape of vgchartz.com.
-
-Fields include:
-
-- Rank: Ranking of overall sales
-- Name: The game name
-- Platform: Platform of the game release (i.e. PC,PS4, etc.)
-- Year: Year of the game's release
-- Genre: Genre of the game
-- Publisher: Publisher of the game
-- NA_{Sales}: Sales in North America (in millions)
-- EU_{Sales}: Sales in Europe (in millions)
-- JP_{Sales}: Sales in Japan (in millions)
-- Other_{Sales}: Sales in the rest of the world (in millions)
-- Global_{Sales}: Total worldwide sales.
-
-There are 16,598 records. 2 records were dropped due to incomplete information.
-
-* Import the Data
-#+begin_src python
-# Import the Python libraries we will be using
-import pandas as pd
-import numpy as np
-import seaborn as sns; sns.set()
-import matplotlib.pyplot as plt
-
-# Load the file using the path to the downloaded file
-file = r'video_game_sales.csv'
-df = pd.read_csv(file)
-df
-#+end_src
-
-#+caption: Dataframe Results
-[[https://img.cleberg.net/blog/20200720-data-exploration-video-game-sales/01_dataframe-min.png]]
-
-* Explore the Data
-#+begin_src python
-# With the description function, we can see the basic stats. For example, we can
-# also see that the 'Year' column has some incomplete values.
-df.describe()
-#+end_src
-
-#+caption: df.describe()
-[[https://img.cleberg.net/blog/20200720-data-exploration-video-game-sales/02_describe-min.png]]
-
-#+begin_src python
-# This function shows the rows and columns of NaN values. For example, df[179,3] = nan
-np.where(pd.isnull(df))
-
-(array([179, ..., 16553], dtype=int64),
- array([3, ..., 5], dtype=int64))
-#+end_src
-
-* Visualize the Data
-#+begin_src python
-# This function plots the global sales by platform
-sns.catplot(x='Platform', y='Global_Sales', data=df, jitter=False).set_xticklabels(rotation=90)
-#+end_src
-
-#+caption: Plot of Global Sales by Platform
-[[https://img.cleberg.net/blog/20200720-data-exploration-video-game-sales/03_plot-min.png]]
-
-#+begin_src python
-# This function plots the global sales by genre
-sns.catplot(x='Genre', y='Global_Sales', data=df, jitter=False).set_xticklabels(rotation=45)
-#+end_src
-
-#+caption: Plot of Global Sales by Genre
-[[https://img.cleberg.net/blog/20200720-data-exploration-video-game-sales/04_plot-min.png]]
-
-#+begin_src python
-# This function plots the global sales by year
-sns.lmplot(x='Year', y='Global_Sales', data=df).set_xticklabels(rotation=45)
-#+end_src
-
-#+caption: Plot of Global Sales by Year
-[[https://img.cleberg.net/blog/20200720-data-exploration-video-game-sales/05_plot-min.png]]
-
-#+begin_src python
-# This function plots four different lines to show sales from different regions.
-# The global sales plot line is commented-out, but can be included for comparison
-df2 = df.groupby('Year').sum()
-years = range(1980,2019)
-
-a = df2['NA_Sales']
-b = df2['EU_Sales']
-c = df2['JP_Sales']
-d = df2['Other_Sales']
-# e = df2['Global_Sales']
-
-fig, ax = plt.subplots(figsize=(12,12))
-ax.set_ylabel('Region Sales (in Millions)')
-ax.set_xlabel('Year')
-
-ax.plot(years, a, label='NA_Sales')
-ax.plot(years, b, label='EU_Sales')
-ax.plot(years, c, label='JP_Sales')
-ax.plot(years, d, label='Other_Sales')
-# ax.plot(years, e, label='Global_Sales')
-
-ax.legend()
-plt.show()
-#+end_src
-
-#+caption: Plot of Regional Sales by Year
-[[https://img.cleberg.net/blog/20200720-data-exploration-video-game-sales/06_plot-min.png]]
-
-** Investigate Outliers
-#+begin_src python
-# Find the game with the highest sales in North America
-df.loc[df['NA_Sales'].idxmax()]
-
-Rank 1
-Name Wii Sports
-Platform Wii
-Year 2006
-Genre Sports
-Publisher Nintendo
-NA_Sales 41.49
-EU_Sales 29.02
-JP_Sales 3.77
-Other_Sales 8.46
-Global_Sales 82.74
-Name: 0, dtype: object
-
-# Explore statistics in the year 2006 (highest selling year)
-df3 = df[(df['Year'] == 2006)]
-df3.describe()
-#+end_src
-
-#+caption: Descriptive Statistics of 2006 Sales
-[[https://img.cleberg.net/blog/20200720-data-exploration-video-game-sales/07_2006_stats-min.png]]
-
-#+begin_src python
-# Plot the results of the previous dataframe (games from 2006) - we can see the year's results were largely carried by Wii Sports
-sns.catplot(x="Genre", y="Global_Sales", data=df3, jitter=False).set_xticklabels(rotation=45)
-#+end_src
-
-#+caption: Plot of 2006 Sales
-[[https://img.cleberg.net/blog/20200720-data-exploration-video-game-sales/08_plot-min.png]]
-
-#+begin_src python
-# We can see 4 outliers in the graph above, so let's get the top 5 games from that dataframe
-# The results below show that Nintendo had all top 5 games (3 on the Wii and 2 on the DS)
-df3.sort_values(by=['Global_Sales'], ascending=False).head(5)
-#+end_src
-
-#+caption: Outliers of 2006 Sales
-[[https://img.cleberg.net/blog/20200720-data-exploration-video-game-sales/09_outliers-min.png]]
-
-* Discussion
-The purpose of exploring datasets is to ask questions, answer questions, and
-discover intelligence that can be used to inform decision-making. So, what have
-we found in this dataset?
-
-Today we simply explored a publicly-available dataset to see what kind of
-information it contained. During that exploration, we found that video game
-sales peaked in 2006. That peak was largely due to Nintendo, who sold the top 5
-games in 2006 and has a number of games in the top-10 list for the years
-1980-2020. Additionally, the top four platforms by global sales (Wii, NES, GB,
-DS) are owned by Nintendo.
-
-We didn't explore everything this dataset has to offer, but we can tell from a
-brief analysis that Nintendo seems to rule sales in the video gaming world.
-Further analysis could provide insight into which genres, regions, publishers,
-or world events are correlated with sales.