aboutsummaryrefslogtreecommitdiff
path: root/content/blog/2022-03-03-financial-database.md
diff options
context:
space:
mode:
Diffstat (limited to 'content/blog/2022-03-03-financial-database.md')
-rw-r--r--content/blog/2022-03-03-financial-database.md330
1 files changed, 157 insertions, 173 deletions
diff --git a/content/blog/2022-03-03-financial-database.md b/content/blog/2022-03-03-financial-database.md
index 49ffc7b..bfe5a98 100644
--- a/content/blog/2022-03-03-financial-database.md
+++ b/content/blog/2022-03-03-financial-database.md
@@ -7,54 +7,50 @@ draft = false
# Personal Financial Tracking
-For the last 6-ish years, I\'ve tracked my finances in a spreadsheet.
-This is common practice in the business world, but any good dev will
-cringe at the thought of storing long-term data in a spreadsheet. A
-spreadsheet is not for long-term storage or as a source of data to pull
-data/reports.
-
-As I wanted to expand the functionality of my financial data (e.g.,
-adding more reports), I decided to migrate the data into a database. To
-run reports, I would query the database and use a language like Python
-or Javascript to process the data, perform calculations, and visualize
-the data.
+For the last 6-ish years, I've tracked my finances in a spreadsheet. This is
+common practice in the business world, but any good dev will cringe at the
+thought of storing long-term data in a spreadsheet. A spreadsheet is not for
+long-term storage or as a source of data to pull data/reports.
+
+As I wanted to expand the functionality of my financial data (e.g., adding more
+reports), I decided to migrate the data into a database. To run reports, I would
+query the database and use a language like Python or Javascript to process the
+data, perform calculations, and visualize the data.
# SQLite
-When choosing the type of database I wanted to use for this project, I
-was split between three options:
+When choosing the type of database I wanted to use for this project, I was split
+between three options:
-1. MySQL: The database I have the most experience with and have used
- for years.
-2. PostgreSQL: A database I\'m new to, but want to learn.
-3. SQLite: A database that I\'ve used for a couple projects and have
- moderate experience.
+1. MySQL: The database I have the most experience with and have used for years.
+2. PostgreSQL: A database I'm new to, but want to learn.
+3. SQLite: A database that I've used for a couple projects and have moderate
+ experience.
-I ended up choosing SQLite since it can be maintained within a single
-`.sqlite` file, which allows me more flexibility for storage
-and backup. I keep this file in my cloud storage and pull it up whenever
-needed.
+I ended up choosing SQLite since it can be maintained within a single `.sqlite`
+file, which allows me more flexibility for storage and backup. I keep this file
+in my cloud storage and pull it up whenever needed.
## GUI Editing
-Since I didn\'t want to try and import 1000--1500 records into my new
-database via the command line, I opted to use [DB Browser for SQLite
+Since I didn't want to try and import 1000--1500 records into my new database
+via the command line, I opted to use [DB Browser for SQLite
(DB4S)](https://sqlitebrowser.org/) as a GUI tool. This application is
-excellent, and I don\'t see myself going back to the CLI when working in
-this database.
+excellent, and I don't see myself going back to the CLI when working in this
+database.
DB4S allows you to copy a range of cells from a spreadsheet and paste it
-straight into the SQL table. I used this process for all 36 accounts,
-1290 account statements, and 126 pay statements. Overall, I\'m guessing
-this took anywhere between 4--8 hours. In comparison, it probably took
-me 2-3 days to initially create the spreadsheet.
+straight into the SQL table. I used this process for all 36 accounts, 1290
+account statements, and 126 pay statements. Overall, I'm guessing this took
+anywhere between 4--8 hours. In comparison, it probably took me 2-3 days to
+initially create the spreadsheet.
![DB4S](https://img.cleberg.net/blog/20220303-maintaining-a-personal-financial-database/db4s.png)
## Schema
-The schema for this database is actually extremely simple and involves
-only three tables (for now):
+The schema for this database is actually extremely simple and involves only
+three tables (for now):
1. Accounts
2. Statements
@@ -62,9 +58,9 @@ only three tables (for now):
**Accounts**
-The Accounts table contains summary information about an account, such
-as a car loan or a credit card. By viewing this table, you can find
-high-level data, such as interest rate, credit line, or owner.
+The Accounts table contains summary information about an account, such as a car
+loan or a credit card. By viewing this table, you can find high-level data, such
+as interest rate, credit line, or owner.
``` sql
CREATE TABLE "Accounts" (
@@ -82,10 +78,10 @@ CREATE TABLE "Accounts" (
**Statements**
-The Statements table uses the same unique identifier as the Accounts
-table, meaning you can join the tables to find a monthly statement for
-any of the accounts listed in the Accounts table. Each statement has an
-account ID, statement date, and total balance.
+The Statements table uses the same unique identifier as the Accounts table,
+meaning you can join the tables to find a monthly statement for any of the
+accounts listed in the Accounts table. Each statement has an account ID,
+statement date, and total balance.
``` sql
CREATE TABLE "Statements" (
@@ -100,10 +96,10 @@ CREATE TABLE "Statements" (
**Payroll**
-The Payroll table is a separate entity, unrelated to the Accounts or
-Statements tables. This table contains all information you would find on
-a pay statement from an employer. As you change employers or obtain new
-perks/benefits, just add new columns to adapt to the new data.
+The Payroll table is a separate entity, unrelated to the Accounts or Statements
+tables. This table contains all information you would find on a pay statement
+from an employer. As you change employers or obtain new perks/benefits, just add
+new columns to adapt to the new data.
``` sql
CREATE TABLE "Payroll" (
@@ -139,134 +135,122 @@ CREATE TABLE "Payroll" (
## Python Reporting
-Once I created the database tables and imported all my data, the only
-step left was to create a process to report and visualize on various
-aspects of the data.
-
-In order to explore and create the reports I\'m interested in, I
-utilized a two-part process involving Jupyter Notebooks and Python
-scripts.
-
-1. Step 1: Jupyter Notebooks
-
- When I need to explore data, try different things, and re-run my
- code cell-by-cell, I use Jupyter Notebooks. For example, I explored
- the `Accounts` table until I found the following useful
- information:
-
- ``` python
- import sqlite3
- import pandas as pd
- import matplotlib
-
- # Set up database filename and connect
- db = "finances.sqlite"
- connection = sqlite3.connect(db)
- df = pd.read_sql_query("SELECT ** FROM Accounts", connection)
-
- # Set global matplotlib variables
- %matplotlib inline
- matplotlib.rcParams['text.color'] = 'white'
- matplotlib.rcParams['axes.labelcolor'] = 'white'
- matplotlib.rcParams['xtick.color'] = 'white'
- matplotlib.rcParams['ytick.color'] = 'white'
- matplotlib.rcParams['legend.labelcolor'] = 'black'
-
- # Display graph
- df.groupby(['AccountType']).sum().plot.pie(title='Credit Line by Account Type', y='CreditLine', figsize=(5,5), autopct='%1.1f%%')
- ```
-
-2. Step 2: Python Scripts
-
- Once I explored enough through the notebooks and had a list of
- reports I wanted, I moved on to create a Python project with the
- following structure:
-
- ``` txt
- finance/
- ├── notebooks/
- │ │ ├── account_summary.ipynb
- │ │ ├── account_details.ipynb
- │ │ └── payroll.ipynb
- ├── public/
- │ │ ├── image-01.png
- │ │ └── image-0X.png
- ├── src/
- │ └── finance.sqlite
- ├── venv/
- ├── _init.py
- ├── database.py
- ├── process.py
- ├── requirements.txt
- └── README.md
- ```
-
- This structure allows me to:
-
- 1. Compile all required python packages into
- `requirements.txt` for easy installation if I move to
- a new machine.
- 2. Activate a virtual environment in `venv/` so I don\'t
- need to maintain a system-wide Python environment just for this
- project.
- 3. Keep my `notebooks/` folder to continuously explore
- the data as I see fit.
- 4. Maintain a local copy of the database in `src/` for
- easy access.
- 5. Export reports, images, HTML files, etc. to
- `public/`.
-
- Now, onto the differences between the code in a Jupyter Notebook and
- the actual Python files. To create the report in the Notebook
- snippet above, I created the following function inside
- `process.py`:
-
- ``` python
- # Create summary pie chart
- def summary_data(accounts: pandas.DataFrame) -> None:
- accounts_01 = accounts[accounts["Owner"] == "Person01"]
- accounts_02 = accounts[accounts["Owner"] == "Person02"]
- for x in range(1, 4):
- if x == 1:
- df = accounts
- account_string = "All Accounts"
- elif x == 2:
- df = accounts_01
- account_string = "Person01's Accounts"
- elif x == 3:
- df = accounts_02
- account_string = "Person02's Accounts"
- print(f"Generating pie chart summary image for {account_string}...")
- summary_chart = (
- df.groupby(["AccountType"])
- .sum()
- .plot.pie(
- title=f"Credit Line by Type for {account_string}",
- y="CreditLine",
- autopct="%1.1f%%",
- )
+Once I created the database tables and imported all my data, the only step left
+was to create a process to report and visualize on various aspects of the data.
+
+In order to explore and create the reports I'm interested in, I utilized a
+two-part process involving Jupyter Notebooks and Python scripts.
+
+### Step 1: Jupyter Notebooks
+
+When I need to explore data, try different things, and re-run my code
+cell-by-cell, I use Jupyter Notebooks. For example, I explored the
+`Accounts` table until I found the following useful information:
+
+``` python
+import sqlite3
+import pandas as pd
+import matplotlib
+
+# Set up database filename and connect
+db = "finances.sqlite"
+connection = sqlite3.connect(db)
+df = pd.read_sql_query("SELECT ** FROM Accounts", connection)
+
+# Set global matplotlib variables
+%matplotlib inline
+matplotlib.rcParams['text.color'] = 'white'
+matplotlib.rcParams['axes.labelcolor'] = 'white'
+matplotlib.rcParams['xtick.color'] = 'white'
+matplotlib.rcParams['ytick.color'] = 'white'
+matplotlib.rcParams['legend.labelcolor'] = 'black'
+
+# Display graph
+df.groupby(['AccountType']).sum().plot.pie(title='Credit Line by Account Type', y='CreditLine', figsize=(5,5), autopct='%1.1f%%')
+```
+
+### Step 2: Python Scripts
+
+Once I explored enough through the notebooks and had a list of reports I
+wanted, I moved on to create a Python project with the following structure:
+
+``` txt
+finance/
+├── notebooks/
+│ │ ├── account_summary.ipynb
+│ │ ├── account_details.ipynb
+│ │ └── payroll.ipynb
+├── public/
+│ │ ├── image-01.png
+│ │ └── image-0X.png
+├── src/
+│ └── finance.sqlite
+├── venv/
+├── _init.py
+├── database.py
+├── process.py
+├── requirements.txt
+└── README.md
+```
+
+This structure allows me to:
+
+1. Compile all required python packages into `requirements.txt` for easy
+ installation if I move to a new machine.
+2. Activate a virtual environment in `venv/` so I don't need to maintain a
+ system-wide Python environment just for this project.
+3. Keep my `notebooks/` folder to continuously explore the data as I see
+ fit.
+4. Maintain a local copy of the database in `src/` for easy access.
+5. Export reports, images, HTML files, etc. to `public/`.
+
+Now, onto the differences between the code in a Jupyter Notebook and the
+actual Python files. To create the report in the Notebook snippet above, I
+created the following function inside `process.py`:
+
+``` python
+# Create summary pie chart
+def summary_data(accounts: pandas.DataFrame) -> None:
+ accounts_01 = accounts[accounts["Owner"] == "Person01"]
+ accounts_02 = accounts[accounts["Owner"] == "Person02"]
+ for x in range(1, 4):
+ if x == 1:
+ df = accounts
+ account_string = "All Accounts"
+ elif x == 2:
+ df = accounts_01
+ account_string = "Person01's Accounts"
+ elif x == 3:
+ df = accounts_02
+ account_string = "Person02's Accounts"
+ print(f"Generating pie chart summary image for {account_string}...")
+ summary_chart = (
+ df.groupby(["AccountType"])
+ .sum()
+ .plot.pie(
+ title=f"Credit Line by Type for {account_string}",
+ y="CreditLine",
+ autopct="%1.1f%%",
)
- summary_chart.figure.savefig(f"public/summary_chart_{x}.png", dpi=1200)
- ```
-
- The result? A high-quality pie chart that is read directly by the
- `public/index.html` template I use.
-
- ![Summary Pie
- Chart](https://img.cleberg.net/blog/20220303-maintaining-a-personal-financial-database/summary_chart.png)
-
- Other charts generated by this project include:
-
- - Charts of account balances over time.
- - Line chart of effective tax rate (taxes divided by taxable
- income).
- - Salary projections and error limits using past income and
- inflation rates.
- - Multi-line chart of gross income, taxable income, and net
- income.
-
- The best thing about this project? I can improve it at any given
- time, shaping it into whatever helps me the most for that time. I
- imagine that I will be introducing an asset tracking table soon to
- track the depreciating value of cars, houses, etc. Who knows what\'s
- next?
+ )
+ summary_chart.figure.savefig(f"public/summary_chart_{x}.png", dpi=1200)
+```
+
+The result? A high-quality pie chart that is read directly by the
+`public/index.html` template I use.
+
+![Summary Pie
+Chart](https://img.cleberg.net/blog/20220303-maintaining-a-personal-financial-database/summary_chart.png)
+
+Other charts generated by this project include:
+
+- Charts of account balances over time.
+- Line chart of effective tax rate (taxes divided by taxable income).
+- Salary projections and error limits using past income and inflation
+ rates.
+- Multi-line chart of gross income, taxable income, and net income.
+
+The best thing about this project? I can improve it at any given time,
+shaping it into whatever helps me the most for that time. I imagine that I
+will be introducing an asset tracking table soon to track the depreciating
+value of cars, houses, etc. Who knows what's next?