diff options
Diffstat (limited to 'blog/2022-03-03-financial-database.org')
-rw-r--r-- | blog/2022-03-03-financial-database.org | 327 |
1 files changed, 173 insertions, 154 deletions
diff --git a/blog/2022-03-03-financial-database.org b/blog/2022-03-03-financial-database.org index e0f524d..c097c64 100644 --- a/blog/2022-03-03-financial-database.org +++ b/blog/2022-03-03-financial-database.org @@ -1,153 +1,166 @@ -+++ -date = 2022-03-03 -title = "Maintaining a Personal Financial Database" -description = "Detailing my process for creating, maintaining, and utilizing a personal financial database." -draft = false -+++ - -## Personal Financial Tracking - -For the last 6-ish years, I've tracked my finances in a spreadsheet. This is -common practice in the business world, but any good dev will cringe at the -thought of storing long-term data in a spreadsheet. A spreadsheet is not for -long-term storage or as a source of data to pull data/reports. - -As I wanted to expand the functionality of my financial data (e.g., adding more -reports), I decided to migrate the data into a database. To run reports, I would -query the database and use a language like Python or Javascript to process the -data, perform calculations, and visualize the data. - -## SQLite - -When choosing the type of database I wanted to use for this project, I was split -between three options: - -1. MySQL: The database I have the most experience with and have used for years. +#+title: Maintaining a Personal Financial Database +#+date: 2022-03-03 + +** Personal Financial Tracking +:PROPERTIES: +:CUSTOM_ID: personal-financial-tracking +:END: +For the last 6-ish years, I've tracked my finances in a spreadsheet. +This is common practice in the business world, but any good dev will +cringe at the thought of storing long-term data in a spreadsheet. A +spreadsheet is not for long-term storage or as a source of data to pull +data/reports. + +As I wanted to expand the functionality of my financial data (e.g., +adding more reports), I decided to migrate the data into a database. To +run reports, I would query the database and use a language like Python +or Javascript to process the data, perform calculations, and visualize +the data. + +** SQLite +:PROPERTIES: +:CUSTOM_ID: sqlite +:END: +When choosing the type of database I wanted to use for this project, I +was split between three options: + +1. MySQL: The database I have the most experience with and have used for + years. 2. PostgreSQL: A database I'm new to, but want to learn. -3. SQLite: A database that I've used for a couple projects and have moderate - experience. - -I ended up choosing SQLite since it can be maintained within a single `.sqlite` -file, which allows me more flexibility for storage and backup. I keep this file -in my cloud storage and pull it up whenever needed. - -### GUI Editing - -Since I didn't want to try and import 1000–1500 records into my new database -via the command line, I opted to use -[DB Browser for SQLite (DB4S)](https://sqlitebrowser.org/) as a GUI tool. This -application is excellent, and I don't see myself going back to the CLI -when working in this database. +3. SQLite: A database that I've used for a couple projects and have + moderate experience. + +I ended up choosing SQLite since it can be maintained within a single +=.sqlite= file, which allows me more flexibility for storage and backup. +I keep this file in my cloud storage and pull it up whenever needed. + +*** GUI Editing +:PROPERTIES: +:CUSTOM_ID: gui-editing +:END: +Since I didn't want to try and import 1000--1500 records into my new +database via the command line, I opted to use +[[https://sqlitebrowser.org/][DB Browser for SQLite (DB4S)]] as a GUI +tool. This application is excellent, and I don't see myself going back +to the CLI when working in this database. DB4S allows you to copy a range of cells from a spreadsheet and paste it -straight into the SQL table. I used this process for all 36 accounts, 1290 -account statements, and 126 pay statements. Overall, I'm guessing this took -anywhere between 4–8 hours. In comparison, it probably took me 2-3 days to -initially create the spreadsheet. +straight into the SQL table. I used this process for all 36 accounts, +1290 account statements, and 126 pay statements. Overall, I'm guessing +this took anywhere between 4--8 hours. In comparison, it probably took +me 2-3 days to initially create the spreadsheet. - +#+caption: DB4S +[[https://img.cleberg.net/blog/20220303-maintaining-a-personal-financial-database/db4s.png]] -### Schema - -The schema for this database is actually extremely simple and involves only -three tables (for now): +*** Schema +:PROPERTIES: +:CUSTOM_ID: schema +:END: +The schema for this database is actually extremely simple and involves +only three tables (for now): 1. Accounts 2. Statements 3. Payroll -**Accounts** +*Accounts* -The Accounts table contains summary information about an account, such as a car -loan or a credit card. By viewing this table, you can find high-level data, such -as interest rate, credit line, or owner. +The Accounts table contains summary information about an account, such +as a car loan or a credit card. By viewing this table, you can find +high-level data, such as interest rate, credit line, or owner. -```sql +#+begin_src sql CREATE TABLE "Accounts" ( - "AccountID" INTEGER NOT NULL UNIQUE, - "AccountType" TEXT, - "AccountName" TEXT, - "InterestRate" NUMERIC, - "CreditLine" NUMERIC, - "State" TEXT, - "Owner" TEXT, - "Co-Owner" TEXT, - PRIMARY KEY("AccountID" AUTOINCREMENT) + "AccountID" INTEGER NOT NULL UNIQUE, + "AccountType" TEXT, + "AccountName" TEXT, + "InterestRate" NUMERIC, + "CreditLine" NUMERIC, + "State" TEXT, + "Owner" TEXT, + "Co-Owner" TEXT, + PRIMARY KEY("AccountID" AUTOINCREMENT) ) -``` +#+end_src -**Statements** +*Statements* -The Statements table uses the same unique identifier as the Accounts table, -meaning you can join the tables to find a monthly statement for any of the -accounts listed in the Accounts table. Each statement has an account ID, -statement date, and total balance. +The Statements table uses the same unique identifier as the Accounts +table, meaning you can join the tables to find a monthly statement for +any of the accounts listed in the Accounts table. Each statement has an +account ID, statement date, and total balance. -```sql +#+begin_src sql CREATE TABLE "Statements" ( - "StatementID" INTEGER NOT NULL UNIQUE, - "AccountID" INTEGER, - "StatementDate" INTEGER, - "Balance" NUMERIC, - PRIMARY KEY("StatementID" AUTOINCREMENT), - FOREIGN KEY("AccountID") REFERENCES "Accounts"("AccountID") + "StatementID" INTEGER NOT NULL UNIQUE, + "AccountID" INTEGER, + "StatementDate" INTEGER, + "Balance" NUMERIC, + PRIMARY KEY("StatementID" AUTOINCREMENT), + FOREIGN KEY("AccountID") REFERENCES "Accounts"("AccountID") ) -``` +#+end_src -**Payroll** +*Payroll* -The Payroll table is a separate entity, unrelated to the Accounts or Statements -tables. This table contains all information you would find on a pay statement -from an employer. As you change employers or obtain new perks/benefits, just add -new columns to adapt to the new data. +The Payroll table is a separate entity, unrelated to the Accounts or +Statements tables. This table contains all information you would find on +a pay statement from an employer. As you change employers or obtain new +perks/benefits, just add new columns to adapt to the new data. -```sql +#+begin_src sql CREATE TABLE "Payroll" ( - "PaycheckID" INTEGER NOT NULL UNIQUE, - "PayDate" TEXT, - "Payee" TEXT, - "Employer" TEXT, - "JobTitle" TEXT, - "IncomeRegular" NUMERIC, - "IncomePTO" NUMERIC, - "IncomeHoliday" NUMERIC, - "IncomeBonus" NUMERIC, - "IncomePTOPayout" NUMERIC, - "IncomeReimbursements" NUMERIC, - "FringeHSA" NUMERIC, - "FringeStudentLoan" NUMERIC, - "Fringe401k" NUMERIC, - "PreTaxMedical" NUMERIC, - "PreTaxDental" NUMERIC, - "PreTaxVision" NUMERIC, - "PreTaxLifeInsurance" NUMERIC, - "PreTax401k" NUMERIC, - "PreTaxParking" NUMERIC, - "PreTaxStudentLoan" NUMERIC, - "PreTaxOther" NUMERIC, - "TaxFederal" NUMERIC, - "TaxSocial" NUMERIC, - "TaxMedicare" NUMERIC, - "TaxState" NUMERIC, - PRIMARY KEY("PaycheckID" AUTOINCREMENT) + "PaycheckID" INTEGER NOT NULL UNIQUE, + "PayDate" TEXT, + "Payee" TEXT, + "Employer" TEXT, + "JobTitle" TEXT, + "IncomeRegular" NUMERIC, + "IncomePTO" NUMERIC, + "IncomeHoliday" NUMERIC, + "IncomeBonus" NUMERIC, + "IncomePTOPayout" NUMERIC, + "IncomeReimbursements" NUMERIC, + "FringeHSA" NUMERIC, + "FringeStudentLoan" NUMERIC, + "Fringe401k" NUMERIC, + "PreTaxMedical" NUMERIC, + "PreTaxDental" NUMERIC, + "PreTaxVision" NUMERIC, + "PreTaxLifeInsurance" NUMERIC, + "PreTax401k" NUMERIC, + "PreTaxParking" NUMERIC, + "PreTaxStudentLoan" NUMERIC, + "PreTaxOther" NUMERIC, + "TaxFederal" NUMERIC, + "TaxSocial" NUMERIC, + "TaxMedicare" NUMERIC, + "TaxState" NUMERIC, + PRIMARY KEY("PaycheckID" AUTOINCREMENT) ) -``` - -### Python Reporting - -Once I created the database tables and imported all my data, the only step left -was to create a process to report and visualize on various aspects of the data. - -In order to explore and create the reports I'm interested in, I utilized a -two-part process involving Jupyter Notebooks and Python scripts. - -#### Step 1: Jupyter Notebooks - +#+end_src + +*** Python Reporting +:PROPERTIES: +:CUSTOM_ID: python-reporting +:END: +Once I created the database tables and imported all my data, the only +step left was to create a process to report and visualize on various +aspects of the data. + +In order to explore and create the reports I'm interested in, I utilized +a two-part process involving Jupyter Notebooks and Python scripts. + +**** Step 1: Jupyter Notebooks +:PROPERTIES: +:CUSTOM_ID: step-1-jupyter-notebooks +:END: When I need to explore data, try different things, and re-run my code -cell-by-cell, I use Jupyter Notebooks. For example, I explored the `Accounts` -table until I found the following useful information: +cell-by-cell, I use Jupyter Notebooks. For example, I explored the +=Accounts= table until I found the following useful information: -```python +#+begin_src python import sqlite3 import pandas as pd import matplotlib @@ -167,14 +180,17 @@ matplotlib.rcParams['legend.labelcolor'] = 'black' # Display graph df.groupby(['AccountType']).sum().plot.pie(title='Credit Line by Account Type', y='CreditLine', figsize=(5,5), autopct='%1.1f%%') -``` - -#### Step 2: Python Scripts +#+end_src -Once I explored enough through the notebooks and had a list of reports I wanted, -I moved on to create a Python project with the following structure: +**** Step 2: Python Scripts +:PROPERTIES: +:CUSTOM_ID: step-2-python-scripts +:END: +Once I explored enough through the notebooks and had a list of reports I +wanted, I moved on to create a Python project with the following +structure: -``` +#+begin_example finance/ ├── notebooks/ │ │ ├── account_summary.ipynb @@ -191,23 +207,24 @@ finance/ ├── process.py ├── requirements.txt └── README.md -``` +#+end_example This structure allows me to: -1. Compile all required python packages into `requirements.txt` for easy +1. Compile all required python packages into =requirements.txt= for easy installation if I move to a new machine. -2. Activate a virtual environment in `venv/` so I don't need to maintain a - system-wide Python environment just for this project. -3. Keep my `notebooks/` folder to continuously explore the data as I see fit. -4. Maintain a local copy of the database in `src/` for easy access. -5. Export reports, images, HTML files, etc. to `public/`. - -Now, onto the differences between the code in a Jupyter Notebook and the actual -Python files. To create the report in the Notebook snippet above, I created the -following function inside `process.py`: - -```python +2. Activate a virtual environment in =venv/= so I don't need to maintain + a system-wide Python environment just for this project. +3. Keep my =notebooks/= folder to continuously explore the data as I see + fit. +4. Maintain a local copy of the database in =src/= for easy access. +5. Export reports, images, HTML files, etc. to =public/=. + +Now, onto the differences between the code in a Jupyter Notebook and the +actual Python files. To create the report in the Notebook snippet above, +I created the following function inside =process.py=: + +#+begin_src python # Create summary pie chart def summary_data(accounts: pandas.DataFrame) -> None: accounts_01 = accounts[accounts["Owner"] == "Person01"] @@ -233,21 +250,23 @@ def summary_data(accounts: pandas.DataFrame) -> None: ) ) summary_chart.figure.savefig(f"public/summary_chart_{x}.png", dpi=1200) -``` +#+end_src The result? A high-quality pie chart that is read directly by the -`public/index.html` template I use. +=public/index.html= template I use. - +#+caption: Summary Pie Chart +[[https://img.cleberg.net/blog/20220303-maintaining-a-personal-financial-database/summary_chart.png]] Other charts generated by this project include: -- Charts of account balances over time. -- Line chart of effective tax rate (taxes divided by taxable income). -- Salary projections and error limits using past income and inflation rates. -- Multi-line chart of gross income, taxable income, and net income. +- Charts of account balances over time. +- Line chart of effective tax rate (taxes divided by taxable income). +- Salary projections and error limits using past income and inflation + rates. +- Multi-line chart of gross income, taxable income, and net income. -The best thing about this project? I can improve it at any given time, shaping -it into whatever helps me the most for that time. I imagine that I will be -introducing an asset tracking table soon to track the depreciating value of -cars, houses, etc. Who knows what's next? +The best thing about this project? I can improve it at any given time, +shaping it into whatever helps me the most for that time. I imagine that +I will be introducing an asset tracking table soon to track the +depreciating value of cars, houses, etc. Who knows what's next? |