{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Omaha Incidents" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Exploration\n", "\n", "Let\"s explore the data a little bit to see what kind of analysis and visualizations we want to implement." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set up environment\n", "\n", "Start by installating and importing the necessary packages. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Install packages, if needed\n", "# !pip3 install ipykernel\n", "# !pip3 install --upgrade pandas plotly dash \"nbformat>=4.2.0\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Import packages\n", "import pandas as pd\n", "import numpy as np\n", "import sqlite3\n", "import plotly.express as px\n", "import plotly.graph_objects as go\n", "import plotly.io as pio" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load Data\n", "\n", "To load the data, we need to connect to the SQLite3 database file and query it for the data we want." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Connect to the database\n", "connection = sqlite3.connect(\"../raw_data/ingress.db\")\n", "cursor = connection.cursor()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# If exists, delete extra header rows\n", "# delete_headers = \"DELETE FROM incidents WHERE rb = 'RB Number'\"\n", "# cursor.execute(delete_headers)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Grab all data\n", "select_all = \"SELECT * FROM incidents\"\n", "df = pd.read_sql_query(select_all, connection)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data Cleaning\n", "\n", "We will clean up the data before we use: inserting NaN, converting types, etc." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Replace empty cells in [lat, lon] with NaN\n", "df = df.replace(r'^\\s*$', np.nan, regex=True)\n", "df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Convert date col to datetime format\n", "df[\"date\"] = pd.to_datetime(df[\"date\"])\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting\n", "\n", "Let's test a plot that will show us the top categories of incidents." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# test plotting by sorting & plotting top 5 crime categories\n", "s = dff.value_counts(subset=[\"description\"])\n", "t = s.nlargest(5)\n", "t.head()\n", "t.plot(kind=\"bar\", title=\"Top 5 Incident Categories\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data Filtering\n", "\n", "To reduce the workload in this rest of this notebook, I am filtering just for one description and a range of dates.\n", "\n", "If you are doing a lot of analysis, I recommend modifying the query at the beginning to only the pull the data you need instead of filtering after querying." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create a smaller dataframe based on a selected date and description\n", "start_date = \"2023-01-01\"\n", "end_date = \"2023-12-31\"\n", "description = \"INJURY\"\n", "\n", "dff = df[(df['date'] > start_date) & (df['date'] < end_date)]\n", "dff = dff.reset_index()\n", "dff = dff[dff.description == description]\n", "\n", "dff_grouped = dff.groupby(by=\"date\").count()\n", "dff_grouped = dff_grouped.reset_index()\n", "\n", "print(dff.head())\n", "print(dff_grouped.head())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dff.size" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dff.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Mapping\n", "\n", "Let's create a geo map of the crime data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig = px.scatter_mapbox(\n", " dff,\n", " lat=\"lat\",\n", " lon=\"lon\",\n", " color=\"description\",\n", " hover_name=\"description\",\n", " hover_data=[\"date\", \"time\"],\n", " title=\"Incident Count by Coordinates\",\n", " center={\"lat\": 41.257160, \"lon\": -95.995102},\n", " zoom=10\n", ")\n", "\n", "# fig.update_layout(showlegend=False)\n", "fig.update_layout(mapbox_style=\"open-street-map\")\n", "fig.update_layout(margin={\"r\": 0, \"t\": 0, \"l\": 0, \"b\": 0})\n", "fig.update_layout(mapbox_bounds={\"west\": -180, \"east\": -50, \"south\": 20, \"north\": 90})\n", "fig.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Optionally, save the figure to an HTML file\n", "# pio.write_html(fig, file=\"test.html\", auto_open=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Wrapping Up\n", "\n", "To finish, remember to close your database connections and save any data you need." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# clean up and close out the database\n", "connection.commit()\n", "connection.close()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.7" } }, "nbformat": 4, "nbformat_minor": 4 }