{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Omaha Incidents"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Exploration\n",
    "\n",
    "Let\"s explore the data a little bit to see what kind of analysis and visualizations we want to implement."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set up environment\n",
    "\n",
    "Start by installating and importing the necessary packages. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Install packages, if needed\n",
    "# !pip3 install ipykernel\n",
    "# !pip3 install --upgrade pandas plotly dash \"nbformat>=4.2.0\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import packages\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import sqlite3\n",
    "import plotly.express as px\n",
    "import plotly.graph_objects as go\n",
    "import plotly.io as pio"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load Data\n",
    "\n",
    "To load the data, we need to connect to the SQLite3 database file and query it for the data we want."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Connect to the database\n",
    "connection = sqlite3.connect(\"../raw_data/ingress.db\")\n",
    "cursor = connection.cursor()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# If exists, delete extra header rows\n",
    "# delete_headers = \"DELETE FROM incidents WHERE rb = 'RB Number'\"\n",
    "# cursor.execute(delete_headers)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Grab all data\n",
    "select_all = \"SELECT * FROM incidents\"\n",
    "df = pd.read_sql_query(select_all, connection)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Data Cleaning\n",
    "\n",
    "We will clean up the data before we use: inserting NaN, converting types, etc."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Replace empty cells in [lat, lon] with NaN\n",
    "df = df.replace(r'^\\s*$', np.nan, regex=True)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Convert date col to datetime format\n",
    "df[\"date\"] = pd.to_datetime(df[\"date\"])\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Plotting\n",
    "\n",
    "Let's test a plot that will show us the top categories of incidents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# test plotting by sorting & plotting top 5 crime categories\n",
    "s = dff.value_counts(subset=[\"description\"])\n",
    "t = s.nlargest(5)\n",
    "t.head()\n",
    "t.plot(kind=\"bar\", title=\"Top 5 Incident Categories\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Data Filtering\n",
    "\n",
    "To reduce the workload in this rest of this notebook, I am filtering just for one description and a range of dates.\n",
    "\n",
    "If you are doing a lot of analysis, I recommend modifying the query at the beginning to only the pull the data you need instead of filtering after querying."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a smaller dataframe based on a selected date and description\n",
    "start_date = \"2023-01-01\"\n",
    "end_date = \"2023-12-31\"\n",
    "description = \"INJURY\"\n",
    "\n",
    "dff = df[(df['date'] > start_date) & (df['date'] < end_date)]\n",
    "dff = dff.reset_index()\n",
    "dff = dff[dff.description == description]\n",
    "\n",
    "dff_grouped = dff.groupby(by=\"date\").count()\n",
    "dff_grouped = dff_grouped.reset_index()\n",
    "\n",
    "print(dff.head())\n",
    "print(dff_grouped.head())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dff.size"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dff.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Mapping\n",
    "\n",
    "Let's create a geo map of the crime data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = px.scatter_mapbox(\n",
    "    dff,\n",
    "    lat=\"lat\",\n",
    "    lon=\"lon\",\n",
    "    color=\"description\",\n",
    "    hover_name=\"description\",\n",
    "    hover_data=[\"date\", \"time\"],\n",
    "    title=\"Incident Count by Coordinates\",\n",
    "    center={\"lat\": 41.257160, \"lon\": -95.995102},\n",
    "    zoom=10\n",
    ")\n",
    "\n",
    "# fig.update_layout(showlegend=False)\n",
    "fig.update_layout(mapbox_style=\"open-street-map\")\n",
    "fig.update_layout(margin={\"r\": 0, \"t\": 0, \"l\": 0, \"b\": 0})\n",
    "fig.update_layout(mapbox_bounds={\"west\": -180, \"east\": -50, \"south\": 20, \"north\": 90})\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Optionally, save the figure to an HTML file\n",
    "# pio.write_html(fig, file=\"test.html\", auto_open=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Wrapping Up\n",
    "\n",
    "To finish, remember to close your database connections and save any data you need."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# clean up and close out the database\n",
    "connection.commit()\n",
    "connection.close()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}