Introduction to the Data Visualization Tutorial

Data visualization is quite fun. Perhaps when you think of data visualization, you think of ugly Microsoft Excel spreadsheets with half-a$$ed graphs.

This tutorial is meant to push you out of the Excel mindset just a little bit, and introduce you to the popular Python library, matplotlib.

The Project

The project we will create takes the sample data from the repository that you will download (a.k.a. clone) in Part 0: Setup, parse the sample data from columns and rows to a list of dictionaries, then render that data in two different graphs and in GitHub as a map.

The sample data that is included is a snapshot of public crime filings from the San Francisco police. Once you’ve gone through this tutorial, feel free to find other data that interests you, and rework our visualization functions.

Goals

Understand how to:

  • run a Python file from the command line
  • import a Python file
  • take a raw file and parse its data with Python’s data structures
  • make a simple graph
  • produce a GeoJSON file for mapping

What else you will be exposed to:

  • Importing Python’s standard library as well as your own module
  • Installing and importing third party packages
  • Licensing & copyrights when using third-party packages
  • File Input/Output
  • Counter data structure from the collections module
  • Global variables, docstrings, list comprehensions
  • Python’s interactive shell in the terminal
  • Iterators versus Generators

Intro to NumPy and matplotlib

NumPy (pronounced num-pie) is a popular scientific library for Python that gives a developer, academic, or scientist tools to work with high-level mathematical functions as well as multi-dimensional arrays and matrices.

We won’t be using much of NumPy, but it is required that we install this library before we can install and use matplotlib.

matplotlib is another popular scientific library that gives the developer tools to produce 2D figures. No longer do you need your TI-89 calculator where you must punch in long lines of formulas, waiting precious seconds for it to render a graph that may be too zoomed in to realize you are missing an important axis point. Packed with detailed examples, you are able to make publication/presentation-quality graphs from the comfort of your keyboard.

Intro to GeoJSON

GeoJSON is a derivative of JSON, and very similar to TopoJSON. It’s a data format for simple geological feature, including coordinate points.

We’ll be using a third-party module to help us in creating GeoJSON files: geojson.

GitHub has an awesome feature that allows folks to paste GeoJSON files into Gists, and it automatically renders as a map.