Part 3: Map

Plot our sample data on a map with GeoJSON.

Module Setup

Looking at map.py from new-coder/dataviz/tutorial_source/map.py, you’ll see in lines after the preamble that we’re importing geojson which is a third-party package used to build GeoJSON, a derivative of JSON files, as well as our own module, parse as p.

Other ways you could have done the import statements:

1
2
3
4
5
6
from geojson import dumps
import geojson as g

import parse
from parse import parse, MY_FILE
import parse as iLoveParsingSoMuch

Of course, we’re lazy programmers, so we’re not going to import parse as iLoveParsingSoMuch because each time we want to refer to our parse() function in the parse module, we’d have to type out iLoveParsingSoMuch.parse(iLoveParsingSoMuch.MY_FILE, ",") — you can probably see why I elected p.

We also don’t have to import the whole geojson library. Ideally, we want to run lean code, so only import the specific module that you need, or even objects (classes, functions, variables, etc) defined from within that module.

For the Curious

  • A package is a collection of modules (or packages). A module is one python file, so a package is a collection of python files within the same directory.
  • A distributed collection of packages is can be referred to as a library.
  • Python has a standard library already built-in (built-in meaning that you don’t have to download extra packages, it’s default within the language and just have to import what you need), but that standard library contains many packages and modules.
  • A bit of a warning: if you try to run map.py outside of new-coder/dataviz/tutorial_source without adjusting the import parse, you may see an ImportError. When making a package yourself for distribution, there are ways to void this issue, and you can read more in the Python docs.

Create a Map!

Now on to the good stuff. The function create_map(data_file) parses through our data file to create a GeoJSON file.

Again with our initial comment setup:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def create_map(data_file):
    # Define type of GeoJSON we're creating

    # Define empty list to collect each point to graph

    # Iterate over our data to create GeoJSON document.
    # We're using enumerate() so we get the line, as well
    # the index, which is the line number.

        # Skip any zero coordinates as this will throw off
        # our map.

        # Setup a new dictionary for each iteration.

        # Assign line items to appropriate GeoJSON fields.

        # Add data dictionary to our item_list

    # For each point in our item_list, we add the point to our
    # dictionary.  setdefault creates a key called 'features' that
    # has a value type of an empty list.  With each iteration, we
    # are appending our point to that list.

    # Now that all data is parsed in GeoJSON write to a file so we
    # can upload it to gist.github.com

The first that we need to do is just define the GeoJSON map type. We’re defining the type of GeoJSON as a “FeatureCollection”, since it is a collection of features (features can be points, multi-points, linestring, etc. More information here):

1
2
    # Define type of GeoJSON we're creating
    geo_map = {"type": "FeatureCollection"}

Next, we just want to define an empty list to collect our coordinates/points when iterating over our CSV file:

1
2
    # Define empty list to collect each point to graph
    item_list = []

Next, we iterate through the parsed data (data_file) that we fed the create_map(data_file) and make sure we build a temporary dictionary of data, data so we can add to our empty list, item_list, defined above:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
    # Iterate over our data to create GeoJSOn document.
    # We're using enumerate() so we get the line, as well
    # the index, which is the line number.
    for index, line in enumerate(data_file):

        # Skip any zero coordinates as this will throw off
        # our map.
        if line['X'] == "0" or line['Y'] == "0":
            continue

        # Setup a new dictionary for each iteration.
        data = {}

        # Assigne line items to appropriate GeoJSON fields.
        data['type'] = 'Feature'
        data['id'] = index
        data['properties'] = {'title': line['Category'],
                              'description': line['Descript'],
                              'date': line['Date']}
        data['geometry'] = {'type': 'Point',
                            'coordinates': (line['X'], line['Y'])}

        # Add data dictionary to our item_list
        item_list.append(data)

So for each line in our data_file, we take certain values of that line, X, Y, Category, etc, and assign it to a key that GeoJSON requires (e.g. 'type', 'id', 'properties', etc). If, for whatever instance, longitude (line['X']) or latitude (line['Y']) is 0, we’ll skip over it. The assumption is if the longitude or latitude is 0, then we can’t plot it (or it will be plotted as 0,0 and screw with our map). This is a simple form of skipping over errors in the data.

When we are done with one line, we add it to item_list, then continue with the next line item.

Notice that we are using enumerate(data_file). The enumerate built-in function allows us to go over each item in the datafile, line, and keep count of the line number with index. So with each iteration of our for-loop, the index will increase by 1 as we go onto the next line.

Next, we actually build onto our geo_map dictionary by adding our points from item_list:

1
2
3
4
5
6
    # For each point in our item_list, we add the point to our
    # dictionary.  setdefault creates a key called 'features' that
    # has a value type of an empty list.  With each iteration, we
    # are appending our point to that list.
    for point in item_list:
        geo_map.setdefault('features', []).append(point)

As it says in the comments, for each point in item_list, we append the point to our geo_map dictionary. Here, we’re using the setdefault method on our dictionary. This sets a key to features and its value to an empty list. And so with each iteration over item_list, we append the point to the list. You can read more information about setdefault here.

So we’ve built up our geo_map dictionary to contain every point in our datafile. Now let’s save it as a geojson file:

1
2
3
4
    # Now that all data is parsed in GeoJSON write to a file so we
    # can upload it to gist.github.com
    with open('file_sf.geojson', 'w') as f:
        f.write(geojson.dumps(geo_map))

This is a new loop construct: with — it allows us to not have to worry about closing a file; it will be done automatically for us. More about the with built-in can be read here.

So with open('file_sf.geojson', 'w') as f assigns the opened file as f; it also will either open the file file_sf.geojson or create it (note: it will be in your current directory unless you specify otherwise, like /Users/lynnroot/NotMyDevFolder/file_sf.geojson with absolute file paths), and give it write capabilities (versus read-only).

Then we use the dumps function from the geojson library that we imported. This basically prints the dictionary, geo_map into a GeoJSON-recognizable file.

Let’s see the create_map() function all together:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
def create_map(data_file):
    """Creates a GeoJSON file.

    Returns a GeoJSON file that can be rendered in a GitHub
    Gist at gist.github.com.  Just copy the output file and
    paste into a new Gist, then create either a public or
    private gist.  GitHub will automatically render the GeoJSON
    file as a map.
    """

    # Define type of GeoJSON we're creating
    geo_map = {"type": "FeatureCollection"}

    # Define empty list to collect each point to graph
    item_list = []

    # Iterate over our data to create GeoJSOn document.
    # We're using enumerate() so we get the line, as well
    # the index, which is the line number.
    for index, line in enumerate(data_file):

        # Skip any zero coordinates as this will throw off
        # our map.
        if line['X'] == "0" or line['Y'] == "0":
            continue

        # Setup a new dictionary for each iteration.
        data = {}

        # Assigne line items to appropriate GeoJSON fields.
        data['type'] = 'Feature'
        data['id'] = index
        data['properties'] = {'title': line['Category'],
                              'description': line['Descript'],
                              'date': line['Date']}
        data['geometry'] = {'type': 'Point',
                            'coordinates': (line['X'], line['Y'])}

        # Add data dictionary to our item_list
        item_list.append(data)

    # For each point in our item_list, we add the point to our
    # dictionary.  setdefault creates a key called 'features' that
    # has a value type of an empty list.  With each iteration, we
    # are appending our point to that list.
    for point in item_list:
        geo_map.setdefault('features', []).append(point)

    # Now that all data is parsed in GeoJSON write to a file so we
    # can upload it to gist.github.com
    with open('file_sf.geojson', 'w') as f:
        f.write(geojson.dumps(geo_map))

That’s it! Now we just have some boiler code for that main() function:

1
2
3
4
5
6
7
def main():
    data = p.parse(p.MY_FILE, ",")

    return create_map(data)

if __name__ == "__main__":
    main()

Here we just first parse our data, then return the GeoJSON document using that parsed data.

Next, save this file as map.py into the MySourceFiles directory that we created earlier, and make sure you are in that directory in your terminal by using cd and pwd to navigate as we did before. Also — make sure your virtualenv is active. Now, in your terminal, run:

1
2
(DataVizProj) $ python map.py
(DataVizProj) $ ls

You should see file_sf.geojson file now! You can open it up in your text editor; a snipit should look like this:

{"type": "FeatureCollection", "features": [{"geometry": {"type": "Point", "coordinates": ["-122.424612993055", "37.8014488257836"]}, "type": "Feature", "id": 0, "properties": {"date": "02/18/2003", "description": "FORGERY, CREDIT CARD", "title": "FRAUD"}}, {"geometry": {"type": "Point", "coordinates": ["-122.420120319211", "37.7877570602182"]}, "type": "Feature", "id": 1, "properties": {"date": "04/17/2003", "description": "WARRANT ARREST", "title": "WARRANTS"}}, {"geometry": {"type": "Point", "coordinates": ["-122.42025048261", "37.7800745746105"]},

To see it up on Github, navigate to gist.github.com, then copy the text in the newly-created geojson file, and paste into the Gist, like below:

Make sure to name your gist file with the .geojson ending:

Then select either “Create Private Gist” or “Create Public Gist”, your choice:

And voila!

If you click on a point, you should see more information about the item. This is pulled from the data['properties'] value that we created: