Plot our sample data on a map with GeoJSON.
Looking at map.py
from new-coder/dataviz/tutorial_source/map.py
, you’ll see in lines after the preamble that we’re importing geojson
which is a third-party package used to build GeoJSON, a derivative of JSON files, as well as our own module, parse
as p
.
Other ways you could have done the import statements:
1 2 3 4 5 6 | from geojson import dumps
import geojson as g
import parse
from parse import parse, MY_FILE
import parse as iLoveParsingSoMuch
|
Of course, we’re lazy programmers, so we’re not going to import parse as iLoveParsingSoMuch
because each time we want to refer to our parse()
function in the parse
module, we’d have to type out iLoveParsingSoMuch.parse(iLoveParsingSoMuch.MY_FILE, ",")
— you can probably see why I elected p
.
We also don’t have to import the whole geojson
library. Ideally, we want to run lean code, so only import the specific module that you need, or even objects (classes, functions, variables, etc) defined from within that module.
map.py
outside of new-coder/dataviz/tutorial_source
without adjusting the import parse
, you may see an ImportError
. When making a package yourself for distribution, there are ways to void this issue, and you can read more in the Python docs.Now on to the good stuff. The function create_map(data_file)
parses through our data file to create a GeoJSON file.
Again with our initial comment setup:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | def create_map(data_file):
# Define type of GeoJSON we're creating
# Define empty list to collect each point to graph
# Iterate over our data to create GeoJSON document.
# We're using enumerate() so we get the line, as well
# the index, which is the line number.
# Skip any zero coordinates as this will throw off
# our map.
# Setup a new dictionary for each iteration.
# Assign line items to appropriate GeoJSON fields.
# Add data dictionary to our item_list
# For each point in our item_list, we add the point to our
# dictionary. setdefault creates a key called 'features' that
# has a value type of an empty list. With each iteration, we
# are appending our point to that list.
# Now that all data is parsed in GeoJSON write to a file so we
# can upload it to gist.github.com
|
The first that we need to do is just define the GeoJSON map type. We’re defining the type of GeoJSON as a “FeatureCollection”, since it is a collection of features (features can be points, multi-points, linestring, etc. More information here):
1 2 | # Define type of GeoJSON we're creating
geo_map = {"type": "FeatureCollection"}
|
Next, we just want to define an empty list to collect our coordinates/points when iterating over our CSV file:
1 2 | # Define empty list to collect each point to graph
item_list = []
|
Next, we iterate through the parsed data (data_file
) that we fed the create_map(data_file)
and make sure we build a temporary dictionary of data, data
so we can add to our empty list, item_list
, defined above:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | # Iterate over our data to create GeoJSOn document.
# We're using enumerate() so we get the line, as well
# the index, which is the line number.
for index, line in enumerate(data_file):
# Skip any zero coordinates as this will throw off
# our map.
if line['X'] == "0" or line['Y'] == "0":
continue
# Setup a new dictionary for each iteration.
data = {}
# Assigne line items to appropriate GeoJSON fields.
data['type'] = 'Feature'
data['id'] = index
data['properties'] = {'title': line['Category'],
'description': line['Descript'],
'date': line['Date']}
data['geometry'] = {'type': 'Point',
'coordinates': (line['X'], line['Y'])}
# Add data dictionary to our item_list
item_list.append(data)
|
So for each line in our data_file
, we take certain values of that line, X
, Y
, Category
, etc, and assign it to a key that GeoJSON requires (e.g. 'type'
, 'id'
, 'properties'
, etc). If, for whatever instance, longitude (line['X']
) or latitude (line['Y']
) is 0
, we’ll skip over it. The assumption is if the longitude or latitude is 0
, then we can’t plot it (or it will be plotted as 0,0
and screw with our map). This is a simple form of skipping over errors in the data.
When we are done with one line, we add it to item_list
, then continue with the next line item.
Notice that we are using enumerate(data_file)
. The enumerate
built-in function allows us to go over each item in the datafile, line
, and keep count of the line number with index
. So with each iteration of our for-loop, the index will increase by 1 as we go onto the next line.
Next, we actually build onto our geo_map
dictionary by adding our points from item_list
:
1 2 3 4 5 6 | # For each point in our item_list, we add the point to our
# dictionary. setdefault creates a key called 'features' that
# has a value type of an empty list. With each iteration, we
# are appending our point to that list.
for point in item_list:
geo_map.setdefault('features', []).append(point)
|
As it says in the comments, for each point in item_list
, we append the point to our geo_map
dictionary. Here, we’re using the setdefault
method on our dictionary. This sets a key to features
and its value to an empty list. And so with each iteration over item_list
, we append the point to the list. You can read more information about setdefault here.
So we’ve built up our geo_map
dictionary to contain every point in our datafile. Now let’s save it as a geojson file:
1 2 3 4 | # Now that all data is parsed in GeoJSON write to a file so we
# can upload it to gist.github.com
with open('file_sf.geojson', 'w') as f:
f.write(geojson.dumps(geo_map))
|
This is a new loop construct: with
— it allows us to not have to worry about closing a file; it will be done automatically for us. More about the with
built-in can be read here.
So with open('file_sf.geojson', 'w') as f
assigns the opened file as f; it also will either open the file file_sf.geojson
or create it (note: it will be in your current directory unless you specify otherwise, like /Users/lynnroot/NotMyDevFolder/file_sf.geojson
with absolute file paths), and give it write
capabilities (versus read-only).
Then we use the dumps
function from the geojson
library that we imported. This basically prints the dictionary, geo_map
into a GeoJSON-recognizable file.
Let’s see the create_map()
function all together:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | def create_map(data_file):
"""Creates a GeoJSON file.
Returns a GeoJSON file that can be rendered in a GitHub
Gist at gist.github.com. Just copy the output file and
paste into a new Gist, then create either a public or
private gist. GitHub will automatically render the GeoJSON
file as a map.
"""
# Define type of GeoJSON we're creating
geo_map = {"type": "FeatureCollection"}
# Define empty list to collect each point to graph
item_list = []
# Iterate over our data to create GeoJSOn document.
# We're using enumerate() so we get the line, as well
# the index, which is the line number.
for index, line in enumerate(data_file):
# Skip any zero coordinates as this will throw off
# our map.
if line['X'] == "0" or line['Y'] == "0":
continue
# Setup a new dictionary for each iteration.
data = {}
# Assigne line items to appropriate GeoJSON fields.
data['type'] = 'Feature'
data['id'] = index
data['properties'] = {'title': line['Category'],
'description': line['Descript'],
'date': line['Date']}
data['geometry'] = {'type': 'Point',
'coordinates': (line['X'], line['Y'])}
# Add data dictionary to our item_list
item_list.append(data)
# For each point in our item_list, we add the point to our
# dictionary. setdefault creates a key called 'features' that
# has a value type of an empty list. With each iteration, we
# are appending our point to that list.
for point in item_list:
geo_map.setdefault('features', []).append(point)
# Now that all data is parsed in GeoJSON write to a file so we
# can upload it to gist.github.com
with open('file_sf.geojson', 'w') as f:
f.write(geojson.dumps(geo_map))
|
That’s it! Now we just have some boiler code for that main()
function:
1 2 3 4 5 6 7 | def main():
data = p.parse(p.MY_FILE, ",")
return create_map(data)
if __name__ == "__main__":
main()
|
Here we just first parse our data, then return the GeoJSON document using that parsed data.
Next, save this file as map.py
into the MySourceFiles
directory that we created earlier, and make sure you are in that directory in your terminal by using cd
and pwd
to navigate as we did before. Also — make sure your virtualenv is active. Now, in your terminal, run:
1 2 | (DataVizProj) $ python map.py
(DataVizProj) $ ls
|
You should see file_sf.geojson
file now! You can open it up in your text editor; a snipit should look like this:
{"type": "FeatureCollection", "features": [{"geometry": {"type": "Point", "coordinates": ["-122.424612993055", "37.8014488257836"]}, "type": "Feature", "id": 0, "properties": {"date": "02/18/2003", "description": "FORGERY, CREDIT CARD", "title": "FRAUD"}}, {"geometry": {"type": "Point", "coordinates": ["-122.420120319211", "37.7877570602182"]}, "type": "Feature", "id": 1, "properties": {"date": "04/17/2003", "description": "WARRANT ARREST", "title": "WARRANTS"}}, {"geometry": {"type": "Point", "coordinates": ["-122.42025048261", "37.7800745746105"]},
To see it up on Github, navigate to gist.github.com, then copy the text in the newly-created geojson file, and paste into the Gist, like below:
Make sure to name your gist file with the .geojson
ending:
Then select either “Create Private Gist” or “Create Public Gist”, your choice:
And voila!
If you click on a point, you should see more information about the item. This is pulled from the data['properties']
value that we created: