7. Reading and Displaying Files¶
To read a file, we must first open the file. This is done using Python’s built-in open() function. The open() function returns a file object, which has a read() method for reading files and storing their contents as variables . Once a file is read and stored as a variable, the original file can be closed. We can then loop through the file variable and do stuff with its contents.
file = open("/Users/student/Desktop/Equakes2.csv", "r")
text = file.readlines()
file.close()
for line in text:
print (line)
Data
You can also download earthquake data from the USGS website.
Points to Note anout the Above Code
The open method has the following modes:
‘r’ – Python read file. Read mode is used when the file is only being read‘w’ – Python write file. Write mode is used to edit and write new information to the file (any existing files with the same name will be erased when this mode is activated)‘a’ – Python append file. Append mode is used to add new data to the end of the file; that is new information is automatically amended to the end‘r+’ – Special read and write mode is used to handle both actions when working with a file
Always close the file after reading or writing to it. In most cases, upon termination of an application or script, a file will be closed eventually. However, there is no guarantee when exactly that will happen. This can lead to unwanted behavior including resource leaks and unwanted behavior.
7.1. Writing File Paths¶
In Python, it is better not to write your file paths with backslashes as shown below:
file = open("C:\Users\student\Desktop\earthquakes.csv", "r")
Writing paths with backslashes is not advisable because a Python string can contain special characters that also incorporate backslashes for other reasons, e.g., “t” (tabs), “n” (newlines), “r” ( carriage returns), etc.
These special characters make it very hard to create a file path that uses single backslashes because file paths in Windows also use single backslashes. There are several workarounds:
1. Use Forward Slashes
file = open("C:/Users/student/Desktop/earthquakes.csv", "r")
2. Use Raw Strings
These are regular strings, but includes an r before the script begins. The r tells the Python Interpreter that the string does not contain any special characters or escape characters.
file = (r"C:\Users\student\Desktop\earthquakes.csv")
3. Use Double Backslahes
In this case, the backslashes are escaped using a second backslash.
"C:\\Users\\student\\Desktop\\earthquakes.csv"
4. Use os.path.join
The os module provides access to operating system functions regardless of the platform you are using, i.e, Windows, Mac OS, Linux, etc.
os.path.join() takes any number of path strings and returns a single path using the platform-specific path separator.
To create a string specifically for the Windows environnment, we can write the os.path.join() string as shown below.
import os
os.path.join("c:/", "Windows")
>>> 'c:/Windows'
To create a string for a platform regardless of whether it is Windows, Mac OS, Linux, etc., we can write the os.path.join() string as shown below. In this situation, the operating system will take care of the separator.
import os
os.path.join('c:', os.sep, 'mydata')
>>> 'c:\\mydata'
Note: os.sep will supply the separator
It is also possible to pass in the current working directory to os.path.join(), as shown in the script below.
import os
dem_path = os.path.join(os.getcwd(), 'Shasta-30m-DEM.tif')
>>> dem_path
>>> dem_path = '/Users/hsemple/Documents/python_gis_tutorials/docs/Shasta-30m-DEM.tif
7.2. Opening a file using the “With” Statement¶
A second way second way to open a file is to use the “with” statement. The with statement automatically takes care of closing the file once it leaves the with block, even in cases of error. I highly recommend that you use the with statement as much as possible, as it allows for cleaner code and makes handling any unexpected errors easier for you.
with open("/Users/student/Desktop/Equakes2.csv", 'r') as file:
text = file.readlines()
for line in text:
print (line)
7.3. Opening a csv File using Python’s Standard Library¶
Let’s open a csv data file using some basic Python statements. First, we will let Python read the file and create a file object. Next, we will split the file contents into inidividual columns, and then store the column data in variables. This is a long-winded way of displaying csv files, but the objective with this code sample is to illustrate some details that is involved in opening these files. For regular work, we use libraries such as Pandas which it much easier to display the csv files. Pandas will be discussed shortly.
infile = open("/Users/student/Desktop/earthquakes.csv", 'r')
lines = infile.readlines()
infile.close()
del lines[0] # Remove the first line
#Create empty lists
xvar = []
yvar = []
count = 0
for line in lines:
elements = line.split(",") # splits the line
mag = float(elements[2]) # Get the data in the third column
dep = float(elements[3]) # Get the data in the fourth column
# Add magnitude and depth data to the empty lists
xvar.append(dep)
yvar.append(mag)
#Format the two lists as columns separated with a space
magnitude = "Magnitude"
depth = "Depth"
print ("%-15s %s" %(magnitude, depth))
print ("")
for c1, c2 in zip(xvar, yvar):
print ("%-15s %s" % (c1, c2))
Things to Look up in the above code
Performing Calculations and Making a Graph using Python’s Standard Library
import math
import matplotlib.pyplot as plt
infile = open("C:/Users/student/Desktop/earthquakes.csv, 'r')
lines = infile.readlines()
infile.close()
del lines[0] # Remove the header line
#Create empty lists
xvar = []
yvar = []
std_dev = []
count = 0
sum = 0
for line in lines:
elements = line.split(",") #splits the lines
mag = float(elements[4]) # Get the data in the fifth column
count = count + 1
# Add count and magnitude data to the empty x,y lists
xvar.append(count)
yvar.append(mag)
sum = sum + mag
#Calculate mean
average = sum / count
rint ("Average is", round(average,5))
print ("")
width = 1
fig, ax = plt.subplots(figsize=(6, 4))
ax.bar(xvar, yvar, width, color='orangered')
ax.set_yscale('log') # Set y-axis to logarithmic scale
ax.set_xlabel("No. of Earthquakes", fontweight='bold', fontsize=12, color='black')
ax.set_ylabel("Magnitude", fontweight='bold', color='black', fontsize=12)
ax.set_title("Magnitude of Earthquakes")
plt.tight_layout() # Adjust figure spacing
plt.show()
7.4. The Matplotlib Figure Object¶
When working with Python, it is important to know how to manipulate matplotlib’s figure object. Almost all of your graphical output will require the use of this figure object either directly or indirectly. The links below provide some information on ths topic.
In terms of a general overview, in matplotlib’s languge, the figure object is the top level object. It serves as the container for all other plot-related objects.
The figure object can contain one or more axes or plotting objects. Axes are actual X,Y plots or diagrams that appear in the document. Axes contain the X-axis and the Y-axis, which are the XY lines in the plots. Axes also have plot objects, titles, legends, etc.
The easiest way to create a new figure is to use the figure method of the matplotlib pyplot interface:
>>> f = plt.figure()
If you want to control the figure size, you can use the figsize argument of the figure method. Figure size is given in width first then height (in inches)
>>> f = plt.figure.figsize=(12,10)
The above statements will create an empty figure object with no axes.
Often, we encounter code samples, such as the one below, that do not specify any figure object. If no figure object is specified in the code, then pyplot uses a default figure object to draw the figure.
import matplotlib.pyplot as plt
x = [0.3, 3.8, 1.2, 2.5]
y = [11, 25, 9, 26]
plt.scatter (x,y, color='darkgreen', marker='^')
plt.xlim(0.5, 4.5)
plt.show()
The main benefit of explicity specifying a figure is that you can control its size, background, color, etc. In the code sample below, the size of the figure object is specified.
import matplotlib.pyplot as plt
x = [0.3, 3.8, 1.2, 2.5]
y = [11, 25, 9, 26]
fig = plt.figure(figsize=(4,6))
plt.scatter (x,y, color='darkgreen', marker='^')
plt.xlim(0.5, 4.5)
plt.show()
Subplots
Within a figure, there can be one or more “subplots”. Subplots are objects within a figure that divide up the figure. The subplots are positioned in a grid so that 221, means 2 rows, 2 columns, and the first cell.
The code below includes a figure and a subplots. Note that when a subplot is created, it is referred to as an axes. That’s because it is also the plotting area for the figure. Also, unused axes are not shown.
import matplotlib.pyplot as plt
x = [0.3, 3.8, 1.2, 2.5]
y = [11, 25, 9, 26]
fig = plt.figure(figsize=(4,6))
ax = fig.add_subplot(221)
plt.scatter (x,y, color='darkgreen', marker='^')
plt.xlim(0.5, 4.5)
plt.show()
Axes
An axes is the area on which the data is plotted using functions such as plot() and scatter(). Axes can have ticks, labels, etc, associated with it.
- As shown in the previous section, axes can be created from a figure object
ax = fig.add_subplot(111)
- Axes can also be created directly from pyplot:
ax = plt.subplot(221)
Once the axes is created, it can be used to plot various types of charts or to plot images and maps. For example, to plot various types of charts, we write:
When plotting graphs, charts, and images, if no axes is specified in the code, you can still use the pyplot module to plot the objects.
Plotting Multiple Axes on a Single Figure
A single figure can contain multiple axes, each of which can contain separate plots.
import matplotlib.pyplot as plt
import geopandas as gpd
file = "/Users/.../school_districts.shp"
df = gpd.read_file(file)
ax1 = plt.subplot(221)
df.plot(ax=ax1, edgecolor="black")
ax2 = plt.subplot(222)
df.plot(ax=ax2, edgecolor="black")
ax3 = plt.subplot(223)
df.plot(ax=ax3)
plt.show()
Setting Legend
To set the legend for an axes, simply type: ax.legend()
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(6,4))
ax = fig.add_subplot(111)
ax.plot([300, 400,500,600,700,800,900],
[1.00, 1.20, 1.40, 1.60, 1.80, 2.00, 2.20], label = 'Supply')
ax.plot([300, 430,560,680,710,800,925],
[2.20, 2.00,1.80, 1.60,1.40,1.20,1.00], label = 'Demand')
ax. set_ylabel('Price')
ax. set_xlabel('Quantity Demanded and Supplied')
ax.set_title("Demand and Supply Data")
ax.legend()
plt.show
For more information on how to use the matplotlib figure object, please click on the links below:
7.5. Reading CSV Files Using Pandas¶
The script above used many lines of code to open the csv file, compute average, and plot a graph. We can achieve the same results with far fewer lines of code by using a specialized library named Pandas. Pandas is a very powerful, popular and easy to use Python library for data analysis. It has many Excel-like functions. Its primary object is the DataFrame, which can be thought of as an abstract database table or spreadsheet. Once you create a dataframe object, you can use it to display tables, plot columns, create and run queries, with just a few lines of code.
Let’s plot the earthquake dataset using Pandas.
import pandas as pd
df = pd.read_csv("../earthquakes.csv")
df
Displaying Specific Columns of your Dataframe
To select multiple columns, use a list of column names within the selection brackets []. In the example below, we are selecting the earthquake depth and magnitude fields.
import pandas as pd
df = pd.read_csv("../earthquakes.csv")
df[['depth', 'magnitude']]
Get the size of the table.
import pandas as pd
df = pd.read_csv("../earthquakes.csv")
df[["depth", "mag"]].shape
Filtering the Data Frame for Certain Rows
import pandas as pd
df = pd.read_csv("../earthquakes.csv")
above_3.5 = df[["mag"] > 3.5]
above_35.head()
7.6. Plotting Graphs¶
Many types of graphs can be plotted by pandas. Below are seven types of graphs that are useful to know how to create.
The kind parameter accepts eleven different string values and determines which kind of plot you’ll create:
“area” is for area plots.“bar” is for vertical bar charts.“barh” is for horizontal bar charts.“box” is for box plots.“hexbin” is for hexbin plots.“hist” is for histograms.“kde” is for kernel density estimate charts.“density” is an alias for “kde”.“line” is for line graphs.“pie” is for pie charts.“scatter” is for scatter plots.
Line Graph
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv("../earthquakes.csv")
df.plot(kind='line',y='depth',color='red', figsize=(6, 8))
plt.show()
Histogram
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv("../earthquakes.csv")
#df.plot.line(column = df.columns[3], figsize=(6, 8))
df.plot(kind='hist',y='Depth_mls',color='red',bins = 10, figsize=(6, 8))
plt.show()
Bar Plot
import matplotlib.pyplot as plt
import pandas as pd
speed = [0.1, 17.5, 40, 48, 52, 69, 88]
lifespan = [2, 8, 70, 1.5, 25, 12, 28]
index = ['snail', 'pig', 'elephant','rabbit', 'giraffe', 'coyote', 'horse']
df = pd.DataFrame({'speed': speed,'lifespan': lifespan}, index=index)
ax = df.plot.bar(rot=10)
plt.title("Speed vs Lifespan, Selected Animals")
plt.show()
Scatterplot
To plot the Depth and Magnitude Data as a scatterplot, write:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv("/Users/.../earthquakes.csv")
df.plot(kind='scatter', x='depth',y='magnitude', color='red',figsize=(6, 8))
plt.show()
or
df.plot(kind='scatter',x='Depth_mls',y='Magnitude',color='red')
plt.show()
Notes:
ax is the axes on which to draw the map
cmap is the name of the colormap
legend & legend_kwds control the display of the legend.
7.7. Writing to a File¶
Once we are done with data analysis, we can also write to a file, as shown below.
with open("C:/Users/.../john.txt", "w") as f:
f.write('Hello \n')
f.write('Hello \n')
f.write('Hello \n')
f.write('Hello \n')
f.write('Hello \n')
f.write('Hello \n')
f.write('Hello \n')
f.close
Reading a Data file into Python, splitting its contents by columns, and storing the columns in variables
infile = open("C:/Users/.../earthquakes.csv, 'r')
lines = infile.readlines()
newfile=open("../newfile.txt",mode="a+",encoding="utf-8")
del lines[0] # Remove the first line
#Create empty lists
xvar = []
yvar = []
count = 0
sum = 0
for line in lines:
elements = line.split(",") # splits the line
mag = float(elements[4]) # Get the data in the fifth column
count = count + 1
# Add count and magnitude data to the empty x,y lists
xvar.append(count)
yvar.append(mag)
sum = sum + mag
newfile.write("\n")
newfile.write(str(mag))
#Calculate mean
average = sum / count
newfile2.write(str(average))
infile.close()
print ("The average earthquake magnitude is",round(average, 2))
print ("")
7.8. Plotting Shapefiles¶
There are many libraries available for plotting shapefiles. Such libraries include Geopandas, Shapely, Fiona, OGR, and descartes. Altogether, one of the main contributions of these libraries is that we no longer have to rely on GIS software to view shapefiles and their attribute data. This can be done easily with Python.
Among the various libraries, Geopandas is perhaps the most widely used for plotting shapefiles as it only requires a few lines of code to display the shapefile. Also, attribute tables can be easily manipulated as a panda/geopanda dataframes.
The other libraries are also important depending on the task that you would like to perform. Since geopandas hides almost all its implementation, you will find that studying the code of other libaries gives you a clearer picture of how to handle shapefiles from a Python perspective.
Displaying Shapefiles using Geopandas
To use Geopandas, first, download and install the library. Afterwards, import the library and use it in your code.
As an example, run the code below in your favorite development environment. Note that the code draws the shapefile with a single color because no specific column is being plotted.
import geopandas as gpd
import matplotlib.pyplot as plt
geo_df = gpd.read_file ("/Users/../Michigan.shp")
geo_df.plot()
Viewing the Shapefile’s Attribute Table using Geopandas
To view the attribute table of a shapefile, wit the Geopandas library, use the head() method of the data frame object.
import geopandas
import matplotlib.pyplot as plt
gdf = geopandas("/Users/../Michigan.shp")
#Show data in the attribute table
print(gdf.head())
#Display the shapefile
f, ax = plt.subplots(1, figsize=(8, 11))
gdf.plot(ax = ax, edgecolor='black')
ax.set_title("Water Wells, Washtenaw County, Michigan", fontdict={'fontsize': '14', 'fontweight' : '3'})
plt.show()
Creating a Graduated Color Thematic Map using Geopandas
Often in GIS, we are interested in creating graduated color thematic maps based on specific columns. To do, simply enter parameters in the gpd.plot() method, as shown below.
gdf.plot(ax = ax, column= ‘HISPANIC’, cmap=’OrRd’ , scheme=’fisher_jenks’, legend=True, edgecolor=’black’)
import geopandas
import matplotlib.pyplot as plt
gdf = geopandas.read_file("../Michigan.shp")
#Display the shapefile
f, ax = plt.subplots(1, figsize=(10, 13))
gdf.plot(ax = ax, column= 'HISPANIC', cmap='OrRd' , scheme='fisher_jenks', legend=True, edgecolor='black', legend_kwds={'loc': 'lower left'})
ax.set_title("Mobile Homes, Michigan", fontdict={'fontsize': '14', 'fontweight' : '1'})
plt.show()
#Save the map
f.savefig("/Users/.../map_export.png", dpi=300)
See the Geopandas User Guide for more information.
Plotting World Population Using Data Provided by Geopandas
import geopandas as gpd
df_world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
print (df_world.head())
fig, ax = plt.subplots(1, 1, figsize=(16, 12))
df_world.plot(column='pop_est', cmap='OrRd', ax = ax, ec="black",
legend=True, legend_kwds={"label": "Population", "orientation":"horizontal"})
ax.set_title("Countries of the World by Population")
7.9. Displaying Shapefiles Using Fiona¶
Fiona is a low-level library that focuses on reading and writing geospatial data formats. It acts as a Pythonic interface to the OGR library, providing direct access to the capabilities of OGR for working with vector data. Fiona is known for its efficiency, simplicity, and interoperability with various geospatial file formats. It is often used when specific data access or format-related functionalities are required.
import fiona
import matplotlib.pyplot as plt
from shapely.geometry import shape
# Path to the shapefile
shapefile_path = '/Users/.../cities.shp'
# Open the shapefile using Fiona
with fiona.open(shapefile_path, 'r') as shp:
# Create a new figure
plt.figure(figsize=(10, 10))
# Iterate over each feature in the shapefile
if geometry is not None:
for feature in shp:
geometry = shape(feature['geometry'])
properties = feature['properties']
# Check the type of geometry
if geometry.geom_type == 'Point':
# Handle points
plt.plot(geometry.x, geometry.y, 'ro')
# Add label
plt.text(geometry.x, geometry.y, properties['NAME'], fontsize=8, ha='center', va='bottom')
elif geometry.geom_type == 'LineString':
# Handle lines
plt.plot(*geometry.xy, 'b-')
# Add label
centroid = geometry.centroid
plt.text(centroid.x, centroid.y, properties['NAME'], fontsize=8, ha='center', va='bottom')
elif geometry.geom_type == 'Polygon':
# Handle polygons
plt.plot(*geometry.exterior.xy, 'g-')
for interior in geometry.interiors:
plt.plot(*interior.xy, 'g--')
# Add label
centroid = geometry.centroid
plt.text(centroid.x, centroid.y, properties['NAME'], fontsize=8, ha='center', va='bottom')
# Set the plot limits and labels
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.title('Shapefile Visualization')
plt.grid(True)
# Show the plot
plt.show()
Notes:
ax is the axes on which to draw the map
cmap is the name of the colormap
legend & legend_kwds control the display of the legend.
Displaying a Shapefile Using PyShp
Pyshp is an open source library that can be used to display a shapefile using pure Python. The project is described at this website. PyShp code for displaying a shapefile is lengthier than the one used by geopandas, but it gives us an opportunity to learn about multipart polygons and polylines. Please note that after installing the “Pyshp” library, it is imported into Python using the shapefile keyword, e.g., “import shapefile”.
Prior to running the code,it is useful to know that shapefiles have a ‘parts collection’ and a ‘points collection’. A parts collection is used to keep track of the number of polygons or line segments associated with each record. This is required because in many situations, more than one polygons or polylines must be linked to a record in the attribute table. As an example, since Hawaii is made up of several islands, multiple polygons are needed to represent each island in the state. However in the attribute table, only a single record is used to represent the state.
In a shapefile, a record with one polygon will have one part, but this will show in the parts collection as 0 because the counting system begins with zero.
A points collection refers to the list of points associated with a shape or feature. When working with points collection, be aware that the x-coordinates have an index of [0] while the y-coordinates will have an index value of [1]. The index values allow us to retrieve the x and y coordinates as separate lists.
The illustration below shows how we visualize the relationships between a parts collection and a points collection. The upper section of the illustration shows the polygons associated with a single record while the lower section shows the part and points collection associated with the polygons. Since there are two polygons associated with the record, the parts collection for that record is 1 because the count starts from zero.
The values in the squares in the parts collection represent index numbers that keep track of the starting coordinates for each part. Thus, the first part has coordinates that go from from p1 to p6. The second part starts from index value 6, and has coordinates that go from p7 to p11.
Using Pyshp we can experiment with displaying a shapefile using only its points collection and with both its parts and points collection. The code below shows how to display a shapefile using only its points collection.
import matplotlib.pyplot as plt
import shapefile
sf = shapefile.Reader("Users/../Michigan.shp")
plt.figure(figsize=(7,8))
for each_rec in sf.shapeRecords():
x = [i[0] for i in each_rec.shape.points]
y = [i[1] for i in each_rec.shape.points]
plt.plot(x,y, color="gray")
plt.show()
The code below plots a Michigan shapefile using both the parts and points collection. This is a much better way to plot the shapefile. To display the parts and points of a shapefile, the code must first get the number of parts associated with each polyline or polygon, then loop through each part to get the array of points for the part.
import shapefile as shp
import matplotlib.pyplot as plt
sf = shp.Reader("Users/../Michigan.shp")
plt.figure(figsize = (7,8))
# loop through each record in the shaperecords collection
for each_rec in sf.shapeRecords():
for i in range(len(each_rec.shape.parts)):
i_start = each_rec.shape.parts[i] #Get the starting values for the part.
if i==len(each_rec.shape.parts)-1:
i_end = len(each_rec.shape.points) #Get the length of the points collection.
else:
i_end = each_rec.shape.parts[i+1]
#Get the X,Y coord of the points in each part and make a list
x = [i[0] for i in each_rec.shape.points[i_start:i_end]]
y = [i[1] for i in each_rec.shape.points[i_start:i_end]]
plt.plot(x,y, color = "green")
plt.show()
Displaying Single or Multi-Polygon Shapefiles using Descartes
The Descartes library is another alternative for displaying shapefiles using Python. Descartes uses geometric objects as input for displaying matplotlib paths and patches as lines and polygons. Experiment with the code below to learn about geo_interfaces from the shapefile library, add_patch() from matplotlib, and PolyPatch () from Descartes.
import shapefile as shp
import matplotlib.pyplot as plt
from descartes import PolygonPatch
sf = shp.Reader("../School_Districts.shp")
fig = plt.figure()
ax = fig.gca()
for poly in sf.shapes():
poly_geo=poly.__geo_interface__
ax.add_patch(PolygonPatch(poly_geo, fc='#6699cc', ec='#000000', alpha=0.5, zorder=2 ))
ax.axis('scaled')
plt.show()
In this example, the code above has been modified to display the polygons in the shapefile with multiple colors.
import shapefile as shp
import matplotlib.pyplot as plt
from descartes import PolygonPatch
import random
sf = shp.Reader("/Users/.../school_districts.shp")
fig = plt.figure()
ax = fig.gca()
for poly in sf.shapes():
poly_geo=poly.__geo_interface__
# Generate a random number between 0 and 2^24
color = random.randrange(0, 2**24)
# Convert the number from base-10 (decimal) to base-16 (hexadecimal)
hex_color = hex(color)
std_color = "#" + hex_color[2:]
ax.add_patch(PolygonPatch(poly_geo, fc= str(std_color), ec='#000000', alpha=0.5, zorder=2 ))
ax.axis('scaled')
plt.show()
Displaying a List of XY Coordinates as Points
# import libraries
import pandas as pd # Pandas Read csv file
from shapely.geometry import Point # Shapely for converting latitude/longtitude to geometry
import geopandas as gpd # Geopandas for visualing shapefile
import matplotlib.pyplot as plt
homicides = pd.read_csv('/Users/.../Toronto_Homicides.csv')
#print (homicides.head())
#Download data at - https://data.torontopolice.on.ca/datasets/TorontoPS::homicides-open-data-asr-rc-tbl-002/explore
# creating a geometry column
geometry = [Point(xy) for xy in zip(homicides['X'], homicides['Y'])]
# Coordinate reference system : WGS84
crs = {'init': 'epsg:4326'}
# Creating a Geographic data frame
gdf = gpd.GeoDataFrame(homicides, geometry=geometry)
# Plot all points
gdf.plot(marker='o', color='r', markersize=1.5)
plt.show()
The above task can be done entirely in geopandas using a script such as the one below:
import pandas as pd
import geopandas as gpd
homicide_data = pd.read_csv('/Users/.../Toronto_Homicides2.csv')
homicide_gpd = gpd.GeoDataFrame(homicide_data, geometry = gpd.points_from_xy(homicides_data["X"],homicides_data["Y"] ))
#homicide_gpd
homicide_gpd.plot(markersize = 1.5, color = "red", figsize =(10,10))
Please view the video below for an explanation of the code.
7.10. Displaying Rasters¶
In programming, rasters are considered as arrays. An array is a collection of items of the same data type that can be manipulated as a single entity. In Python, a list is a one dimensional array. However, when we are thinking about rasters, we are typically thinking of two dimensional arrays that are defined by rows and columns.
In Python, one difference between a list and a two-dimensional array is that whereas a list can store multiple data types, a two-dimensional array can store only one data type.
Python has specialized libraries for manipulating arrays. Two popular ones are the “numpy” library and the “array” module. Numpy appears to be more popular. To import Numpy into your script, type:
>>> import numpy as np
The “np” is a nickname by convention. To learn more about arrays, please click on this link.
Displaying a DEM using GDAL and Matplotlib
One way to display a raster is to open the raster file using the gdal library, then convert the raster into an array using GDAL. Afterwards, we can use matplotlib’s pyplot.imshow() to display the array. If you not familiar with imshow, please look it up. Imshow takes a ‘vmin’ argument which sets the minimum display values. In this case elevation values of zero and higher will be displayed. Thus, if the raster has a negative nodata value, it will bot be displayed. Cmap is a color map. Here we are using the matplotlib’s color map named “gist_earth”.
import gdal
import matplotlib.pyplot as plt
#Open raster and read number of rows, columns and bands
ds = gdal.Open("/Users/.../topography/dem")
band1 = ds.GetRasterBand(1)
raster_array = band1.ReadAsArray()
plt.imshow(raster_array,vmin=0, cmap="gist_earth")
plt.show()
In the illustration above, the raster is displayed without any reference to real world coordinates. The coordinates displayed are just counts of the raster cells.
In GIS, it is typical for map layers to be displayed with their real world coordinates. In the sample script below, the geotransform function is introduced. This function is used to recalibrate the cell values in terms of real world coordinates. The script also introduces raster legends.
The GeoTransform contains a tuple of values that are used to establish the coordinates of the upper left (UL) corner of a raster. GeoTransform basically takes these values and use them to renumber the entire image in terms of the real world coordinate values. There are six values in the geotransform tuple. They are shown below.
In the script below, ulx and uly are the coordinates in the the upper left corner in the GeoTranformed raster while lrx and lry are the coordinates in the lower right corner.
import gdal
import matplotlib.pyplot as plt
import matplotlib.colors as colors
#Set directory
ds = gdal.Open('/Users/.../topography/dem')
if ds is None:
print ('Could not open file')
sys.exit(1)
band1 = ds.GetRasterBand(1)
no_data = band1.GetNoDataValue()
raster_array = band1.ReadAsArray()
ulx,xres,xskew,uly,yskew,yres = ds.GetGeoTransform()
lrx = ulx + (ds.RasterXSize * xres)
lry = uly + (ds.RasterYSize * yres)
print (ulx,lrx,lry,uly)
fig, ax = plt.subplots(figsize=(8, 5))
#Array contains many no data values, therefore normalize legend from 0 to 1.
norm = colors.Normalize(vmin = 0, vmax = raster_array.max())
cmap = plt.get_cmap("gist_earth")
img = plt.imshow(raster_array, cmap, norm, extent=(ulx, lrx,lry,uly)) # Get the plot renderer object.
cbar = plt.colorbar(img,shrink=0.75) #Associate the color bar with the plot renderer and axes objects.
cbar.set_label('Meters')
plt.savefig('dem_plot.png', dpi=300, bbox_inches='tight')
plt.show()
References
Raster data processing with Python and GDAL - https://notebook.community/Automating-GIS-processes/Lesson-7-Automating-Raster-Data-Processing/Python-and-Gdal
Introduction to NumPy and OpenCV. http://vision.deis.unibo.it/~smatt/DIDATTICA/Sistemi_Digitali_M/PDF/Introduction_to_NumPy_and_OpenCV.pdf
Displaying a Raster using Rasterio and Matplotlib
Rasterio is a popular open source Python library used for viewing and manipulating rasters. Rasterio utilizes the gdal library to display rasters. With rasterio, viewing a raster can be done with just a few lines of code, like the example below.
Rasterio has a show( ) method for displaying rasters. However, the library also uses pyplot’s imshow method to display the data.
import rasterio
from matplotlib import pyplot
src = rasterio.open("/Users/.../topography/dem")
# Get the bounding box coordinates of the raster
extent=[src.bounds[0], src.bounds[2], src.bounds[1], src.bounds[3]]
#Convert the raster into an array
src_array = src.read(1)
fig, ax = pyplot.subplots(1, figsize=(8, 5))
# Get the plot renderer object.
img = ax.imshow(src_array, cmap=plt.get_cmap('jet'), extent=extent, norm = colors.Normalize(vmin = 0, vmax = src_array.max()))
ax.set_title("Digital Elevation Model, Wayne County")
#Associate the figure object with plot renderer and axes objects.
fig.colorbar(img, ax=ax)
#Let the axes object set the length of the colobar.
ax.set_aspect('auto')
pyplot.show()
Displaying a three-band Aerial Photo with GDAL
import numpy as np
from osgeo import gdal
import matplotlib.pyplot as plt
aerial = gdal.Open("/Users/semple/Desktop/Detroit.png
bnd1 = aerial.GetRasterBand(1)
bnd2 = aerial.GetRasterBand(2)
bnd3 = aerial.GetRasterBand(3)
#Now turn each band into a ndarray:
img1 = bnd1.ReadAsArray()
img2 = bnd2.ReadAsArray()
img3 = bnd3.ReadAsArray()
#Then stack them to have a 3 band image
img = np.dstack((img1,img2,img3))
plt.imshow(img)
plt.show()
Displaying Satellite Imagery with GDAL
import numpy as np
from osgeo import gdal
import matplotlib.pyplot as plt
#Image-2019
band5 = "/Users/.../Landsat/LT05_L1TP_019031_20111106_20160830_01_T1_B5.TIF"
band4 = "/Users/.../Landsat/LT05_L1TP_019031_20111106_20160830_01_T1_B4.TIF"
band3 = "/Users/.../Landsat/LT05_L1TP_019031_20111106_20160830_01_T1_B3.TIF"
#Open the Landsat image bands with GDAL
B5_data = gdal.Open(band5)
B4_data= gdal.Open(band4)
B3_data = gdal.Open(band3)
B5_Array = B5_data.GetRasterBand(1).ReadAsArray().astype(np.float32)
B4_Array = B4_data.GetRasterBand(1).ReadAsArray().astype(np.float32)
B3_Array = B5_data.GetRasterBand(1).ReadAsArray().astype(np.float32)
#Then stack them to have a 3 band image
img = np.dstack((B5_Array,B4_Array,B3_Array))
plt.imshow(img)
plt.show()
Displaying a Three-band Raster with Rasterio
Rasterio is a popular open source Python library used for viewing and manipulating rasters. Rasterio utilizes the gdal library to display rasters. With rasterio, viewing a raster can be done with just a few lines of code, like the example below.
Rasterio has a show( ) method for displaying rasters. However, the library can also use pyplot’s imshow method to display the data. The example below uses the show method.
import rasterio
from rasterio.plot import show
src = rasterio.open("/Users/.../Detroit.png")
show(src)
Displaying a Raster in QGIS Using Python
Displaying Web Maps and Feature Layers Stored in ArcGIS Online Using ArcGIS API for Python
Making an Anonymous Connection to ArcGIS Online
The code sample code below lets you connect anonymously to Arcgis Online and search for content that are related to ‘Michigan’ and that are publicly available.
from arcgis.gis import GIS
gis = GIS() # Connect to ArcGIS Online as an anonymous user
search_subset = gis.content.search("Michigan", max_items=100)
search_subset
The results should be similar to the illustration below.
Now, let’s search for a particular layer named “BA Web App Suitability Analysis Data”
from arcgis.gis import GIS
gis = GIS() # Connect to ArcGIS Online as an anonymous user
search_subset = gis.content.search("BA Web App Suitability Analysis Data")
search_subset
The result is:
- [<Item title:”Justice40 Tracts Map April 2022” type:Web Map owner:esri_demographics>,
<Item title:”Justice40 Tracts Map May 2022” type:Web Map owner:esri_demographics>, <Item title:”Live Fire Web App (Hosted by Paul Doherty) “ type:Web Mapping Application owner:pjdohertygis>, <Item title:”10 Minute Drive Access to Grocery Stores” type:Map Image Layer owner:UOdocent>, <Item title:”10 Minute Walk Access to Grocery Stores” type:Map Image Layer owner:UOdocent>, <Item title:”BA Web App Suitability Analysis Data” type:Web Mapping Application owner:bpeverall22@COAGIS>,
Let’s get the first item from the list.
from arcgis.gis import GIS
gis = GIS() # Connect to ArcGIS Online as an anonymous user
search_subset = gis.content.search("BA Web App Suitability Analysis Data")
subset_item = search_subset[0]
subset_item
Connect to your ArcGIS Online Account
Let’s log into our own ArcGIS Online Account and display some of the content. The general syntax is shown below. Notice that we are passing in the URL of the ArcGIS Online server (or portal), our username, and password to “GIS”.
import arcgis
from arcgis.gis import GIS
#connect to your GIS
gis = GIS("https://www.arcgis.com","username","password")
my_content = gis.content.search(query="owner:" + gis.users.me.username, max_items=100)
my_content
Connect to a Feature Layer
# Establish a connection to your GIS.
from arcgis.gis import GIS
from IPython.display import display
gis = GIS("https://www.arcgis.com","username","password")
# Search for 'USA major cities' feature layer collection
search_results = gis.content.search('title: USA Major Cities',
'Feature Layer')
# Access the first Item that's returned
major_cities_item = search_results[0]
major_cities_item
Connect to a Feature Layer using the Feature’s ItemID
from arcgis.gis import GIS
from IPython.display import display
gis = GIS() # anonymous connection to www.arcgis.com
freeways = gis.content.get('91c6a5f6410b4991ab0db1d7c26daacb')
freeways
For more information, please visit the link below:
https://developers.arcgis.com/python/guide/working-with-feature-layers-and-features/
7.11. Exercises¶
Visit this NFL website and copy the quarter back data. Paste the data into Excel and save it in CSV format. Use the standard library in Python or pandas to plot a simple histogram and boxplot of the data in the passing attempts field (ATT). Also, calculate the mean and standard deviation of passing attempts. Explain what your data is portraying in short paragraph.
Using Geopandas, create a thematic map for the USA or Michigan showing the distribution of Covid19 cases across the country or state for the date for which you have data. Write comments to explain what your code is doing. Repeat the process using another Python library of your choice.
Extend the sample code in this Module that uses Descartes to display shapefiles in which the polygons are labeled.
Complete the tutorial on this website including the challenge task at the end.
Review the tutorial on this page and be prepared to discuss the code, particularly how Matplotlib is how used.
Modify the ArcGIS API for Python code above to enable you to log into your own ArcGIS Online account and display a web map.
7.12. Resources¶
Python Shapefile Library - https://pythonhosted.org/Python%20Shapefile%20Library/
Land Cover Change Analysis with Python and GDAL - Tutorial - https://hatarilabs.com/ih-en/land-cover-change-analysis-with-python-and-gdal-tutorial
Rasterio - https://geobgu.xyz/py/rasterio.html#
Create Random Hex Color Code Using Python - https://www.geeksforgeeks.org/create-random-hex-color-code-using-python/#
Plotting large shapefiles with matplotlib - https://gis.stackexchange.com/questions/202839/plotting-large-shapefiles-with-matplotlib/266675#266675
ArcGIS API for Python. Working with web maps and web scenes - https://developers.arcgis.com/python/guide/working-with-web-maps-and-web-scenes/
https://mapscaping.com/how-to-read-a-shapefile-using-python/