Getting Started with Geopandas: A Library for Spatial Data Analysis in Python¶
1. Introduction:¶
Geopandas is an open-source Python library that simplifies the process of working with geospatial data.It helps users to do operations like plotting maps, working with shapefiles, and performing geographic analysis. In this blog article, we will introduce you to some important features of Geopandas, show you how to install it, and walk through basic operations.
2. Installation:¶
Ensure you have Python 3.x installed on your system. Then follow these steps-
pip install geopandas
3. Key Features & Explanation:¶
1. GeoDataFrames:¶
GeoDataFrames provides the ability to store both tabular data and geometric data in a single structure. A GeoDataFrame extends the capabilities of a regular Pandas DataFrame by including a special column for geometries. These geometries are objects like Point, LineString, and Polygon.
2. Easy Reading and Writing of Geospatial Data¶
Geopandas supports several common geospatial data formats, allowing you to read and write spatial data with just one line of code. Example of formats - Shapefiles (.shp), GeoJSON (.geojson), KML (.kml), GPKG (GeoPackage)
3. Spatial Operations:¶
Spatial operations are important for tasks like proximity analysis, spatial joins, or finding intersections between geographic features. Geopandas makes these operations simple and efficient.
4. Plotting and Visualization¶
Geopandas work with Matplotlib to visualize geospatial data. It helps us to easily plot geometries, customize map colors, and put additional elements such as titles and legends.
5. Coordinate Reference Systems (CRS)¶
CRS defines how coordinates (latitude and longitude) relate to locations on the Earth’s surface. It ensures that our data aligns correctly when combined with other datasets.
4. Code Examples:¶
1. Loading the Shapefile¶
import geopandas as gpd
shapefile_path = r"C:\Users\prabhanshu chouhan\Downloads\110m_cultural\ne_110m_admin_0_countries.shp"
gdf = gpd.read_file(shapefile_path)
# Display the first few rows
print(gdf.head())
featurecla scalerank LABELRANK SOVEREIGNT SOV_A3 \ 0 Admin-0 country 1 6 Fiji FJI 1 Admin-0 country 1 3 United Republic of Tanzania TZA 2 Admin-0 country 1 7 Western Sahara SAH 3 Admin-0 country 1 2 Canada CAN 4 Admin-0 country 1 2 United States of America US1 ADM0_DIF LEVEL TYPE TLC ADMIN ... \ 0 0 2 Sovereign country 1 Fiji ... 1 0 2 Sovereign country 1 United Republic of Tanzania ... 2 0 2 Indeterminate 1 Western Sahara ... 3 0 2 Sovereign country 1 Canada ... 4 1 2 Country 1 United States of America ... FCLASS_TR FCLASS_ID FCLASS_PL FCLASS_GR FCLASS_IT \ 0 None None None None None 1 None None None None None 2 Unrecognized Unrecognized Unrecognized None None 3 None None None None None 4 None None None None None FCLASS_NL FCLASS_SE FCLASS_BD FCLASS_UA \ 0 None None None None 1 None None None None 2 Unrecognized None None None 3 None None None None 4 None None None None geometry 0 MULTIPOLYGON (((180 -16.06713, 180 -16.55522, ... 1 POLYGON ((33.90371 -0.95, 34.07262 -1.05982, 3... 2 POLYGON ((-8.66559 27.65643, -8.66512 27.58948... 3 MULTIPOLYGON (((-122.84 49, -122.97421 49.0025... 4 MULTIPOLYGON (((-122.84 49, -120 49, -117.0312... [5 rows x 169 columns]
2. Plotting the Geospatial Data¶
import matplotlib.pyplot as plt
# Plot the geospatial data
gdf.plot()
# Add a title and show the plot
plt.title("Countries World Map")
plt.show()
3. Filtering Data (e.g., Select a Specific Country)¶
# Filter the GeoDataFrame to select India
india = gdf[gdf['NAME'] == 'India']
# Plot India
india.plot()
plt.title("India")
plt.show()
4.Calculating the Distance Between Countries¶
# Reproject both countries to UTM (EPSG: 32633) for distance in meters
gdf_utm = gdf.to_crs(epsg=32633)
# Filter India and England
india_utm = gdf_utm[gdf_utm['NAME'] == 'India']
england_utm = gdf_utm[gdf_utm['NAME'] == 'United Kingdom']
# Calculate the centroids of both countries in the new CRS (UTM)
india_centroid_utm = india_utm.geometry.centroid.iloc[0]
england_centroid_utm = england_utm.geometry.centroid.iloc[0]
# Calculate the distance in meters between the centroids
distance_meters = india_centroid_utm.distance(england_centroid_utm)
# Convert distance from meters to kilometers
distance_km = distance_meters / 1000
# Print the distance in kilometers
print(f"The distance between India and England is: {distance_km} kilometers.")
The distance between India and England is: 9028.0805211224 kilometers.
5. Finding the Nearest Neighbors (Based on geographical distance between the centroids)¶
import geopandas as gpd
# Read the shapefile with countries
shapefile_path = r"C:\Users\prabhanshu chouhan\Downloads\110m_cultural\ne_110m_admin_0_countries.shp"
gdf = gpd.read_file(shapefile_path)
# Filter India from the GeoDataFrame
india = gdf[gdf['NAME'] == 'India'].iloc[0] # Get the first (and presumably only) row for India
# Calculate the distance from India to all other countries
gdf['distance_to_india'] = gdf.geometry.distance(india.geometry)
# Exclude India from the results (distance to itself is zero)
gdf_without_india = gdf[gdf['NAME'] != 'India']
# Find the index of the nearest country
nearest_country_index = gdf_without_india['distance_to_india'].idxmin()
# Get the name of the nearest country
nearest_country_name = gdf.loc[nearest_country_index, 'NAME']
print(f"The nearest country to India is: {nearest_country_name}.")
The nearest country to India is: Myanmar.
C:\Users\prabhanshu chouhan\AppData\Local\Temp\ipykernel_337576\1029225391.py:11: UserWarning: Geometry is in a geographic CRS. Results from 'distance' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation. gdf['distance_to_india'] = gdf.geometry.distance(india.geometry)
6. Calculating the Area of Each Country¶
# Reproject to a projected CRS (if not already)
gdf_utm = gdf.to_crs(epsg=32633)
# Calculate the area of each country (in square meters)
gdf_utm['area_sqm'] = gdf_utm.geometry.area
# Convert the area to square kilometers
gdf_utm['area_sqkm'] = gdf_utm['area_sqm'] / 1e6
# Print the area of the first few countries
print(gdf_utm[['NAME', 'area_sqkm']].head())
NAME area_sqkm 0 Fiji 2.080502e+04 1 Tanzania 1.053473e+06 2 W. Sahara 1.167059e+05 3 Canada 1.326423e+07 4 United States of America 1.810020e+07
C:\Users\prabhanshu chouhan\AppData\Local\Programs\Python\Python313\Lib\site-packages\shapely\measurement.py:44: RuntimeWarning: invalid value encountered in area return lib.area(geometry, **kwargs)
7. Finding the Centroid of a Country¶
import matplotlib.pyplot as plt
# Filter for India
india = gdf[gdf['NAME'] == 'India']
# Calculate the centroid of India
india_centroid = india.geometry.centroid.iloc[0]
# Plot the country and its centroid
fig, ax = plt.subplots()
india.plot(ax=ax, edgecolor='black', facecolor='lightblue')
ax.scatter(india_centroid.x, india_centroid.y, color='red', marker='x')
plt.title('Centroid of India')
plt.show()
C:\Users\prabhanshu chouhan\AppData\Local\Temp\ipykernel_337576\1904554495.py:6: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation. india_centroid = india.geometry.centroid.iloc[0]
pip install folium
Collecting folium Downloading folium-0.19.4-py2.py3-none-any.whl.metadata (3.8 kB) Collecting branca>=0.6.0 (from folium) Downloading branca-0.8.1-py3-none-any.whl.metadata (1.5 kB) Requirement already satisfied: jinja2>=2.9 in c:\users\prabhanshu chouhan\appdata\local\programs\python\python313\lib\site-packages (from folium) (3.1.4) Requirement already satisfied: numpy in c:\users\prabhanshu chouhan\appdata\local\programs\python\python313\lib\site-packages (from folium) (2.2.1) Requirement already satisfied: requests in c:\users\prabhanshu chouhan\appdata\local\programs\python\python313\lib\site-packages (from folium) (2.32.3) Collecting xyzservices (from folium) Downloading xyzservices-2025.1.0-py3-none-any.whl.metadata (4.3 kB) Requirement already satisfied: MarkupSafe>=2.0 in c:\users\prabhanshu chouhan\appdata\local\programs\python\python313\lib\site-packages (from jinja2>=2.9->folium) (3.0.2) Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\prabhanshu chouhan\appdata\local\programs\python\python313\lib\site-packages (from requests->folium) (3.4.0) Requirement already satisfied: idna<4,>=2.5 in c:\users\prabhanshu chouhan\appdata\local\programs\python\python313\lib\site-packages (from requests->folium) (3.10) Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\prabhanshu chouhan\appdata\local\programs\python\python313\lib\site-packages (from requests->folium) (2.2.3) Requirement already satisfied: certifi>=2017.4.17 in c:\users\prabhanshu chouhan\appdata\local\programs\python\python313\lib\site-packages (from requests->folium) (2024.12.14) Downloading folium-0.19.4-py2.py3-none-any.whl (110 kB) Downloading branca-0.8.1-py3-none-any.whl (26 kB) Downloading xyzservices-2025.1.0-py3-none-any.whl (88 kB) Installing collected packages: xyzservices, branca, folium Successfully installed branca-0.8.1 folium-0.19.4 xyzservices-2025.1.0 Note: you may need to restart the kernel to use updated packages.
[notice] A new release of pip is available: 24.3.1 -> 25.0.1 [notice] To update, run: python.exe -m pip install --upgrade pip
8. Plotting PM 2.5 sensor locations on the map of India from provided dataset.¶
import geopandas as gpd
import matplotlib.pyplot as plt
from shapely.geometry import Point
df= pd.read_csv('Data.csv')
state=pd.read_csv('State_data.csv')
NCAP=pd.read_csv('NCAP_Funding.csv')
# Path to the extracted shapefile you downloaded from Natural Earth
shapefile_path = r"C:\Users\prabhanshu chouhan\Downloads\110m_cultural\ne_110m_admin_0_countries.shp"
# Load the shapefile using GeoPandas
world = gpd.read_file(shapefile_path)
# Filter the world data to get India
india = world[world['ADMIN'] == 'India'] # Replace 'ADMIN' with the correct column name
# Extract sensor locations from your provided dataframe (latitude and longitude)
sensor_locations = df[['latitude', 'longitude']].dropna() # Ensure no missing coordinates
sensor_locations = [Point(lon, lat) for lat, lon in zip(sensor_locations['latitude'], sensor_locations['longitude'])]
# Create a GeoDataFrame for sensor locations
gdf_sensors = gpd.GeoDataFrame(geometry=sensor_locations)
# Plotting the map of India
fig, ax = plt.subplots(figsize=(10, 10)) # Set figure size
india.plot(ax=ax, color='lightgray') # Plot India in light gray
# Plotting the sensor locations on the map
gdf_sensors.plot(ax=ax, color='red', markersize=50, label='Sensor Locations')
# Adding labels and title
plt.title('Map of India with Sensor Locations')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
# Show the legend
plt.legend()
# Show the plot
plt.show()
5. Use Cases:¶
1. Visualization of Countries¶
It offers a clear visual representation of where capitals are located in relation to countries and neighboring regions.
It helps in quickly identifying the geographical distribution of capital cities.
2. Distance Calculation¶
We can calculate the distance from a capital city to the nearest border of the country.
This type of analysis is useful in geopolitical studies.
3.Regional Planning and Decision Making¶
It help governments and businesses plan for the growth and expansion of regions like development projects, infrastructure expansion, and service delivery based on geographical data.
4. Travel and Tourism Analysis¶
GeoPandas can assist tourism agencies and governments by analyzing the distance between capitals and major tourist destinations.
Travel agencies can make informed decisions on creating travel routes or designing tourism packages.
5. Climate and Environmental Studies¶
With climate projections such as temperature increases, rising sea levels, or extreme weather events, GeoPandas enables the assessment of capitals' vulnerability to these phenomena.
GeoPandas can be used to assess pollution levels in and around capital cities.
6.Public Health and Disease Tracking¶
GeoPandas can help in tracking the spread of infectious diseases and understanding spatial patterns of outbreaks.
7.Social Studies¶
GeoPandas can help to analyze population movements, density, and migration trends.
Can help to identify condition of education, healthcare, jobs availibility of a geographical region.
6.Conclusion¶
For Python users, GeoPandas is a key library for handling and analyzing geospatial data. It offers wide-ranging capabilities for spatial data manipulation. Also, it provides truly intuitive visualization and effortless integration with other libraries like Pandas and Shapely. GeoPandas provides a variety of flexible and user-friendly solutions across business, public health, ecological studies, and urban planning. It is useful for data-directed decision-making in multiple fields because it can do quite a few involved analyses, including a number of distance calculations as well as geometric transformations. It also has a few cons for example, it consumes a lot of memory when dealing with very large datasets and it relies on several external libraries which can lead to installation or version compatibility issues. But overall, GeoPandas is a very helpful library.
7.References & Further Reading:¶
- GeoPandas Official Documentation: https://geopandas.org/
- GeoPandas GitHub Repository: https://github.com/geopandas/geopandas
- GeoPandas Tutorial - DataCamp: https://www.datacamp.com/community/tutorials/geospatial-data-python