Understanding Coordinate Systems in Geospatial Analysis

Aug 3, 2024

I spent three hours debugging why my geospatial clustering wasn't working.

The distances were completely wrong. A cluster in Manhattan was somehow "closer" to Brooklyn than to itself. Points that should be 500 meters apart were calculating as 5 kilometers.

Then I realized: I was calculating distances in degrees, not meters.

One line of code to transform the coordinate system, and everything worked perfectly.

This is the most common mistake in geospatial analysis, and it'll bite you if you don't understand coordinate systems. The difference between WGS84 and EPSG:3857 isn't academic trivia—it's the difference between your analysis working or producing garbage.

If you're working with location data (and if you're reading this, you probably are), you need to understand this. Let me save you the three hours I wasted and explain coordinate systems in a way that actually makes sense.

No math PhD required. Just practical knowledge for working with real geospatial data.

Why Coordinate Systems Matter

Here's the fundamental problem: The Earth is round. Your screen is flat.

Every time you work with geospatial data, you're dealing with this tension between reality (a sphere) and representation (a flat map).

The consequences of getting this wrong:

Distance calculations are wildly inaccurate

  • "This location is 10 kilometers away" → Actually it's 2 kilometers

  • Your proximity analysis is useless

Areas are completely wrong

  • "This property is 5000 square meters" → Actually it's 3000

  • Your real estate analysis is garbage

Clustering doesn't work

  • Points that should cluster together don't

  • DBSCAN, K-means, any spatial algorithm fails

  • Your hotspot detection is meaningless

Visualization looks broken

  • Shapes are distorted

  • Distances don't match visual representation

  • Maps look weird and unprofessional

All because you used the wrong coordinate system.

The Two Types of Coordinate Systems

There are two fundamentally different ways to represent locations on Earth.

1. Geographic Coordinate Systems (Latitude/Longitude)

This is what you're used to: degrees of latitude and longitude.

Example: New York City is at (40.7128° N, -74.0060° W)

How it works:

  • Latitude: Distance north/south of the equator (0° to 90° or -90°)

  • Longitude: Distance east/west of the Prime Meridian (0° to 180° or -180°)

  • Units: Degrees (not meters, not feet)

The problem: Degrees are not constant distances.

1° of longitude at the equator ≈ 111 km 1° of longitude at New York (40°N) ≈ 85 km
1° of longitude at the North Pole ≈ 0 km

Translation: You can't do accurate distance calculations in degrees. Period.

Most common: WGS84 (EPSG:4326)

  • Used by GPS

  • What you get from Google Maps API

  • Standard for storing location data

  • Great for storage, terrible for calculations

2. Projected Coordinate Systems (X/Y in Meters)

This converts the round Earth to a flat map with coordinates in meters (or feet).

Example: Same NYC location in Web Mercator is (-8238310.94, 4970270.41) meters

How it works:

  • Take the sphere (Earth)

  • Project it onto a flat surface (like peeling an orange and laying it flat)

  • Coordinates are now in meters from a reference point

  • Units: Meters (or feet)

The advantage: You can now do math.

python


# Calculate distance between two points
distance = sqrt((x2 - x1)² + (y2 - y1)²)

# This actually works in projected coordinates
# It's garbage in lat/lon

Most common: Web Mercator (EPSG:3857)

  • Used by Google Maps, OpenStreetMap, Mapbox

  • Optimized for web mapping

  • Units are meters

  • Great for calculations, distorts areas near poles

WGS84 vs Web Mercator: The Most Important Comparison

These are the two you'll use 95% of the time.

WGS84 (EPSG:4326) - Geographic

What it is: Latitude and longitude in degrees

When to use:

  • ✅ Storing location data in databases

  • ✅ GPS coordinates

  • ✅ API responses (Google Maps, etc.)

  • ✅ Displaying points on a map

  • ✅ Global datasets

When NOT to use:

  • ❌ Distance calculations

  • ❌ Area calculations

  • ❌ Spatial clustering (DBSCAN, K-means)

  • ❌ Buffer operations ("find everything within 500m")

  • ❌ Any math-based spatial operations

Example data:

python


# WGS84 - Lat/Lon in degrees
locations = [
    {"lat": 40.7128, "lon": -74.0060},  # NYC
    {"lat": 40.7589, "lon": -73.9851},  # Times Square
]

Web Mercator (EPSG:3857) - Projected

What it is: X/Y coordinates in meters from the equator and Prime Meridian

When to use:

  • ✅ Distance calculations

  • ✅ Area measurements

  • ✅ Spatial clustering

  • ✅ Buffer operations

  • ✅ Proximity analysis

  • ✅ Any math on coordinates

When NOT to use:

  • ❌ Storing data (use WGS84 instead)

  • ❌ Global analysis (distorts near poles)

  • ❌ Accurate area at high latitudes

Example data:

python


# EPSG:3857 - X/Y in meters
locations = [
    {"x": -8238310.94, "y": 4970270.41},  # NYC
    {"x": -8233190.12, "y": 4975851.23},  # Times Square
]

The difference in distance calculation:

python


# WGS84 (WRONG for distance)
lat1, lon1 = 40.7128, -74.0060  # NYC
lat2, lon2 = 40.7589, -73.9851  # Times Square

# This is WRONG
bad_distance = sqrt((lat2 - lat1)**2 + (lon2 - lon1)**2)
# Result: 0.0529 degrees ← Meaningless!

# EPSG:3857 (CORRECT for distance)
x1, y1 = -8238310.94, 4970270.41  # NYC
x2, y2 = -8233190.12, 4975851.23  # Times Square

# This is CORRECT
good_distance = sqrt((x2 - x1)**2 + (y2 - y1)**2)
# Result: 6,730 meters ← Accurate!

The Critical Workflow: Transform Before You Calculate

Here's the pattern you'll use constantly:

Step 1: Data arrives in WGS84 (lat/lon) Step 2: Transform to projected CRS (EPSG:3857 or local) Step 3: Do your calculations (distance, clustering, buffers) Step 4: Transform back to WGS84 for storage/display (if needed)

In Python with GeoPandas:

python


import geopandas as gpd
from shapely.geometry import Point

# Step 1: Create data in WGS84
data = [
    {"name": "Location A", "lat": 40.7128, "lon": -74.0060},
    {"name": "Location B", "lat": 40.7589, "lon": -73.9851},
]

# Convert to GeoDataFrame with WGS84
geometry = [Point(d['lon'], d['lat']) for d in data]
gdf = gpd.GeoDataFrame(data, geometry=geometry, crs="EPSG:4326")

print("Original CRS:", gdf.crs)  # EPSG:4326

# Step 2: Transform to Web Mercator for calculations
gdf_projected = gdf.to_crs("EPSG:3857")

print("Projected CRS:", gdf_projected.crs)  # EPSG:3857

# Step 3: Now you can do accurate calculations
distance = gdf_projected.iloc[0].geometry.distance(
    gdf_projected.iloc[1].geometry
)
print(f"Distance: {distance:.0f} meters")  # 6,730 meters

# Step 4: Transform back to WGS84 for storage
gdf_final = gdf_projected.to_crs("EPSG:4326")

This is the pattern. Memorize it. Use it every time.

My Airbnb Project: A Real Example

Let me show you how I used this in my Airbnb Price Hotspot Analyzer.

The problem:

  • 48,000 NYC Airbnb listings with lat/lon coordinates

  • Need to cluster them to find price hotspots

  • Need to calculate distances to landmarks

  • DBSCAN requires distance in meters (not degrees!)

Step 1: Load data in WGS84

python


import pandas as pd
import geopandas as gpd
from shapely.geometry import Point

# Load Airbnb data
df = pd.read_csv('AB_NYC_2019.csv')

# Create geometry from lat/lon
geometry = [Point(xy) for xy in zip(df['longitude'], df['latitude'])]

# Create GeoDataFrame with WGS84
gdf = gpd.GeoDataFrame(
    df,
    geometry=geometry,
    crs="EPSG:4326"  # WGS84 - the data's native format
)

print(f"Loaded {len(gdf)} listings in WGS84")

Step 2: Transform to EPSG:3857 for calculations

python


# Transform to Web Mercator for meter-based calculations
gdf_meters = gdf.to_crs("EPSG:3857")

print("Coordinate system changed to EPSG:3857")
print(f"Now coordinates are in meters, not degrees")

Step 3: Spatial clustering with DBSCAN

python


from sklearn.cluster import DBSCAN
import numpy as np

# Extract coordinates (now in meters!)
coords = np.array([[geom.x, geom.y] for geom in gdf_meters.geometry])

# DBSCAN with eps in meters
clustering = DBSCAN(
    eps=300,        # 300 meters (not degrees!)
    min_samples=10
).fit(coords)

gdf_meters['cluster'] = clustering.labels_

print(f"Found {gdf_meters['cluster'].nunique()} clusters")

Without transforming to EPSG:3857, this would fail:

  • eps=300 would mean 300 degrees (nonsense)

  • Clusters would be wildly wrong

  • Results would be garbage

Step 4: Calculate distance to landmarks

python


# Define landmarks in WGS84
landmarks = gpd.GeoDataFrame({
    'name': ['Times Square', 'Central Park', 'Empire State'],
    'geometry': [
        Point(-73.9851, 40.7589),
        Point(-73.9654, 40.7829),
        Point(-73.9857, 40.7484)
    ]
}, crs="EPSG:4326")

# Transform landmarks to EPSG:3857
landmarks_meters = landmarks.to_crs("EPSG:3857")

# Calculate distance from each listing to nearest landmark
def nearest_landmark_distance(point):
    distances = landmarks_meters.geometry.distance(point)
    return distances.min()

gdf_meters['distance_to_landmark'] = gdf_meters.geometry.apply(
    nearest_landmark_distance
)

print(f"Distance range: {gdf_meters['distance_to_landmark'].min():.0f} to "
      f"{gdf_meters['distance_to_landmark'].max():.0f} meters")

Results:

  • Accurate distance calculations

  • Meaningful clusters

  • Proximity analysis that actually works

All because I transformed to the right coordinate system first.

Other Important Coordinate Systems

Beyond WGS84 and Web Mercator, here are a few you might encounter:

NAD83 / State Plane (US-specific)

What: Projected CRS optimized for specific US states

Example: EPSG:2263 (New York State Plane, Long Island)

When to use:

  • ✅ High-accuracy measurements in a specific state

  • ✅ Surveying and engineering projects

  • ✅ Local government GIS data

Why: More accurate than Web Mercator for a specific region

python


# NYC data is often in NY State Plane
gdf_stateplane = gdf.to_crs("EPSG:2263")

UTM (Universal Transverse Mercator)

What: Divides world into 60 zones, each with its own projection

Example: EPSG:32618 (UTM Zone 18N - covers NYC)

When to use:

  • ✅ Scientific applications

  • ✅ High-accuracy local measurements

  • ✅ Military and aviation

Why: Very accurate within a specific zone

python


# NYC is in UTM Zone 18N
gdf_utm = gdf.to_crs("EPSG:32618")

Local Custom Projections

Some regions have their own coordinate systems optimized for local accuracy.

Examples:

  • British National Grid (EPSG:27700)

  • Irish Grid (EPSG:29903)

  • Swiss Grid (EPSG:21781)

When to use: Working with official government data from that country

How to Choose the Right Coordinate System

Decision tree for choosing a CRS:

Question 1: Are you doing calculations?

NO → Use WGS84 (EPSG:4326)

  • Just displaying points on a map

  • Storing in database

  • Sending to an API

YES → Continue to Question 2

Question 2: What's your geographic scope?

Global or multi-continent → Web Mercator (EPSG:3857)

  • Works everywhere (except poles)

  • Standard for web mapping

  • Good enough for most use cases

Single country → Local projection

  • US: State Plane or UTM

  • UK: British National Grid

  • More accurate for local analysis

Single city → UTM or local projection

  • Find the UTM zone for that city

  • Or use local government's preferred CRS

Question 3: What's your accuracy requirement?

Rough/approximate → Web Mercator (EPSG:3857)

  • Good enough for most business applications

  • Error is usually <1% for mid-latitudes

High accuracy → Local projection

  • Surveying

  • Engineering

  • Legal boundaries

  • Use State Plane, UTM, or local grid

Common Coordinate System Mistakes

Let me save you from these painful errors:

Mistake #1: Calculating Distance in WGS84

WRONG:

python


# This is garbage
lat1, lon1 = 40.7128, -74.0060
lat2, lon2 = 40.7589, -73.9851

distance = ((lat2 - lat1)**2 + (lon2 - lon1)**2)**0.5
# Result: 0.0529 ← Meaningless degrees!

CORRECT:

python


from shapely.geometry import Point
import geopandas as gpd

# Create points in WGS84
p1 = gpd.GeoSeries([Point(-74.0060, 40.7128)], crs="EPSG:4326")
p2 = gpd.GeoSeries([Point(-73.9851, 40.7589)], crs="EPSG:4326")

# Transform to meters
p1_m = p1.to_crs("EPSG:3857")
p2_m = p2.to_crs("EPSG:3857")

# Calculate distance
distance = p1_m.distance(p2_m)[0]
# Result: 6730 meters ← Accurate!

Mistake #2: Mixing Coordinate Systems

WRONG:

python


# Points in different CRS!
gdf1 = gpd.GeoDataFrame(..., crs="EPSG:4326")  # WGS84
gdf2 = gpd.GeoDataFrame(..., crs="EPSG:3857")  # Web Mercator

# This will give wrong results
distances = gdf1.geometry.distance(gdf2.geometry)

CORRECT:

python


# Make sure both use same CRS
gdf1 = gpd.GeoDataFrame(..., crs="EPSG:4326")
gdf2 = gpd.GeoDataFrame(..., crs="EPSG:4326")

# Transform both to same projected CRS
gdf1_m = gdf1.to_crs("EPSG:3857")
gdf2_m = gdf2.to_crs("EPSG:3857")

# Now calculations work
distances = gdf1_m.geometry.distance(gdf2_m.geometry)

Mistake #3: Not Setting CRS When Creating GeoDataFrame

WRONG:

python


# No CRS specified!
gdf = gpd.GeoDataFrame(df, geometry=geometry)
print(gdf.crs)  # None ← GeoPandas doesn't know what it is

CORRECT:

python


# Always specify CRS
gdf = gpd.GeoDataFrame(
    df,
    geometry=geometry,
    crs="EPSG:4326"  # Explicitly set
)
print(gdf.crs)  # EPSG:4326 ← GeoPandas knows how to transform

Mistake #4: Using Web Mercator for Global Area Calculations

WRONG:

python


# Web Mercator distorts areas, especially near poles
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world_merc = world.to_crs("EPSG:3857")

# Areas will be wrong (Greenland looks huge!)
areas = world_merc.geometry.area

CORRECT:

python


# Use equal-area projection for area calculations
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world_equal_area = world.to_crs("ESRI:54009")  # Mollweide equal-area

# Now areas are accurate
areas = world_equal_area.geometry.area

Practical Tips for Working with Coordinate Systems

Tip #1: Always Check Your CRS

python


# Always check CRS when loading data
gdf = gpd.read_file('data.geojson')
print(f"CRS: {gdf.crs}")

# If it's None, set it manually
if gdf.crs is None:
    gdf.set_crs("EPSG:4326", inplace=True)

Tip #2: Keep Original WGS84 Data

python


# Keep original data in WGS84
gdf_original = gpd.read_file('data.geojson')

# Create projected copy for calculations
gdf_projected = gdf_original.to_crs("EPSG:3857")

# Do calculations on projected
distances = gdf_projected.geometry.distance(...)

# Add results back to original
gdf_original['distance'] = distances

# Save in WGS84
gdf_original.to_file('output.geojson')

Tip #3: Use .to_crs() Liberally

python


# Transform whenever needed, it's cheap
gdf_wgs84 = gdf.to_crs("EPSG:4326")  # For storage
gdf_meters = gdf.to_crs("EPSG:3857")  # For calculations
gdf_display = gdf.to_crs("EPSG:4326")  # For mapping

# GeoPandas handles the math

Tip #4: Understand Your Data's Native CRS

python


# Different data sources use different CRS
google_maps_api = "EPSG:4326"  # WGS84
openstreetmap = "EPSG:4326"    # WGS84
us_census = "EPSG:4269"        # NAD83
uk_ordnance = "EPSG:27700"     # British National Grid

# Check documentation or metadata

When Coordinate Systems Really Matter

Let me show you scenarios where getting this wrong is catastrophic:

Scenario 1: Real Estate Price Analysis

python


# Finding properties within 500m of subway stations

# WRONG - using WGS84
properties_wgs84 = gpd.read_file('properties.geojson')  # EPSG:4326
subways_wgs84 = gpd.read_file('subway.geojson')        # EPSG:4326

# This buffer is in degrees, not meters!
buffer_wrong = subways_wgs84.buffer(0.005)  # 0.005 degrees ≈ 500m? NO!

# CORRECT - using projected CRS
properties_m = properties_wgs84.to_crs("EPSG:3857")
subways_m = subways_wgs84.to_crs("EPSG:3857")

# This buffer is actually 500 meters
buffer_correct = subways_m.buffer(500)  # 500 meters

# Find properties within buffer
near_subway = properties_m[properties_m.geometry.within(buffer_correct.unary_union)]

Impact of getting it wrong: Mislabeled properties, bad investment decisions, unhappy clients

Scenario 2: Emergency Response Planning

python


# Finding hospitals within 10km of incident

# WRONG
incident = Point(-74.0060, 40.7128)  # NYC
hospitals_wgs84 = gpd.read_file('hospitals.geojson')

# This is meaningless
nearby = hospitals_wgs84[
    hospitals_wgs84.geometry.distance(incident) < 0.09  # 0.09 degrees ≈ 10km? NO!
]

# CORRECT
incident_gdf = gpd.GeoDataFrame(
    {'geometry': [Point(-74.0060, 40.7128)]},
    crs="EPSG:4326"
)
incident_m = incident_gdf.to_crs("EPSG:3857")
hospitals_m = hospitals_wgs84.to_crs("EPSG:3857")

# Actually 10,000 meters
nearby = hospitals_m[
    hospitals_m.geometry.distance(incident_m.iloc[0].geometry) < 10000
]

Impact of getting it wrong: Wrong hospitals dispatched, delayed response, lives at risk

Scenario 3: Fraud Detection

python


# Detecting impossible travel (credit card fraud)

# Transaction 1: NYC at 2:00 PM
# Transaction 2: Boston at 2:30 PM
# Time difference: 30 minutes

# WRONG - distance in degrees
nyc = Point(-74.0060, 40.7128)
boston = Point(-71.0589, 42.3601)

deg_distance = ((boston.x - nyc.x)**2 + (boston.y - nyc.y)**2)**0.5
# Result: 3.3 degrees ← Meaningless!

# CORRECT - distance in meters
nyc_gdf = gpd.GeoDataFrame({'geometry': [nyc]}, crs="EPSG:4326")
boston_gdf = gpd.GeoDataFrame({'geometry': [boston]}, crs="EPSG:4326")

nyc_m = nyc_gdf.to_crs("EPSG:3857")
boston_m = boston_gdf.to_crs("EPSG:3857")

meter_distance = nyc_m.geometry.distance(boston_m.geometry)[0]
# Result: 306,000 meters (306 km)

# Check if travel is possible
speed_required = meter_distance / (30 * 60)  # meters per second
speed_kmh = speed_required * 3.6
# Result: 612 km/h ← Impossible! Flag as fraud

Impact of getting it wrong: Missed fraud, financial losses, compromised accounts

Quick Reference: Common EPSG Codes

Global:

  • EPSG:4326 - WGS84 (lat/lon in degrees) - GPS standard

  • EPSG:3857 - Web Mercator (meters) - Google Maps, OpenStreetMap

United States:

  • EPSG:4269 - NAD83 (lat/lon)

  • EPSG:2163 - US National Atlas Equal Area

  • EPSG:2263 - New York State Plane (Long Island)

  • EPSG:32618 - UTM Zone 18N (NYC area)

Europe:

  • EPSG:27700 - British National Grid

  • EPSG:29903 - Irish Grid

  • EPSG:25832 - ETRS89 / UTM Zone 32N (Central Europe)

Equal-Area (for area calculations):

  • ESRI:54009 - Mollweide

  • ESRI:54034 - World Cylindrical Equal Area

Find more at: https://epsg.io/

Conclusion

Coordinate systems aren't optional knowledge. They're fundamental to geospatial analysis.

The key principles:

  1. WGS84 (EPSG:4326) for storage and display

    • Lat/lon in degrees

    • GPS standard

    • Don't do math in it

  2. Projected CRS (EPSG:3857 or local) for calculations

    • X/Y in meters

    • Transform before calculating

    • Web Mercator works for most cases

  3. Always transform before spatial operations

    • Distance, area, clustering, buffers

    • Check your CRS: gdf.crs

    • Use .to_crs() liberally

  4. Match CRS when combining datasets

    • Don't mix WGS84 and Web Mercator

    • Transform to same CRS first

    • GeoPandas will warn you (sometimes)

The workflow:

python


# 1. Load in WGS84
gdf = gpd.read_file('data.geojson')

# 2. Transform to projected
gdf_m = gdf.to_crs("EPSG:3857")

# 3. Do calculations
distances = gdf_m.geometry.distance(...)
clusters = DBSCAN(eps=500).fit(coords)

# 4. Transform back if needed
gdf_final = gdf_m.to_crs("EPSG:4326")

Remember: Degrees are for storing. Meters are for calculating.

Get this right, and your geospatial analysis will actually work. Get it wrong, and you'll waste hours debugging nonsense results.

Want to see this in action?

Check out my Airbnb Price Hotspot Analyzer project where I use these exact techniques to analyze 48,000 listings:

  • GitHub: github.com/Shodexco/airbnb-hotspot-analyzer

Questions? Let's connect:

  • Portfolio: jonathansodeke.framer.website

  • GitHub: github.com/Shodexco

  • LinkedIn: [Your LinkedIn]

Now go fix your coordinate systems. Your spatial analysis will thank you.

About the Author

Jonathan Sodeke is a Data Engineer and ML Engineer specializing in geospatial data processing and production MLOps systems. He's built systems processing millions of location records and learned coordinate systems the hard way so you don't have to.

When he's not transforming coordinate systems at 2am, he's building AI systems and teaching others to work with geospatial data.

Portfolio: jonathansodeke.framer.website
GitHub: github.com/Shodexco
LinkedIn: www.linkedin.com/in/jonathan-sodeke

Sign Up To My Newsletter

Get notified when a new article is posted.

Sign Up To My Newsletter

Get notified when a new article is posted.

Sign Up To My Newsletter

Get notified when a new article is posted.

© Jonathan Sodeke 2025

© Jonathan Sodeke 2025

© Jonathan Sodeke 2025

Create a free website with Framer, the website builder loved by startups, designers and agencies.