Large-scale autonomous driving datasets such as the Waymo Open Motion Dataset (WOMD) provide rich scenario data with detailed lane graphs, vehicle trajectories, and traffic signal states. However, we don't know what intersection types the training data actually covers — scenarios exist as isolated coordinate frames with no connection to real-world road networks, making it impossible to assess geographic bias or coverage gaps.
We present WayGraph, a method for mapping WOMD scenarios onto real-world OpenStreetMap (OSM) road networks using star pattern fingerprinting. Our approach extracts a compact 48-dimensional topology descriptor from each intersection's lane graph, encoding the center intersection properties and its 1-hop neighborhood structure. By hierarchically matching these fingerprints against a database of 5,904 OSM intersections, we achieve 90% top-1 accuracy on San Francisco scenarios.
This mapping reveals the geographic blind spots in training data — which intersection types are over- or under-represented, and how dataset composition (e.g., San Francisco's grid networks) may limit generalizability to cities with different road topologies (e.g., Montreal's diagonal streets and irregular intersections). Furthermore, we demonstrate temporal continuity validation — overlapping 9.1s scenario windows can be chained into continuous driving sequences (483 connections, position error <0.01m), enabling corridor-level traffic analysis and providing independent validation of our matching accuracy (56% path connectivity vs 3.9% baseline, p<0.001).
WayGraph operates as a three-stage pipeline that transforms raw WOMD lane graphs into geolocated intersection matches on OpenStreetMap. Each stage progressively refines the search space, enabling efficient and accurate matching across thousands of candidate intersections.
Raw WOMD lane graphs contain hundreds of lane segments with complex connectivity. We identify intersection centers by detecting lane convergence zones, then extract the local topology: approach arms, lane counts per arm, traffic control types, and connectivity between approaching and departing lanes.
Each intersection is encoded as a 48-dimensional feature vector capturing both the center intersection properties (6 features) and its 1-hop neighborhood (6 arms × 7 features per arm). This compact descriptor is rotation-invariant and robust to geometric noise in the WOMD lane graphs.
Candidate OSM intersections are filtered hierarchically: first by approach count (exact match), then by traffic control type, and finally by star pattern distance (weighted Euclidean distance in 48D space). This reduces the search space from 5,904 candidates to a small shortlist before fine-grained matching.
The core innovation of WayGraph is the star pattern — a structured 48-dimensional vector that captures the topological fingerprint of an intersection and its immediate neighborhood. The name derives from the star-shaped pattern formed by the intersection center and its radiating approach arms.
Intersection type (signalized, stop-controlled, uncontrolled)
Number of approaches (3-way, 4-way, 5+ way)
Total lane count across all arms
Traffic control configuration
Symmetry indicators
Connectivity density
Approach direction (angular encoding)
Road type (arterial, collector, local)
Inbound lane count
Outbound lane count
Traffic control at arm (signal phase, stop sign)
Speed limit category
Neighbor intersection distance and type
We evaluate the matching pipeline on San Francisco scenarios from the Waymo Open Motion Dataset, where ground-truth intersection locations are available for validation. The hierarchical filtering strategy progressively narrows candidates before computing star pattern distances.
Step 1 — Approach Count Filter: Eliminate all OSM intersections with a different number of approaches (e.g., 3-way vs 4-way). This alone reduces candidates by ~60%.
Step 2 — Traffic Control Filter: Match traffic control configurations (signalized vs stop-controlled). Reduces remaining candidates by ~50%.
Step 3 — Star Pattern Distance: Compute weighted Euclidean distance in 48D space between the query and remaining candidates. Rank by distance and return top-K matches.
Beyond intersection-level matching, WayGraph performs route-level localization by matching complete driving routes across consecutive scenarios. This validates geographic accuracy through spatial consistency of matched sequences.
Waymo extracts overlapping 9.1-second sliding windows from continuous recording drives. We discovered that scenarios within each shard share a global coordinate frame — meaning hundreds of these short clips can be chained back into continuous driving sequences, each lasting 13–18 seconds. Across 252 shards, we found 483 temporal connections linking 369 chains, with ego positions aligning to sub-centimeter precision in overlap regions.
This is the key enabler for corridor-level traffic analysis: instead of treating each 9.1s scenario as an isolated snapshot, we can reconstruct extended driving trajectories along real road segments. It also provides powerful independent validation of our star pattern matching — if temporally consecutive scenarios independently match to adjacent OSM intersections, the geographic localization is correct.
Each tab is a pair of overlapping 9.1s scenarios from the same continuous drive. Orange = Scene A, Blue = Scene B, Green = overlap region where both scenes observe the same physical space. Drag to pan, scroll to zoom.
Matched routes enable aggregation of traffic data at the corridor level, providing travel time reliability metrics and identifying speed transition zones across the road network.
By joining matched intersections with historical crash records, WayGraph enables safety analysis of the scenarios represented in the Waymo dataset, revealing systematic sampling biases.
Once scenarios are localized to real-world intersections, WayGraph extracts detailed traffic data from the vehicle trajectories within each matched scenario. This bridges the gap between autonomous driving datasets and traditional traffic engineering, enabling data-driven calibration of microsimulation models.
Vehicle trajectories within each scenario are classified by movement type (left turn, through, right turn, U-turn) based on their entry and exit arms. Aggregating across multiple scenarios at the same intersection yields statistically robust turning ratio estimates.
Instantaneous and approach speeds are extracted from trajectory data, providing per-intersection speed profiles segmented by movement type and time of day. These distributions directly calibrate microsimulation speed parameters.
For unsignalized movements (e.g., left turns at signals during permissive phases), gap acceptance behavior is extracted by analyzing the time headways between conflicting vehicles and the decision to proceed or yield.
To verify matching accuracy beyond quantitative metrics, we visually validated 50 randomly selected scenarios against satellite imagery from Google Maps.
WayGraph is released as an open-source Python toolkit with a modular architecture. Each stage of the pipeline is implemented as a standalone module, allowing researchers to use individual components (e.g., just the topology extraction, or just the OSM matching) independently.
Parse WOMD protobuf files and extract lane graphs, vehicle trajectories, and traffic signal states into a unified data structure.
Identify intersection centers, extract approach arms, compute lane counts, and classify traffic control types from raw lane graph data.
Compute the 48D star pattern vector for any intersection, with configurable feature weights and normalization options.
Download and process OSM road networks for any geographic area, build an intersection database with star pattern fingerprints, and perform hierarchical matching.
Extract turning ratios, speed distributions, and gap acceptance parameters from matched scenarios, with export to standard traffic engineering formats.
WayGraph has been validated on scenarios from the two major metropolitan areas covered by the Waymo Open Motion Dataset: San Francisco and Phoenix. The San Francisco region provides the densest coverage with 5,904 OSM intersections in the matching database, while Phoenix extends coverage to a different urban morphology with wider roads and grid-pattern intersections.