Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction
Options
BORIS DOI
Official URL
Description
Understanding drivers’ decision-making is crucial for road safety. Although pre dicting the ego-vehicle’s path is valuable for driver-assistance systems, existing
methods mainly focus on external factors like other vehicles’ motions, often ne glecting the driver’s attention and intent. To address this gap, we infer the ego trajectory by integrating the driver’s gaze and the surrounding scene. We intro duce RouteFormer, a novel multimodal ego-trajectory prediction network com bining GPS data, environmental context, and driver field-of-view—comprising
first-person video and gaze fixations. We also present the Path Complexity Index
(PCI), a new metric for trajectory complexity that enables a more nuanced evalu ation of challenging scenarios. To tackle data scarcity and enhance diversity, we
introduce GEM, a comprehensive dataset of urban driving scenarios enriched with
synchronized driver field-of-view and gaze data. Extensive evaluations on GEM
and DR(eye)VE demonstrate that RouteFormer significantly outperforms state-of the-art methods, achieving notable improvements in prediction accuracy across
diverse conditions. Ablation studies reveal that incorporating driver field-of-view
data yields significantly better average displacement error, especially in challeng ing scenarios with high PCI scores, underscoring the importance of modeling
driver attention. All data and code is available at meakbiyik.github.io/routeformer.
methods mainly focus on external factors like other vehicles’ motions, often ne glecting the driver’s attention and intent. To address this gap, we infer the ego trajectory by integrating the driver’s gaze and the surrounding scene. We intro duce RouteFormer, a novel multimodal ego-trajectory prediction network com bining GPS data, environmental context, and driver field-of-view—comprising
first-person video and gaze fixations. We also present the Path Complexity Index
(PCI), a new metric for trajectory complexity that enables a more nuanced evalu ation of challenging scenarios. To tackle data scarcity and enhance diversity, we
introduce GEM, a comprehensive dataset of urban driving scenarios enriched with
synchronized driver field-of-view and gaze data. Extensive evaluations on GEM
and DR(eye)VE demonstrate that RouteFormer significantly outperforms state-of the-art methods, achieving notable improvements in prediction accuracy across
diverse conditions. Ablation studies reveal that incorporating driver field-of-view
data yields significantly better average displacement error, especially in challeng ing scenarios with high PCI scores, underscoring the importance of modeling
driver attention. All data and code is available at meakbiyik.github.io/routeformer.
Date of Publication
2025-01-22
Publication Type
Conference Item
Keyword(s)
Ego-trajectory prediction
•
driver attention
•
multimodal learning
•
field-of-view
•
gaze fixations
•
deep learning
•
autonomous driving
•
driver behavior modeling
•
dataset creation
Language(s)
en
Contributor(s)
Akbiyik, M. Eren | |
Savov, Nedko | |
Paudel, Danda Pani | |
Popovic, Nikola | |
Hilliges, Otmar | |
Van Gool, Luc | |
Wang, Xi |
Additional Credits
Publisher
Cornell University
Access(Rights)
restricted