AN EMPIRICAL ANALYSIS OF TRAJECTORY
PREDICTION TECHNIQUES FOR MOTION
PREDICTION IN WAYMO DATASET
Devansh Arora
Indraprastha Institute of Information Technology (IIIT) Delhi, India
devansh20053@iiitd.ac.in
Parul Arora
Dept. of Computer Science and Applications. Bharati Vidyapeeth’s Institute of
Computer Applications and Management (BVICAM). Delhi, India
paruldevsum@gmail.com
Ritika Wason*
Dept. of Computer Science and Applications. Bharati Vidyapeeth’s Institute of
Computer Applications and Management (BVICAM). Delhi, India
ritika.wason@bvicam.in
Reception: 15/02/2023 Acceptance: 21/04/2023 Publication: 10/07/2023
Suggested citation:
Devansh, A., Parul, A. And Ritika, W. (2023). An Empirical Analysis of
Trajectory Prediction Techniques for Motion Prediction in Waymo
Dataset. 3C Tecnología. Glosas de innovación aplicada a la pyme, 12(2),
49-63. https://doi.org/10.17993/3ctecno.2023.v12n2e44.49-63
https://doi.org/10.17993/3ctecno.2023.v12n2e44.49-63
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.44 | Iss.12 | N.2 April - June 2023
49
ABSTRACT
The Waymo is the prime and most varied autonomous driving dataset that improves
and enhances itself every year. Motion Prediction is a considerable challenge in 2023.
This manuscript analyses five considerable methods namely MTR-A, Wayformer,
DenseTNT, Golfer and MultiPath++ for their technology applied. The analysis
revealed that the Transformer network could achieve a state of the art trajectory
prediction as well as scale to many workloads.
KEYWORDS
Trajectory Prediction, Waymo Dataset, Motion Prediction, Transformer Network,
Autonomous Driving.
INDEX
ABSTRACT
KEYWORDS
1. INTRODUCTION
2. ADDING LABELS AND CHALLENGES TO WAYMO OPEN DATASET.
3. WAYMO OPEN DATASET: MOTION PREDICTION CHALLENGE
4. LEADERBOARD BEST SOLUTIONS
5. RESULTS
6. CONCLUSION
7. FUTURE SCOPE
REFERENCES
ABOUT THE AUTHORS
https://doi.org/10.17993/3ctecno.2023.v12n2e44.49-63
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.44 | Iss.12 | N.2 April - June 2023
50
ABSTRACT
The Waymo is the prime and most varied autonomous driving dataset that improves
and enhances itself every year. Motion Prediction is a considerable challenge in 2023.
This manuscript analyses five considerable methods namely MTR-A, Wayformer,
DenseTNT, Golfer and MultiPath++ for their technology applied. The analysis
revealed that the Transformer network could achieve a state of the art trajectory
prediction as well as scale to many workloads.
KEYWORDS
Trajectory Prediction, Waymo Dataset, Motion Prediction, Transformer Network,
Autonomous Driving.
INDEX
ABSTRACT
KEYWORDS
1. INTRODUCTION
2. ADDING LABELS AND CHALLENGES TO WAYMO OPEN DATASET.
3. WAYMO OPEN DATASET: MOTION PREDICTION CHALLENGE
4. LEADERBOARD BEST SOLUTIONS
5. RESULTS
6. CONCLUSION
7. FUTURE SCOPE
REFERENCES
ABOUT THE AUTHORS
https://doi.org/10.17993/3ctecno.2023.v12n2e44.49-63
1. INTRODUCTION
Google, Uber, Tesla, Mobileye, and numerous automakers have lately made
substantial investments in autonomous driving systems, a futuristic use [7]
. The
autonomous driving technology permits the car to drive itself without human
assistance [15]
. The car with autonomous driving capacity detects its surroundings,
determines its position, and drives itself safely to the given target without human
intervention [27]
. Demand for this solution continues to rise, resulting in increased
industry investment [17]. Mobileye is a pioneer in computer vision-based autonomous
driving technology, and Intel just purchased the company for $15.3 billion. Forecasts
indicate that by 2035, the market for driverless vehicles will be worth $77 billion [4].
The number of autonomous vehicles is expected to reach 18 million by 2035, which
represents 25% of the market [3].
From robot axes to self-driving trucks, it is anticipated that autonomous driving
technology will enable a vast array of applications with the potential to save numerous
lives [1],[18]. The public availability of large-scale datasets and yardsticks has led to
substantial growth in the fields of image categorization, object recognition, object
trailing, semantic segmentation, and instance segmentation. Images obtained from
numerous high-resolution cameras and sensor readings from numerous high-quality
LiDAR scanners installed on a convoy of autonomous vehicles make up the Waymo
open data set, the largest and most diversified multimodal autonomous driving dataset
to date [6]. When compared to other autonomous driving datasets, ours captures a far
wider geographical range, both in terms of overall area covered and allotment of that
coverage across geographies [13]. Several cities, including San Francisco, Phoenix,
and Mountain View, were sampled across a variety of environmental circumstances,
and a vast geographical area was sampled within each city [5],[13],[20]. The dataset
demonstrates that the disparities in these regions result in a significant domain gap,
hence opening up intriguing potential for research in the field of domain adaptation [6].
Both 3D ground truth bounding boxes for the LiDAR data and 2D bounding boxes that
closely fit the camera images are included in the Waymo dataset, which has a large
number of them [12]
. Track IDs are present in all ground truth containers to assist
object tracking [26]. Finally, with our provided rolling shutter aware projection software,
scientists can derive 2D a modal camera boxes from 3D LiDAR boxes [2]
. Studies
involving LiDAR and camera annotations are bolstered by the multimodal ground
truth. There are about 12 million camera box annotations, 12 million LiDAR object
tracks, and about 250 thousand camera image tracks [10]. Professional labelers used
labelling tools suitable for production to make and verify all annotations. It captured all
of the sensor data in our dataset using an industrial-strength sensor suite consisting of
numerous high-resolution cameras and multiple high-quality LiDAR sensors.
Moreover, we provide camera and LiDAR synchronization, which enables exciting
cross-domain learning and transfer [2]
. Every pixel in the range images we supply
also includes accurate information about the vehicle's attitude, in addition to sensor
attributes like as elongation. Since this is the original synchronized dataset with such
low-level information, it will facilitate studies of alternative LiDAR input formats to the
https://doi.org/10.17993/3ctecno.2023.v12n2e44.49-63
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.44 | Iss.12 | N.2 April - June 2023
51
standard 3D point set format [6],[8]. Now, there are 1000 scenarios used for training
and validation, along with 150 scenes used for testing; every scene lasts for 20
seconds [6]
. To see how effectively the models, we've trained on our dataset
generalize to new environments, we might choose test set scenarios from a
geographical holdout area [24],[26].
2. ADDING LABELS AND CHALLENGES TO WAYMO
OPEN DATASET.
To broaden the scope of academic inquiry, new labels have been added to the
Waymo Open Dataset [6]. The following are included in the extension: The evaluation
of central features and spatial context can be a useful extension of models for
predicting perception and behavior. Subtle cues, such as a bicycle signaling a turn,
are not lost on them. The key point label release is the largest dataset of its kind that
is freely accessible for research into autonomous vehicles. We're energized to see
how the research neighborhood at large puts it to use to progress the field of human
posture evaluation.
Although segmentation has long been recognized as a valuable tool in the
academic world, the vast majority of publicly available datasets for autonomous
driving only provide bounding boxes to characterize and categorize objects, which
might lead to the absence of critical information. In order to identify and categorize
each pixel in an image or LiDAR point cloud as part of a certain object, segmentation
labelling is employed [11]. This remarkable level of granularity is made possible by the
insertion of 3D segmentation labels for 23 classes and 1,150 segments of the Waymo
Open Dataset [6],[17].
It could be confusing or time-consuming to match up the bounding boxes from a 2D
camera with their 3D equivalents in LiDAR labels. In order to promote further research
on sensor fusion for object recognition and detection, we have added labels based on
the standard 2D-to-3D bounding box correspondence.
Along from all these new tools, Waymo has also launched the 2023 Waymo Open
Dataset Challenges, which will have participants forecast the whereabouts of up to
eight agents eight seconds into the future using only the agents' historical one-second
tracks on an associated map [14],[23],[25].
3. WAYMO OPEN DATASET: MOTION PREDICTION
CHALLENGE
The capacity to predict the behavior of other drivers is essential for safe and
successful driving [25]. Important questions can be: Is that the sound of a pedestrian
trying to cross? How close is that car to entering my lane, and is it parallel parked? Is
the speeding car going to roll through the stop sign? One of the most demanding
https://doi.org/10.17993/3ctecno.2023.v12n2e44.49-63
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.44 | Iss.12 | N.2 April - June 2023
52
standard 3D point set format [6],[8]. Now, there are 1000 scenarios used for training
and validation, along with 150 scenes used for testing; every scene lasts for 20
seconds [6]. To see how effectively the models, we've trained on our dataset
generalize to new environments, we might choose test set scenarios from a
geographical holdout area [24],[26].
2. ADDING LABELS AND CHALLENGES TO WAYMO
OPEN DATASET.
To broaden the scope of academic inquiry, new labels have been added to the
Waymo Open Dataset [6]. The following are included in the extension: The evaluation
of central features and spatial context can be a useful extension of models for
predicting perception and behavior. Subtle cues, such as a bicycle signaling a turn,
are not lost on them. The key point label release is the largest dataset of its kind that
is freely accessible for research into autonomous vehicles. We're energized to see
how the research neighborhood at large puts it to use to progress the field of human
posture evaluation.
Although segmentation has long been recognized as a valuable tool in the
academic world, the vast majority of publicly available datasets for autonomous
driving only provide bounding boxes to characterize and categorize objects, which
might lead to the absence of critical information. In order to identify and categorize
each pixel in an image or LiDAR point cloud as part of a certain object, segmentation
labelling is employed [11]. This remarkable level of granularity is made possible by the
insertion of 3D segmentation labels for 23 classes and 1,150 segments of the Waymo
Open Dataset [6],[17].
It could be confusing or time-consuming to match up the bounding boxes from a 2D
camera with their 3D equivalents in LiDAR labels. In order to promote further research
on sensor fusion for object recognition and detection, we have added labels based on
the standard 2D-to-3D bounding box correspondence.
Along from all these new tools, Waymo has also launched the 2023 Waymo Open
Dataset Challenges, which will have participants forecast the whereabouts of up to
eight agents eight seconds into the future using only the agents' historical one-second
tracks on an associated map [14],[23],[25].
3. WAYMO OPEN DATASET: MOTION PREDICTION
CHALLENGE
The capacity to predict the behavior of other drivers is essential for safe and
successful driving [25]. Important questions can be: Is that the sound of a pedestrian
trying to cross? How close is that car to entering my lane, and is it parallel parked? Is
the speeding car going to roll through the stop sign? One of the most demanding
https://doi.org/10.17993/3ctecno.2023.v12n2e44.49-63
aspects of autonomous driving is accurately predicting the behavior of other road
users. There are also serious safety concerns; being able to precisely predict the
actions of other drivers is crucial for avoiding collisions. While researchers in the
ground of autonomous vehicles have made significant strides in recent years in
solving the problem of motion prediction, the industry would benefit from having
access to even more high-quality open-source motion data.
To the best of our knowledge, the Waymo Open Dataset motion challenge is the
largest interactive dataset released to date for study of behavior prediction and motion
forecasting for autonomous driving, and we've expanded it in this work. In order to
help any research group looking into how to construct its own high-quality motion
data, we are reviewing all the articles describing the state-of-the-art research
perception method used to annotate the motion dataset. This is especially true of
high-quality motion data, which can be difficult to come by and sometimes costs a lot
of money to obtain.
An advanced perception system is needed to build a motion dataset with high-
quality labels, as this requires the ability to reliably identify agents and objects from
camera and LidaR data, as well as track their movement within the image. The
collection of compelling motion data is similarly difficult. Most commutes are
uneventful, therefore there is little to no useful information to use in developing a
system to anticipate what can happen on the road under extreme circumstances. As a
result, there are usually just a few of interesting interactions included in the datasets
that are publicly available.
The Waymo Open Dataset is designed to address these issues. Predict the
positions of up to eight agents eight seconds into the future, given their 1 second-ago
tracks on a comparable map. The ground truth future data for the test set is concealed
from challenge participants in order to facilitate the motion prediction task. As a result,
the test sets only include one second of historical data. The validation sets contain the
actual future ground truth data for use in model building. In addition, the test and
validation sets include a list of up to eight predicted object tracks in the scene. They
are chosen for their engaging behavior and variety of object types.
4. LEADERBOARD BEST SOLUTIONS
Each Scenario Predictions proto within a motion prediction submission corresponds
to a single scenario in the test set and contains up to eight predictions for the objects
indicated in the tracks to predict field of the scenario [19],[9]. While these are distinct
forecasts, each Joint Predictions proto comprises a prediction for a single item. Each
Multi Modal Prediction prototype will include a maximum of six trajectory predictions,
each accompanied by a confidence rating. Trajectory forecasts must include precisely
16 position samples, each corresponding to the next 8 seconds and sampled at a rate
of 2 Hz. Wayformer's attention-based scene encoder/decoder is modest [16].
Nigamaa Nayakanti and all study scene encoder early, late, and hierarchical input
https://doi.org/10.17993/3ctecno.2023.v12n2e44.49-63
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.44 | Iss.12 | N.2 April - June 2023
53
fusion [16]
. Factorized or latent query attention balances efficiency and quality for
each fusion type. Nigamaa Nayakanti and all design philosophy proves that early
fusion, despite its simplicity, is modality neutral and performs at the top of the Waymo
Open Motion Dataset (WOMD) and Argoverse leaderboards. Shaoshuai Shi and all
offer a distinctive Motion Transformer framework for multimodal motion prediction,
which initiates a restricted set of novel motion query pairs for producing superior
multimodal future trajectories by conducting intention localization and iterative motion
refining simultaneously [19]
. Balakrishnan Varadarajan and all in their manuscript
directly uses agent state information and compact polylines to describe road features
(e.g., position, velocity, acceleration) [22]. Balakrishnan Varadarajan et. al. examines
pre-defined, static anchors and develop a model to discover latent anchor
embeddings end-to-end. Balakrishnan Varadarajan et. al. use ensembling and output
aggregation approaches from other ML areas to find appropriate probabilistic
multimodal output representations. Yueming Zhang introduces a real-time 2D object
detection algorithm from photos [25]
. Yueming Zhang aggregate multiple common
one-stage object detectors and train various input strategy models independently to
improve multi-scale identification of each category, notably small objects. TensorRT
optimizes detection pipeline inference time for model acceleration. Junru Gu offer an
anchor-free model, dubbed DenseTNT, which performs opaque goal probability
estimate for trajectory prediction [9]
. Without relying on the value of heuristically set
goal anchors, its performance vastly improves. In the next section we will compare
and analyze the leaderboard solutions and understand the research areas where
work can be done.
5. RESULTS
Table 1 below highlights the research gaps we discovered of our analysis of the five
above methodologies. These gaps lay the ground for considerable future research.
TABLE 1. Comparison of Considerable Leaderboard Solutions for Motion Prediction in
Waymo Dataset
https://doi.org/10.17993/3ctecno.2023.v12n2e44.49-63
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.44 | Iss.12 | N.2 April - June 2023
54
fusion [16]. Factorized or latent query attention balances efficiency and quality for
each fusion type. Nigamaa Nayakanti and all design philosophy proves that early
fusion, despite its simplicity, is modality neutral and performs at the top of the Waymo
Open Motion Dataset (WOMD) and Argoverse leaderboards. Shaoshuai Shi and all
offer a distinctive Motion Transformer framework for multimodal motion prediction,
which initiates a restricted set of novel motion query pairs for producing superior
multimodal future trajectories by conducting intention localization and iterative motion
refining simultaneously [19]. Balakrishnan Varadarajan and all in their manuscript
directly uses agent state information and compact polylines to describe road features
(e.g., position, velocity, acceleration) [22]. Balakrishnan Varadarajan et. al. examines
pre-defined, static anchors and develop a model to discover latent anchor
embeddings end-to-end. Balakrishnan Varadarajan et. al. use ensembling and output
aggregation approaches from other ML areas to find appropriate probabilistic
multimodal output representations. Yueming Zhang introduces a real-time 2D object
detection algorithm from photos [25]. Yueming Zhang aggregate multiple common
one-stage object detectors and train various input strategy models independently to
improve multi-scale identification of each category, notably small objects. TensorRT
optimizes detection pipeline inference time for model acceleration. Junru Gu offer an
anchor-free model, dubbed DenseTNT, which performs opaque goal probability
estimate for trajectory prediction [9]. Without relying on the value of heuristically set
goal anchors, its performance vastly improves. In the next section we will compare
and analyze the leaderboard solutions and understand the research areas where
work can be done.
5. RESULTS
Table 1 below highlights the research gaps we discovered of our analysis of the five
above methodologies. These gaps lay the ground for considerable future research.
TABLE 1. Comparison of Considerable Leaderboard Solutions for Motion Prediction in
Waymo Dataset
https://doi.org/10.17993/3ctecno.2023.v12n2e44.49-63
S. No.
Title
Ref.
Findings
Research Gaps
1
MTR-A: 1st Place
Solution for 2022
Waymo Open
Dataset
Challenge -
Motion Prediction
[19]
We introduce
the Motion
Transformer, a
novel
architecture for
multimodal
motion
prediction that
uses
simultaneous
intention
localization and
iterative motion
refinement to
generate better
multimodal
future
trajectories.
To further
improve the
performance of
the final model,
a basic model
ensemble
technique with
non-maximal
suppression is
employed.
Approach came
in first on the
leaderboard
and did better
than all the
other
submissions in
terms of Soft
mAP, mAP, and
the miss rate.
This means that
their method is
better at
predicting
multimodal
future
trajectories.
Agent-centric
modelling forecasts
the multimodal
future trajectories
of a single
interested agent
while redundantly
encoding the
situation for
additional
interested actors.
So, it is an
upcoming problem
to build a
multimodal motion
prediction system
for several actors.
Even when using a
rule-based post-
processing
method, accuracy
in predicting the
minADE/minFDE
can be low. If you
want a more solid
structure, it's worth
your time to learn
how to generate 6
possible future
trajectories using
multimodal
predictions (e.g.,
64 predictions).
https://doi.org/10.17993/3ctecno.2023.v12n2e44.49-63
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.44 | Iss.12 | N.2 April - June 2023
55
2
DenseTNT:
Waymo Open
Dataset Motion
Prediction
[9]
DenseTNT is a
model without
anchors that
conducts dense
goal probability
estimate for
trajectory
prediction. The
author extracts
sparse scene
context
characteristics
before
employing a
dense
probability
estimation to
construct the
probability
distribution of
the goal
candidates. A
trajectory
completion
module then
generates
trajectories
depending on a
set of selected
objectives.
The objective
candidates are
densely
dispersed over
the map in
DenseTNT. We
display the
probability of
the dense goals
and the
anticipated
trajectories
based on the
specified goals.
DenseTNT
provides
different
predictions,
including
travelling
straight, making
left/right turns,
and U-turns.
Complex trajectory
generation in
dense TNT is
computationally
intensive and time
consuming,
especially in
dynamic
environments with
moving obstacles.
As a result, its
usefulness in real-
time contexts may
be hampered. In
addition, Dense
TNT is highly
sensitive to the
initial conditions,
with even a small
shift in the robot's
or the obstacles'
starting position
leading to a
dramatically
different path. In
applications where
the initial
conditions are
uncertain or may
change during the
plan's execution,
this can be
problematic.
https://doi.org/10.17993/3ctecno.2023.v12n2e44.49-63
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.44 | Iss.12 | N.2 April - June 2023
56
2
DenseTNT:
Waymo Open
Dataset Motion
Prediction
[9]
DenseTNT is a
model without
anchors that
conducts dense
goal probability
estimate for
trajectory
prediction. The
author extracts
sparse scene
context
characteristics
before
employing a
dense
probability
estimation to
construct the
probability
distribution of
the goal
candidates. A
trajectory
completion
module then
generates
trajectories
depending on a
set of selected
objectives.
The objective
candidates are
densely
dispersed over
the map in
DenseTNT. We
display the
probability of
the dense goals
and the
anticipated
trajectories
based on the
specified goals.
DenseTNT
provides
different
predictions,
including
travelling
straight, making
left/right turns,
and U-turns.
Complex trajectory
generation in
dense TNT is
computationally
intensive and time
consuming,
especially in
dynamic
environments with
moving obstacles.
As a result, its
usefulness in real-
time contexts may
be hampered. In
addition, Dense
TNT is highly
sensitive to the
initial conditions,
with even a small
shift in the robot's
or the obstacles'
starting position
leading to a
dramatically
different path. In
applications where
the initial
conditions are
uncertain or may
change during the
plan's execution,
this can be
problematic.
https://doi.org/10.17993/3ctecno.2023.v12n2e44.49-63
3
Wayformer:
Motion
Forecasting via
Simple & Efficient
Attention
Networks
[16]
Wayformer is a
simple and
unified family of
attention-based
architectures for
motion
prediction
introduced in
this paper. A
scene encoder
and decoder
that is based on
attention are
the meat and
potatoes of
Wayformer's
model
description. We
explore the use
of early, late,
and hierarchical
input fusion in
the scene
encoder. We
look into
methods of
achieving a
happy medium
between speed
and accuracy,
using either
factorized or
latent query
attention, for
every possible
The results
obtained by
Wayformer on
the Waymo
Open Motion
Dataset
(WOMD) and
the Argoverse
leaderboards
validate the
effectiveness of
our design
philosophy and
show that early
fusion is not
only modality
agnostic but
also delivers
state-of-the-art
outcomes.
The following are
the limits placed on
the scope of this
investigation:
Processing the
same data over
and again is a
burden for
egocentric
modelling in
complex settings.
This can be
avoided by
encoding the scene
only once, in a
world-at-once
reference frame.
The input to the
system is a vague
and generalized
description of the
world, which leaves
out important
details in complex
situations, such as
indications from
human eyes or
fine-grained
contour or wheel
angle information
for vehicles.
Gaining an all-
encompassing
understanding of
perception and
prediction could
pave the way for
progress. Each
agent's distribution
over possible
futures is modelled
separately in time
and space, and
each agent's
distribution over
possible futures is
modelled
conditionally
independently in
time and space
given their goal.
These
https://doi.org/10.17993/3ctecno.2023.v12n2e44.49-63
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.44 | Iss.12 | N.2 April - June 2023
57
4
Golfer: Trajectory
Prediction with
Masked Goal
Conditioning
MnM Network
[21]
For the purpose
of AV trajectory
prediction,
authors provide
a universal
Transformer-
like architectural
module MnM
network with
innovative
masked goal
conditioning
training
methods.
It has been
demonstrated
that the
resulting MnM
network, which
consists of
solely MnM
blocks stacked
on top of one
another, is
superior since it
can predict
trajectories
given point-like
agent and road
inputs.
On May 23,
2022, authors
golfer-named
trajectory
prediction
model, which
was enhanced
with the new
masked goal
conditioning
and MnM
network, was
rated second on
the Waymo
Open Motion
Dataset
leaderboard.
In order to learn
cross-correlations
between items in a
set, the proposed
Mix and Match
(MnM) block, a
broad kind of set
transformation, has
been shown to be
particularly useful.
This building block
may not be suitable
for use in all
circumstances,
though.
https://doi.org/10.17993/3ctecno.2023.v12n2e44.49-63
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.44 | Iss.12 | N.2 April - June 2023
58
4
Golfer: Trajectory
Prediction with
Masked Goal
Conditioning
MnM Network
[21]
For the purpose
of AV trajectory
prediction,
authors provide
a universal
Transformer-
like architectural
module MnM
network with
innovative
masked goal
conditioning
training
methods.
It has been
demonstrated
that the
resulting MnM
network, which
consists of
solely MnM
blocks stacked
on top of one
another, is
superior since it
can predict
trajectories
given point-like
agent and road
inputs.
On May 23,
2022, authors
golfer-named
trajectory
prediction
model, which
was enhanced
with the new
masked goal
conditioning
and MnM
network, was
rated second on
the Waymo
Open Motion
Dataset
leaderboard.
In order to learn
cross-correlations
between items in a
set, the proposed
Mix and Match
(MnM) block, a
broad kind of set
transformation, has
been shown to be
particularly useful.
This building block
may not be suitable
for use in all
circumstances,
though.
https://doi.org/10.17993/3ctecno.2023.v12n2e44.49-63
6. CONCLUSION
MTRA, Golfer and Wayformer underlined that Transformer can be trained
substantially faster than recurrent or convolutional layer-based designs. The
Transformer utilizes multi-headed focus in three distinct ways. In an encoder-decoder
architecture, the memory's keys and values are produced by the encoder, while
queries are passed down from the previous decoder layer. This allows the decoder's
input positions to process the entire sequence. This is similar to the focus
mechanisms of encoder-decoder models used in sequence-to-sequence models. The
encoder has layers for introspective processing. In a self-attention layer, the output of
5
Multipath++:
efficient
information
fusion and
trajectory
aggregation for
behavior
prediction
[20]
framework can
cope with the
problem of a
multimodal
output space by
using a
Gaussian
Mixture Model
to characterize
the extremely
multimodal
output
distributions.
With the help of
static trajectory
anchors, an
external input to
the model, this
method can
overcome the
common
problem of
mode collapse
in the learning
process. This
useful
technique
provides
experts with a
fundamental
strategy for
guaranteeing
consistency and
an extra
measure of
control for
modelers
through the
creation of such
The provided
model performs
at the state-of-
the-art level in
both the
Argoverse
Motion
Forecasting
Competition
and the Waymo
Open Dataset
Motion
Prediction
Challenge.
Sparse
encoding,
efficient fusion
methods,
control-based
approaches,
and learned
anchors were
all shown to be
crucial by the
authors.
Furthermore,
we provided a
practical
guidance for
implementing
different training
and inference
procedures to
enhance
robustness,
diversity,
missing data
handling, and
training
convergence
speed.
Multipath++ is only
capable of
predicting a path a
few seconds into
the future. While
this may be
sufficient for some
applications, others
may necessitate
more advanced
prediction
techniques.
https://doi.org/10.17993/3ctecno.2023.v12n2e44.49-63
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.44 | Iss.12 | N.2 April - June 2023
59
the previous layer's encoder is used as the source for all keys, values, and queries.
The encoder's architecture allows for all of the previous layer's positions to be
serviced from any given place. Like the encoder, the decoder has self-attention layers
that allow any location in the decoder to pay attention to all other positions. The auto-
regressive property can only be preserved by blocking leftward information flow in the
decoder.
7. FUTURE SCOPE
Based on what we learned from our analysis, we conclude that Transformers
networks, modified to improve their baseline architecture of input encodings and
overall models, produce the best results. With transformers, one can interpret which
parts of the input sequence are most crucial to generating the output thanks to their
attention mechanisms. This allows transformers to achieve state-of-the-art results in
the case of trajectory prediction and scale to a wide range of tasks.
REFERENCES
(1) Bansal, P., & Kockelman, K. M. (2017). Forecasting Americans’ long-term
adoption of connected and autonomous vehicle technologies. Transportation
Research Part A: Policy and Practice, 95, 49–63. https://doi.org/10.1016/
J.TRA.2016.10.013
(2) Chong, Y. L., Lee, C. D. W., Chen, L., Shen, C., Chan, K. K. H., & Ang, M. H.
(2022). Online Obstacle Trajectory Prediction for Autonomous Buses. Machines,
10(3), 1–19. https://doi.org/10.3390/machines10030202
(3) Clements, L. M., & Kockelman, K. M. (2017). Economic Effects of Automated
Vehicles. Https://Doi.Org/10.3141/2606-14, 2606(1), 106–114. https://doi.org/
10.3141/2606-14
(4) Cohen, T., & Rabinovitch, A. L. (2017). Intel’s $15 billion purchase of Mobileye
shakes up driverless car sector | Reuters. Technology, Media & Telecom-
Innovation. https://www.reuters.com/article/us-intel-mobileye-idUSKBN16K0ZP
(5) CVPR 2020 Open Access Repository. (n.d.). Retrieved March 20, 2023, from
https://openaccess.thecvf.com/content_CVPR_2020/html/
Sun_Scalability_in_Perception_for_Autonomous_Driving_Waymo_Open_Datase
t_CVPR_2020_paper.html
(6) Ettinger, S., Cheng, S., Caine, B., Liu, C., Zhao, H., Pradhan, S., Chai, Y., Sapp,
B., Qi, C., Zhou, Y., Yang, Z., Chouard, A., Sun, P., Ngiam, J., Vasudevan, V.,
McCauley, A., Shlens, J., & Anguelov, D. (2021). Large Scale Interactive Motion
Forecasting for Autonomous Driving: The WAYMO OPEN MOTION DATASET.
Proceedings of the IEEE International Conference on Computer Vision, 9690–
9699. https://doi.org/10.1109/ICCV48922.2021.00957
(7) Fagnant, D. J., & Kockelman, K. (2015). Preparing a nation for autonomous
vehicles: opportunities, barriers and policy recommendations. Transportation
Research Part A: Policy and Practice, 77, 167–181. https://doi.org/10.1016/
J.TRA.2015.04.003
https://doi.org/10.17993/3ctecno.2023.v12n2e44.49-63
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.44 | Iss.12 | N.2 April - June 2023
60
the previous layer's encoder is used as the source for all keys, values, and queries.
The encoder's architecture allows for all of the previous layer's positions to be
serviced from any given place. Like the encoder, the decoder has self-attention layers
that allow any location in the decoder to pay attention to all other positions. The auto-
regressive property can only be preserved by blocking leftward information flow in the
decoder.
7. FUTURE SCOPE
Based on what we learned from our analysis, we conclude that Transformers
networks, modified to improve their baseline architecture of input encodings and
overall models, produce the best results. With transformers, one can interpret which
parts of the input sequence are most crucial to generating the output thanks to their
attention mechanisms. This allows transformers to achieve state-of-the-art results in
the case of trajectory prediction and scale to a wide range of tasks.
REFERENCES
(1) Bansal, P., & Kockelman, K. M. (2017). Forecasting Americans’ long-term
adoption of connected and autonomous vehicle technologies. Transportation
Research Part A: Policy and Practice, 95, 4963. https://doi.org/10.1016/
J.TRA.2016.10.013
(2) Chong, Y. L., Lee, C. D. W., Chen, L., Shen, C., Chan, K. K. H., & Ang, M. H.
(2022). Online Obstacle Trajectory Prediction for Autonomous Buses. Machines,
10(3), 119. https://doi.org/10.3390/machines10030202
(3) Clements, L. M., & Kockelman, K. M. (2017). Economic Effects of Automated
Vehicles. Https://Doi.Org/10.3141/2606-14, 2606(1), 106–114. https://doi.org/
10.3141/2606-14
(4) Cohen, T., & Rabinovitch, A. L. (2017). Intels $15 billion purchase of Mobileye
shakes up driverless car sector | Reuters. Technology, Media & Telecom-
Innovation. https://www.reuters.com/article/us-intel-mobileye-idUSKBN16K0ZP
(5) CVPR 2020 Open Access Repository. (n.d.). Retrieved March 20, 2023, from
https://openaccess.thecvf.com/content_CVPR_2020/html/
Sun_Scalability_in_Perception_for_Autonomous_Driving_Waymo_Open_Datase
t_CVPR_2020_paper.html
(6) Ettinger, S., Cheng, S., Caine, B., Liu, C., Zhao, H., Pradhan, S., Chai, Y., Sapp,
B., Qi, C., Zhou, Y., Yang, Z., Chouard, A., Sun, P., Ngiam, J., Vasudevan, V.,
McCauley, A., Shlens, J., & Anguelov, D. (2021). Large Scale Interactive Motion
Forecasting for Autonomous Driving: The WAYMO OPEN MOTION DATASET.
Proceedings of the IEEE International Conference on Computer Vision, 9690–
9699. https://doi.org/10.1109/ICCV48922.2021.00957
(7) Fagnant, D. J., & Kockelman, K. (2015). Preparing a nation for autonomous
vehicles: opportunities, barriers and policy recommendations. Transportation
Research Part A: Policy and Practice, 77, 167181. https://doi.org/10.1016/
J.TRA.2015.04.003
https://doi.org/10.17993/3ctecno.2023.v12n2e44.49-63
(8) Gressenbuch, L., Esterle, K., Kessler, T., & Althoff, M. (2022). MONA: The
Munich Motion Dataset of Natural Driving. IEEE Conference on Intelligent
Transportation Systems, Proceedings, ITSC, 2022-Octob, 2093–2100. https://
doi.org/10.1109/ITSC55140.2022.9922263
(9) Gu, J., Sun, Q., & Zhao, H. (2021). DenseTNT: Waymo Open Dataset Motion
Prediction Challenge 1st Place Solution. 1–5. http://arxiv.org/abs/2106.14160
(10) Hu, X., Zheng, Z., Chen, D., Zhang, X., & Sun, J. (2022). Processing, assessing,
and enhancing the Waymo autonomous vehicle open dataset for driving
behavior research. Transportation Research Part C: Emerging Technologies,
134(December). https://doi.org/10.1016/j.trc.2021.103490
(11) Hula, A., de Zwart, R., Mons, C., Weijermars, W., Boghani, H., & Thomas, P.
(2023). Using reaction times and accident statistics for safety impact prediction
of automated vehicles on road safety of vulnerable road users. Safety Science,
162. https://doi.org/10.1016/j.ssci.2023.106091
(12) LaMondia, J. J., Fagnant, D. J., Qu, H., Barrett, J., & Kockelman, K. (2016).
Shifts in long-distance travel mode due to automated vehicles: Statewide mode-
shift simulation experiment and travel survey analysis. Transportation Research
Record, 2566, 1–10. https://doi.org/10.3141/2566-01
(13) Leon, F., & Gavrilescu, M. (2021). A review of tracking and trajectory prediction
methods for autonomous driving. Mathematics, 9(6), na. https://doi.org/10.3390/
math9060660
(14) Mahmoud, A., Hu, J. S. K., & Waslander, S. L. (2023). Dense Voxel Fusion for
3D Object Detection (pp. 663–672).
(15) May, A. D., Shepherd, S., Pfaffenbichler, P., & Emberger, G. (2020). The
potential impacts of automated cars on urban transport: An exploratory analysis.
Transport Policy, 98, 127–138. https://doi.org/10.1016/j.tranpol.2020.05.007
(16) Nayakanti, N., Al-Rfou, R., Zhou, A., Goel, K., Refaat, K. S., & Sapp, B. (2022).
Wayformer: Motion Forecasting via Simple & Efficient Attention Networks. 1–20.
http://arxiv.org/abs/2207.05844
(17) Notz, D., Becker, F., Kuhbeck, T., & Watzenig, D. (2020). Extraction and
Assessment of Naturalistic Human Driving Trajectories from Infrastructure
Camera and Radar Sensors. IEEE International Conference on Automation
Science and Engineering, 2020-Augus, 455–462. https://doi.org/10.1109/
CASE48305.2020.9216992
(18) Shaheen, S. A., Cohen, A. P., & Martin, E. (2010). Carsharing parking policy.
Transportation Research Record, 2187, 146–156. https://doi.org/
10.3141/2187-19
(19) Shi, S., Jiang, L., Dai, D., & Schiele, B. (2022). MTR-A: 1st Place Solution for
2022 Waymo Open Dataset Challenge -- Motion Prediction. http://arxiv.org/abs/
2209.10033
(20) Sun, P., Kretzschmar, H., Dotiwalla, X., Chouard, A., Patnaik, V., Tsui, P., Guo,
J., Zhou, Y., Chai, Y., Caine, B., Vasudevan, V., Han, W., Ngiam, J., Zhao, H.,
Timofeev, A., Ettinger, S., Krivokon, M., Gao, A., Joshi, A., … Anguelov, D.
(2020). Scalability in Perception for Autonomous Driving: Waymo Open Dataset
(pp. 2446–2454). http://www.waymo.com/open
https://doi.org/10.17993/3ctecno.2023.v12n2e44.49-63
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.44 | Iss.12 | N.2 April - June 2023
61
(21) Tang, X., Eshkevari, S. S., Chen, H., Wu, W., Qian, W., & Wang, X. (2022).
Golfer: Trajectory Prediction with Masked Goal Conditioning MnM Network. 1–4.
Retrieved from http://arxiv.org/abs/2207.00738
(22) Varadarajan, B., Hefny, A., Srivastava, A., Refaat, K. S., Nayakanti, N.,
Cornman, A., Chen, K., Douillard, B., Lam, C. P., Anguelov, D., & Sapp, B.
(2022). MultiPath++: Efficient Information Fusion and Trajectory Aggregation for
Behavior Prediction. Proceedings - IEEE International Conference on Robotics
and Automation, 7814–7821. https://doi.org/10.1109/ICRA46639.2022.9812107
(23) WACV 2023 Open Access Repository. (n.d.). Retrieved March 20, 2023, from
https://openaccess.thecvf.com/content/WACV2023/html/
Mahmoud_Dense_Voxel_Fusion_for_3D_Object_Detection_WACV_2023_paper
.html
(24) Wang, J. (2019). Estimation And Tracking Algorithm For Autonomous Vehicles
And Humans.
(25) Wang, Y., Chen, S., Huang, L., Ge, R., Hu, Y., Ding, Z., & Liao, J. (2020). 1st
Place Solutions for Waymo Open Dataset Challenges -- 2D and 3D Tracking. c,
1–8. Retrieved from http://arxiv.org/abs/2006.15506
(26) Ward, E. (2018). Models Supporting Trajectory Planning in Autonomous Vehicles
[KTH Royal Institute of Technology]. In Doctoral Thesis. Retrieved from http://
urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-224870
(27) You, C., Lu, J., Filev, D., & Tsiotras, P. (2019). Advanced planning for
autonomous vehicles using reinforcement learning and deep inverse
reinforcement learning. Robotics and Autonomous Systems, 114, 1–18. https://
doi.org/10.1016/j.robot.2019.01.003
ABOUT THE AUTHORS
Mr. Devansh Arora
Mr Devansh Arora is a student at Indraprastha Institute of Information Technology
(IIIT),
Delhi. An artificial intelligence and machine learning enthusiast he has many
projects and papers to his credit.
Dr. Parul Arora
Dr Parul Arora is working as Associate Professor with Bharati Vidyapeeth's Institute
of Computer Applications and Management (BVICAM), New Delhi. An avid researcher
she has many research papers published in many renowned journals and
conferences.
Dr. Ritika Wason
Dr Ritika Wason is working as Associate Professor with Bharati Vidyapeeth's
Institute of Computer Applications and Management (BVICAM), New Delhi. She is
also the managing editor for International Journal of Information Technology (IJIT), an
official Journal of Bharati Vidyapeeth’s Institute of Computer Applications and
https://doi.org/10.17993/3ctecno.2023.v12n2e44.49-63
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.44 | Iss.12 | N.2 April - June 2023
62
(21) Tang, X., Eshkevari, S. S., Chen, H., Wu, W., Qian, W., & Wang, X. (2022).
Golfer: Trajectory Prediction with Masked Goal Conditioning MnM Network. 1–4.
Retrieved from http://arxiv.org/abs/2207.00738
(22) Varadarajan, B., Hefny, A., Srivastava, A., Refaat, K. S., Nayakanti, N.,
Cornman, A., Chen, K., Douillard, B., Lam, C. P., Anguelov, D., & Sapp, B.
(2022). MultiPath++: Efficient Information Fusion and Trajectory Aggregation for
Behavior Prediction. Proceedings - IEEE International Conference on Robotics
and Automation, 78147821. https://doi.org/10.1109/ICRA46639.2022.9812107
(23) WACV 2023 Open Access Repository. (n.d.). Retrieved March 20, 2023, from
https://openaccess.thecvf.com/content/WACV2023/html/
Mahmoud_Dense_Voxel_Fusion_for_3D_Object_Detection_WACV_2023_paper
.html
(24) Wang, J. (2019). Estimation And Tracking Algorithm For Autonomous Vehicles
And Humans.
(25) Wang, Y., Chen, S., Huang, L., Ge, R., Hu, Y., Ding, Z., & Liao, J. (2020). 1st
Place Solutions for Waymo Open Dataset Challenges -- 2D and 3D Tracking. c,
18. Retrieved from http://arxiv.org/abs/2006.15506
(26) Ward, E. (2018). Models Supporting Trajectory Planning in Autonomous Vehicles
[KTH Royal Institute of Technology]. In Doctoral Thesis. Retrieved from http://
urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-224870
(27) You, C., Lu, J., Filev, D., & Tsiotras, P. (2019). Advanced planning for
autonomous vehicles using reinforcement learning and deep inverse
reinforcement learning. Robotics and Autonomous Systems, 114, 1–18. https://
doi.org/10.1016/j.robot.2019.01.003
ABOUT THE AUTHORS
Mr. Devansh Arora
Mr Devansh Arora is a student at Indraprastha Institute of Information Technology
(IIIT), Delhi. An artificial intelligence and machine learning enthusiast he has many
projects and papers to his credit.
Dr. Parul Arora
Dr Parul Arora is working as Associate Professor with Bharati Vidyapeeth's Institute
of Computer Applications and Management (BVICAM), New Delhi. An avid researcher
she has many research papers published in many renowned journals and
conferences.
Dr. Ritika Wason
Dr Ritika Wason is working as Associate Professor with Bharati Vidyapeeth's
Institute of Computer Applications and Management (BVICAM), New Delhi. She is
also the managing editor for International Journal of Information Technology (IJIT), an
official Journal of Bharati Vidyapeeths Institute of Computer Applications and
https://doi.org/10.17993/3ctecno.2023.v12n2e44.49-63
Management (BVICAM) co-published with Springer and UGC-Care Indexed and
Scopus indexed. An avid researcher, she is also the editor for CSI Communications, a
monthly magazine published by the Computer Society of India (CSI). A certified
Mendeley trainer she has trained several professionals and scholars on Mendeley. A
researcher she has also authored many books and papers published by
many leading publishers.
https://doi.org/10.17993/3ctecno.2023.v12n2e44.49-63
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed.44 | Iss.12 | N.2 April - June 2023
63