With an emphasis on DDS communication, ROS 2 performance, and practical Autoware applications, this paper offers a thorough performance comparison of containerized versus bare-metal deployments for autonomous car systems. The findings demonstrate that, on both x86 and aarch64 systems, containerized environments consistently lower resource usage, jitter, and delay.With an emphasis on DDS communication, ROS 2 performance, and practical Autoware applications, this paper offers a thorough performance comparison of containerized versus bare-metal deployments for autonomous car systems. The findings demonstrate that, on both x86 and aarch64 systems, containerized environments consistently lower resource usage, jitter, and delay.

Containerization Solves ROS 2's Biggest Performance Challenges

2025/10/15 15:00
9 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

:::info Authors:

(1) Tobias Betz, Technical University of Munich, Germany;

(2) Long Wen, Technical University of Munich, Germany;

(3) Fengjunjie Pan, Technical University of Munich, Germany;

(4) Gemb Kaljavesi, Technical University of Munich, Germany;

(5) Alexander Zuepke, Technical University of Munich, Germany;

(6) Andrea Bastoni, Technical University of Munich, Germany;

(7) Marco Caccamo, Technical University of Munich, Germany;

(8) Alois Knoll, Technical University of Munich, Germany;

(9) Johannes Betz, Technical University of Munich, Germany.

:::

Abstract and I. Introduction

II. Related Work

III. Microservice Architecture for an Autonomous Driving Software

IV. Experiments

V. Results

VI. Discussion

VII. Conclusion, Acknowledgments, and References

\

V. RESULTS

A. DDS Communication

\ The performance benchmark results offer valuable insights into the effects of containerization. As described in Section IV-B1, we used CycloneDDS and measured the round trip latency of specific message sizes and the number of successfully delivered packets. Table IV presents the measured results for both of the compute platforms from message sizes ranging from 1 kB to 8 MB. On both platforms, container-based deployment achieves lower latencies than bare-metal deployment in almost all scenarios. This effect is more evident on aarch64 and particularly pronounced for small message sizes. On aarch64, bare-metal configurations never perform better than containers, while only 2 MB bare-metal messages achieve a lower latency on x86. The latency improvement is more evident on aarch64 than x86. For example, for 1 kB, on aarch64 containers achieve around 15% lower latency (around 8-10% on x86), while for 64 kB, the improvement is even higher (18% vs. 8%). Multi-containers can consistently perform better than single-containers. This is the case for large message sizes on aarch64, where multi-container setups always perform better than single-container starting from 512 kB message sizes. This trend is not confirmed on x86, where single-container setups perform better for large message sizes. In a real application such as the Autoware software stack shown later, the observed message sizes are in the range of 1 kB to 128 kB. This is a range where containerized versions on both systems showed smaller latencies.

\ B. ROS 2

\ We further investigated the combined performance implications of DDS and ROS 2 using ros2benchmark. We present the experimental setup in Section IV-B2. Image data with a size of 0.92 MB is transferred for the data set used. In addition to the pure DDS time, the end-to-end latency now also includes the computation time of the detection algorithm. Therefore, in percentage terms, the DDS time has a much smaller share. Table V shows the measured results of the conducted benchmark. Contrary to ddsperf, latency values in ros2benchmark are very close across all setups. Although containerization can achieve slightly lower latency than baremetal for low fps (10 and 30), bare-metal performs slightly better at 60 fps. Differences in latency are minimal (between

\ Fig. 4: End-to-end latency histograms (a) using the bare-metal setup, (b) using the single-container setup, and (c) using the multi-container setup (blue=x86, red=aarch64).

\ 0.4% and 4.6%). Looking at the jitter shows a reduction due to containerization in almost all cases. For the 10 fps setup, the occurring jitter is reduced by 10.3% for single-container and 8.1% for multi-container. At 60 fps by 6.2% and 3.10%. At maximum throughput, only minor differences occurred. An outlier occurs in the multi-container deployment only in the 30 fps setup. The aarch64 platform also shows this behavior in all setups except for the maximum throughput. At 10 fps, the jitter is reduced by 30.1% and 31.4%. Again, for the other setups, we observe the same behavior. Only the multi-container deployment in the last setup exhibits slightly increased jitter. Overall, we conclude that containerization may lead to a reduction in latency. Additionally, we observe slight differences in CPU utilization. Specifically, for x86, the single-container setup demonstrates the lowest utilization compared to other deployments. Conversely, for aarch64 at lower fps, the bare metal benchmark outperforms the containerized benchmark, while for higher fps, the single-container setup performs the best.

\ C. Real-World Autonomous Driving Application

\ At last, we evaluate the developed multi-container microservice architecture (see Section III) for an autonomous vehicle

\ TABLE IV: Mean latency and package count for different message sizes for the ddsperf benchmark.(BM=bare-metal, SC=single-container, MC=multi-container). Min values across configurations are marked in bold.

\ TABLE V: Mean latency, jitter, and CPU utilization for different setups of the ros2_benchmark.

\ TABLE VI: Consolidated statistics for different metrics for Autoware.

\ Fig. 5: CPU utilization of the systems for the different deployment variants.

\ Fig. 6: Memory utilization of the systems for the different deployment variants.

\ and compare it with the bare-metal execution and the execution within a single container. We further split the end-to-end latency into its components (Idle, DDS Communication, Computation) to gain a better understanding of each contribution. Fig. 4 shows histograms of the different deployment variants on the respective compute platforms. Table VI presents the detailed measurement values for the entire experiment. As explained in Section IV-D the latencies are shown for one computation chain from the sensor to the control output.

\ E2E Latency. The histograms of bare-metal deployments (Fig. 4(a)) show a bimodal distribution with (x86) high kurtosis value of 19.94 and a long tail visible in the Q75 and P99 values (Table VI). A similar behavior is evident on the aarch64 system. Instead, for single-containers, we see a reduced standard deviation and a significantly lower kurtosis of only 0.35. This is reflected in the histogram that show a compact distribution.

\ The bare-metal implementation of Autoware on the x86 platform shows an end-to-end latency of 194.67 ms. Instead, for the developed microservice architecture, the mean latency is reduced by 8.1% to 178.73 ms for x86 and by 5.6% to 244.11 ms for aarch64.

\ A relatively large maximum value with 1437.94 ms is measured for bare-metal. In the single-container scenario, this maximum value drops to only 553.79 ms. We also see a reduction in the various quantiles. From 515.87 ms in the 99th percentile to 300.79 ms, which is an improvement of 41.7%.

\ Idle Latency. Looking at the idle latency, which describes the time data waits for processing via a timer callback, we observe the following for x86. The native deployment shows an idle latency of 133.81 ms. Experiments on aarch64 show lower mean (112.76 ms) and maximum values. Executing Autoware in a single-container environment eliminates the two peaks, resulting in a more even distribution for both x86 and aarch64. The mean latency value decreases to 125.92 ms for x86 and 97.75 ms for aarch64, indicating improvements of 5.9% and 13.3%, respectively. This improvement extends to quantile values as well. The 99th percentile sees an improvement of 41.6% to 237.34 ms for x86 and 17.3% to 223.01 ms for aarch64. In the multi-container deployment, the mean idle latency is higher than that of single containers, but still lower than the bare-metal setup for x86. Conversely, a higher measurement value compared to bare-metal is observed for aarch64 (4.5%). For x86, quantiles 25 and 50 are lower compared to single containers, but higher for aarch64. Both systems exhibit increased values for P99 and the maximum.

\ Communication Latency. The communication latency represents the smallest portion of the entire end-to-end latency. On the x86 in the native deployment, this latency has a mean of 8.10 ms, while on the aarch64, it has a mean of 10.666 ms. However, both systems exhibit maximum values of 402.42 ms and 468.110 ms respectively, resulting in distributions with long right tails. Notably, the distribution of the single-container variant shows a reduced tail: the mean DDS latency improves by 61.2% to 3.14 ms for x86, with a smaller improvement (22.0%) observed for aarch64. Both systems also see reductions in their maximum values. In the multi-container variant, DDS values for both systems are similar to those of the single-container setup, with reduced mean and quantile values within a negligible range compared to single-container. Only the maximum value increases slightly to 298.26 ms for x86, still smaller than in the bare-metal deployment. On the aarch64, the maximum value is only slightly higher than that of the single-container setup.

\ Computation Latency. The computation latency is reduced in the multi-container deployment, resulting in improved mean values (47.83 ms for x86 and 118.63 ms for aarch64). These improvements lead to lower values than those observed in both the bare-metal and single-container scenarios. However, we note increased values for P99 and the maximum on the x86 variant. However, the mean values diverge significantly, and the aarch64 system experiences higher values compared to before. The results deviate from a normal distribution, as no clear peak is visible, but rather multiple peaks in both cases.

\ CPU and Memory Utilization. Fig. 5 shows the distributions of CPU usage measured over all runs, i.e.,, from the execution of the ROS 2 launch file until the vehicle reaches the target position. We observe the same behavior noted by [24] regarding the startup of the software. With bare-metal and singlecontainer, the mean ramp-up phase of the software lasted 13.60 s and 14.4 s, on the x86. On the aarch64 it is 21.3 s and 23.1 s. Multi-container has a much lower ramp-up phase, where all nodes of the software startup the fastest, 3.8 s for x86 and 4.6 s for aarch64 respectively. This ramp-up phase is clearly visible in both of the plots in the lower part of the graph. As visible in Fig. 5 and Fig. 6, containerized applications, regardless of whether single or multi-container, have a lower CPU and memory utilization. Further analyzing CPU utilization, a significant scatter is observed for baremetal, ranging from approximately 50% to 90% after node initialization. The variance is notably lower for the singlecontainer setup, with a slight difference in utilization, where aarch64 exhibits slightly higher values compared to x86. Regarding memory consumption, bare-metal deployments on x86 cluster at around 10 GB RAM, whereas aarch64 displays higher memory consumption with a significantly higher variance. Once again, transitioning the application to a container environment, whether single- or multi-container, leads to a reduction in memory consumption by almost a factor of two for x86. Similarly, on aarch64, a drastic decrease in memory consumption is observed. However, both container variants still exhibit higher memory consumption compared to the x86 platform, as seen with bare-metal deployment.

\

:::info This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

\

Market Opportunity
Metal Blockchain Logo
Metal Blockchain Price(METAL)
$0.13844
$0.13844$0.13844
+0.21%
USD
Metal Blockchain (METAL) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

CEO Sandeep Nailwal Shared Highlights About RWA on Polygon

CEO Sandeep Nailwal Shared Highlights About RWA on Polygon

The post CEO Sandeep Nailwal Shared Highlights About RWA on Polygon appeared on BitcoinEthereumNews.com. Polygon CEO Sandeep Nailwal highlighted Polygon’s lead in global bonds, Spiko US T-Bill, and Spiko Euro T-Bill. Polygon published an X post to share that its roadmap to GigaGas was still scaling. Sentiments around POL price were last seen to be bearish. Polygon CEO Sandeep Nailwal shared key pointers from the Dune and RWA.xyz report. These pertain to highlights about RWA on Polygon. Simultaneously, Polygon underlined its roadmap towards GigaGas. Sentiments around POL price were last seen fumbling under bearish emotions. Polygon CEO Sandeep Nailwal on Polygon RWA CEO Sandeep Nailwal highlighted three key points from the Dune and RWA.xyz report. The Chief Executive of Polygon maintained that Polygon PoS was hosting RWA TVL worth $1.13 billion across 269 assets plus 2,900 holders. Nailwal confirmed from the report that RWA was happening on Polygon. The Dune and https://t.co/W6WSFlHoQF report on RWA is out and it shows that RWA is happening on Polygon. Here are a few highlights: – Leading in Global Bonds: Polygon holds 62% share of tokenized global bonds (driven by Spiko’s euro MMF and Cashlink euro issues) – Spiko U.S.… — Sandeep | CEO, Polygon Foundation (※,※) (@sandeepnailwal) September 17, 2025 The X post published by Polygon CEO Sandeep Nailwal underlined that the ecosystem was leading in global bonds by holding a 62% share of tokenized global bonds. He further highlighted that Polygon was leading with Spiko US T-Bill at approximately 29% share of TVL along with Ethereum, adding that the ecosystem had more than 50% share in the number of holders. Finally, Sandeep highlighted from the report that there was a strong adoption for Spiko Euro T-Bill with 38% share of TVL. He added that 68% of returns were on Polygon across all the chains. Polygon Roadmap to GigaGas In a different update from Polygon, the community…
Share
BitcoinEthereumNews2025/09/18 01:10
👨🏿‍🚀TechCabal Daily – Folded by a paper cut

👨🏿‍🚀TechCabal Daily – Folded by a paper cut

In today's edition: Mpact’s paper mill is shutting down || An e-commerce play for SA’s Post Office || Kenya’s traffic cop
Share
Techcabal2026/03/10 14:05
MTN Plans Starlink Launch in Zambia

MTN Plans Starlink Launch in Zambia

MTN’s Starlink launch plan in Zambia signals a new phase for satellite internet expansion, aiming to accelerate rural connectivity and support the country’s digital
Share
Furtherafrica2026/03/10 14:00