LiveRamp Cuts Clean Room Runtimes by 50% Using Apache DataFusion Comet
LiveRamp has integrated Apache DataFusion Comet to accelerate Apache Spark performance within its clean room architecture for cross-media data collaboration and ad-tech analytics. By transitioning Spark from JVM-based row-level execution to native columnar execution via Apache Arrow and Rust, the company achieved up to a 50% runtime reduction on complex queries. This technical optimization aims to lower resource overhead and infrastructure costs for brands, publishers, and platforms operating multi-terabyte ad measurement and attribution workloads.
Key Takeaways
- Native execution via Apache Arrow and Rust reduced runtimes by up to 50% for high-complexity queries.
- Benchmark testing using Spark 3.5.5 and Comet 0.11.0 covered multi-terabyte datasets across AWS, GCP, and Azure.
- The implementation functions as a drop-in accelerator, maintaining 100% compatibility with existing PySpark and Scala workloads.
- Technical gains include reduced JVM garbage collection overhead and more efficient SIMD-optimized memory usage.
Why It Matters
As data clean rooms move from experimental pilots to production-scale infrastructure, compute efficiency has become a critical competitive lever. By bypassing traditional JVM overhead, LiveRamp addresses the primary cost bottleneck of multi-party data collaboration: the massive join and aggregation workloads required for cross-platform attribution. This transition to native columnar execution signals a broader industry shift toward 'deconstructed' data systems that separate query planning from high-performance native execution. For the streaming ecosystem, this means faster delivery of reach and frequency metrics across fragmented CTV environments. Watch for other neutral clean room providers like Decentriq or InfoSum to adopt similar native acceleration to compete with the vertical performance of warehouse-native rooms in Snowflake or AWS.
Additional Context
The clean room market is entering a phase of rapid commoditization and consolidation, underscored by Publicis Groupe's reported $2.5 billion acquisition of LiveRamp in May 2026. This move highlights the strategic value of interoperable data infrastructure as walled gardens face increasing antitrust scrutiny. Per IDC MarketScape reporting from May 2025, LiveRamp was positioned as a leader in the global clean room segment, cited specifically for its cross-cloud interoperability and identity resolution capabilities via RampID. This leadership is being challenged by cloud-native providers; for instance, NIQ launched its own global clean room on Snowflake in October 2025 to streamline ad measurement for consumer goods brands. Technically, the shift toward native execution engines like Apache DataFusion is gaining momentum across the Spark ecosystem. Recent industry benchmarks from the 2025 Iceberg Summit indicated that native Rust-based query engines can provide up to a 2x speedup for TPC-DS workloads at 1TB scales. By adopting Comet, LiveRamp is following a performance path also explored by major players like Apple, which has actively contributed to the DataFusion-Comet integration to accelerate large-scale Parquet scans. Furthermore, the IAB Tech Lab solidified the operational landscape for these technologies by releasing its Open Private Join & Activation (OPJA) stable specification, providing the industry with a standardized framework for private audience activation that these performance upgrades will power.
Read full article at liveramp.com