Microsoft Fabric Blog presents a technical overview of Adaptive Target File Size Management in Fabric Spark. The article, authored by Microsoft, details how these new features help data teams optimize performance and maintenance in Spark lakehouses automatically.

Adaptive Target File Size Management in Fabric Spark

Set It and Forget It Target File Size Optimization

This article explores how Fabric Spark introduces Adaptive Target File Size to automate and optimize file layouts for Delta tables. Traditionally, data teams had to manually tune file size configurations, leading to inconsistent performance and additional maintenance as tables grew and access patterns changed.

Key Challenges with Traditional File Size Management

  • Multiple configurations: Different Spark operations (OPTIMIZE, Auto Compaction, Optimized Writes) require separate settings, making consistency hard.
  • Table growth: As tables scale from MBs to TBs, optimal file sizes change, demanding ongoing retuning.
  • Configuration sprawl: Each table in a lakehouse can need different file sizes, making individualized management unrealistic at scale.
  • Expertise requirement: Optimal tuning often requires deep Spark and data distribution knowledge.
  • Hidden performance costs: Poor settings can cause issues such as excessive metadata overhead, limited data skipping, bottlenecked parallelism, and write amplification on updates.

Introducing Adaptive Target File Size for Fabric Spark

Adaptive Target File Size leverages Delta table telemetry to estimate and set the ideal file size based on the current table size and workload, updating as your data grows or access patterns shift. With a single session configuration (SET spark.microsoft.delta.targetFileSize.adaptive.enabled = True), all data layout operations align to this adaptive strategy for improved consistency and less manager intervention.

Table-Level Intelligence

Adaptive sizing enables every table to have its own optimal file size, scaling automatically:

  • Small table (1GB): Target 128MB
  • Medium table (500GB): Target 256MB
  • Large table (4TB): Target 512MB
  • Very large table (10TB): Target 1GB

Unified Configuration

A single property (delta.targetFileSize) controls all file-size-related operations (OPTIMIZE, Optimized Writes, Auto Compaction) for an individual table. This property can override adaptive sizing if needed, granting fine control where desired.

Cascading Benefits

  • Enhanced data skipping: Smaller, right-sized files let Spark skip irrelevant data more efficiently.
  • Reduced update costs: More granular files mean MERGE and UPDATE rewrite less data.
  • Improved parallelism: Balanced file sizes maximize the use of clusters, improving throughput.
  • Operational consistency: Single-source-of-truth for file size management makes troubleshooting easier.

Demonstrated Results

  • Up to 2.8x faster compaction jobs
  • 1.6x faster compaction and 1.2x faster queries in TPC-DS 1TB tests when compaction ran with adaptive sizing
  • 30% reduction in ELT cycle time cited in customer benchmarks
How to Enable
  • For new tables: Enable adaptive sizing via session config. The system applies optimal file sizes at creation and optimize time.
  • For existing tables: Enable the session config; first optimize operation will set the baseline.
  • To override: Use ALTER TABLE <table> SET TBLPROPERTIES ('delta.targetFileSize' = '<value>').

Teams Benefit By

  • Data Engineers: Focus on pipeline building, not config tuning.
  • Platform Teams: Reduced maintenance and more predictable performance.
  • Analytics Teams: Faster queries as tables scale.
  • Cost Management: Eliminate need to over-provision for peak scenarios.

Resources

Summary

Adaptive Target File Size Management in Fabric Spark provides a unified and intelligent approach to file layout optimization for Delta tables, delivering predictable performance and maintenance simplicity as data grows.

This post appeared first on “Microsoft Fabric Blog”. Read the entire article here