Apache Spark lineage now available in Amazon SageMaker Unified Studio for IDC based domains

Published
February 4, 2026
https://aws.amazon.com/about-aws/whats-new/2026/02/apache-spark-lineage-amazon-sageMaker-unified-studio

Amazon SageMaker Data Lineage for Apache Spark Jobs

Amazon SageMaker has announced the general availability of Data Lineage for Apache Spark jobs executed on Amazon EMR and AWS Glue in SageMaker Unified Studio for IDC based domains. This feature provides insights to identify the root cause of complex issues and understand the impact of changes.

The feature supports lineage capture of schema and transformations of data assets and columns from Spark executions in EMR-EC2, EMR-Serverless, EMR-EKS, and AWS Glue. You can explore this lineage visually as a graph in SageMaker Unified Studio or query it using APIs. Additionally, you can use lineage to compare transformations across Spark job's history.

What to do

  • Explore the new Data Lineage feature in SageMaker Unified Studio.
  • Refer to the documentation for detailed information on how to get started.

Source: AWS release notes




If you need further guidance on AWS, our experts are available at AWS@westloop.io. You may also reach us by submitting the Contact Us form.

Follow our blog

Get the latest insights and advice on AWS services from our experts.

By clicking Sign Up you're confirming that you agree with our Terms and Conditions.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.