Apache Spark lineage now available in Amazon SageMaker Unified Studio for IDC based domains

Amazon SageMaker Data Lineage for Apache Spark Jobs
Amazon SageMaker has announced the general availability of Data Lineage for Apache Spark jobs executed on Amazon EMR and AWS Glue in SageMaker Unified Studio for IDC based domains. This feature provides insights to identify the root cause of complex issues and understand the impact of changes.
The feature supports lineage capture of schema and transformations of data assets and columns from Spark executions in EMR-EC2, EMR-Serverless, EMR-EKS, and AWS Glue. You can explore this lineage visually as a graph in SageMaker Unified Studio or query it using APIs. Additionally, you can use lineage to compare transformations across Spark job's history.
What to do
- Explore the new Data Lineage feature in SageMaker Unified Studio.
- Refer to the documentation for detailed information on how to get started.
Source: AWS release notes
If you need further guidance on AWS, our experts are available at AWS@westloop.io. You may also reach us by submitting the Contact Us form.



