Master Microsoft DP-750: Azure Databricks Data Engineering Prep
You have an Azure Databricks workspace that is enabled for Unity Catalog
You have a complex job named Job1 that contains eight tasks. Job! takes multiple hours to complete
During the last job run, the final task fails due to a transient issue.
You need to retry the last task without rerunning tasks that have already completed.
What should you do?
Correct : B
CORRECT ANSWE R: B - Repair the current job run.
According to Microsoft Learn on Lakeflow Jobs run repair, the 'Repair Run' feature allows you to re-run only the failed task (and any tasks that depend on it) within an existing job run, while skipping all tasks that already completed successfully. This directly satisfies: 'retry the last task without rerunning tasks that have already completed.' For a complex job with eight tasks that takes multiple hours, re-running from scratch (Option C - Restart) would waste significant time and compute resources. Option A (update job parameters) changes parameters for future runs but doesn't re-execute the failed task. Option D (disable and re-enable the schedule) creates a new run from the beginning rather than repairing the existing run.
Start a Discussions
You have a Lakeflow Spark Declarative Pipelines {SDP) pipeline in Azure Databricks. The pipeline ingests transaction data into a table named Table1.
You need to ensure that in the event of an invalid record, the pipeline continues to run. The solution must meet the following requirements:
* Invalid records must NOT be written to Table 1.
* Invalid records must be preserved for review.
* Minimize development effort
What should you do?
Correct : B
CORRECT ANSWE R: B - Define a pipeline expectation.
According to Microsoft Learn on Lakeflow Spark Declarative Pipelines (SDP) data quality expectations, defining a pipeline expectation with the @dlt.expect_or_drop decorator is the correct approach when the pipeline must continue running, invalid records must NOT be written to the target table, and invalid records must be preserved for review. SDP automatically tracks dropped records as expectation metrics in the pipeline event log, which satisfies the 'preserve for review' requirement with minimal development effort. Option A (advanced quarantine logic) requires additional development effort to implement custom routing. Option C (WHERE clauses in downstream queries) filters records at query time rather than at ingestion, meaning invalid records would still be written to Table1. Option D (check constraint on Table1) would cause write failures and stop the pipeline, violating the 'pipeline continues to run' requirement.
Start a Discussions
You have an Azure Databricks workspace that is attached to a Unity Catalog metastore named metastore1. Metastore1 contains a catalog named catalog 1.
You need to create a new schema named schema2 that meets the following requirements:
* Is contained in catalog1
* Uses abfss://containergstorageaccount.dfs.core.windows.net/data as the Managed location
Which SQL statement should you execute?
Correct : A
CORRECT ANSWE R: A - CREATE SCHEMA catalog1.schema2 MANAGED LOCATION 'abfss://container@storageaccount.dfs.core.windows.net/data';
According to Microsoft Learn on Unity Catalog schema management, the correct DDL syntax to create a schema within a specific catalog and set a custom managed storage location is: CREATE SCHEMA <catalog>.<schema> MANAGED LOCATION '
Start a Discussions
You use Databricks Asset Bundles to manage two jobs and an app.
You need to deploy the bundle to development and production environments. The solution must meet the following requirements
* Deploy the app to both environments.
* Deploy only one job to development.
* Minimize administrative effort.
What should you use?
Correct : D
CORRECT ANSWE R: D - A targets node in a databricks.yml file.
According to Microsoft Learn on Databricks Asset Bundles (DAB), the targets node in databricks.yml defines environment-specific configurations (development, staging, production). Within each target, you can override resource inclusion using the include/exclude mechanism or resource-level overrides. The requirement to 'deploy the app to both environments' and 'deploy only one job to development' with 'minimize administrative effort' is best achieved through a single databricks.yml with a targets node --- where the development target excludes or overrides one of the jobs. Option A (resources node) defines all resources but doesn't handle environment-specific filtering. Option B (separate databricks.yml files) requires maintaining multiple files and increases administrative effort. Option C (variables node) handles parameterization but not resource inclusion/exclusion.
Start a Discussions
You have an Azure Databricks workspace that is enabled for Unity Catalog
You have an Apache Spark Structured Streaming job that writes data to a Delta table.
After the cluster restarts, the streaming job reprocesses previously ingested data
You need to prevent the streaming job from reprocessing the data after the cluster restarts.
What should you do?
Correct : B
CORRECT ANSWE R: B - Configure a checkpoint location for the streaming query.
According to Microsoft Learn on Apache Spark Structured Streaming, checkpointing is the mechanism that enables fault tolerance and exactly-once processing semantics. The checkpoint stores the streaming query's progress --- including the offset of the last successfully processed batch --- in a durable storage location (typically ADLS Gen2 or DBFS). When the cluster restarts, the streaming query reads the checkpoint to determine the last committed offset and resumes from that point, preventing reprocessing of already-ingested data. Option A (increase trigger interval) controls how frequently micro-batches run but does not prevent reprocessing on restart. Option C (watermark) handles late-arriving data in event-time processing but does not prevent reprocessing on restart. Option D (enable CDF) tracks changes to a Delta table but does not affect streaming source offset management.
Start a Discussions
Total 58 questions