Databricks-Certified-Professional-Data-Engineer Exam Questions - Real Practice Questions for Guaranteed Success

Question 1

The DevOps team has configured a production workload as a collection of notebooks scheduled to run daily using the Jobs Ul. A new data engineering hire is onboarding to the team and has requested access to one of these notebooks to review the production logic.

What are the maximum notebook permissions that can be granted to the user without allowing accidental changes to production code or data?

ACan manage

BCan edit

CCan run

DCan Read

Correct : D

Granting a user 'Can Read' permissions on a notebook within Databricks allows them to view the notebook's content without the ability to execute or edit it. This level of permission ensures that the new team member can review the production logic for learning or auditing purposes without the risk of altering the notebook's code or affecting production data and workflows. This approach aligns with best practices for maintaining security and integrity in production environments, where strict access controls are essential to prevent unintended modifications. Reference: Databricks documentation on access control and permissions for notebooks within the workspace (https://docs.databricks.com/security/access-control/workspace-acl.html).

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

ACan manage

BCan edit

CCan run

DCan Read

0 / 1500

Question 2

The view updates represents an incremental batch of all newly ingested data to be inserted or updated in the customers table.

The following logic is used to process these records.

MERGE INTO customers

USING (

SELECT updates.customer_id as merge_ey, updates .*

FROM updates

UNION ALL

SELECT NULL as merge_key, updates .*

FROM updates JOIN customers

ON updates.customer_id = customers.customer_id

WHERE customers.current = true AND updates.address <> customers.address

) staged_updates

ON customers.customer_id = mergekey

WHEN MATCHED AND customers. current = true AND customers.address <> staged_updates.address THEN

UPDATE SET current = false, end_date = staged_updates.effective_date

WHEN NOT MATCHED THEN

INSERT (customer_id, address, current, effective_date, end_date)

VALUES (staged_updates.customer_id, staged_updates.address, true, staged_updates.effective_date, null)

Which statement describes this implementation?

AThe customers table is implemented as a Type 2 table; old values are overwritten and new customers are appended.

BThe customers table is implemented as a Type 1 table; old values are overwritten by new values and no history is maintained.

CThe customers table is implemented as a Type 2 table; old values are maintained but marked as no longer current and new values are inserted.

DThe customers table is implemented as a Type 0 table; all writes are append only with no changes to existing values.

Correct : C

The provided MERGE statement is a classic implementation of a Type 2 SCD in a data warehousing context. In this approach, historical data is preserved by keeping old records (marking them as not current) and adding new records for changes. Specifically, when a match is found and there's a change in the address, the existing record in the customers table is updated to mark it as no longer current (current = false), and an end date is assigned (end_date = staged_updates.effective_date). A new record for the customer is then inserted with the updated information, marked as current. This method ensures that the full history of changes to customer information is maintained in the table, allowing for time-based analysis of customer data. Reference: Databricks documentation on implementing SCDs using Delta Lake and the MERGE statement (https://docs.databricks.com/delta/delta-update.html#upsert-into-a-table-using-merge).

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

AThe customers table is implemented as a Type 2 table; old values are overwritten and new customers are appended.

BThe customers table is implemented as a Type 1 table; old values are overwritten by new values and no history is maintained.

CThe customers table is implemented as a Type 2 table; old values are maintained but marked as no longer current and new values are inserted.

DThe customers table is implemented as a Type 0 table; all writes are append only with no changes to existing values.

0 / 1500

Question 3

The data governance team is reviewing user for deleting records for compliance with GDPR. The following logic has been implemented to propagate deleted requests from the user_lookup table to the user aggregate table.

Assuming that user_id is a unique identifying key and that all users have requested deletion have been removed from the user_lookup table, which statement describes whether successfully executing the above logic guarantees that the records to be deleted from the user_aggregates table are no longer accessible and why?

ANo: files containing deleted records may still be accessible with time travel until a BACUM command is used to remove invalidated data files.

BYes: Delta Lake ACID guarantees provide assurance that the DELETE command successed fully and permanently purged these records.

CNo: the change data feed only tracks inserts and updates not deleted records.

DNo: the Delta Lake DELETE command only provides ACID guarantees when combined with the MERGE INTO command

Correct : A

The DELETE operation in Delta Lake is ACID compliant, which means that once the operation is successful, the records are logically removed from the table. However, the underlying files that contained these records may still exist and be accessible via time travel to older versions of the table. To ensure that these records are physically removed and compliance with GDPR is maintained, a VACUUM command should be used to clean up these data files after a certain retention period. The VACUUM command will remove the files from the storage layer, and after this, the records will no longer be accessible.

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

ANo: files containing deleted records may still be accessible with time travel until a BACUM command is used to remove invalidated data files.

BYes: Delta Lake ACID guarantees provide assurance that the DELETE command successed fully and permanently purged these records.

CNo: the change data feed only tracks inserts and updates not deleted records.

DNo: the Delta Lake DELETE command only provides ACID guarantees when combined with the MERGE INTO command

0 / 1500

Question 4

A data engineer wants to reflector the following DLT code, which includes multiple definition with very similar code:

In an attempt to programmatically create these tables using a parameterized table definition, the data engineer writes the following code.

The pipeline runs an update with this refactored code, but generates a different DAG showing incorrect configuration values for tables.

How can the data engineer fix this?

AConvert the list of configuration values to a dictionary of table settings, using table names as keys.

BConvert the list of configuration values to a dictionary of table settings, using different input the for loop.

CLoad the configuration values for these tables from a separate file, located at a path provided by a pipeline parameter.

DWrap the loop inside another table definition, using generalized names and properties to replace with those from the inner table

Correct : A

The issue with the refactored code is that it tries to use string interpolation to dynamically create table names within the dlc.table decorator, which will not correctly interpret the table names. Instead, by using a dictionary with table names as keys and their configurations as values, the data engineer can iterate over the dictionary items and use the keys (table names) to properly configure the table settings. This way, the decorator can correctly recognize each table name, and the corresponding configuration settings can be applied appropriately.

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

AConvert the list of configuration values to a dictionary of table settings, using table names as keys.

BConvert the list of configuration values to a dictionary of table settings, using different input the for loop.

CLoad the configuration values for these tables from a separate file, located at a path provided by a pipeline parameter.

DWrap the loop inside another table definition, using generalized names and properties to replace with those from the inner table

0 / 1500

Question 5

The data engineer is using Spark's MEMORY_ONLY storage level.

Which indicators should the data engineer look for in the spark UI's Storage tab to signal that a cached table is not performing optimally?

ASize on Disk is> 0

BThe number of Cached Partitions> the number of Spark Partitions

CThe RDD Block Name included the '' annotation signaling failure to cache

DOn Heap Memory Usage is within 75% of off Heap Memory usage

Correct : C

In the Spark UI's Storage tab, an indicator that a cached table is not performing optimally would be the presence of the _disk annotation in the RDD Block Name. This annotation indicates that some partitions of the cached data have been spilled to disk because there wasn't enough memory to hold them. This is suboptimal because accessing data from disk is much slower than from memory. The goal of caching is to keep data in memory for fast access, and a spill to disk means that this goal is not fully achieved.

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

ASize on Disk is> 0

BThe number of Cached Partitions> the number of Spark Partitions

CThe RDD Block Name included the '' annotation signaling failure to cache

DOn Heap Memory Usage is within 75% of off Heap Memory usage

0 / 1500

Master Databricks-Certified-Professional-Data-Engineer Exam with Reliable Practice Questions

Options Selected by Other Users:

Options Selected by Other Users:

Options Selected by Other Users:

Options Selected by Other Users:

Options Selected by Other Users: