Master Google Professional Data Engineer Exam with Reliable Practice Questions

Page: 1 out of Viewing questions 1-5 out of 375 questions

Last exam update: Apr 10,2025

Upgrade to Premium

Question 1

You have one BigQuery dataset which includes customers' street addresses. You want to retrieve all occurrences of street addresses from the dataset. What should you do?

ACreate a deep inspection job on each table in your dataset with Cloud Data Loss Prevention and create an inspection template that includes the STREET_ADDRESS infoType.

BCreate a de-identification job in Cloud Data Loss Prevention and use the masking transformation.

CWrite a SQL query in BigQuery by using REGEXP_CONTAINS on all tables in your dataset to find rows where the word 'street' appears.

DCreate a discovery scan configuration on your organization with Cloud Data Loss Prevention and create an inspection template that
includes the STREET_ADDRESS infoType.

Correct : A

To retrieve all occurrences of street addresses from a BigQuery dataset, the most effective and comprehensive method is to use Cloud Data Loss Prevention (DLP). Here's why option A is the best choice:

Cloud Data Loss Prevention (DLP):

Cloud DLP is designed to discover, classify, and protect sensitive information. It includes pre-defined infoTypes for various kinds of sensitive data, including street addresses.

Using Cloud DLP ensures thorough and accurate detection of street addresses based on advanced pattern recognition and contextual analysis.

Deep Inspection Job:

A deep inspection job allows you to scan entire tables for sensitive information.

By creating an inspection template that includes the STREET_ADDRESS infoType, you can ensure that all instances of street addresses are detected across your dataset.

Scalability and Accuracy:

Cloud DLP is scalable and can handle large datasets efficiently.

It provides a high level of accuracy in identifying sensitive data, reducing the risk of missing any occurrences.

Steps to Implement:

Set Up Cloud DLP:

Enable the Cloud DLP API in your Google Cloud project.

Create an Inspection Template:

Create an inspection template in Cloud DLP that includes the STREET_ADDRESS infoType.

Run Deep Inspection Jobs:

Create and run a deep inspection job for each table in your dataset using the inspection template.

Review the inspection job results to retrieve all occurrences of street addresses.

Cloud DLP Documentation

Creating Inspection Jobs

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

ACreate a deep inspection job on each table in your dataset with Cloud Data Loss Prevention and create an inspection template that includes the STREET_ADDRESS infoType.

BCreate a de-identification job in Cloud Data Loss Prevention and use the masking transformation.

CWrite a SQL query in BigQuery by using REGEXP_CONTAINS on all tables in your dataset to find rows where the word 'street' appears.

DCreate a discovery scan configuration on your organization with Cloud Data Loss Prevention and create an inspection template that
includes the STREET_ADDRESS infoType.

0 / 1500

Question 2

You are administering a BigQuery on-demand environment. Your business intelligence tool is submitting hundreds of queries each day that aggregate a large (50 TB) sales history fact table at the day and month levels. These queries have a slow response time and are exceeding cost expectations. You need to decrease response time, lower query costs, and minimize maintenance. What should you do?

ABuild materialized views on top of the sales table to aggregate data at the day and month level.

BBuild authorized views on top of the sales table to aggregate data at the day and month level.

CEnable Bl Engine and add your sales table as a preferred table.

DCreate a scheduled query to build sales day and sales month aggregate tables on an hourly basis.

Correct : A

To improve response times and reduce costs for frequent queries aggregating a large sales history fact table, materialized views are a highly effective solution. Here's why option A is the best choice:

Materialized Views:

Materialized views store the results of a query physically and update them periodically, offering faster query responses for frequently accessed data.

They are designed to improve performance for repetitive and expensive aggregation queries by precomputing the results.

Efficiency and Cost Reduction:

By building materialized views at the day and month level, you significantly reduce the computation required for each query, leading to faster response times and lower query costs.

Materialized views also reduce the need for on-demand query execution, which can be costly when dealing with large datasets.

Minimized Maintenance:

Materialized views in BigQuery are managed automatically, with updates handled by the system, reducing the maintenance burden on your team.

Steps to Implement:

Identify Aggregation Queries:

Analyze the existing queries to identify common aggregation patterns at the day and month levels.

Create Materialized Views:

Create materialized views in BigQuery for the identified aggregation patterns. For example

CREATE MATERIALIZED VIEW project.dataset.sales_daily_summary AS

SELECT

DATE(transaction_time) AS day,

SUM(amount) AS total_sales

FROM

project.dataset.sales

GROUP BY

day;

CREATE MATERIALIZED VIEW project.dataset.sales_monthly_summary AS

SELECT

EXTRACT(YEAR FROM transaction_time) AS year,

EXTRACT(MONTH FROM transaction_time) AS month,

SUM(amount) AS total_sales

FROM

project.dataset.sales

GROUP BY

year, month;

Query Using Materialized Views:

Update existing queries to use the materialized views instead of directly querying the base table.

BigQuery Materialized Views

Optimizing Query Performance

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

ABuild materialized views on top of the sales table to aggregate data at the day and month level.

BBuild authorized views on top of the sales table to aggregate data at the day and month level.

CEnable Bl Engine and add your sales table as a preferred table.

DCreate a scheduled query to build sales day and sales month aggregate tables on an hourly basis.

0 / 1500

Question 3

You have a Standard Tier Memorystore for Redis instance deployed in a production environment. You need to simulate a Redis instance failover in the most accurate disaster recovery situation, and ensure that the failover has no impact on production dat

a. What should you do?

ACreate a Standard Tier Memorystore for Redis instance in a development environment. Initiate a manual failover by using the force-data-loss data protection mode.

BInitiate a manual tailover by using the limited-data-loss data protection mode to the Memorystore for Redis instance in the
production environment.

CIncrease one replica to Redis instance in production environment. Initiate a manual failover by using the force-data-loss data
protection mode.

DCreate a Standard Tier Memorystore for Redis instance in the development environment. Initiate a manual failover by using the limited-data-loss data protection mode.

Correct : D

To simulate a Redis instance failover in a production-like environment without impacting production data, the best approach is to use a development environment. Here's why option D is the best choice:

Standard Tier Memorystore for Redis:

The Standard Tier provides high availability and automatic failover capabilities. It's suitable for testing failover scenarios in a controlled environment.

Development Environment:

Using a development environment ensures that any potential data loss or impact from the failover simulation does not affect production data, maintaining the integrity and availability of the production system.

Limited-Data-Loss Mode:

The limited-data-loss mode for manual failover ensures that data loss is minimized during the failover process, making it a realistic simulation of a production failover scenario.

Steps to Implement:

Create a Development Environment:

Set up a development environment with a Standard Tier Memorystore for Redis instance that mirrors the configuration of your production instance.

Initiate Manual Failover:

Initiate a manual failover using the limited-data-loss data protection mode to simulate a failover scenario:

gcloud redis instances failover INSTANCE_ID --data-protection-mode=limited-data-loss

Verify Failover:

Monitor and verify the failover process to ensure it behaves as expected, simulating the disaster recovery scenario accurately.

Memorystore for Redis Documentation

Manual Failover in Memorystore

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

ACreate a Standard Tier Memorystore for Redis instance in a development environment. Initiate a manual failover by using the force-data-loss data protection mode.

BInitiate a manual tailover by using the limited-data-loss data protection mode to the Memorystore for Redis instance in the
production environment.

CIncrease one replica to Redis instance in production environment. Initiate a manual failover by using the force-data-loss data
protection mode.

DCreate a Standard Tier Memorystore for Redis instance in the development environment. Initiate a manual failover by using the limited-data-loss data protection mode.

0 / 1500

Question 4

You are planning to use Cloud Storage as pad of your data lake solution. The Cloud Storage bucket will contain objects ingested from external systems. Each object will be ingested once, and the access patterns of individual objects will be random. You want to minimize the cost of storing and retrieving these objects. You want to ensure that any cost optimization efforts are transparent to the users and applications. What should you do?

ACreate a Cloud Storage bucket with Autoclass enabled.

BCreate a Cloud Storage bucket with an Object Lifecycle Management policy to transition objects from Standard to Coldline storage class if an object age reaches 30 days.

CCreate a Cloud Storage bucket with an Object Lifecycle Management policy to transition objects from Standard to Coldline storage class if an object is not live.

DCreate two Cloud Storage buckets. Use the Standard storage class for the first bucket, and use the Coldline storage class for the second bucket. Migrate objects from the first bucket to the second bucket after 30 days.

Correct : A

To minimize the cost of storing and retrieving objects in a Cloud Storage bucket while ensuring that cost optimization efforts are transparent to the users and applications, enabling Autoclass is the best approach. Here's why:

Autoclass Feature:

Autoclass automatically transitions objects between different storage classes (Standard, Nearline, Coldline, and Archive) based on their access patterns.

It ensures that frequently accessed data is kept in lower-latency, higher-cost storage classes and infrequently accessed data is moved to higher-latency, lower-cost storage classes.

Cost Optimization:

Autoclass optimizes storage costs by automatically moving objects to the most cost-effective storage class based on actual usage patterns, without manual intervention.

This feature ensures that objects are stored in the most economical class appropriate for their access frequency, reducing storage costs over time.

Transparency to Users:

The transition of objects between storage classes is handled automatically by Cloud Storage, making the process transparent to users and applications.

Users and applications interact with the objects in the same way, regardless of the underlying storage class, ensuring seamless access.

Steps to Implement:

Create a Cloud Storage Bucket:

When creating a new Cloud Storage bucket, enable the Autoclass feature.

Configure Autoclass:

Autoclass configuration is typically a straightforward process in the Google Cloud Console, where you enable it during bucket creation.

Monitor and Adjust:

Monitor the storage and access patterns through the Google Cloud Console to ensure that Autoclass is optimizing costs as expected.

Google Cloud Storage Autoclass

Optimizing Storage Costs with Autoclass

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

ACreate a Cloud Storage bucket with Autoclass enabled.

BCreate a Cloud Storage bucket with an Object Lifecycle Management policy to transition objects from Standard to Coldline storage class if an object age reaches 30 days.

CCreate a Cloud Storage bucket with an Object Lifecycle Management policy to transition objects from Standard to Coldline storage class if an object is not live.

0 / 1500

Question 5

You are building a data pipeline on Google Cloud. You need to prepare data using a casual method for a

machine-learning process. You want to support a logistic regression model. You also need to monitor and

adjust for null values, which must remain real-valued and cannot be removed. What should you do?

AUse Cloud Dataprep to find null values in sample source data. Convert all nulls to 'none' using a Cloud
Dataproc job.

BUse Cloud Dataprep to find null values in sample source data. Convert all nulls to 0 using a Cloud
Dataprep job.

CUse Cloud Dataflow to find null values in sample source data. Convert all nulls to 'none' using a Cloud
Dataprep job.

DUse Cloud Dataflow to find null values in sample source data. Convert all nulls to using a custom script.

Correct : C

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

AUse Cloud Dataprep to find null values in sample source data. Convert all nulls to 'none' using a Cloud
Dataproc job.

BUse Cloud Dataprep to find null values in sample source data. Convert all nulls to 0 using a Cloud
Dataprep job.

CUse Cloud Dataflow to find null values in sample source data. Convert all nulls to 'none' using a Cloud
Dataprep job.

DUse Cloud Dataflow to find null values in sample source data. Convert all nulls to using a custom script.

0 / 1500