Databricks-Machine-Learning-Associate Exam Questions - Real Practice Questions for Guaranteed Success

Question 2

A data scientist uses 3-fold cross-validation and the following hyperparameter grid when optimizing model hyperparameters via grid search for a classification problem:

Hyperparameter 1: [2, 5, 10]

Hyperparameter 2: [50, 100]

Which of the following represents the number of machine learning models that can be trained in parallel during this process?

D18

Correct : D

To determine the number of machine learning models that can be trained in parallel, we need to calculate the total number of combinations of hyperparameters. The given hyperparameter grid includes:

Hyperparameter 1: [2, 5, 10] (3 values)

Hyperparameter 2: [50, 100] (2 values)

The total number of combinations is the product of the number of values for each hyperparameter: 3(valuesofHyperparameter1)2(valuesofHyperparameter2)=63(valuesofHyperparameter1)2(valuesofHyperparameter2)=6

With 3-fold cross-validation, each combination of hyperparameters will be evaluated 3 times. Thus, the total number of models trained will be: 6(combinations)3(folds)=186(combinations)3(folds)=18

However, the number of models that can be trained in parallel is equal to the number of hyperparameter combinations, not the total number of models considering cross-validation. Therefore, 6 models can be trained in parallel.

Databricks documentation on hyperparameter tuning: Hyperparameter Tuning

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

D18

0 / 1500

Question 3

An organization is developing a feature repository and is electing to one-hot encode all categorical feature variables. A data scientist suggests that the categorical feature variables should not be one-hot encoded within the feature repository.

Which of the following explanations justifies this suggestion?

AOne-hot encoding is a potentially problematic categorical variable strategy for some machine learning algorithms.

BOne-hot encoding is dependent on the target variable's values which differ for each apaplication.

COne-hot encoding is computationally intensive and should only be performed on small samples of training sets for individual machine learning problems.

DOne-hot encoding is not a common strategy for representing categorical feature variables numerically.

Correct : A

The suggestion not to one-hot encode categorical feature variables within the feature repository is justified because one-hot encoding can be problematic for some machine learning algorithms. Specifically, one-hot encoding increases the dimensionality of the data, which can be computationally expensive and may lead to issues such as multicollinearity and overfitting. Additionally, some algorithms, such as tree-based methods, can handle categorical variables directly without requiring one-hot encoding.

Databricks documentation on feature engineering: Feature Engineering

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

AOne-hot encoding is a potentially problematic categorical variable strategy for some machine learning algorithms.

BOne-hot encoding is dependent on the target variable's values which differ for each apaplication.

COne-hot encoding is computationally intensive and should only be performed on small samples of training sets for individual machine learning problems.

DOne-hot encoding is not a common strategy for representing categorical feature variables numerically.

0 / 1500

Question 4

A data scientist has created a linear regression model that uses log(price) as a label variable. Using this model, they have performed inference and the predictions and actual label values are in Spark DataFrame preds_df.

They are using the following code block to evaluate the model:

regression_evaluator.setMetricName("rmse").evaluate(preds_df)

Which of the following changes should the data scientist make to evaluate the RMSE in a way that is comparable with price?

AThey should exponentiate the computed RMSE value

BThey should take the log of the predictions before computing the RMSE

CThey should evaluate the MSE of the log predictions to compute the RMSE

DThey should exponentiate the predictions before computing the RMSE

Correct : D

When evaluating the RMSE for a model that predicts log-transformed prices, the predictions need to be transformed back to the original scale to obtain an RMSE that is comparable with the actual price values. This is done by exponentiating the predictions before computing the RMSE. The RMSE should be computed on the same scale as the original data to provide a meaningful measure of error.

Databricks documentation on regression evaluation: Regression Evaluation

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

AThey should exponentiate the computed RMSE value

BThey should take the log of the predictions before computing the RMSE

CThey should evaluate the MSE of the log predictions to compute the RMSE

DThey should exponentiate the predictions before computing the RMSE

0 / 1500

Master Databricks-Machine-Learning-Associate Exam with Reliable Practice Questions

Options Selected by Other Users:

Options Selected by Other Users:

Options Selected by Other Users:

Options Selected by Other Users:

Options Selected by Other Users: