Deploy a Vertex AI Workbench instance
Create logistic regression model
Create a logistic regression model by adding additional airport information
Machine Learning with BigQuery ML
BigQuery is a serverless, highly scalable data warehouse. It is also an excellent machine learning platform. This combination is significant for organizations that handle privacy-sensitive or confidential data as you can do machine learning without having to extract data out of the data warehouse. Not having extracts of data floating around in people’s projects is important for security. Also, the auditability that BigQuery provides out of the box means that you know exactly who created the model, and which data was used in which model.
This lab uses a set of code samples and scripts developed for Data Science on the Google Cloud Platform, 2nd Edition from O'Reilly Media, Inc.
Deploy a Vertex AI Workbench instance
Build logistic regression model with BigQuery ML
Evaluate the model
Make prediction from the model
Create an AI model and then use the model to make predictions
Add additional information to evaluate the model
Setup and requirements
Before you click the Start Lab button
Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.
This hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.
What you need
To complete this lab, you need:
- Access to a standard internet browser (Chrome browser recommended).
- Time to complete the lab.
Note: If you already have your own personal Google Cloud account or project, do not use it for this lab.
Note: If you are using a Chrome OS device, open an Incognito window to run this lab.
How to start your lab and sign in to the Google Cloud Console
Click the Start Lab button. If you need to pay for the lab, a pop-up opens for you to select your payment method. On the left is a panel populated with the temporary credentials that you must use for this lab.
Copy the username, and then click Open Google Console. The lab spins up resources, and then opens another tab that shows the Sign in page.
Tip: Open the tabs in separate windows, side-by-side.
In the Sign in page, paste the username that you copied from the left panel. Then copy and paste the password.
Important: You must use the credentials from the left panel. Do not use your Google Cloud Training credentials. If you have your own Google Cloud account, do not use it for this lab (avoids incurring charges).
Click through the subsequent pages:
- Accept the terms and conditions.
- Do not add recovery options or two-factor authentication (because this is a temporary account).
- Do not sign up for free trials.
After a few moments, the Cloud Console opens in this tab.
Task 1. Deploy a Vertex AI Workbench instance
On the Navigation Menu, click More products, and then Vertex AI > Workbench.
On the Notebook instances page, click New Notebook > TensorFlow Enterprise > TensorFlow Enterprise 2.6 (with LTS) > Without GPUs.
In the New notebook dialog, name the Notebook
For Region, select
For Zone, select a zone within the selected region.
Leave all other fields with their default options, and click Create.
As the Notebook spins up, the bqml-notebook notebook is listed in the Workbench Notebook list. When it completes,
Open Jupyterlab appears inline with the Notebook name.
- Click Open JupyterLab.
When the Vertex AI Notebook instance starts you will see a suggestion to start a build to add tensorflow_model_analysis and xai_tabular_widget.
- Click Build.
Your notebook is now set up.
- In the Notebook launcher section click Python 3 to open a new notebook.
To use a Notebook you enter commands into a cell. Be sure you run the commands in the cell by either pressing Shift + Enter, or clicking the triangle on the Notebook top menu to Run selected cells and advance.
Task 2. Build logistic regression model
Create the training dataset
The first step in BigQuery ML is to create the training dataset. We want the three features (departure delay, taxi out time, and distance.) and the label (ontime), so let’s use SQL to craft the dataset just the way we want it.
Enter following code into new cell and then run the cell.
Create the model
In this section, you create a logistic regression model where the label column is called
ontime, and the remaining columns are used as input features to the model. Here, you do not want to randomly split the flight data – you need to avoid having correlated flights on the same day split between training and test datasets.
That’s why when you created the training data in the previous section, you pre-split the data and created a table that specifies which days should be used for training and which days for evaluation. The resulting model parameters are stored in a BigQuery model object called
arr_delay_lm in the dataset
You can explicitly tell BigQuery ML to use a column in the training dataset to split the data. Add that column to your
SELECT statement as a Boolean value by joining the flight data against the table of prespecified training days.
- Create a logistic regression model in BigQuery ML by running the following query in new cell.
In the above query, BigQuery trains a logistic regression model and puts the weights into the model
arr_delay_lm. Obtain the training error (called the loss) by querying the result of the special function
- Enter following into new cell and run to obtain training error.
Task 3. Evaluate the model
Evaluate the model by calling
ML.EVALUATE in SQL. BigQuery then evaluates the model on the withheld data (where
False), but use a threshold of 0.7.
Run the query below in the new cell to evaluate the model.
The resulting evaluation statistics are:
The precision is how often the model is right when it reports a flight as being on time. The recall is the fraction of on time flights correctly classified. The Receiver Operating Characteristic (ROC) is a threshold-independent measure of classifier performance.
Task 4. Make prediction from the model
ML.WEIGHTS function allows you to see the underlying weights used by a model during prediction. This function applies to linear & logistic regression models.
- Run the following query in new cell to obtain the weights by calling
The output is similar to the following:
However, there is usually no point to getting just the weights. Instead, what you want is the predicted value for some set of inputs.
- Run the query below in a new cell to carry out a prediction.
While you can use ML.PREDICT to actually carry out predictions, the predictions are subject to the typical BigQuery latency of a second or so. Therefore, ML.PREDICT is typically used for batch predictions over large datasets. For online prediction (i.e. exposing the prediction service a microservice using REST), you can extract the model as a TensorFlow model and deploy it into Vertex AI.
ML.PREDICT to actually carry out predictions in case you want to compute some other metric.
- Run the query below in new cell to compute Root Mean Squared Error (RMSE).
Output something similar to the following:
The query above pulls out the probability field from the predictions (it’s an array, one for each category, hence the UNNEST) and uses it to compute the RMSE. The resulting RMSE was 0.2131.
Task 5. Create, evaluate and predict the model by adding additional airport information
Create the BigQuery ML logistic regression model
arr_delay_airports_lm by adding airport information to model (note two additional columns: origin and dest). This seemingly simple change adds two categorical variables that, when one-hot-encoded, adds 600+ new columns to the model.
- To showcase the scalability of BigQuery, add two fields, the
- Evaluate the model
- Make prediction from the model
The resulting prediction statistics are:
The model that includes the airport information has a RMSE of 0.2098, which is an improvement over the original model created above.
BigQuery ML provides a simple and powerful SQL interface for machine learning. You created a classifier model for predicting flight delays using BigQuery ML. Now you know how to use BigQuery ML to build, evaluate, and use it for batch predictions with just SQL.
Finish your Quest
This self-paced lab is part of the Quest, Data Science on Google Cloud: Machine Learning. A Quest is a series of related labs that form a learning path. Completing this Quest earns you the badge above, to recognize your achievement. You can make your badges public and link to them in your online resume or social media account. Enroll in this Quest and get immediate completion credit if you've taken this lab. See other available Quests.
Take your next Lab
Next steps / learn more
The source of this lab:
Explore other notebooks. Repeat the steps in this lab for the following notebooks (in order).
Google Cloud Training & Certification
...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.
Manual Last Updated April 5, 2022
Lab Last Tested March 23, 2021
Copyright 2022 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.