menu
arrow_back

Ingesting Data Into The Cloud

60m access · 45m completion
Connection Details

5 Credits

info_outline
This lab costs 5 Credits to run. You can purchase credits or a subscription under My Account.

01:00:00

Ingesting Data Into The Cloud

GSP194

Google Cloud Self-Paced Labs

Overview

In this lab you'll learn how to use a bash script to download selected data, which provides historic information about internal flights in the United States, from a large public data set available on the internet.

The techniques used to ingest this data from the US Bureau of Transport Statistics website into the cloud can be applied generally to other data sets that provide comprehensive real world data that needs to be parsed and cleaned up before it can be used.

This data will be used in other labs in the Data Science on Google Cloud Platform quest to demonstrate a wide range of data science concepts and techniques using the Google Cloud Platform.

Objectives

  • Retrieve initial data from the BTS website.

  • Store the data on Google Cloud Storage

Setup and Requirements

What you'll need

To complete this lab, you’ll need:

  • Access to a standard internet browser (Chrome browser recommended).

  • Time. Note the lab’s Completion time in Qwiklabs. This is an estimate of the time it should take to complete all steps. Plan your schedule so you have time to complete the lab. Once you start the lab, you will not be able to pause and return later (you begin at step 1 every time you start a lab).

  • The lab's Access time is how long your lab resources will be available. If you finish your lab with access time still available, you will be able to explore the Google Cloud Platform or work on any section of the lab that was marked "if you have time". Once the Access time runs out, your lab will end and all resources will terminate.

  • You DO NOT need a Google Cloud Platform account or project. An account, project and associated resources are provided to you as part of this lab.

  • If you already have your own GCP account, make sure you do not use it for this lab.

  • If your lab prompts you to log into the console, use only the student account provided to you by the lab. This prevents you from incurring charges for lab activities in your personal GCP account.

Start your lab

When you are ready, click Start Lab. You can track your lab’s progress with the status bar at the top of your screen.

Find Your Lab’s GCP Username and Password

To access the resources and console for this lab, locate the Connection Details panel in Qwiklabs. Here you will find the account ID and password for the account you will use to log in to the Google Cloud Platform:

Open Google Console

If your lab provides other resource identifiers or connection-related information, it will appear on this panel as well.

Log in to Google Cloud Console

Using the Qwiklabs browser tab/window or the separate browser you are using for the Qwiklabs session, copy the Username from the Connection Details panel and click the “Open Google Console” button.

You'll be asked to choose an account. Click Use another account.

Google_choose_Account

Paste in the Username, and then the Password as prompted:

Sign in to continue to Google Cloud Platform

Accept the terms and conditions.

Since this is a temporary account, which you will only have access to for this one lab:

  • Do not add recovery options
  • Do not sign up for free trials

Activate Google Cloud Shell

Google Cloud Shell provides command-line access to your GCP resources.

From the GCP Console click the Cloud Shell icon on the top right toolbar:

Cloud Shell Icon

Then click START CLOUD SHELL:

Start Cloud Shell

It takes a few moments to provision and connect to the environment:

Cloud Shell Terminal

The Cloud Shell is a virtual machine loaded with all the development tools you’ll need. It offers a persistent 5GB home directory, and runs on the Google Cloud, greatly enhancing network performance and authentication.

Once connected to the cloud shell, you'll see that you are already authenticated and the project is set to your PROJECT_ID:

gcloud auth list

Output:

Credentialed accounts:
 - <myaccount>@<mydomain>.com (active)
gcloud config list project

Output:

[core]
project = <PROJECT_ID>

Join Qwiklabs to read the rest of this lab...and more!

  • Get temporary access to the Google Cloud Console.
  • Over 200 labs from beginner to advanced levels.
  • Bite-sized so you can learn at your own pace.
Join to Start This Lab
Score

—/10

Create a new Cloud Storage bucket

Run Step

/ 5

Copy data files to the storage bucket

Run Step

/ 5

home
Home
school
Catalog
menu
More
More