menu
arrow_back

Running R at Scale on Google Compute Engine

90m access · 60m completion
Connection Details

9 Credits

info_outline
This lab costs 9 Credits to run. You can purchase credits or a subscription under My Account.

01:30:00

Running R at Scale on Compute Engine

GSP134

Google Cloud Self-Paced Labs

Overview

This QwikLab shows you how to run R scripts on multiple physical nodes in the Google Cloud Platform (GCP). R is an open source programming language that's used extensively by statisticians and economists for modeling and data visualization. Many of these models require serious memory and computational power—far more than what's available on a single node or virtual machine. In turn, computational clusters are used to aggregate memory and computation across tens to hundreds of nodes and thousands of computation cores. This tutorial shows you how to leverage computational clusters with R so you can start scaling your own analytic models.

R has a number of packages that make it easy to program a cluster of nodes for your modeling and analytics:

  • Snow provides clustering capabilities with standard sockets or with high-performance Message Passing Interface (MPI).
  • RHIPE provides an interface to Hadoop for analysis of data from within R, using the map-reduce approach to parallelism.
  • Rslurm provides functions to allow submitting R scripts to a Slurm cluster workload manager.
  • Rmpi provides a low-level interface to the MPI parallel API.

R supports many other packages for parallelism.

This lab uses Rmpi largely because it supports a number of different libraries. With Rmpi, an R developer uses high-performance computing (HPC) clusters and workload managers to submit a job. The job consists of an R script that uses the Rmpi interface to create processes across the nodes in the cluster, and to send and receive messages across those nodes.

Objectives

  • Install a small 5-node compute cluster using the cluster-provisioning tool ElastiCluster and the Slurm workload manager.

  • Customize ElastiCluster to install additional software packages.

  • Submit a job to the workload manager to run an R script that leverages the computation capabilities across the cluster.

Prerequisites

This is an expert level lab. Before taking it, you should be comfortable with at least the basics of R, clusters, and shell programming. Here are some Qwiklabs that can get you up to speed:

Once your prepared, scroll down to learn how you can run R at scale.

Join Qwiklabs to read the rest of this lab...and more!

  • Get temporary access to the Google Cloud Console.
  • Over 200 labs from beginner to advanced levels.
  • Bite-sized so you can learn at your own pace.
Join to Start This Lab
home
Home
school
Catalog
menu
More
More