Running R at Scale on Compute Engine
This QwikLab shows you how to run R scripts on multiple physical nodes in the Google Cloud Platform (GCP). R is an open source programming language that's used extensively by statisticians and economists for modeling and data visualization. Many of these models require serious memory and computational power—far more than what's available on a single node or virtual machine. In turn, computational clusters are used to aggregate memory and computation across tens to hundreds of nodes and thousands of computation cores. This tutorial shows you how to leverage computational clusters with R so you can start scaling your own analytic models.
R has a number of packages that make it easy to program a cluster of nodes for your modeling and analytics:
- Snow provides clustering capabilities with standard sockets or with high-performance Message Passing Interface (MPI).
- RHIPE provides an interface to Hadoop for analysis of data from within R, using the map-reduce approach to parallelism.
- Rslurm provides functions to allow submitting R scripts to a Slurm cluster workload manager.
- Rmpi provides a low-level interface to the MPI parallel API.
R supports many other packages for parallelism.
This lab uses
Rmpi largely because it supports a number of different libraries. With Rmpi, an R developer uses high-performance computing (HPC) clusters and workload managers to submit a job. The job consists of an R script that uses the Rmpi interface to create processes across the nodes in the cluster, and to send and receive messages across those nodes.
- Install a small 5-node compute cluster using the cluster-provisioning tool ElastiCluster and the Slurm workload manager.
- Customize ElastiCluster to install additional software packages.
- Submit a job to the workload manager to run an R script that leverages the computation capabilities across the cluster.
This is an expert level lab. Before taking it, you should be comfortable with at least the basics of R, clusters, and shell programming. Here are some Qwiklabs that can get you up to speed:
- Getting Started with Cloud Shell & gcloud
- Creating a Virtual Machine
- Awwvision: Cloud Vision API from a Kubernetes Cluster
- Introduction to Kubeflow on Google Kubernetes Engine
Once your prepared, scroll down to learn how you can run R at scale.
Join Qwiklabs to Read the Rest of this Lab...and More!
- Get temporary access to the Google Cloud Console.
- Nearly 100 labs from beginner to advanced levels.
- Bite-sized so you can learn at your own pace.