Orchestrating a Data Pipeline with BigQuery and Cloud Storage

INTERMEDIATE
90 minutes
5 tasks

In this lab, you'll learn to orchestrate a data pipeline using Google Cloud services such as BigQuery and Cloud Storage. You'll design a pipeline that reads data from CSV files stored in Cloud Storage, transforms it using BigQuery SQL, and loads the results into a BigQuery table. This hands-on experience will prepare you to automate, schedule, and monitor data processing tasks in GCP.

Scenario

Your company, Data Insights Inc., processes millions of rows of transaction data daily. The business needs to analyze daily sales data by region to adjust marketing efforts effectively. You'll set up a data pipeline that ingests transaction data into BigQuery for analytical processing.

Learning Objectives

  • Design a simple data pipeline using BigQuery and Cloud Storage.
  • Load and transform CSV data in BigQuery.
  • Automate and schedule data processing tasks with data pipeline orchestration.
  • Monitor the data pipeline using Cloud Logging and Cloud Monitoring.

tasks (5)

task 1: Create a Cloud Storage bucket for data storage

10 min

task 2: Upload CSV data files to the Cloud Storage bucket

15 min

task 3: Load CSV data into BigQuery and create a table

20 min

task 4: Schedule daily query to analyze transactions

25 min

task 5: Monitor pipeline using Cloud Monitoring and Logging

20 min

Prerequisites

  • Basic understanding of Google Cloud Storage concepts.
  • Familiarity with writing SQL queries in BigQuery.

Skills Tested

Design and implement simple data pipelines with BigQuery and Cloud Storage.Schedule and automate data processing tasks using BigQuery scheduled queries.Monitor Data Pipeline using Cloud Logging and Monitoring.
    Orchestrating a Data Pipeline with BigQuery and Cloud Storage - Hands-On Lab - CertiPass