Implementing a Streaming Data Processing Pipeline with Amazon Kinesis

INTERMEDIATE
95 minutes
5 tasks

In this lab, you'll build a real-time data processing pipeline using Amazon Kinesis services. You will configure Kinesis Data Streams to ingest streaming data, process it with AWS Lambda functions, and store the results in Amazon S3. Additionally, you'll set up AWS Glue to catalog the data and enable quick access with Amazon Athena for data analytics. This lab will give you hands-on experience with critical AWS services that are foundational for real-time analytics solutions, demonstrating how to efficiently integrate streaming and batch processes.

Scenario

A streaming video platform needs to process user activity data in real-time to provide insights into viewer engagement and content popularity. As the data engineer, you'll implement a processing pipeline that can handle hundreds of data streams simultaneously and offer near real-time analytics.

Learning Objectives

  • Set up Amazon Kinesis Data Streams to ingest real-time data.
  • Process streaming data with AWS Lambda functions.
  • Store processed data in Amazon S3 for persistence.
  • Use AWS Glue to catalog the S3 data for analytics.
  • Query the cataloged data using Amazon Athena.

tasks (5)

task 1: Create Kinesis Data Stream for video platform ingestion

15 min

task 2: Develop a Lambda function to process incoming data

25 min

task 3: Store processed data in Amazon S3

20 min

task 4: Catalog S3 data with AWS Glue

20 min

task 5: Query processed data using Amazon Athena

15 min

Prerequisites

  • Basic understanding of AWS services like Lambda and S3
  • Familiarity with data streaming concepts

Skills Tested

Implement real-time data ingestion with Amazon Kinesis Data StreamsConfigure serverless data processing using AWS LambdaStore and manage processed data in Amazon S3Use AWS Glue for data catalogingPerform data queries with Amazon Athena
    Implementing a Streaming Data Processing Pipeline with Amazon Kinesis - Hands-On Lab - CertiPass