Introduction
In the era of big data real-time data streaming is crucial for modern applications that require immediate insights and actions based on incoming data. Amazon Kinesis is a powerful service that enables developers to collect, process and analyze real-time streaming data. This blog will provide an in-depth guide to Amazon Kinesis including setting up data streams and integrating with other AWS services for real-time analytics. By the end of this tutorial you will understand how to harness the power of Amazon Kinesis to build applications that can handle and process large streams of data in real-time.
Overview of Amazon Kinesis
Amazon Kinesis is a family of services designed to enable real-time data streaming and analytics. It includes :-
Amazon Kinesis Data Streams : Enables you to build real-time custom applications that process or analyze streaming data.
Amazon Kinesis Data Firehose : The easiest way to reliably load streaming data into data lakes, data stores and analytics services.
Amazon Kinesis Data Analytics : Enables you to process and analyze streaming data using standard SQL.
Amazon Kinesis Video Streams : Makes it easy to securely stream video from connected devices to AWS for analytics, machine learning and other processing.
Setting Up Amazon Kinesis Data Streams
Step 1 :- Create a Kinesis Data Stream
Sign in to the AWS Management Console and open the Amazon Kinesis console.
Choose Data Streams in the left navigation pane.
Select Create data stream.
Name your data stream (e.g. my-data-stream ).
Set the number of shards. Each shard provides a fixed unit of capacity. For this example we'll use 1 shard.
Choose Create data stream.
Step 2 :- Writing Data to the Stream
To write data to the stream we will use the AWS SDK for Python (Boto3). Ensure you have the AWS CLI configured with your credentials.
Install Boto3 :-
pip install boto3
Write a Python script to send data to your stream :-
import boto3
import json
import datetime
import random
# Initialize Kinesis client
kinesis_client = boto3.client('kinesis', region_name='us-east-1')
def generate_data():
return {
'event_time': datetime.datetime.now().isoformat(),
'event_type': random.choice(['click', 'view', 'purchase']),
'user_id': random.randint(1, 100)
}
def send_data_to_kinesis():
while True:
data = generate_data()
print(f'Sending data: {data}')
kinesis_client.put_record(
StreamName='my-data-stream',
Data=json.dumps(data),
PartitionKey=str(data['user_id'])
)
if __name__ == '__main__':
send_data_to_kinesis()
This script continuously sends random events to the Kinesis data stream.
Step 3 :- Reading Data from the Stream
To read data from the stream we will again use the AWS SDK for Python (Boto3).
import boto3
# Initialize Kinesis client
kinesis_client = boto3.client('kinesis', region_name='us-east-1')
# Get shard iterator
shard_iterator = kinesis_client.get_shard_iterator(
StreamName='my-data-stream',
ShardId='shardId-000000000000',
ShardIteratorType='LATEST'
)['ShardIterator']
# Continuously read data from the stream
while True:
response = kinesis_client.get_records(ShardIterator=shard_iterator, Limit=10)
records = response['Records']
if records:
for record in records:
print(f'Record: {json.loads(record["Data"])}')
shard_iterator = response['NextShardIterator']
This script reads data from the Kinesis data stream and prints it to the console.
Integrating with Other AWS Services
Amazon Kinesis Data Firehose
Kinesis Data Firehose is a fully managed service for delivering real-time streaming data to destinations such as Amazon S3, Amazon Redshift, Amazon Elasticsearch Service and Splunk.
Step 1 :- Create a Firehose Delivery Stream
Sign in to the AWS Management Console and open the Amazon Kinesis console.
Choose Data Firehose in the left navigation pane.
Select Create delivery stream.
Name your delivery stream (e.g. my-firehose-stream ).
Select the source. For this example we'll use "Direct PUT or other sources".
Choose the destination (e.g. Amazon S3).
Configure destination settings (e.g. specify the S3 bucket).
Choose Create delivery stream.
Step 2 :- Send Data to Firehose
Modify the previous Python script to send data to the Firehose delivery stream instead of the Kinesis data stream.
import boto3
import json
import datetime
import random
# Initialize Firehose client
firehose_client = boto3.client('firehose', region_name='us-east-1')
def generate_data():
return {
'event_time': datetime.datetime.now().isoformat(),
'event_type': random.choice(['click', 'view', 'purchase']),
'user_id': random.randint(1, 100)
}
def send_data_to_firehose():
while True:
data = generate_data()
print(f'Sending data: {data}')
firehose_client.put_record(
DeliveryStreamName='my-firehose-stream',
Record={'Data': json.dumps(data)}
)
if __name__ == '__main__':
send_data_to_firehose()
Amazon Kinesis Data Analytics
Kinesis Data Analytics enables you to process and analyze streaming data using standard SQL.
Step 1 :- Create a Kinesis Data Analytics Application
Sign in to the AWS Management Console and open the Amazon Kinesis console.
Choose Data Analytics in the left navigation pane.
Select Create application.
Name your application (e.g. my-analytics-app ).
Select the source. Choose "Connect to a stream" and specify your Kinesis data stream.
Choose Create application.
Step 2 :- Write SQL Queries
Within the Kinesis Data Analytics console you can write SQL queries to process the data.
Example SQL query to count the number of events by type :-
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
"event_type" VARCHAR(64),
"event_count" INTEGER
);
CREATE OR REPLACE PUMP "STREAM_PUMP" AS
INSERT INTO "DESTINATION_SQL_STREAM"
SELECT
"event_type",
COUNT(*) AS "event_count"
FROM "SOURCE_SQL_STREAM_001"
GROUP BY "event_type";
This query groups events by type and counts the occurrences then inserts the results into the destination stream.
Amazon Kinesis Video Streams
Kinesis Video Streams makes it easy to securely stream video from connected devices to AWS for analytics, machine learning and other processing.
Step 1 :- Create a Kinesis Video Stream
Sign in to the AWS Management Console and open the Amazon Kinesis console.
Choose Video Streams in the left navigation pane.
Select Create video stream.
Name your video stream (e.g. my-video-stream ).
Choose Create video stream.
Step 2 :- Stream Video Data
Use the Kinesis Video Streams SDK to stream video data from a device.
import boto3
import cv2
# Initialize Kinesis Video Streams client
kvs_client = boto3.client('kinesisvideo', region_name='us-east-1')
# Get data endpoint
response = kvs_client.get_data_endpoint(
StreamName='my-video-stream',
APIName='PUT_MEDIA'
)
endpoint = response['DataEndpoint']
# Initialize PUT media client
media_client = boto3.client('kinesis-video-media', endpoint_url=endpoint)
# Open video capture
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if not ret:
break
_, buffer = cv2.imencode('.jpg', frame)
media_client.put_media(
StreamName='my-video-stream',
Payload=buffer.tobytes(),
ContentType='video/jpeg'
)
cap.release()
This script captures video from the device's camera and streams it to the Kinesis video stream.
Real-Time Analytics with Amazon Kinesis
Example Use Case :- Real-Time Clickstream Analytics
Imagine you run a website and want to analyze user behavior in real-time. You can use Amazon Kinesis to collect clickstream data, process it with Kinesis Data Analytics and visualize the results in real-time.
Collect Clickstream Data : Use a script or web application to send clickstream data to a Kinesis data stream.
Process Data with Kinesis Data Analytics : Create a Kinesis Data Analytics application to process the clickstream data and generate insights.
Visualize Results : Use Amazon QuickSight or another visualization tool to create real-time dashboards and reports.
Example SQL Query for Clickstream Analysis
CREATE OR REPLACE STREAM "CLICKSTREAM_ANALYTICS_STREAM" (
"page" VARCHAR(64),
"click_count" INTEGER,
"unique_users" INTEGER
);
CREATE OR REPLACE PUMP "CLICKSTREAM_PUMP" AS
INSERT INTO "CLICKSTREAM_ANALYTICS_STREAM"
SELECT
"page",
COUNT(*) AS "click_count",
COUNT(DISTINCT "user_id") AS "unique_users"
FROM "SOURCE_SQL_STREAM_001"
GROUP BY "page";
This query counts the number of clicks and unique users per page.
Conclusion
Amazon Kinesis provides a robust platform for real-time data streaming and analytics. By using Kinesis Data Streams, Kinesis Data Firehose, Kinesis Data Analytics and Kinesis Video Streams you can build applications that process and analyze large streams of data in real-time. Whether you're working with clickstream data, IoT device data or video streams Amazon Kinesis offers the tools you need to gain immediate insights and drive informed decisions.
Stay tuned for more insights in our upcoming blog posts.