In this article, I will describe how to use Athena and QuickSight to make BI/DataViz of your current Loadbalancer Access Logs on AWS: ELB and ALB.

From AWS: “Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

In few minutes, you will be able to query your access log, even if you have a huge amount of logs with this serverless service. I will show you how to use AWS DataViz product called QuickSight to get some great visuals.

In my case, Athena ran over 250GB of sample logs in only 50s.

Loadbalancer: Enable Access Logs

First, you’ll need to activate access logs on your loadbalancers. These logs will be stored in S3 bucket of your choice.

Go to EC2 -> Load Balancers -> Edit attributes -> Access logs

Access_Logs

Athena: Create Database and table

To begin, go the AWS Athena, then create logs database. Then, we will use the following SQL command to create the proper Athena table.

Choice the one corresponding to your actual logs: ELB or ALB Don’t forget to edit the last line with the location of your S3 bucket.

ELB

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
CREATE EXTERNAL TABLE IF NOT EXISTS elb_logs (
 request_timestamp string,
 elb_name string,
 request_ip string,
 request_port int,
 backend_ip string,
 backend_port int,
 request_processing_time double,
 backend_processing_time double,
 client_response_time double,
 elb_response_code string,
 backend_response_code string,
 received_bytes bigint,
 sent_bytes bigint,
 request_verb string,
 url string,
 protocol string,
 user_agent string,
 ssl_cipher string,
 ssl_protocol string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
 'serialization.format' = '1',
 'input.regex' = '([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:\-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \\\"([^ ]*) ([^ ]*) (- |[^ ]*)\\\" (\"[^\"]*\") ([A-Z0-9-]+) ([A-Za-z0-9.-]*)$' )
LOCATION 's3://{bucket}/prefix/AWSLogs/AWS_account_ID/elasticloadbalancing/';

ALB

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
CREATE EXTERNAL TABLE IF NOT EXISTS logs.alb_logs (
 type string,
 time string,
 elb string,
 client_ip string,
 client_port int,
 target_ip string,
 target_port int,
 request_processing_time double,
 target_processing_time double,
 response_processing_time double,
 elb_status_code string,
 target_status_code string,
 received_bytes bigint,
 sent_bytes bigint,
 request_verb string,
 request_url string,
 request_proto string,
 user_agent string,
 ssl_cipher string,
 ssl_protocol string,
 target_group_arn string,
 trace_id string,
 domain_name string,
 chosen_cert_arn string
 )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'input.regex' = '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:\-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" (\"[^\"]*\") ([A-Z0-9-]+) ([A-Za-z0-9.-]*) ([^ ]*) (.*) (.*) (.*)' )
LOCATION 's3://{bucket}/AWSLogs/elasticloadbalancing/';

ref: ELB Docs|ALB Docs

Athena: Run Queries

You will find in this section few examples of what you can get with standard SQL queries.

This query will show you the number of the same IP requesting your backend with HTTP response code 200:

1
2
3
4
5
6
SELECT request_ip, COUNT(*) as count
FROM web_elb
WHERE elb_response_code=200
GROUP BY request_ip
ORDER BY count DESC
LIMIT 10

This one will show you which company behind a proxy are using on default browser “Firefox”

1
2
3
4
5
6
SELECT request_ip, COUNT(*) as client_ip
FROM web_elb
WHERE user_agent LIKE '%Firefox%'
GROUP BY request_ip
ORDER BY client_ip DESC
LIMIT 10

Example result:

Query_Result

Find out which company are using macOS by replacing Firefox by Macintosh ;-)

QuickSight: Create your visuals

Create your QuickSight account

In your AWS Console, Go to QuickSight and follow the quick sign-up process.

Create your dataset

Select Athena dataset type and choice your Athena DB and your ELB/ALB table.

Create visuals / restitution

You will need to play with visuals, different graph type to meet your goals, it could take some time, but for example, you can get this kind of restitution:

QuickSight_Example1

QuickSight_Example2

That’s all folks, I hope you enjoyed this quick howto.

Don’t hesitate to ping me with any question.

That’s all folks!

zoph.