Configure AWS Application Load Balancer Access Logs and Query with Amazon Athena
There are numerous resources, applications, and services running in the cloud. To fulfill and distribute incoming requests across workloads efficiently and ensure provisioned capacity is utilized without overloading and degrading performance, load balancers play an essential role.
A load balancer distributes incoming traffic across a group of backend servers, evaluates the health of the workloads, and gives the flexibility to add or remove computing resources. It helps reduce download, scale, and distribute requests efficiently across workloads. It also offers capabilities such as listener rules, routing algorithms, session persistence, target health checks, access logs, etc.
In AWS, Elastic Load Balancing supports Application Load Balancers, Network Load Balancers, Gateway Load Balancers, and Classic Load Balancers. You can leverage the type of load balancer that best suits your needs.
In this blog, I’ll cover the steps to enable access logs for your Application Load Balancer that capture detailed information about requests sent to your load balancer and query Application Load Balancer access logs with Amazon Athena to see the source of traffic, latency, request paths, server responses, etc., to and from the load balancer and backend applications.
Enable access logging for Application Load Balancer
Step 1: Create Amazon S3 Bucket
Elastic Load Balancing Application Load Balancer stores access logs in the Amazon S3 bucket, and the Amazon S3 bucket must be in the same region as the load balancer.
The Amazon S3 bucket can be created in a different account than the load balancer owner. It can be useful when Organization has a separate Log Archive account for centralized logging.
To create an Amazon S3 bucket:
- Open the Amazon S3 console.
- Choose Create bucket to open create bucket wizard.
- In the Bucket name, enter the appropriate unique name.
- In Region, select the AWS Region where you want to create the bucket.
- Configure the suitable options in Object Ownership, Block Public Access settings for this bucket, Bucket Versioning, and Default encryption. It’s recommended to keep all settings enabled in Block Public Access, enable Bucket versioning, and enable Default encryption.
- Choose Create bucket.
For more information on creating a bucket, visit Creating a bucket.
Step 2: Create a bucket policy
To create a bucket policy that allows Elastic Load Balancing permission to write the access logs to your bucket:
- Select the previously created bucket.
- Choose Permissions and then choose Bucket Policy.
- Select the previously created bucket.
- Choose Permissions.
- Select Edit the Bucket Policy and insert the following policy.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::elb-account-id:root"
},
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::bucket-name/prefix/AWSLogs/your-aws-account-id/*"
}
]
}
- In Principal, replace elb-account-id. Each region has a different Elastic Load Balancing account ID in place of elb-account-id. To find it visit, Bucket permissions.
- In Resource, replace the bucket-name with the previously created bucket name and your-aws-account-id with the owner bucket AWS account ID.
- Choose Save changes.
Step 3: Enable access logging
Now you need to specify the Amazon S3 bucket name created previously.
To enable access logging:
- Open the Amazon EC2 console.
- In the navigation pane, choose Load Balancers.
- Select the application load balancer where you want to enable access logging.
- Select the Description tab and choose Edit attributes.
- For Access logs, select Enable in the Edit load balancer attributes page.
- For the S3 location, specify the Amazon S3 bucket name created previously. (Optional) Include any prefix, if needed.
- Choose Save.
For more information on Elastic Load Balancing Application Load Balancer Access logs, visit Access logs for your Application Load Balancer.
Query Application Load Balancer Access Logs with Amazon Athena
Step 1: Specifying a query result location
Before you run a query in Amazon Athena, you need to specify an Amazon S3 bucket to store the query result, and the bucket must belong to the region where Athena is used.
I have created a separate Amazon S3 bucket to store query results. You can follow Creating a bucket user guide to create an Amazon S3 bucket.
To specify a query result location:
- Open the Amazon Athena console.
- In the navigation pane, choose Query editor.
- Choose Settings and select Manage to Manage settings.
- Select the appropriate Workgroup. By default, each account has a primary workgroup. For more information, visit Using workgroups for running queries.
- In the Location of query result, specify the S3 bucket with the s3:// prefix or browse the S3 bucket.
- (Optional) For the Expected bucket owner, enter the AWS account ID of the Athena query output owner. Leave blank if the bucket is in the same account.
- (Optional) Enable Encrypt query results to encrypt the query results stored in Amazon S3. For Encryption type, choose CSE-KMS, SSE-KMS, or SSE-S3. It’s recommended to encrypt Athena query results stored in Amazon S3.
- (Optional) Select Assign bucket owner full control over query results to grant the owner of the S3 query results bucket full control over the query results.
- Choose Save.
Step 2: Creating a database
To create a logical group for tables, you need to create a database in the Amazon Athena query editor.
To create a database:
- Open the Amazon Athena console.
- In the navigation pane, choose Query editor.
- Enter the following DDL command to create the access_logs database. You can specify the database name you want to use instead of accees_logs.
CREATE DATABASE access_logs;
- Choose Run or press Ctrl+ENTER.
- Select the created database from the Database menu.
Step 3: Create ALB access logs table
- Use the following CREATE TABLE statement:
CREATE EXTERNAL TABLE IF NOT EXISTS alb_logs (
type string,
time string,
elb string,
client_ip string,
client_port int,
target_ip string,
target_port int,
request_processing_time double,
target_processing_time double,
response_processing_time double,
elb_status_code int,
target_status_code string,
received_bytes bigint,
sent_bytes bigint,
request_verb string,
request_url string,
request_proto string,
user_agent string,
ssl_cipher string,
ssl_protocol string,
target_group_arn string,
trace_id string,
domain_name string,
chosen_cert_arn string,
matched_rule_priority string,
request_creation_time string,
actions_executed string,
redirect_url string,
lambda_error_reason string,
target_port_list string,
target_status_code_list string,
classification string,
classification_reason string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'input.regex' =
'([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) (.*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-_]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\s]+?)\" \"([^\s]+)\" \"([^ ]*)\" \"([^ ]*)\"')
LOCATION 's3://<alb-logs-s3-bucket>/AWSLogs/<ACCOUNT-ID>/elasticloadbalancing/<REGION>/'
- Replace the value in LOCATION ‘s3://<alb-logs-s3-bucket>/AWSLogs/<ACCOUNT-ID>/elasticloadbalancing/<REGION>/’
- Run the query in the Athena query editor. Athena will create an ‘alb_logs’ table with data to run queries.
Step 4: Querying Application Load Balancer logs
After successful completion of the above steps, you can run queries against requests logged by Elastic Load Balancing Application Load Balancer.
Following are some sample queries:
To count the total number of requests:
SELECT COUNT(*)
FROM alb_logs;
To get all requests between a specific date and time:
SELECT *
FROM alb_logs
WHERE time BETWEEN '2022-07-10T00:00:00' AND '2022-07-20T23:59:00';
To get all requests for particular client IP between a specific date and time in chronological order:
SELECT *
FROM alb_logs
WHERE time BETWEEN '2022-07-10T00:00:00' AND '2022-07-20T23:59:00'
AND client_ip LIKE '%203.187.228.168%' ORDER BY time;
To list the most frequently accessed 10 URLs in descending order:
SELECT count(*) as count,
request_url
FROM alb_logs
GROUP by request_url
ORDER by count(*) DESC
LIMIT 10;
To get a total number of requests that failed to route because of an empty target group — HTTP 503: Service unavailable:
SELECT *
from alb_logs
where elb_status_code = 503;
To list the requests for a particular string in the URL:
SELECT *
FROM alb_logs
WHERE request_url LIKE '%example.com%'
ORDER BY time DESC;
To list the requests that take the longest request processing time:
SELECT *
FROM alb_logs
ORDER BY request_processing_time,
time DESC;
Likewise, you can write and run queries with different use cases with Amazon Athena.
In this blog, you have learned how to configure Application Load Balancer Access Logs and query with Amazon Athena.
Analyzing Application Load Balancer access logs with Amazon Athena can help you to troubleshoot various use cases like HTTP errors, analyze traffic distribution and patterns, and troubleshoot latency issues.
Further, Application Load Balancer and Target Group CloudWatch metrics can be used to verify performance and troubleshooting by creating Amazon CloudWatch Dashboards and Alarms.
For more information on Application Load Balancer Access Logs files, entries, and fields, visit Access logs for your Application Load Balancer.
For information on Querying Application Load Balancer logs with Amazon Athena, visit Querying Application Load Balancer logs.
For troubleshooting issues with Application Load Balancer, visit Troubleshoot your Application Load Balancers.
For information on the Application Load Balancer CloudWatch metric, visit CloudWatch metrics for your Application Load Balancer.