Imagine a user claims their data was deleted without consent. You check the logs only to find… nothing. No record, no timeline, no defense.
When building a product, especially one that handles sensitive data, there inevitably comes a point where you need to implement a system for auditing user actions.
An audit system acts as a detailed record of what happened, who did it, and when it occurred. This isn't about logging activity for internal curiosity, it’s a critical layer of accountability and traceability that protects the company against legal risks, compliance violations, and potential disputes.
Audit logging won’t help your sales team close a deal, and it doesn’t make the UI prettier, but it’s critical infrastructure. And because it often flies under the radar, it’s easy to under-invest in it until it’s too late.
The challenge is to strike the right balance: building an audit system that’s reliable without becoming a drag on product velocity or budget. And that's what we'll talk about today.
From the beginning, I knew the system had to be serverless. Since audit logging is not a high-frequency operation, there was no reason to keep a virtual machine running 24/7 just to handle occasional events.
The more difficult question was: where should we store the logs?
Logs are inherently unstructured, so my first instinct was to use a NoSQL database, specifically MongoDB. But a MongoDB Cluster can get really expensive as the system grows, so I needed a way to keep my cluster as small as possible.
So I decided on a hybrid approach: keep recent logs in MongoDB for easy access, and move older records to a cheaper, long-term storage like AWS S3. Eventually, we could even migrate the oldest logs into colder storage tiers such as AWS Glacier.
Let's break down the costs of this architecture:
In the first approach the MongoDB is clearly the most expensive component of the infrastructure. I was confident there had to be a more cost-effective alternative that still met the system's requirements.
After some research, I found what seemed like a optimal solution: AWS Athena. Athena is a serverless query engine that lets you run SQL-like queries directly on data stored in AWS S3, and you only pay per query and the amount of data scanned.
The key to using Athena efficiently is to strategically partition your data in S3 based on your application's access patterns. In my case, a single partition by date was enough. This allows the system to efficiently retrieve records from the last N days without scanning the entire dataset.
The data in S3 can be stored in .json.gz
or .parquet
files, in my case I chose .json.gz
to retain the flexibility of unstructured documents, just like in MongoDB.
It’s worth noting that as the dataset grows, Athena queries will slow down, often taking a few seconds to execute. But for an audit logging system, that tradeoff is typically acceptable.
To start using Athena with S3 you first need to create an AWS Glue Database and Table, here's a Terraform example for reference:
resource "aws_glue_catalog_database" "this" { name = "audit" location_uri = "s3://${data.aws_s3_bucket.this.bucket}/database/" tags = local.tags } resource "aws_glue_catalog_table" "this" { name = "records" database_name = aws_glue_catalog_database.this.name table_type = "EXTERNAL_TABLE" parameters = { classification = "json" compressionType = "gzip" EXTERNAL = "TRUE" has_encrypted_data = "false" } storage_descriptor { location = "${aws_glue_catalog_database.this.location_uri}/records/" input_format = "org.apache.hadoop.mapred.TextInputFormat" output_format = "org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat" ser_de_info { name = "json" serialization_library = "org.openx.data.jsonserde.JsonSerDe" parameters = { "ignore.malformed.json" = "true" } } columns { name = "id" type = "string" } columns { name = "kind" type = "string" } columns { name = "occurredAt" type = "string" # ISO timestamp as string; you can cast in Athena queries } columns { name = "description" type = "struct<template:string,values:map<string,string>>" } columns { name = "entity" type = "struct<id:string,title:string,url:string,kind:struct<kind:string,label:string>>" } columns { name = "author" type = "struct<id:string,name:string,email:string,ip:string,userAgent:string>" } columns { name = "fields" type = "array<struct<name:string,label:string,before:struct<value:string,label:string>,after:struct<value:string,label:string>>>" } } partition_keys { name = "dt" // Avoid using reserver word "date" type = "date" } lifecycle { ignore_changes = [ parameters, ] } }
Let's make the cost break down for that approach:
Even with generous usage, you'll likely stay well under 1 TB/month. And even if you do cross that mark, you’re looking at around $30/month in Athena costs for a highly scalable and low-maintenance audit system.
The final architecture looks like the following:
Designing an audit system is a balancing act between reliability, simplicity, and cost. While traditional approaches like MongoDB offer quick wins in flexibility and familiarity, they can become prohibitively expensive and hard to scale. By rethinking the architecture and leveraging AWS services like S3 and Athena, we were able to build a solution that is both serverless and cost-efficient—without compromising on traceability.
This approach may not be the fastest for real-time queries, but for audit logs, speed isn’t the top priority—maintainability and affordability are. With strategic partitioning, compression, and thoughtful schema design, Athena over S3 proves to be a practical alternative that meets the core requirements of an audit system with minimal overhead.
In the end, audit logging may not close deals or generate revenue, but it’s what keeps your product defensible, your users accountable, and your company out of trouble. Investing in a sustainable audit architecture early on pays off when you need it the most.
You can find the repository containing the application source code in: https://github.com/RafaLopesMelo/audit