TechTrends Now - Tech News for Builders and Operators

Most operational infrastructure starts this way: a requirement appears before the architecture does.

One day the team needed a database audit solution. Not in a planning doc — someone asked, and I had to build something. I'd owned the Oracle audit pipeline already, so I knew what the destination looked like. The question was what it would take to get there on Aurora PostgreSQL and AWS.

The short answer: more than you'd expect. The longer answer is this article.

What We're Actually Auditing — and Why It Matters

The scope here is specific: individual human users making direct DML changes (SELECT, INSERT, UPDATE, DELETE) to application tables.

Application service accounts are expected to modify data — that's their function. The risk surface is direct human access. A developer connected via psql. A support engineer running an ad-hoc update. A credential that shouldn't have had access to begin with. Those are the actions that need an audit trail in a regulated environment.

This distinction shapes every architectural decision: we enable pgAudit per individual user account, not cluster-wide. Application accounts are excluded entirely.

The Oracle Baseline

Before getting into the build, it's worth framing the comparison.

On Oracle, Unified Auditing (an Enterprise-tier feature) handles this with a single policy definition at the intersection of user, action, and object:

CREATE AUDIT POLICY user_dml_activity
  ACTIONS SELECT, INSERT, UPDATE, DELETE ON app_schema.orders
  WHEN 'SYS_CONTEXT(''USERENV'',''SESSION_USER'') != ''APP_SVC_ACCOUNT'''
  EVALUATE PER SESSION;

AUDIT POLICY user_dml_activity;

Records land in UNIFIED_AUDIT_TRAIL — a structured, SQL-queryable audit view. From there, the data can be exported on a schedule to any downstream analytics platform such as Elasticsearch, Splunk, OpenSearch, or a dedicated audit store. Scheduled queries or alerting rules can then detect patterns like bulk deletes, after-hours access, or unexpected DDL changes and trigger notifications through PagerDuty, Opsgenie, or an observability platform via HTTP webhook. Once the audit policy is defined, the downstream detection and alerting pipeline is relatively small because the audit data is already structured and queryable.

One important operational caveat with Oracle's approach: if the audit tablespace fills up, Oracle halts the database rather than silently dropping audit records — it guarantees completeness at the cost of availability. Tablespace monitoring becomes a hard operational requirement. pgAudit writing to the log stream instead of a table sidesteps this entirely: if log delivery has issues, the database keeps running.

pgAudit is different. There's no policy-based object/action targeting. You set a log class (read, write, ddl, all) per user. Records go to the PostgreSQL log stream — on Aurora, that means CloudWatch Logs. There's no queryable view. You're working with text:

AUDIT: SESSION,1,1,WRITE,INSERT,TABLE,public.orders,
"INSERT INTO orders (customer_id, amount) VALUES ($1, $2)"

It works. But extracting structured, alertable signal from a log stream requires deliberate engineering.

The Architecture

Aurora PostgreSQL  (pgAudit per-user)
        │
        ▼
CloudWatch Logs
        │
EventBridge  rate(5 min)
        │
        ▼
Lambda  → runs Log Insights query → filters noise → publishes count metric
        │
        ▼
CloudWatch Alarm  (count > 0)
        │
        ▼
SNS → incident platform + email

The Lambda exists because CloudWatch Alarms cannot evaluate Log Insights query results directly — there's no native bridge. Lambda runs the query, counts filtered results, and publishes a standard CloudWatch metric. The alarm evaluates that metric. Lambda invocation logs also give you an independent timestamped record of every detection event — useful when compliance asks "when was this first noticed?"

Full code, IAM policies, SNS configurations, and Terraform module: github.com/pcraavi/PostgreSQL-Audit

Gotcha 1: The Silent pgAudit Setup Failure

Enable pgaudit in shared_preload_libraries, reboot the cluster. Simple.

Except — shared_preload_libraries only loads the binary into shared memory. You also need:

CREATE EXTENSION pgaudit;

Without it: zero audit records. Zero error messages. Nothing in CloudWatch. The cluster had been rebooted, the parameter was confirmed, logs were being exported — and there was nothing to show for it.

The diagnostic:

SELECT * FROM pg_extension WHERE extname = 'pgaudit';
-- 0 rows returned

That was the entire problem. One missing statement.

Then enable per-user logging:

ALTER USER john_doe SET pgaudit.log TO 'all';

-- Verify
SELECT usename, useconfig FROM pg_user WHERE usename = 'john_doe';

Gotcha 2: The Noise Problem

The first time you look at raw audit logs from a production cluster, the signal-to-noise ratio is genuinely bad:

AUDIT: SESSION,...,READ,SELECT,,,"SELECT version()"
AUDIT: SESSION,...,READ,SELECT,,,"SELECT * FROM pg_shdescription..."
AUDIT: SESSION,...,READ,SELECT,,,"SET application_name='DBeaver 23.2.0'"
AUDIT: SESSION,...,READ,SELECT,,,"SELECT oid, typarray FROM pg_type WHERE typname=$1"

Every database tool (DBeaver, DataGrip, pgAdmin) fires a catalog query sequence on connect. Every JDBC driver runs initialization SELECTs. Every connection pool runs health checks. Alert on this raw stream and you're alerting on noise constantly — which means you stop paying attention, which defeats the audit entirely.

The filter query was built incrementally by watching actual production traffic:

fields @timestamp, @message, @logStream, @log
| filter @message like /AUDIT:/
| filter (
    @message like /SELECT/ or @message like /INSERT/ or
    @message like /UPDATE/ or @message like /DELETE/
  )
| filter @message not like /SELECT version()/
| filter @message not like /pg_shdescription/
| filter @message not like /pg_catalog/
| filter @message not like /information_schema/
| filter @message not like /SET application_name/
| filter @message not like /DBeaver/
| filter @message not like /PostgreSQL JDBC Driver/
| filter @message not like /datname = \$1/
| sort @timestamp desc

Your exclusion list will grow. Every application stack generates its own connection init patterns.

Gotcha 3: The KMS + CloudWatch Alarms Incompatibility

Encrypt the SNS topic at rest. Straightforward — use alias/aws/sns, the AWS-managed SNS key. Done.

Except the alarm stopped delivering.

CloudWatch Alarms does not have authorization to access the SNS topic encryption key

AWS-managed keys have immutable key policies. You cannot grant CloudWatch kms:GenerateDataKey. The fix is a customer-managed KMS key with an explicit CloudWatch service grant in the key policy. The managed key works fine if Lambda is publishing directly to SNS — the limitation is specific to the CloudWatch Alarms → SNS delivery path.

Key policy and full SNS security configuration (HTTPS enforcement, cross-account lockdown): github.com/pcraavi/PostgreSQL-Audit/tree/main/sns

Gotcha 4: The Incident Platform Deduplication Problem

First alert: fired correctly.
Second alert, third, fourth: nothing.

The incident platform logs showed requests arriving and the "Create Alert" action starting — but no alert created. The issue: platforms like Opsgenie, PagerDuty, and VictorOps deduplicate by alarm name (used as the alert alias). If that alert is already open or acknowledged, subsequent triggers are suppressed.

Correct behavior for most alarms. For an audit alert that should fire every detection window, it needs handling.

Fix options:

Timestamp the alarm name at deploy time ($(date +%s) suffix) — unique alias per deployment
Configure auto-close when the alarm returns to OK — fresh alert on each ALARM transition
Override the alias template in the integration to include a timestamp from the alert payload

Why Not Database Activity Streams?

Experienced AWS engineers will immediately ask this. DAS provides near-real-time activity streaming to Kinesis with structured output. It's a real solution.

The reason we didn't use it: it introduces Kinesis as a required dependency, adds per-event cost, and requires a consumer layer for decryption and processing. For teams without an existing Kinesis pipeline, that's meaningful operational surface area.

pgAudit + CloudWatch uses infrastructure Aurora already depends on. Lambda and SNS are general-purpose services. There's no proprietary tooling, no specialized operational knowledge required. Any engineer with standard AWS experience can maintain it. That portability and low learning curve matters when you're deploying across multiple accounts.

GuardDuty RDS Protection is complementary — it handles threat detection, not structured audit trails. OpenSearch subscription filters would work but introduce an OpenSearch cluster as a dependency, which adds cost and operational overhead.

A Note on pgAudit Overhead

Overhead is minimal at this audit scope. Individual human users in a regulated production environment are not expected to generate high transaction volumes — the audit target is ad-hoc access, not application traffic. The per-session overhead of writing to the PostgreSQL log is negligible when audit scope is limited to non-application users.

For high-volume clusters where audit is a compliance requirement: account for log overhead in instance sizing. Set CloudWatch log retention to match your compliance window rather than keeping logs indefinitely — use aws logs put-retention-policy to enforce it.

What I'd Do Differently

Ship filtered audit records to a structured store. CloudWatch Log Insights is a query tool, not a data store. Extending the Lambda to write each filtered record to DynamoDB or an RDS audit table gives you the closest equivalent to Oracle's UNIFIED_AUDIT_TRAIL: indexed, queryable, long-term audit history. Kinesis Firehose → S3 → Athena is another path for columnar query performance over large windows. The Lambda architecture makes this extension natural — the query and filtering logic is already there.

Terraform from day one. The full stack — Lambda, EventBridge rule, SNS topic, IAM role, CloudWatch alarm, KMS key — fits in a single reusable module. Deploying to additional accounts should be terraform apply with environment variables, not a re-run of CLI commands.

Tag every resource. Environment, ClusterName, Owner. Untagged audit infrastructure across multiple accounts becomes unauditable infrastructure.

Final Thought

Each component here has a clean interface and behaves predictably in isolation. The engineering is in the integration contracts: Log Insights results don't flow to alarms without mediation; AWS-managed KMS keys don't work with CloudWatch delivery; incident platform deduplication suppresses repeated alarm transitions. None of these are documented prominently — they surface when components are wired together under production conditions.

Start with the Log Insights filter query. Validate the signal before building anything around it. The filtering logic is the foundation. The rest is plumbing.

Full implementation: Lambda code, IAM policies, SNS topic policy, KMS key policy, CloudWatch alarm, Terraform module → github.com/pcraavi/PostgreSQL-Audit

Drop a comment if you've hit different noise patterns in your environment, or if you've implemented the Lambda → structured audit table extension — curious how others have approached long-term audit retention on Aurora.

What It Actually Takes to Audit Aurora PostgreSQL on AWS

What We're Actually Auditing — and Why It Matters

The Oracle Baseline

The Architecture

Gotcha 1: The Silent pgAudit Setup Failure

Gotcha 2: The Noise Problem

Gotcha 3: The KMS + CloudWatch Alarms Incompatibility

Gotcha 4: The Incident Platform Deduplication Problem

Why Not Database Activity Streams?

A Note on pgAudit Overhead

What I'd Do Differently

Final Thought

Comments (0)

United States

Related News

Trump Calls Off AI Executive Order Over Concern It Could Weaken US Tech Edge

Microservices Didn't Fail. People Did

Meta Settles Lawsuit That Claimed Social Media Addiction Screwed Up Schools

Centralized Authentication for a Multi-Brand Laravel Ecosystem

Gizmo Guard - Safeguard Bot (Powered by Gemma4)