Building a Bot-Filtered View Counter with reCAPTCHA v3 and AWS

September 24, 2025 at 10:18 PM CDTWayne Workman8 min read

TL;DR

What: Bot-filtered analytics using reCAPTCHA v3 + AWS serverless (Lambda, DynamoDB, API Gateway)

Why: Clean human-only traffic data without the bot noise that pollutes server logs

Cost: Under $1/month for typical blog traffic (~10K views, as of Sep 2025)

When NOT to use: Privacy-critical sites, JavaScript-disabled users, need for real-time granular metrics

Key gotcha: reCAPTCHA tokens expire after 2 minutes - execute immediately on page load

The bot problem in analytics

So I've built a few web apps over the years. If you've ever looked at raw server logs, you know the deal. The majority of your traffic isn't human. Last time I analyzed such logs, it was mostly automated, crawlers, scrapers, vulnerability scanners, the works.

When you're trying to understand how real people use your site, all that bot noise makes the data basically useless. Sure, you could use server-side user agent filtering, but bots fake user agents all the time. You need something smarter.

That's where I landed on using Google reCAPTCHA v3 as a bot filter for page view analytics. Yeah, it's a bit unconventional, but it actually works really well when you combine it with AWS serverless infrastructure. Let me walk you through the complete implementation, frontend to backend.

The Architecture

The solution leverages AWS serverless components with reCAPTCHA v3 as the gatekeeper:

The whole thing costs less than a dollar per month for a typical blog. Seriously. I mean, when you think about what traditional analytics services charge, especially the ones that actually filter bots properly, you're looking at real money every month. This approach? Basically free.

Frontend Implementation

Let's start with the frontend because that's where the magic begins. reCAPTCHA v3 runs completely invisible, no puzzles, no challenges, just a score from 0.0 to 1.0 indicating how human the interaction seems.

Basic Setup

First, grab your keys from Google reCAPTCHA. Make sure you select v3, not v2. You'll get a site key (public) and a secret key (backend).

Warning: Tokens expire after 2 minutes. If you generate on page load and wait, that token's dead. For page view tracking though, we execute immediately anyway, so no problem there.

Production-ready analytics tracker

Here's what I actually use in production. This goes in a separate JavaScript file that you'll include on every page:

// File: public/js/analytics.js
(function() {
    // Skip on localhost to avoid errors
    if (window.location.hostname === 'localhost' ||
        window.location.hostname === '127.0.0.1') {
        console.debug('Analytics disabled on localhost');
        return;
    }

    const SITE_KEY = 'YOUR_RECAPTCHA_V3_SITE_KEY';
    const API_ENDPOINT = 'https://your-api.example.com/view-counter';

    function trackPageView() {
        grecaptcha.ready(function() {
            grecaptcha.execute(SITE_KEY, {action: 'page_view'})
                .then(function(token) {
                    const payload = {
                        path: window.location.pathname,
                        recaptchaToken: token,
                        timestamp: new Date().toISOString(),
                        timezone: Intl.DateTimeFormat().resolvedOptions().timeZone,
                        userAgent: navigator.userAgent,
                        referrer: document.referrer || 'direct',
                        hostname: window.location.hostname
                    };

                    // Fire and forget, don't block on analytics
                    fetch(API_ENDPOINT, {
                        method: 'POST',
                        headers: {
                            'Content-Type': 'application/json',
                        },
                        body: JSON.stringify(payload)
                    }).catch(function(err) {
                        console.debug('Analytics request failed:', err);
                    });
                })
                .catch(function(err) {
                    console.debug('reCAPTCHA execution failed:', err);
                });
        });
    }

    // Wait for reCAPTCHA to load before executing
    function waitForRecaptchaAndTrack() {
        if (typeof grecaptcha === 'undefined') {
            // reCAPTCHA not loaded yet, check again in 300ms
            setTimeout(waitForRecaptchaAndTrack, 300);
            return;
        }

        // reCAPTCHA is loaded, track the page view
        trackPageView();
    }

    // Execute when DOM is ready
    if (document.readyState === 'loading') {
        document.addEventListener('DOMContentLoaded', waitForRecaptchaAndTrack);
    } else {
        waitForRecaptchaAndTrack();
    }
})();

Key decisions here:

How to Include on Every Page

Once you have the analytics.js file, include these two scripts at the bottom of every HTML page, right before the closing </body> tag:

<!-- Google reCAPTCHA v3 -->
<script src="https://www.google.com/recaptcha/api.js?render=YOUR_SITE_KEY" async></script>
<!-- Your analytics tracker -->
<script src="/js/analytics.js" defer></script>

The async on reCAPTCHA and defer on analytics ensures they don't block page render. But watch out, there's a race condition here! The reCAPTCHA script loads asynchronously so it might not be available when analytics.js runs on first page load. That's why the code polls for grecaptcha to be defined before executing. Without this check, you'll only track views on page reloads when the script is cached. Ask me how I know.

Hiding the badge

That reCAPTCHA badge in the corner? You can hide it legally as long as you include attribution elsewhere (per Google's policy):

// File: public/js/analytics.js (append to file)
const style = document.createElement('style');
style.textContent = '.grecaptcha-badge { visibility: hidden !important; }';
document.head.appendChild(style);

Then add this to your footer:

This site is protected by reCAPTCHA and the Google
<a href="https://policies.google.com/privacy">Privacy Policy</a> and
<a href="https://policies.google.com/terms">Terms of Service</a> apply.

Backend Implementation

Now for the fun part. I'm using AWS Lambda with Python because it's cheap, scales automatically, and urllib3 comes pre-installed (no Lambda layers needed). Lambda's free tier is generous enough that this literally costs nothing most months.

DynamoDB table structure

First, let's talk about the data model. I use monthly aggregation to keep costs down:

# File: terraform/dynamodb.tf
resource "aws_dynamodb_table" "analytics" {
  name         = "blog-analytics"
  billing_mode = "PAY_PER_REQUEST"

  hash_key  = "pk"  # Page path
  range_key = "sk"  # Year-Month (YYYY-MM)

  attribute {
    name = "pk"
    type = "S"
  }

  attribute {
    name = "sk"
    type = "S"
  }
}

Each item represents one page's traffic for one month. So instead of storing millions of individual page views, you get maybe 100 items per year. Smart, right? Costs stay tiny (pennies at my level), which keeps my wallet happy.

Lambda function

Here's the Lambda that does all the heavy lifting. Note how I use urllib3 instead of requests, it's already available in Lambda's Python runtime:

# File: lambda/handler.py
import json
import boto3
import urllib3
from datetime import datetime
from decimal import Decimal
import os

dynamodb = boto3.resource('dynamodb')
secrets_client = boto3.client('secretsmanager')
http = urllib3.PoolManager()

def verify_recaptcha(token, secret_key):
    """Verify reCAPTCHA v3 token with Google"""
    verification_url = 'https://www.google.com/recaptcha/api/siteverify'

    fields = {
        'secret': secret_key,
        'response': token
    }

    response = http.request(
        'POST',
        verification_url,
        fields=fields
    )

    return json.loads(response.data.decode('utf-8'))

def lambda_handler(event, context):
    # CORS headers for all responses
    cors_headers = {
        'Access-Control-Allow-Origin': os.environ['ALLOWED_ORIGIN']  # e.g., 'https://wayne.theworkmans.us',
        'Content-Type': 'application/json'
    }

    try:
        body = json.loads(event.get('body', '{}'))

        # Verify hostname matches
        if body.get('hostname') != os.environ['ALLOWED_HOSTNAME']:
            return {
                'statusCode': 403,
                'headers': cors_headers,
                'body': json.dumps({'error': 'Invalid hostname'})
            }

        # Get reCAPTCHA secret from Secrets Manager
        secret_arn = os.environ['SECRET_ARN']
        response = secrets_client.get_secret_value(SecretId=secret_arn)
        secrets = json.loads(response['SecretString'])

        # Verify reCAPTCHA token
        result = verify_recaptcha(
            body['recaptchaToken'],
            secrets['secret_key']
        )

        # Check score threshold (0.5 is reasonable)
        score = result.get('score', 0)
        if not result.get('success') or score < 0.5:
            print(f"Low score: {score}")
            return {
                'statusCode': 403,
                'headers': cors_headers,
                'body': json.dumps({'error': 'Low trust score'})
            }

        # Verify action matches
        if result.get('action') != 'page_view':
            return {
                'statusCode': 403,
                'headers': cors_headers,
                'body': json.dumps({'error': 'Invalid action'})
            }

        # Parse timestamp for monthly aggregation
        timestamp = datetime.fromisoformat(
            body['timestamp'].replace('Z', '+00:00')
        )
        year_month = timestamp.strftime('%Y-%m')

        # Strip query params and fragments from path
        path = body['path'].split('?')[0].split('#')[0]

        # Update counter atomically
        table = dynamodb.Table(os.environ['DYNAMODB_TABLE'])
        table.update_item(
            Key={
                'pk': path,
                'sk': year_month
            },
            UpdateExpression='ADD view_count :inc SET last_viewed = :ts, recaptcha_score = :score',
            ExpressionAttributeValues={
                ':inc': Decimal(1),
                ':ts': body['timestamp'],
                ':score': Decimal(str(score))
            }
        )

        return {
            'statusCode': 200,
            'headers': cors_headers,
            'body': json.dumps({'success': True})
        }

    except Exception as e:
        print(f'Error: {str(e)}')
        return {
            'statusCode': 500,
            'headers': cors_headers,
            'body': json.dumps({'error': 'Internal error'})
        }

The beauty of DynamoDB's ADD operation is that it's atomic. If the item doesn't exist, it creates it with view_count = 1. If it exists, it increments. No read-before-write race conditions.

Secrets management

Never hardcode secrets. I use AWS Secrets Manager:

# File: terraform/secrets.tf
resource "aws_secretsmanager_secret" "recaptcha" {
  name = "recaptcha-v3-keys"
}

resource "aws_secretsmanager_secret_version" "recaptcha" {
  secret_id = aws_secretsmanager_secret.recaptcha.id
  secret_string = jsonencode({
    site_key   = "PLACEHOLDER"
    secret_key = "PLACEHOLDER"
  })

  lifecycle {
    ignore_changes = [secret_string]
  }
}

That lifecycle rule prevents Terraform from overwriting production secrets. I learned that one the hard way, long ago.

API Gateway configuration

Warning: CORS configuration is critical. Without proper CORS headers, browsers will block your analytics requests.

The API Gateway setup is straightforward but don't forget CORS:

# File: terraform/api_gateway.tf
resource "aws_api_gateway_rest_api" "api" {
  name = "blog-analytics-api"
}

resource "aws_api_gateway_resource" "view_counter" {
  rest_api_id = aws_api_gateway_rest_api.api.id
  parent_id   = aws_api_gateway_rest_api.api.root_resource_id
  path_part   = "view-counter"
}

resource "aws_api_gateway_method" "view_counter_post" {
  rest_api_id   = aws_api_gateway_rest_api.api.id
  resource_id   = aws_api_gateway_resource.view_counter.id
  http_method   = "POST"
  authorization = "NONE"
}

resource "aws_api_gateway_integration" "view_counter" {
  rest_api_id = aws_api_gateway_rest_api.api.id
  resource_id = aws_api_gateway_resource.view_counter.id
  http_method = aws_api_gateway_method.view_counter_post.http_method

  integration_http_method = "POST"
  type                    = "AWS_PROXY"
  uri                     = aws_lambda_function.view_counter.invoke_arn
}

IAM permissions

Lambda needs specific permissions:

# File: terraform/iam.tf
resource "aws_iam_role_policy" "lambda_policy" {
  name = "view-counter-policy"
  role = aws_iam_role.lambda.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "logs:CreateLogGroup",
          "logs:CreateLogStream",
          "logs:PutLogEvents"
        ]
        Resource = "arn:aws:logs:*:*:*"  # Scope to specific log group in production
      },
      {
        Effect = "Allow"
        Action = ["secretsmanager:GetSecretValue"]
        Resource = aws_secretsmanager_secret.recaptcha.arn
      },
      {
        Effect = "Allow"
        Action = [
          "dynamodb:UpdateItem",
          "dynamodb:GetItem"
        ]
        Resource = aws_dynamodb_table.analytics.arn
      }
    ]
  })
}

Performance and cost analysis

Frontend Performance Impact
reCAPTCHA script ~130KB (45KB gzipped)
Execution time 200-400ms
Network overhead 1 extra POST request

But it runs async and fails silently, so zero impact on user experience.

Backend Costs (10K views/month) Monthly Cost (Sep 2025)
DynamoDB ~$0.01
Lambda $0 (free tier)
API Gateway ~$0.03
Secrets Manager $0.40
Total < $1/month

Critical implementation details

Path sanitization

The Lambda strips query parameters and fragments. So /blog/post, /blog/post?ref=twitter, and /blog/post#section all increment the same counter. Otherwise you'd have hundreds of variations of the same page.

Score threshold

I use 0.5 as the cutoff. Google suggests this as a starting point. In practice, real users typically score 0.7 to 0.9, while bots score 0.1 to 0.3. That middle ground catches sophisticated bots without blocking legitimate users on VPNs or unusual browsers.

Error handling philosophy

Note: Every error must return proper CORS headers. Without them, the browser console will light up with errors.

The frontend implements fire-and-forget, if analytics fails, the user never knows. This is critical. Analytics should NEVER impact user experience. I've seen too many sites where analytics failures actually break functionality and that's just embarrassing when you think about it.

Monitoring and debugging

CloudWatch Logs capture everything:

Trade-offs and limitations

Let's be real about what this solution isn't:

For a public technical blog, these trade-offs seemed worth it for easy, accurate metrics. If you need true privacy or real-time granular data, this isn't the solution for you.

Lessons learned

The key to success was working with the constraints rather than against them:

  1. Use what Lambda provides. urllib3 is there, requests isn't. Don't fight it.
  2. Aggregate aggressively. Individual page views are expensive to store and query. Monthly rollups are cheap and sufficient.
  3. Fail silently. Analytics errors should never affect users.
  4. Token expiration is real. That 2-minute timeout will bite you if you're not careful.
  5. CORS headers on errors too. Otherwise the browser console lights up like a Christmas tree.

Wrapping up

This serverless architecture gives you bot-filtered analytics for essentially free. By combining reCAPTCHA v3's invisible verification with DynamoDB's atomic operations and Lambda's pay-per-use model, you get accurate view counts without the complexity or cost of traditional analytics platforms.

Is it perfect? No. But for a personal blog or small business site, it's more than sufficient. You get clean data showing actual human traffic patterns, and you can build whatever dashboards you want on top of the DynamoDB data.

The whole implementation took me a Wednesday weeknight after work. Sometimes the simple solution really is the best solution. Give it a try, worst case, you learn something about reCAPTCHA and AWS. Best case, you finally get analytics data you can actually trust without the bot noise.

← Back to Blog