Building a Bot-Filtered View Counter with reCAPTCHA v3 and AWS
TL;DR
What: Bot-filtered analytics using reCAPTCHA v3 + AWS serverless (Lambda, DynamoDB, API Gateway)
Why: Clean human-only traffic data without the bot noise that pollutes server logs
Cost: Under $1/month for typical blog traffic (~10K views, as of Sep 2025)
When NOT to use: Privacy-critical sites, JavaScript-disabled users, need for real-time granular metrics
Key gotcha: reCAPTCHA tokens expire after 2 minutes - execute immediately on page load
The bot problem in analytics
So I've built a few web apps over the years. If you've ever looked at raw server logs, you know the deal. The majority of your traffic isn't human. Last time I analyzed such logs, it was mostly automated, crawlers, scrapers, vulnerability scanners, the works.
When you're trying to understand how real people use your site, all that bot noise makes the data basically useless. Sure, you could use server-side user agent filtering, but bots fake user agents all the time. You need something smarter.
That's where I landed on using Google reCAPTCHA v3 as a bot filter for page view analytics. Yeah, it's a bit unconventional, but it actually works really well when you combine it with AWS serverless infrastructure. Let me walk you through the complete implementation, frontend to backend.
The Architecture
The solution leverages AWS serverless components with reCAPTCHA v3 as the gatekeeper:
- Frontend: Pure JavaScript that executes reCAPTCHA and sends tokens
- API Gateway: HTTPS endpoint with CORS support
- Lambda: Python function for token validation and storage
- DynamoDB: Monthly aggregated view counts
- Secrets Manager: Secure storage for reCAPTCHA keys
The whole thing costs less than a dollar per month for a typical blog. Seriously. I mean, when you think about what traditional analytics services charge, especially the ones that actually filter bots properly, you're looking at real money every month. This approach? Basically free.
Frontend Implementation
Let's start with the frontend because that's where the magic begins. reCAPTCHA v3 runs completely invisible, no puzzles, no challenges, just a score from 0.0 to 1.0 indicating how human the interaction seems.
Basic Setup
First, grab your keys from Google reCAPTCHA. Make sure you select v3, not v2. You'll get a site key (public) and a secret key (backend).
Production-ready analytics tracker
Here's what I actually use in production. This goes in a separate JavaScript file that you'll include on every page:
// File: public/js/analytics.js
(function() {
// Skip on localhost to avoid errors
if (window.location.hostname === 'localhost' ||
window.location.hostname === '127.0.0.1') {
console.debug('Analytics disabled on localhost');
return;
}
const SITE_KEY = 'YOUR_RECAPTCHA_V3_SITE_KEY';
const API_ENDPOINT = 'https://your-api.example.com/view-counter';
function trackPageView() {
grecaptcha.ready(function() {
grecaptcha.execute(SITE_KEY, {action: 'page_view'})
.then(function(token) {
const payload = {
path: window.location.pathname,
recaptchaToken: token,
timestamp: new Date().toISOString(),
timezone: Intl.DateTimeFormat().resolvedOptions().timeZone,
userAgent: navigator.userAgent,
referrer: document.referrer || 'direct',
hostname: window.location.hostname
};
// Fire and forget, don't block on analytics
fetch(API_ENDPOINT, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify(payload)
}).catch(function(err) {
console.debug('Analytics request failed:', err);
});
})
.catch(function(err) {
console.debug('reCAPTCHA execution failed:', err);
});
});
}
// Wait for reCAPTCHA to load before executing
function waitForRecaptchaAndTrack() {
if (typeof grecaptcha === 'undefined') {
// reCAPTCHA not loaded yet, check again in 300ms
setTimeout(waitForRecaptchaAndTrack, 300);
return;
}
// reCAPTCHA is loaded, track the page view
trackPageView();
}
// Execute when DOM is ready
if (document.readyState === 'loading') {
document.addEventListener('DOMContentLoaded', waitForRecaptchaAndTrack);
} else {
waitForRecaptchaAndTrack();
}
})();
Key decisions here:
- Skip localhost (reCAPTCHA will error otherwise)
- Poll for reCAPTCHA availability (fixes first-load race condition)
- Execute immediately when available (avoid token expiration)
- Include context like path and referrer
- Fail silently, analytics should never break your site
How to Include on Every Page
Once you have the analytics.js file, include these two scripts at the bottom of every HTML page, right before the closing </body> tag:
<!-- Google reCAPTCHA v3 -->
<script src="https://www.google.com/recaptcha/api.js?render=YOUR_SITE_KEY" async></script>
<!-- Your analytics tracker -->
<script src="/js/analytics.js" defer></script>
The async on reCAPTCHA and defer on analytics ensures they don't block page render. But watch out, there's a race condition here! The reCAPTCHA script loads asynchronously so it might not be available when analytics.js runs on first page load. That's why the code polls for grecaptcha to be defined before executing. Without this check, you'll only track views on page reloads when the script is cached. Ask me how I know.
Hiding the badge
That reCAPTCHA badge in the corner? You can hide it legally as long as you include attribution elsewhere (per Google's policy):
// File: public/js/analytics.js (append to file)
const style = document.createElement('style');
style.textContent = '.grecaptcha-badge { visibility: hidden !important; }';
document.head.appendChild(style);
Then add this to your footer:
This site is protected by reCAPTCHA and the Google
<a href="https://policies.google.com/privacy">Privacy Policy</a> and
<a href="https://policies.google.com/terms">Terms of Service</a> apply.
Backend Implementation
Now for the fun part. I'm using AWS Lambda with Python because it's cheap, scales automatically, and urllib3 comes pre-installed (no Lambda layers needed). Lambda's free tier is generous enough that this literally costs nothing most months.
DynamoDB table structure
First, let's talk about the data model. I use monthly aggregation to keep costs down:
# File: terraform/dynamodb.tf
resource "aws_dynamodb_table" "analytics" {
name = "blog-analytics"
billing_mode = "PAY_PER_REQUEST"
hash_key = "pk" # Page path
range_key = "sk" # Year-Month (YYYY-MM)
attribute {
name = "pk"
type = "S"
}
attribute {
name = "sk"
type = "S"
}
}
Each item represents one page's traffic for one month. So instead of storing millions of individual page views, you get maybe 100 items per year. Smart, right? Costs stay tiny (pennies at my level), which keeps my wallet happy.
Lambda function
Here's the Lambda that does all the heavy lifting. Note how I use urllib3 instead of requests, it's already available in Lambda's Python runtime:
# File: lambda/handler.py
import json
import boto3
import urllib3
from datetime import datetime
from decimal import Decimal
import os
dynamodb = boto3.resource('dynamodb')
secrets_client = boto3.client('secretsmanager')
http = urllib3.PoolManager()
def verify_recaptcha(token, secret_key):
"""Verify reCAPTCHA v3 token with Google"""
verification_url = 'https://www.google.com/recaptcha/api/siteverify'
fields = {
'secret': secret_key,
'response': token
}
response = http.request(
'POST',
verification_url,
fields=fields
)
return json.loads(response.data.decode('utf-8'))
def lambda_handler(event, context):
# CORS headers for all responses
cors_headers = {
'Access-Control-Allow-Origin': os.environ['ALLOWED_ORIGIN'] # e.g., 'https://wayne.theworkmans.us',
'Content-Type': 'application/json'
}
try:
body = json.loads(event.get('body', '{}'))
# Verify hostname matches
if body.get('hostname') != os.environ['ALLOWED_HOSTNAME']:
return {
'statusCode': 403,
'headers': cors_headers,
'body': json.dumps({'error': 'Invalid hostname'})
}
# Get reCAPTCHA secret from Secrets Manager
secret_arn = os.environ['SECRET_ARN']
response = secrets_client.get_secret_value(SecretId=secret_arn)
secrets = json.loads(response['SecretString'])
# Verify reCAPTCHA token
result = verify_recaptcha(
body['recaptchaToken'],
secrets['secret_key']
)
# Check score threshold (0.5 is reasonable)
score = result.get('score', 0)
if not result.get('success') or score < 0.5:
print(f"Low score: {score}")
return {
'statusCode': 403,
'headers': cors_headers,
'body': json.dumps({'error': 'Low trust score'})
}
# Verify action matches
if result.get('action') != 'page_view':
return {
'statusCode': 403,
'headers': cors_headers,
'body': json.dumps({'error': 'Invalid action'})
}
# Parse timestamp for monthly aggregation
timestamp = datetime.fromisoformat(
body['timestamp'].replace('Z', '+00:00')
)
year_month = timestamp.strftime('%Y-%m')
# Strip query params and fragments from path
path = body['path'].split('?')[0].split('#')[0]
# Update counter atomically
table = dynamodb.Table(os.environ['DYNAMODB_TABLE'])
table.update_item(
Key={
'pk': path,
'sk': year_month
},
UpdateExpression='ADD view_count :inc SET last_viewed = :ts, recaptcha_score = :score',
ExpressionAttributeValues={
':inc': Decimal(1),
':ts': body['timestamp'],
':score': Decimal(str(score))
}
)
return {
'statusCode': 200,
'headers': cors_headers,
'body': json.dumps({'success': True})
}
except Exception as e:
print(f'Error: {str(e)}')
return {
'statusCode': 500,
'headers': cors_headers,
'body': json.dumps({'error': 'Internal error'})
}
The beauty of DynamoDB's ADD operation is that it's atomic. If the item doesn't exist, it creates it with view_count = 1. If it exists, it increments. No read-before-write race conditions.
Secrets management
Never hardcode secrets. I use AWS Secrets Manager:
# File: terraform/secrets.tf
resource "aws_secretsmanager_secret" "recaptcha" {
name = "recaptcha-v3-keys"
}
resource "aws_secretsmanager_secret_version" "recaptcha" {
secret_id = aws_secretsmanager_secret.recaptcha.id
secret_string = jsonencode({
site_key = "PLACEHOLDER"
secret_key = "PLACEHOLDER"
})
lifecycle {
ignore_changes = [secret_string]
}
}
That lifecycle rule prevents Terraform from overwriting production secrets. I learned that one the hard way, long ago.
API Gateway configuration
The API Gateway setup is straightforward but don't forget CORS:
# File: terraform/api_gateway.tf
resource "aws_api_gateway_rest_api" "api" {
name = "blog-analytics-api"
}
resource "aws_api_gateway_resource" "view_counter" {
rest_api_id = aws_api_gateway_rest_api.api.id
parent_id = aws_api_gateway_rest_api.api.root_resource_id
path_part = "view-counter"
}
resource "aws_api_gateway_method" "view_counter_post" {
rest_api_id = aws_api_gateway_rest_api.api.id
resource_id = aws_api_gateway_resource.view_counter.id
http_method = "POST"
authorization = "NONE"
}
resource "aws_api_gateway_integration" "view_counter" {
rest_api_id = aws_api_gateway_rest_api.api.id
resource_id = aws_api_gateway_resource.view_counter.id
http_method = aws_api_gateway_method.view_counter_post.http_method
integration_http_method = "POST"
type = "AWS_PROXY"
uri = aws_lambda_function.view_counter.invoke_arn
}
IAM permissions
Lambda needs specific permissions:
# File: terraform/iam.tf
resource "aws_iam_role_policy" "lambda_policy" {
name = "view-counter-policy"
role = aws_iam_role.lambda.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
]
Resource = "arn:aws:logs:*:*:*" # Scope to specific log group in production
},
{
Effect = "Allow"
Action = ["secretsmanager:GetSecretValue"]
Resource = aws_secretsmanager_secret.recaptcha.arn
},
{
Effect = "Allow"
Action = [
"dynamodb:UpdateItem",
"dynamodb:GetItem"
]
Resource = aws_dynamodb_table.analytics.arn
}
]
})
}
Performance and cost analysis
| Frontend Performance | Impact |
|---|---|
| reCAPTCHA script | ~130KB (45KB gzipped) |
| Execution time | 200-400ms |
| Network overhead | 1 extra POST request |
But it runs async and fails silently, so zero impact on user experience.
| Backend Costs (10K views/month) | Monthly Cost (Sep 2025) |
|---|---|
| DynamoDB | ~$0.01 |
| Lambda | $0 (free tier) |
| API Gateway | ~$0.03 |
| Secrets Manager | $0.40 |
| Total | < $1/month |
Critical implementation details
Path sanitization
The Lambda strips query parameters and fragments. So /blog/post, /blog/post?ref=twitter, and /blog/post#section all increment the same counter. Otherwise you'd have hundreds of variations of the same page.
Score threshold
I use 0.5 as the cutoff. Google suggests this as a starting point. In practice, real users typically score 0.7 to 0.9, while bots score 0.1 to 0.3. That middle ground catches sophisticated bots without blocking legitimate users on VPNs or unusual browsers.
Error handling philosophy
The frontend implements fire-and-forget, if analytics fails, the user never knows. This is critical. Analytics should NEVER impact user experience. I've seen too many sites where analytics failures actually break functionality and that's just embarrassing when you think about it.
Monitoring and debugging
CloudWatch Logs capture everything:
- reCAPTCHA scores for tuning thresholds
- Failed validations to spot attacks
- Path patterns to understand traffic
- Error traces for debugging
Trade-offs and limitations
Let's be real about what this solution isn't:
- Not privacy-first - Google tracks users across the web to generate scores
- Requires JavaScript - Users with JS disabled won't be counted
- Monthly granularity - For real-time or hourly stats, use CloudWatch Metrics instead
- Google dependency - You're trusting Google's scoring algorithm
For a public technical blog, these trade-offs seemed worth it for easy, accurate metrics. If you need true privacy or real-time granular data, this isn't the solution for you.
Lessons learned
The key to success was working with the constraints rather than against them:
- Use what Lambda provides. urllib3 is there, requests isn't. Don't fight it.
- Aggregate aggressively. Individual page views are expensive to store and query. Monthly rollups are cheap and sufficient.
- Fail silently. Analytics errors should never affect users.
- Token expiration is real. That 2-minute timeout will bite you if you're not careful.
- CORS headers on errors too. Otherwise the browser console lights up like a Christmas tree.
Wrapping up
This serverless architecture gives you bot-filtered analytics for essentially free. By combining reCAPTCHA v3's invisible verification with DynamoDB's atomic operations and Lambda's pay-per-use model, you get accurate view counts without the complexity or cost of traditional analytics platforms.
Is it perfect? No. But for a personal blog or small business site, it's more than sufficient. You get clean data showing actual human traffic patterns, and you can build whatever dashboards you want on top of the DynamoDB data.
The whole implementation took me a Wednesday weeknight after work. Sometimes the simple solution really is the best solution. Give it a try, worst case, you learn something about reCAPTCHA and AWS. Best case, you finally get analytics data you can actually trust without the bot noise.