WPEngine Monitoring
Automated monitoring of 228 production WordPress sites across all WPEngine installs. Collects backup, SSL, disk, and bandwidth metrics every 2 hours with Slack alerts.
Quick Start
# Check monitoring status
tail -100 ~/logs/flypilot-wpengine.log
# Check alerts
tail -100 ~/logs/flypilot-wpengine-alerts.log
# View recent metrics in Supabase
psql $SUPABASE_DB_URL -c "SELECT * FROM v_wpengine_health_summary LIMIT 10;"
What's Being Monitored
228 Production Sites
Monitoring covers:
- OMG Interactive sites - Property management sites across multiple clients
- Learned Media sites - Client WordPress installations
- FLYPILOT sites - ShopSanel, SPS Fitness, SHAI USA, etc.
4 Metric Types
| Metric | Fields | Update Frequency |
|---|---|---|
| Backup | status, size_mb, hours_since | Every 2 hours |
| SSL | issuer, valid_to, days_remaining | Every 2 hours |
| Disk | used_mb, total_mb, percent_used | Every 2 hours |
| Bandwidth | used_gb, included_gb, percent_used | Every 2 hours |
Health Thresholds
Backup Monitoring
| Status | Condition | Action |
|---|---|---|
| 🔴 Critical | Backup failed | Slack alert immediately |
| ⚠️ Warning | >48 hours since last backup | Slack alert |
| ✅ Healthy | Backup within 48 hours | No action |
SSL Monitoring
| Status | Condition | Action |
|---|---|---|
| 🔴 Critical | ≤7 days until expiry | Slack alert immediately |
| ⚠️ Warning | ≤30 days until expiry | Slack alert |
| ✅ Healthy | >30 days until expiry | No action |
Disk Monitoring
| Status | Condition | Action |
|---|---|---|
| 🔴 Critical | ≥90% disk used | Slack alert immediately |
| ⚠️ Warning | ≥75% disk used | Slack alert |
| ✅ Healthy | <75% disk used | No action |
Bandwidth Monitoring
| Status | Condition | Action |
|---|---|---|
| 🔴 Critical | ≥90% bandwidth used | Slack alert immediately |
| ⚠️ Warning | ≥75% bandwidth used | Slack alert |
| ✅ Healthy | <75% bandwidth used | No action |
Cron Schedule
Metrics Collection
# Every 2 hours
0 */2 * * * cd ~/GitHub/flypilot && source .venv/bin/activate && python scripts/sync_wpengine_metrics.py
What it does:
- Connects to WPEngine API
- Fetches metrics for all 228 sites
- Stores in Supabase
wpengine_metricstable - Updates
last_synced_attimestamp
Alert Processing
# Every 15 minutes during business hours (7am-8pm, Mon-Fri)
*/15 7-20 * * 1-5 cd ~/GitHub/flypilot && source .venv/bin/activate && python scripts/wpengine_alerts.py
What it does:
- Queries
wpengine_metricstable - Evaluates health thresholds
- Sends Slack alerts to
#tech-alertsfor issues - Logs all activity to
~/logs/flypilot-wpengine-alerts.log
Slack Alerts
All alerts are sent to the #tech-alerts channel with:
Message format:
🔴 CRITICAL: SSL Certificate Expiring
Site: clientsite.com
Install: omg-client-1
Days Remaining: 5
Expires: 2026-02-08
Action Required: Renew SSL certificate immediately
Alert types:
- 🔴 Critical - Requires immediate action
- ⚠️ Warning - Requires attention soon
- ✅ Resolved - Issue has been fixed
Supabase Dashboard Views
5 pre-built views for querying metrics:
v_wpengine_health_summary
Overall health score (0-100) per site based on all metrics.
SELECT * FROM v_wpengine_health_summary
WHERE health_score < 70
ORDER BY health_score ASC;
Health score calculation:
- Backup status: 40% weight
- SSL validity: 30% weight
- Disk usage: 20% weight
- Bandwidth usage: 10% weight
v_wpengine_backup_status
Backup-specific metrics and status.
SELECT * FROM v_wpengine_backup_status
WHERE hours_since_last_backup > 48
ORDER BY hours_since_last_backup DESC;
v_wpengine_ssl_status
SSL certificate metrics and expiry tracking.
SELECT * FROM v_wpengine_ssl_status
WHERE days_remaining < 30
ORDER BY days_remaining ASC;
v_wpengine_disk_status
Disk usage metrics and capacity tracking.
SELECT * FROM v_wpengine_disk_status
WHERE percent_used > 75
ORDER BY percent_used DESC;
v_wpengine_bandwidth_status
Bandwidth usage metrics and overage tracking.
SELECT * FROM v_wpengine_bandwidth_status
WHERE percent_used > 75
ORDER BY percent_used DESC;
Manual Operations
Run Metrics Collection Manually
cd ~/GitHub/flypilot
source .venv/bin/activate
# Collect metrics for all sites
python scripts/sync_wpengine_metrics.py
# Check logs
tail -50 ~/logs/flypilot-wpengine.log
Run Alert Check Manually
cd ~/GitHub/flypilot
source .venv/bin/activate
# Dry-run (no Slack alerts sent)
python scripts/wpengine_alerts.py --dry-run -v
# Run for real (sends Slack alerts)
python scripts/wpengine_alerts.py
Test Alert Filtering
# See what alerts would be triggered
python scripts/wpengine_alerts.py --dry-run -v | grep -A 5 "Critical\|Warning"
# Count alerts by severity
python scripts/wpengine_alerts.py --dry-run -v | grep -c "Critical"
python scripts/wpengine_alerts.py --dry-run -v | grep -c "Warning"
Phase 1 Deployment Status
Status: ✅ Production (deployed 2026-02-03)
What's Live:
- ✅ Metrics collection (backup, SSL, disk, bandwidth)
- ✅ Cron job running every 2 hours
- ✅ Data syncing to Supabase
- ✅ Dashboard views created
- ✅ Alert system deployed
- ✅ Slack notifications working
Completion: 100%
Future Enhancements (Phase 2-3)
Phase 2: Enhanced Monitoring
- Performance metrics (page load times, TTFB)
- Uptime monitoring (ping checks)
- PHP error log monitoring
- Plugin update notifications
Phase 3: Automation
- Automatic SSL renewal reminders
- Backup verification testing
- Disk cleanup recommendations
- Bandwidth optimization suggestions
Troubleshooting
"No metrics collected"
# Check cron is running
crontab -l | grep wpengine
# Check logs for errors
tail -100 ~/logs/flypilot-wpengine.log | grep -i error
# Test API connection
python scripts/sync_wpengine_metrics.py
"Alerts not showing in Slack"
# Verify Slack bot is in #tech-alerts channel
# Check Slack bot token in environment
echo $SLACK_BOT_TOKEN
# Test alert system
python scripts/wpengine_alerts.py --dry-run -v
"Too many SSL alerts"
# Check SSL filtering in script
python scripts/wpengine_alerts.py --dry-run -v | grep -A 5 "SSL"
# Expected: less than 10 SSL alerts per run
# If more, adjust filtering threshold
"Metrics out of date"
# Check last collection time
psql $SUPABASE_DB_URL -c "SELECT MAX(last_synced_at) FROM wpengine_metrics;"
# Should be within 2 hours of current time
# If older, check cron and logs
Configuration
Environment Variables
Required in ~/.config/flypilot/automation.env:
WPENGINE_USER_ID="your-wpengine-user-id"
WPENGINE_PASSWORD="your-wpengine-password"
SLACK_BOT_TOKEN="xoxb-your-slack-bot-token"
SUPABASE_URL="https://your-project.supabase.co"
SUPABASE_SERVICE_ROLE_KEY="your-service-role-key"
Alert Thresholds
Thresholds are configurable in scripts/wpengine_alerts.py:
THRESHOLDS = {
'backup_warning_hours': 48,
'ssl_critical_days': 7,
'ssl_warning_days': 30,
'disk_critical_percent': 90,
'disk_warning_percent': 75,
'bandwidth_critical_percent': 90,
'bandwidth_warning_percent': 75,
}