Certificate Monitoring for Account Health

Introduction

The Account Health monitoring stack has been enhanced with a new certificate monitoring feature that provides proactive alerting for upcoming certificate expiries. This feature helps prevent service disruptions by alerting teams well in advance of certificate expiration dates, allowing sufficient time for certificate renewal.

The certificate monitoring capability integrates seamlessly with the existing Account Health monitoring infrastructure, leveraging CloudWatch metrics and alarms to provide both task-level and critical-level notifications based on configurable expiry thresholds.

Features

Certificate Expiry Monitoring

The new certificate monitoring feature introduces automated scanning and alerting for AWS Certificate Manager (ACM) certificates approaching their expiration dates. This proactive monitoring helps ensure that certificates are renewed before they expire, preventing potential service outages.

Key Capabilities

Implementation Details

Lambda Function

The certificate monitoring is implemented through a Python 3.11 Lambda function (certificatemonitor) that:

  1. Lists all certificates in the account using the ACM API
  2. Calculates days until expiry for each certificate
  3. Publishes metrics to CloudWatch with appropriate dimensions
  4. Runs on an hourly schedule via EventBridge

The Lambda function publishes two types of metrics:

CloudWatch Alarms

Two CloudWatch alarms are configured to monitor certificate expiries:

  1. Task-Level Alarm (CertificateMonitoringTaskAlarm):

    • Triggers when any certificate has less than 30 days until expiry (configurable)
    • Sends notifications to the task SNS topic
    • Allows teams to plan certificate renewals with adequate lead time
  2. Critical-Level Alarm (CertificateMonitoringCriticalAlarm):

    • Triggers when any certificate has less than 10 days until expiry (configurable)
    • Sends notifications to the critical SNS topic
    • Indicates urgent action is required to prevent service disruption

Configuration Parameters

The following new parameters have been added to the monitoring stack:

Parameter Required Default Description
EnableCertificateMonitoring false true Boolean value that sets whether certificate monitoring should be enabled
CertificateExpiryTaskThreshold false 30 Number of days until certificate expiry to trigger task-level alerts
CertificateExpiryCriticalThreshold false 10 Number of days until certificate expiry to trigger critical-level alerts

IAM Permissions

The certificate monitor Lambda function requires the following IAM permissions:

certificatemonitor:
  cloudwatch:
    action:
      - cloudwatch:PutMetricData
  acm:
    action:
      - acm:ListCertificates
      - acm:GetCertificate

Examples

Deployment Example

To deploy the Account Health stack with certificate monitoring enabled using default settings:

# Certificate monitoring is enabled by default with:
# - Task alerts at 30 days before expiry
# - Critical alerts at 10 days before expiry

Customizing Expiry Thresholds

To customize the expiry thresholds during deployment:

# Set task alerts at 60 days and critical alerts at 14 days
aws cloudformation update-stack \
  --stack-name account-health-monitoring \
  --parameters \
    ParameterKey=CertificateExpiryTaskThreshold,ParameterValue=60 \
    ParameterKey=CertificateExpiryCriticalThreshold,ParameterValue=14

Disabling Certificate Monitoring

To disable certificate monitoring:

aws cloudformation update-stack \
  --stack-name account-health-monitoring \
  --parameters \
    ParameterKey=EnableCertificateMonitoring,ParameterValue=false

Example Scenario

The PR includes a practical example demonstrating the feature:

  1. Initial State: Deploy the feature with default parameters (monitoring enabled, 30-day threshold)
  2. Test Normal Operation: With a certificate expiring on August 24, 2025, the Lambda pushes a metric value of 0 indicating no certificates are within the 30-day threshold
  3. Simulate Alert: Update the CertificateExpiryThreshold parameter to 385 days to include the test certificate
  4. Verify Alert: The alarm triggers as expected, detecting the certificate that will expire within the threshold

Monitoring Dashboard

The certificate expiry metrics can be visualized in CloudWatch dashboards:

{
  "metrics": [
    [ "Base2Monitoring", "DaysToExpiry", { "stat": "Minimum" } ],
    [ "...", { "stat": "Average" } ]
  ],
  "period": 300,
  "stat": "Minimum",
  "region": "ap-southeast-2",
  "title": "Certificate Expiry Status"
}

Conclusion

The certificate monitoring feature enhances the Account Health monitoring stack with proactive certificate expiry tracking and alerting. By providing configurable thresholds and seamless integration with existing monitoring infrastructure, this feature helps prevent certificate-related outages and ensures smooth service operations.

Teams can now:

This feature is enabled by default in version 0.5.3 of the Account Health stack, providing immediate value while remaining fully configurable to meet specific organizational requirements.