Alerts & Token Usage

Stay informed about what’s happening in your clusters with configurable alerts. Plus, track and control your AI token usage.

Alerts Overview

Alerts tell you when something needs attention. You control:

What triggers alerts
How you get notified
Whether AI should diagnose them

Alert Types

Type	What it watches
GPU Usage	GPU utilization exceeds threshold
Node Not Ready	A node becomes unavailable
Pod Crash	Pod enters CrashLoopBackOff
Memory Pressure	Memory usage is too high
CPU Pressure	CPU usage is too high
Disk Pressure	Disk space is running low
Custom	Your own conditions

Alert Severity

Three levels:

Severity	Color	Meaning
Critical	Red	Needs immediate attention
Warning	Yellow	Should be addressed soon
Info	Blue	Good to know

Creating Alert Rules

Built-in Rules

The console comes with preset rules:

Rule	Condition	Severity
GPU Usage Critical	>90% for 5 min	Critical
Node Not Ready	Any node, 1 min	Critical
Pod Crash Loop	>5 restarts in 10 min	Warning
Memory Pressure	>85% for 5 min	Warning (disabled)

Custom Rules

Create your own rules:

Go to Alerts dashboard (/alerts)
Click “Create Rule”
Configure:
- Name - What to call this rule
- Condition - What triggers it
- Threshold - The limit
- Duration - How long before alerting
- Severity - Critical, Warning, or Info
- Clusters - Which clusters to watch
- Namespaces - Which namespaces to watch
Enable AI Diagnose for automatic analysis
Save

Example: High Memory Alert

Name: High Memory Usage
Condition: Memory > 80%
Duration: 10 minutes
Severity: Warning
Clusters: production-*
AI Diagnose: Enabled

Notification Channels

Choose how you want to be notified:

Browser Notifications

Pop-up notifications in your browser
Click to jump to the alert
Works even when the tab is in background

How to enable:

Go to Settings
Allow browser notifications when prompted

Slack

Send alerts to a Slack channel
Include alert details
Great for team visibility

How to set up:

Create a Slack webhook URL
Go to Settings > Alert Channels
Add Slack webhook
Select which alerts go to Slack

Webhooks

Send alerts to any URL
Use for custom integrations
JSON payload with alert details

Webhook payload:

{
  "alert": {
    "name": "GPU Usage Critical",
    "severity": "critical",
    "status": "firing",
    "cluster": "prod-east",
    "message": "GPU usage at 95%"
  },
  "timestamp": "2024-01-15T10:30:00Z"
}

AI Diagnosis for Alerts

When enabled, AI automatically analyzes alerts.

What AI Provides

Summary - Quick overview of the issue
Root Cause - What’s likely causing it
Suggestions - How to fix it
Mission Link - Click to start AI troubleshooting

Example AI Diagnosis

Alert: Pod Crash Loop

Summary: The nginx-abc123 pod has restarted 7 times
in the last 10 minutes.

Root Cause: The container is exiting with code 137,
indicating an OOM (Out of Memory) kill.

Suggestions:
1. Increase memory limit from 128Mi to 256Mi
2. Check application for memory leaks
3. Review recent deployments for changes

Managing Alerts

Alert Statuses

Status	Meaning
Firing	Alert is active
Pending	Condition met, waiting for duration
Resolved	Issue no longer present

Acknowledging Alerts

Click Acknowledge to:

Record that you’ve seen it
Track who acknowledged and when
Keep for future reference

Clearing Alerts

Alerts auto-resolve when the condition clears. You can also manually close alerts that are no longer relevant.

Token Usage

AI features use tokens. Track and control your usage.

Where to See Usage

Header bar - Quick percentage view
Settings page - Detailed breakdown

Understanding Tokens

Tokens are like words that AI reads and writes:

Reading your question uses tokens
AI’s response uses tokens
Longer conversations use more tokens

Token Limits

Set limits to control costs:

Setting	What it does
Monthly Limit	Maximum tokens per month
Warning Threshold	% when to show warning
Critical Threshold	% when to restrict features

What Happens at Limits

At Warning (e.g., 80%) - You see a notification
At Critical (e.g., 95%) - Some AI features disabled
At Limit (100%) - AI features pause until next month

Reducing Token Usage

Use Low AI mode for routine work
Keep mission conversations focused
Avoid asking the same question repeatedly
Use direct kubectl for simple queries

Settings Page

The Settings page shows:

Current AI mode
Token usage this month
Alert notification preferences
Connected channels (Slack, webhooks)

Best Practices

For Alerts

Start with defaults - Built-in rules are a good starting point
Don’t over-alert - Too many alerts leads to alert fatigue
Use severity wisely - Reserve Critical for real emergencies
Enable AI diagnose - Let AI help understand issues

For Token Usage

Match mode to task - High mode for troubleshooting, Low for monitoring
Review usage weekly - Stay ahead of limits
Set reasonable limits - Balance cost vs. usefulness

For Notifications

Use Slack for teams - Everyone sees important alerts
Keep browser on - For personal notifications
Don’t over-notify - critical to Slack, rest to browser