Agent 02 · Reliability Agent · for Databricks
Stop lakehouse drift before it bites.
Lakehouses don't break — they decay. The agent maintains health continuously, so you stop firefighting.
What the agent maintains
Six drift patterns. Continuously.
Small-files explosion OPTIMIZE staleness Missing partitioning Recurring failed queries Expensive query patterns Cluster configuration drift
Conversational investigation
When something breaks, just ask the agent.
› why is the nightly_billing_etl job failing?
Investigating... 3 of last 5 runs failed at task load_invoice_lines with OutOfMemoryError. Root cause: input partition skew increased 4× last week — customers table grew but partition strategy didn't adapt. Recommended fix: re-partition by customer_region. Estimated savings: $4,080/yr. Want me to draft the change?
Same MCP tool registry. Same audit trail. Just conversational.
Health, on autopilot.
Tell the agent what tables you care about. It maintains them. You stop being on call.