Omnitrace
All posts

What Databricks system tables tell you about cost, and what they still do not

System tables are the foundation for Databricks observability, but cost optimization still needs context, ownership, and action.

The Omnitrace team - - 8 min read

Databricks system tables are one of the most important building blocks for lakehouse operations. They centralize operational data for billing, compute, query history, jobs, lineage, access, and other account-level signals. For FinOps and platform teams, that is a major step forward.

System tables help answer questions that used to require brittle exports, scattered APIs, or manual investigation. They can show usage by product, SKU, workspace, custom tags, identities, jobs, materialized views, streaming tables, and query activity. They can also support dashboards and alerts for spend monitoring.

But system tables are not the whole cost-optimization workflow. They are the evidence layer.

What system tables are good at

System tables are strong when the question is, "What happened?" Billing usage tables can show billable usage and metadata about the resource or object involved. Query history can show SQL warehouse and serverless query records where the system table is available. Compute tables can help monitor cluster configuration and lifecycle changes. Job-cost examples can show how to reason about Lakeflow Jobs and Pipelines under specific billing scopes.

This is exactly the right foundation for historical observability. A cost program needs facts before it can assign ownership or propose changes.

Where the workflow gets harder

The gap appears when the question changes from "What happened?" to "What should we do now?"

A production team usually needs to connect multiple layers:

  • Billing usage and product features
  • Query history and SQL warehouse behavior
  • Compute configuration and policy drift
  • Job runtime, failures, retries, and ownership
  • Cloud infrastructure cost outside the Databricks bill
  • Tags, teams, service principals, and approval workflow

That correlation work is where many organizations spend the most effort. Cost data may say a workload is expensive. Query history may show latency. Compute metadata may show configuration drift. Tags may be incomplete. The owner may be a service principal. The fix may require approval from someone who does not live in the dashboard.

System table data also has a security boundary

Databricks notes that system table information can expose sensitive operational data if it is mishandled. That does not mean teams should avoid system tables. It means the product architecture around them matters.

A good operations system should use the smallest necessary scope, avoid customer table contents and query results, and keep the reasoning process inside the customer's chosen boundary when required. In other words, metadata is powerful enough for cost and reliability work, but it still deserves enterprise-grade handling.

What Omnitrace adds above the table layer

Omnitrace treats system tables and APIs as evidence sources, not as the final user experience. The agent correlates operational metadata, identifies repeated patterns, estimates impact, routes findings, applies approved fixes, and verifies the result.

That does not remove the need for Databricks-native observability. It makes that observability actionable. Instead of asking platform engineers to inspect a dashboard and manually assemble a ticket, the agent packages the finding with owner context, cost evidence, a recommended strategy, an autonomy level, and verification criteria.

The result is a different operating model: system tables provide the facts; the agent closes the loop.

Sources worth reading

Ready to put the agent to work?

Connect operational metadata, prioritize verified savings, and move approved Databricks fixes through the agent loop.