Skip to main content
Performance & Resilience Engineering

Performance as a Declarative Policy: Implementing Intent-Based Scaling and Circuit-Breaking with OPA

When your autoscaling logic is scattered across Helm charts, custom controllers, and application code, changing a single threshold means touching multiple repos, redeploying services, and hoping nothing breaks. Circuit-breaking rules suffer the same fate: hardcoded timeouts and retry budgets that require a full release cycle to adjust. This guide presents an alternative: treat performance and resilience rules as declarative policies evaluated by Open Policy Agent (OPA). You define what you want (intent), not how to achieve it, and OPA becomes the decision engine for scaling and circuit-breaking. This approach is not for greenfield prototypes. It suits teams already running OPA for authorization or compliance who want to extend its reach into operational decisions. If you have experienced SREs, a mature observability stack, and a willingness to write Rego, the payoff is faster iteration and a single source of truth for resilience rules.

When your autoscaling logic is scattered across Helm charts, custom controllers, and application code, changing a single threshold means touching multiple repos, redeploying services, and hoping nothing breaks. Circuit-breaking rules suffer the same fate: hardcoded timeouts and retry budgets that require a full release cycle to adjust. This guide presents an alternative: treat performance and resilience rules as declarative policies evaluated by Open Policy Agent (OPA). You define what you want (intent), not how to achieve it, and OPA becomes the decision engine for scaling and circuit-breaking.

This approach is not for greenfield prototypes. It suits teams already running OPA for authorization or compliance who want to extend its reach into operational decisions. If you have experienced SREs, a mature observability stack, and a willingness to write Rego, the payoff is faster iteration and a single source of truth for resilience rules.

Why Intent-Based Scaling and Circuit-Breaking Fail Without a Policy Engine

Most teams start with simple threshold-based autoscaling: CPU at 80% triggers a scale-up. That works until traffic patterns become bursty, or a dependency slows down and causes cascading scale events. Circuit-breaking is equally brittle: a hardcoded 500ms timeout might work in one region but cause false positives in another with higher latency. The core problem is that scaling and circuit-breaking decisions depend on context—current load, error rates, time of day, deployment version—and embedding that context into code or config maps creates tight coupling.

Without a policy engine, changing a rule requires a code change, a build, a deployment, and a rollback plan. Teams often end up with multiple copies of the same logic: one in the HPA config, one in the service mesh, one in the application itself. When a rule needs updating, some instances get missed, leading to inconsistent behavior. OPA centralizes policy evaluation: you write the rule once in Rego, and every component that needs to make a scaling or circuit-breaking decision queries OPA. This separation of concerns means the policy can change without touching any running service.

Another failure mode is the lack of composability. A simple rule like 'scale up if error rate > 5%' might conflict with 'scale down if CPU < 30%'. Without a policy engine, these rules are evaluated independently, often causing oscillation. OPA allows you to write policies that consider multiple signals simultaneously and produce a single, coherent decision. You can express intent like 'prefer availability over cost during business hours' and let OPA resolve the trade-offs.

Finally, auditing and compliance become easier. Every decision OPA makes can be logged with the full input context. When an incident occurs, you can replay the exact inputs and see which policy fired. This is nearly impossible when rules are scattered across codebases.

Prerequisites: What You Need Before Writing Rego for Performance

Before diving into Rego policies, ensure your infrastructure meets a few prerequisites. First, you need a reliable data source for the signals your policies will evaluate. OPA is a decision engine, not an observability system. You must feed it metrics like request latency, error rates, CPU utilization, and queue depths. Typically, this data comes from Prometheus, Datadog, or a custom metrics API. OPA can pull data via its built-in HTTP client or receive it as part of the input document when a service queries it.

Second, you need a way to trigger policy evaluation at the right moments. For autoscaling, you might integrate OPA with the Kubernetes Horizontal Pod Autoscaler (HPA) using a custom metrics adapter that queries OPA for the desired replica count. For circuit-breaking, a service mesh like Istio or Envoy can call OPA's external authorization endpoint before routing traffic. Alternatively, you can embed the OPA Go SDK into your application for lower latency, but that reintroduces some coupling.

Third, your team must be comfortable writing Rego. Rego is a declarative query language with a learning curve. If your team has no experience with logic programming or policy-as-code, start with simple rules and iterate. Invest in unit tests for your policies using OPA's built-in test framework. Without tests, policy changes become risky.

Fourth, consider the performance of OPA itself. OPA can evaluate policies in sub-millisecond time for simple rules, but complex policies with many data lookups may take longer. If your circuit-breaking decision needs to happen in under 10 milliseconds, you may need to pre-load data or use partial evaluation. Benchmark your policies with realistic input sizes before relying on them in production.

Finally, have a rollback plan. OPA policies can introduce bugs just like code. Use version control for your policy bundles, and deploy them through a CI/CD pipeline. If a policy causes incorrect scaling or unnecessary circuit breaks, you need to revert quickly. OPA supports bundle signing and rollback via the bundle API.

Core Workflow: Writing Rego Policies for Scaling and Circuit-Breaking

The workflow consists of three phases: define the input schema, write the policy, and integrate with the decision point. Let's walk through each with concrete examples.

Define the Input Schema

Every OPA policy receives an input document that contains the context for the decision. For scaling, the input might include the current replica count, average CPU, memory, request latency, and error rate. For circuit-breaking, the input might include the destination service, the recent error rate, the current request rate, and the circuit state (closed, open, half-open). Agree on this schema with your team before writing policies. Use JSON Schema or OpenAPI to document it.

Example input for a scaling decision:

{
  "namespace": "production",
  "deployment": "checkout",
  "current_replicas": 3,
  "metrics": {
    "cpu_avg": 0.75,
    "memory_avg": 0.60,
    "p99_latency_ms": 1200,
    "error_rate": 0.02
  },
  "time": "2025-03-15T14:30:00Z"
}

Write the Policy

In Rego, you write rules that evaluate to a decision. For scaling, the decision could be the desired number of replicas. For circuit-breaking, the decision could be 'allow', 'deny', or 'open_circuit'. Here is a simple scaling policy that scales up if latency exceeds 1000ms or error rate exceeds 5%, but never beyond 10 replicas:

package scaling

default desired_replicas = input.current_replicas

# Scale up if latency is high
high_latency {
    input.metrics.p99_latency_ms > 1000
}

# Scale up if error rate is high
high_errors {
    input.metrics.error_rate > 0.05
}

desired_replicas = new_replicas {
    high_latency
    new_replicas = input.current_replicas + 1
}

desired_replicas = new_replicas {
    high_errors
    new_replicas = input.current_replicas + 1
}

# Cap at 10
cap = 10 {
    desired_replicas > 10
}

For circuit-breaking, you might write a policy that denies traffic to a service if its error rate exceeds 10% over the last minute, unless the request is from an admin:

package circuit_breaker

allow {
    input.metrics.error_rate < 0.10
}

allow {
    input.request.metadata.role == "admin"
}

deny {
    not allow
    reason = "error_rate_too_high"
}

Integrate with the Decision Point

For Kubernetes HPA, you can use a custom metrics adapter that queries OPA for the desired replica count. The adapter periodically sends the current metrics to OPA and applies the returned replica count. For circuit-breaking with Istio, you can use OPA as an external authorizer via the Envoy ext_authz filter. When a request arrives, Envoy calls OPA, which returns allow or deny based on the circuit state. Both integrations require careful latency monitoring to avoid adding too much overhead.

Tools and Setup: Running OPA for Performance Decisions

OPA can be deployed as a sidecar, a standalone server, or embedded via its Go SDK. For autoscaling, a standalone OPA server with a bundle endpoint works well. You deploy OPA as a Deployment in Kubernetes, configure it to pull policies from a bundle server (like an S3 bucket or a Git repository), and expose its REST API. The custom metrics adapter then calls OPA's /v1/data endpoint with the input and reads the decision.

For circuit-breaking, the sidecar pattern is common. Each service instance runs an OPA sidecar that caches policies and evaluates requests locally. This reduces latency because the decision happens on the same node. The sidecar can be configured to reload policies without restarting the service. Use OPA's 'live' mode with bundle downloading for zero-downtime updates.

Another option is to use OPA's partial evaluation feature. If your policies depend on data that changes slowly (like a list of degraded services), you can pre-compute partial results and only evaluate the dynamic parts at request time. This can cut evaluation time by an order of magnitude. However, partial evaluation adds complexity and is only worth it if you have high throughput (thousands of requests per second).

Monitoring OPA itself is critical. Expose OPA's Prometheus metrics (decision count, evaluation time, cache hits) and alert on high evaluation latency or policy errors. Use OPA's built-in 'print' statements during development to debug policies, but remove them in production to avoid log spam.

Variations for Different Constraints: When to Adjust the Approach

The basic workflow above works for many teams, but your constraints may require variations. Here are three common scenarios and how to adapt.

Low-Latency Requirements (Sub-5ms Decisions)

If your circuit-breaking decision must complete in under 5 milliseconds, a remote OPA call is too slow. Embed the OPA Go SDK directly into your application. This eliminates network round-trips and serialization overhead. You still get the benefits of declarative policies, but you must manage the policy lifecycle within your application binary. Use OPA's 'rego' package to compile policies at startup and evaluate them with input. Be mindful of memory usage: large policies can increase binary size.

No Service Mesh (Bare Kubernetes or VMs)

If you are not using a service mesh, you can still implement circuit-breaking with OPA by integrating at the application level. Each service makes an HTTP call to OPA before sending a request to a downstream service. This adds latency but works without mesh infrastructure. To reduce overhead, batch decisions or use local caching with a short TTL. For autoscaling, the HPA custom metrics adapter works regardless of mesh presence.

Multi-Region Deployments with Different Constraints

In a multi-region setup, you may want different scaling rules for each region. For example, the US region might tolerate higher latency before scaling than the EU region due to different user expectations. OPA supports this by including region information in the input. You can write policies that branch on input.region. Alternatively, deploy separate OPA instances per region with region-specific policy bundles. The latter avoids policy complexity but requires more infrastructure.

Pitfalls and Debugging: What to Check When the Policy Doesn't Work

Even with careful design, policies can misbehave. Here are common pitfalls and how to debug them.

Policy Not Firing as Expected

If your scaling policy never triggers, check the input data. OPA silently returns the default value if no rule matches. Use OPA's 'trace' feature to see which rules evaluated and why. Run opa eval with the --explain flag on a sample input to see the evaluation tree. Also verify that the input schema matches what your integration sends. A missing field can cause a rule to be undefined.

Oscillation in Autoscaling

If the replica count bounces up and down, your policy might be too sensitive to short-term spikes. Add a stabilization window in your policy: only scale up if the condition persists for N consecutive evaluations. You can implement this by storing previous decisions in OPA's data document (if using OPA as a server with persistent storage) or by handling stabilization in the HPA adapter. Another approach is to use a hysteresis threshold: scale up at 80% CPU but only scale down below 60%.

Circuit-Breaking False Positives

If the circuit opens too often, your error rate threshold might be too low, or your measurement window too short. Consider using a sliding window of at least 30 seconds to avoid reacting to transient blips. Also check that the input metrics are accurate: if your error rate calculation includes health check failures, those might inflate the rate. Log every circuit-breaking decision with the input context to audit false positives.

OPA Performance Bottleneck

If OPA becomes a bottleneck, profile your policies. Use OPA's built-in profiling (opa eval --profile) to see which rules consume the most time. Common culprits are rules that iterate over large arrays or make many HTTP calls to external data sources. Consider pre-loading data into OPA's cache or using partial evaluation. If all else fails, scale OPA horizontally behind a load balancer.

FAQ: Common Questions About Declarative Performance Policies

Q: Can I use OPA for both scaling and circuit-breaking with the same instance? Yes. You can deploy a single OPA server that exposes multiple policy packages. The scaling adapter queries /v1/data/scaling, and the circuit-breaking authorizer queries /v1/data/circuit_breaker. Just ensure the input schemas are compatible or use separate endpoints.

Q: How do I handle policy rollback? Use versioned bundles. When you push a new bundle, OPA automatically loads it. If something goes wrong, you can push the previous bundle version. Keep at least two previous bundles available on your bundle server. Also, use OPA's 'dry-run' mode in development to test policies against historical data before deploying.

Q: What about cost? OPA is open-source, but running it adds compute overhead. For most teams, the overhead is negligible. A single OPA instance can handle thousands of decisions per second. The cost of running OPA is far less than the cost of incidents caused by misconfigured scaling or circuit-breaking. If you are cost-sensitive, embed the Go SDK to avoid running a separate server.

Q: Can I write policies that depend on external data (e.g., a list of critical services)? Yes. OPA can pull data from external sources via HTTP or bundle APIs. You can also push data into OPA's data document using its REST API. For example, you could maintain a list of 'degraded services' in a ConfigMap and have OPA fetch it periodically.

Q: How do I test policies? Use OPA's test framework. Write unit tests in Rego that define input and expected output. Run them with 'opa test'. Integrate these tests into your CI pipeline. Also, consider using OPA's 'eval' command with sample inputs to manually verify behavior before deploying.

Next Steps: From Pilot to Production

Start small. Pick one service and one decision type—either autoscaling or circuit-breaking—and implement it with OPA in a staging environment. Run it for a week alongside your existing logic, comparing decisions. This gives you confidence in the policies and the integration.

Once the pilot succeeds, expand to more services. Create a policy library with reusable Rego modules. For example, define common metric thresholds in a shared module that all scaling policies import. This reduces duplication and makes policy changes consistent.

Invest in monitoring and alerting for OPA itself. Set up dashboards for decision latency, error rates, and policy version. Alert on any decision that deviates from expected patterns. Without observability, you are flying blind.

Finally, document your policies in plain language alongside the Rego code. Use comments and a README to explain the intent behind each rule. This helps new team members understand why a policy exists and when to change it. Treat policies as code—review them in pull requests, test them, and version them.

Declarative performance policies with OPA are not a silver bullet. They add complexity and require a cultural shift toward policy-as-code. But for teams that already value infrastructure-as-code and want to decouple operational decisions from application logic, this approach provides a powerful way to manage resilience at scale.

Share this article:

Comments (0)

No comments yet. Be the first to comment!