Customers

Pricing

Research

Resources

Company

Contact Sales

Start now

Mar 30, 2026

Building a Reliable AI Infrastructure Layer

A person smiling and posing in front of a dark background.

Pranav Pusarla

A set of microphones on a control panel.

Summarize this article with your favorite LLM

Rox runs thousands of different agents over its various products. That means millions of model calls per day that need to execute reliably and at scale all day long. At that volume, infrastructure matters.

We rely on a combination of standard techniques to keep things running smoothly: rate limiting, fallbacks, dynamic routing. However, as things scale, you begin to realize that standard techniques break under the pressure. That's where the fun begins: breaking it down to the basics and innovating on the fly.

Here are a few of our stories.

Rate Limiting

Every platform that uses AI agents depends on multiple model providers: OpenAI, Anthropic, open source…, to name a few. And with that usage comes rate limits in two common forms: RPM (Requests per Minute) and TPM (Tokens per Minute). What providers don't always tell you is that once you hit those limits, they quietly start throttling your service, creating unpredictability at exactly the wrong moments.

So we built our own internal rate limiting service. The goal was simple: take back control and protect the reliability of our product. We used Redis via the pyrate-limiter package, wrapped with a retry layer and jitter to handle traffic spikes gracefully.

We shipped it to production. The next morning, we had a SEV-0.

Redis memory usage had gone through the roof. Core parts of our product were coming to a halt. After a few hours of digging, we found the culprit: pyrate-limiter uses a Leaky Bucket algorithm that stores every request as an individual timestamp. For a single request carrying 20,000 tokens, the package was making 20,000 separate Redis calls: one per token.

Ex. pseudocode from package

for i=1,TOKENS do
    redis.call('ZADD', bucket, now, i)
end

for i=1,TOKENS do
    redis.call('ZADD', bucket, now, i)
end

for i=1,TOKENS do
    redis.call('ZADD', bucket, now, i)
end

You can imagine our reaction. We were routing millions of tokens through this thing.

We went back to basics, researched industry approaches, and landed on the rolling window algorithm: a method that groups all requests within a time window into a single bucket, storing usage as a simple key-value pair. We wrote the bucket implementation from scratch using our own custom LUA scripts and patched it into the package. The Redis call overhead dropped dramatically. While it's an approximate estimation compared to the full preciseness of Leaky Bucket, you can increase the number of buckets to get comparable accuracy. For our use case, it was the perfect fit.

Some electronic controls on an handheld device.

Figure 1. Rolling Window Bucket Algorithm

We shipped the fix. It's holding up well.

Search Provider Reliability

One of our most popular features is Clever Columns, which lets users generate answers to any question across a list of companies/accounts. Under the hood, it works by scraping public URLs through search providers and extracting the information needed to answer each query.

But what happens when a search provider goes down? We built a fallback system with multiple layers of redundancy. What we didn't anticipate were the problems that came with juggling multiple external services.

External providers offer SDKs to make integration easy and secure and for a while, that convenience was great. Then we started noticing something strange: bizarre lag spikes under traffic, potential memory leaks, unusually high CPU usage. After days of investigation, we found the source.

One of our primary search providers was opening a brand new HTTP connection for every single request. It was baked right into their SDK. Every new connection means a full TCP and TLS handshake, expensive in both latency and compute. The fix was connection pooling: we patched their internal parameters on our end to use our own pooling client. It worked, until the same issue surfaced with another search provider.

Ex. code that caused bug

limits=httpx.Limits(max_keepalive_connections=0)

limits=httpx.Limits(max_keepalive_connections=0)

limits=httpx.Limits(max_keepalive_connections=0)

That kicked off one of the longest internal debates we've ever had. One side wanted to ditch the SDK approach entirely. Patching each new provider's internals wasn't scalable. It was better to own the logic ourselves and talk directly to the source of truth: the API. The other side was worried about the maintenance burden. If we bypass the SDK, we have to track every upstream API change ourselves.

We chose to trust our engineers’ capabilities and build from the ground up. We scrapped the SDK, built our own connection pool from scratch, and created lightweight provider clients that talk directly to each API.

A collection of controls showing a speed game.

Figure 2. HTTPX connection pooling architecture

A week later, it was in production ready before the weekend. And within days after that, we were able to scale our search providers by 3x without being concerned with external provider behavior.

Reliability as a Service

There's a phrase that gets repeated a lot in our office: "Think from first principles." It sounds simple, but it becomes essential when you're building the layer that holds everything else up.

Whenever we start a new infrastructure problem, we ask ourselves three questions:

Why does this need to exist?
How will this affect the user's experience?
How do we build this to scale from the very beginning?

In a space moving as fast as AI, what makes any platform truly usable isn't just the intelligence of the models but the reliability underneath them. The two problems we described here are just a small part of the engineering framework we're building every day.

Summarize this article with your favorite LLM

Similar Articles

We build with the best to make sure we exceed the highest standards and deliver real value.

View all

Resources

Revenue Intelligence vs CRM Analytics: 7 Key Differences Explained

No items

Jun 26, 2026

Resources

Revenue Intelligence vs CRM Analytics: 7 Key Differences Explained

No items

Jun 26, 2026

Resources

Revenue Intelligence Use Cases: How Data Drives Sales Growth in 2026

No items

Jun 25, 2026

Resources

Revenue Intelligence Use Cases: How Data Drives Sales Growth in 2026

No items

Jun 25, 2026

Resources

Data Analytics for Revenue Intelligence: Insights to Optimize Growth

No items

Jun 25, 2026

Resources

Data Analytics for Revenue Intelligence: Insights to Optimize Growth

No items

Jun 25, 2026

Get started today

Start now

Rox is committed to the privacy and security of its users. Customer data processed through the Rox platform is encrypted in transit and at rest using AES-256 encryption and is never used to train generalized machine learning models. Rox maintains SOC 2 Type II compliance and undergoes independent third-party security audits on an annual basis. All AI-generated outputs, including but not limited to prospect recommendations, message drafts, meeting summaries, and pipeline scoring, are provided for informational purposes and should be reviewed by authorized personnel before any action is taken. Performance metrics referenced on this website, including pipeline generation figures, response rates, and revenue impact, reflect results reported by individual customers under specific configurations and may not be representative of all deployments. Actual results will vary based on factors including but not limited to data quality, CRM configuration, outreach volume, market conditions, and target audience. Rox does not guarantee specific revenue outcomes. The Rox platform integrates with third-party services including Salesforce, HubSpot, Gmail, Microsoft Outlook, Slack, and others; availability and functionality of third-party integrations are subject to the respective providers' terms of service and may change without notice. Features described as "autopilot," "autonomous," or "automated" operate within user-defined parameters and require initial configuration and ongoing oversight. Rox, the Rox logo, and "Revenue on Autopilot" are trademarks of Rox Data Corp. All other trademarks are the property of their respective owners. Service availability is subject to the terms outlined in your enterprise agreement. For questions regarding data processing, compliance certifications, or platform capabilities, contact security@rox.com.

About

Customers

Pricing

Company

Careers

Security

Contact

Resources

Docs

Status

Articles

Media kit

Terms & policies

Subprocessor List

Vulnerability Disclosure Policy

About

Customers

Pricing

Company

Careers

Security

Contact

Resources

Docs

Status

Articles

Media kit

Terms & policies

Subprocessor List

Vulnerability Disclosure Policy

About

Customers

Pricing

Company

Careers

Security

Contact

Resources

Docs

Status

Articles

Media kit

Terms & policies

Subprocessor List

Vulnerability Disclosure Policy

About

Customers

Pricing

Company

Careers

Security

Contact

Resources

Docs

Status

Articles

Media kit

Terms & policies

Subprocessor List

Vulnerability Disclosure Policy

About

Customers

Pricing

Company

Careers

Security

Contact

Resources

Docs

Status

Articles

Media kit

Terms & policies

Subprocessor List

Vulnerability Disclosure Policy