Go back
Go back
AI Blog

Rox Outreach: Equipping Agents to Write Sales Emails

December 23, 2025

Rox Outreach: Equipping Agents to Write Sales Emails

When we first started building Outreach at Rox, the goal sounded deceptively simple: help reps write better sales emails, faster. But as the models improved and our platform grew richer, we realized the hard part was no longer the model’s ability to write. It was everything around it.

This post is about a few lessons I learned while modernizing Rox Outreach’s email generation, specifically around UX and agent design. The core theme is straightforward. With frontier models, success is less about raw intelligence and more about what you choose to put into context, when you do it, and how much control you give the user along the way.

What is Rox Outreach?

Rox Outreach lets users configure multi-step campaigns across email and non-email channels. Each campaign is grounded in rich context. We automatically inject information about the contact, their company, and the user’s own organization. Users can layer in background research via clever columns, attach organization documents directly, and add explicit model instructions to steer tone or strategy.

The promise is that agents do not start from a blank page. They start informed. But as we learned, more context does not mean better output.

Platform lessons: keeping context alive

Earlier versions of Outreach followed a one-and-done workflow. Users defined the campaign upfront, specified the number of steps, wrote instructions for the AI, and generated emails. That was it. No iteration, no edits. If something felt off, the only option was to start over.

This design mirrored an older mental model of AI systems: you prompt once, you accept the output. But when humans write, they make multiple iterative passes over their drafts until they’re satisfied. We should not expect AI-assisted writing to be any different.

The UX mismatch was obvious in hindsight, but quickly become our top priority to resolve as we modernized Outreach. Campaigns needed to be living objects. Users should be able to adjust instructions, fix mistakes, or update assumptions without blowing everything away. That forced us to think carefully about how context is stored, recomposed, and refreshed as the campaign evolves.

When I say “context” here, I am referring not only to the aforementioned user inputs to a campaign, but also to the actual context of a given outreach sequence, i.e., a given email should be generated with the history of past sent emails in context. After all, if a user’s campaign opens with a template that they have written, followed by agent-generated emails, and later they edit that template with new messaging, the agent emails should react accordingly.

The hard problem to solve here in particular was reconciling this with our platform’s task framework. Since each campaign may target hundreds of contacts, email generation is orchestrated as async background tasks. When a campaign is created or edited, it launches uncancellable tasks with a static set of instructions to populate emails per contact. And if the user continues editing their campaign with tasks in flight (behavior that we formerly blocked), the content generated by those tasks would suddenly be stale.

The solution we devised was to apply versioning to campaigns. We would continuously launch tasks as campaigns get created or edited, and with each edit, bump the version on the campaign configuration in our database, and mark the edited step and any subsequence AI steps as “regenerating”. Tasks in flight “listen” for newer versions throughout their lifecycle, and drop out of execution without any DB writes when a new version is detected, under the assumption that the new edit will trigger its own generation task. The final outcome is that campaign edits are idempotent and free of race conditions, with only the most recent set of instructions “winning”.

Keeping context up to date is not flashy. If we do our job right, the user simply has their expectations met. But at the end of the day, a system that cannot gracefully absorb change will always feel brittle.

Agent lessons: separation of concerns

On the agent side, our biggest realization was that we were asking one agent to do too much at once.

Campaigns can include a lot of inputs: CRM fields, company firmographics, clever columns, uploaded documents, and freeform instructions. Feeding all of that directly into a single prompt overwhelmed the model, and the output suffered for it.

Our response was to split the agent into two phases.

The first agent focuses on research. It looks at the prospect and their company and produces high-level bullet points. These are not polished sentences. They are ideas worth mentioning, angles to hit, and signals that matter in the prospect’s world.

The second agent, which we call the style agent, takes those bullets and drafts the actual email. It applies best practices in email structure (personalized intros/triggers, value props, calls to action), balances tone, and importantly, decides what to leave in versus out.

This split also changed how we think about summaries. In agent development, we often complain about lossiness when we manage context length. Document summaries throw away detail and remove nuance. Usually that’s a bug, but here, it became a feature.

The prior outreach agent would often latch onto hyper-specific facts from raw documents and force them into emails. The result sounded unnatural, like a research report pretending to be a cold email. By introducing a deliberately lossy intermediate representation, we gave the style agent room to sound human. The email did not need to say everything. It needed to say the right things.

Prompting, revisited

There is a saying in data science: if you torture the data, eventually it will confess. We found the same applies to prompting. Our old Outreach agent was guilty of this. The prompt was long, prescriptive, and opinionated. It told the model exactly how emails should sound.

The result was predictable. Emails worked, but they all sounded “samey”. Worse, the agent often ignored explicit user instructions because the prompt itself was shouting louder.

This time around, we treated prompt design as a UX problem. The default behavior had to be good enough for a first-time user with no instructions at all. But it also had to be steerable for power users who wanted to experiment.

That meant carefully devising our prompts such that the only things set in stone are irrefutable best practices on cold outreach that we gathered from discussions with our sales team, customers, and articles online. At the same time, we leave some potentially more prescriptive instructions on things like style to be overridable by user prompts, such that emails look good out of the box but are visibly responsive to instructions.

Conclusion

The common thread across all of this is that model capability is no longer the bottleneck. Frontier models can write, reason, and adapt tone exceedingly well. The challenge is choosing what context earns a place in front of them and how that context evolves over time.

Good systems require strong source data, clear signals, and deep domain knowledge. But above all, they require respect for the user. The user is king. If the system is not steerable, dynamic, and responsive, then it does not matter how impressive the agent is.

Join us

Driving your business forward with impactful solutions.

Related Articles

Copyright © 2025 Rox. All rights reserved. 251 Rhode Island St, Suite 205, San Francisco, CA 94103

Copyright © 2025 Rox. All rights reserved. 251 Rhode Island St, Suite 205, San Francisco, CA 94103