How We Are Building Rox’s System of Record: For Contact Data

AI Blog

How We Are Building Rox’s System of Record: For Contact Data

November 29, 2025

Modern revenue systems consume data from a wide range of independent sources: CRMs, enrichment vendors, public web crawlers, marketing automations, CSV uploads, and manual rep entries. Each source often contributes partial, conflicting, or differently structured representations of the same real-world data points. Traditional CRMs attempt to serve as a system of record over this mess, but their assumptions heavily simplify the data reality. They treat a person as a row, with a single authoritative email, name, and company association. Yet, almost no source delivers data that cleanly.

At Rox, we take a different approach. Instead of forcing real-world identity into a row-level structure, we treat identity as computed from context. This led us to develop a System of Context, the data architecture underlying how Rox Agents unify, normalize, and reconcile public and private signals into a consistent, canonical model of people. Today, we will dive into how Rox’s system produces a true system of record for contact data, built on principles that reflect how identity actually behaves in distributed, multi-source environments.

Identity as an Emergent Property, Not a Row

A single executive (John Smith) can appear across a company’s data ecosystem in forms like:

The CRM identifies him as john.smith@acme.com
A data vendor (like ZoomInfo) identifies him asjsmith@acme.com
A rep uploads a data point with john.smith+marketing@acme.com
Linkedin says his profile is linkedin.com/in/johnsmith-acme (public profile)

These identifiers are not interchangeable, nor are they authoritative on their own. They represent observations of the same underlying real-world entity that may be incomplete, time-shifted, or contradictory. Commonly, database-level uniqueness constraints can collapse these variations prematurely, forcing decisions without context. CRMs end up storing whichever version was seen last, which source wrote it, or whichever field a rep edited manually, accumulating drift over time.

Rox’s view is that identity is not a stored object — it is a conclusion drawn from context. To answer “Who is this person?” the system must evaluate identifiers, provenance, timing, and trustworthiness simultaneously. This shifts the design from CRUD operations on rows to a context-rich computation.

The System of Context

Rox’s System of Context provides the semantic and structural foundation for building an authoritative people graph. It captures not only the what (the data) but the how, when, and from whom (the context). Every signal entering Rox (an email, job title, profile URL, domain, or enrichment payload) is ingested with metadata describing:

source type (CRM, CSV, vendor, public web, user input)
timestamp of observation
organizational scope
trust level associated with the source
identifier semantics (primary, alias, historical, inferred)

Input signals are then normalized into consistent formats to produce derived matching keys, while raw inputs are preserved. This contextual information is retained indefinitely and is not collapsed into a single row.

Provenance-driven conflict resolution

Not all sources carry equal authority. Rox enforces a trust hierarchy that prioritizes high fidelity sources over noisy data sources. When multiple sources assert conflicting emails, titles, or company associations, the canonical value is chosen based on trust and recency, but all assertions remain stored in the graph. The canonical view is a projection, not a destructive update.

Entity graph construction

All identifiers link to person nodes via typed edges - primary email, alias email, historical employment, public profile URL, etc. As new identifiers appear, the system computes whether they belong to existing entities or warrant new ones. This graph-based representation supports multi-identifier resolution, conflict isolation, and non-destructive recomputation as data evolves.

This is the heart of the System of Context. It is what turns a fragmented set of external feeds into a coherent internal model.

Direct Resolution

We’ve built a Contact Resolution Engine where Rox projects its System of Context into a usable, canonical contact for both the agentic and product layers. A key design choice is using direct resolution, not chained merges. Many legacy dedupe systems merge records sequentially, forming long pointer chains that make lookups slow and rollback nearly impossible. Rox avoids chain structures entirely. Each variant attaches directly to a single canonical person:

Variant A ↘

Variant B → Canonical Person

Variant C ↗

This guarantees:

deterministic lookups
Constant time resolution paths
no merge-order dependency
instant reversibility
stable performance under high ingest velocity

The resolution engine uses normalized identifiers, trust scores, timestamps, and relational evidence (such as company or role linkage) to attach or detach variants. When upstream data changes or when conflicting information arrives, the system recalculates the canonical projection without rewriting history.

The Service Layer

Once the underlying graph is consistent, Rox generates a canonical contact for operational use. This record surfaces:

a high-trust primary email
verified identity attributes
company association with historical context
role metadata sourced from the most reliable provider
normalized identifiers for matching
a stable, globally unique person identifier

Importantly, this “system of record” is not a table in the traditional sense. It is a projection computed from the System of Context so it is always consistent with the full set of observed data. If a rep corrects a title, or if a vendor updates a domain, or if a public crawl detects a job change, the canonical view recalculates deterministically.

This dynamic projection is what makes Rox’s contact layer behave more like a living model than a static database.

Operational Impact

This architecture delivers practical advantages beyond data cleanliness:

High ingest throughput: multiple teams and tools can add data concurrently without creating duplicates.
Real-world tolerance: partial identifiers, outdated emails, and conflicting sources do not destabilize the system.
Performance: the majority of resolutions complete under one second due to precomputed normalization and direct mappings.
Immutability and auditability: every value has a timestamp, source label, and trust tag.
Self-healing: changes in trusted sources propagate automatically, avoiding manual corrections.

For sales teams, this means fewer duplicate sequences, fewer bounced emails, and a more reliable CRM. For operations teams, it means a stable identity layer that supports enrichment, attribution, segmentation, routing, and analytics without constant cleanup.

Closing

Rox’s Contact Resolution Engine is the visible tip of a larger architectural system designed to compute identity rather than assume it. The underlying System of Context and its ingestion logic, provenance model, trust hierarchy, and entity graph makes an accurate System of Record possible.

By treating identity as contextual and dynamic, Rox produces contacts that are stable, auditable, and reflective of real-world people as they evolve. This is the difference between a CRM that stores data and a platform that understands it.

Join us

Contact Sales Contact Sales

AI Blog

A Brief History of Clever Columns

One of Rox’s flagship features is Clever Columns: scaled research that works across all your accounts. What began as a simple single-turn search has since evolved into a multi-agent system powered by diverse data sources and sophisticated orchestration. As our user base expanded and the sales organizations we supported grew more complex, the research problems they brought us did too - forcing us to rethink and reinvent the underlying architecture at every stage.

November 25, 2025

AI Blog

Building Production-Ready Streaming LLM Agents: Lessons from the Trenches

Over the past year of building and deploying LLM agents in production, we've learned that success isn't about using the most sophisticated frameworks or the newest models. It's about building systems that are composable, observable, and resilient to failure.

November 17, 2025

How We Are Building Rox’s System of Record: For Contact Data

From Data Overload to
Clear Action Plans.

Identity as an Emergent Property, Not a Row

The System of Context

Provenance-driven conflict resolution

Entity graph construction

Direct Resolution

The Service Layer

Operational Impact

Closing

Join us

Related Articles

A Brief History of Clever Columns

Building Production-Ready Streaming LLM Agents: Lessons from the Trenches