Research Note

The future of AI runs on a context graph

June 30, 2026

Everyone is suddenly talking about context graphs

In mid-2025, Jaya Gupta and Ashu Garg of Foundation Capital published an essay calling context graphs “AI’s trillion-dollar opportunity.” Their argument was sharp and largely right. The last era of enterprise software got rich by becoming the system of record for what happened: Salesforce, Workday, SAP. The next one will be won by capturing why things happened: the decisions, exceptions, and reasoning that today live in Slack threads, deal-desk calls, and people’s heads. They called this “a living record of decision traces stitched across entities and time so precedent becomes searchable.” Make that machine-readable and an AI agent can finally act with judgment, not just retrieve facts.

The next system of record is the reasoning, not the record.
The next system of record is the reasoning, not the record.

The essay landed, and within weeks the term was everywhere. Every other infrastructure startup had a take, a thread, a diagram, a library.

And that is exactly where it got frustrating. A compelling vision is not a working system. When we went looking for something real to build on (a way to actually construct one of these graphs over a live organization’s data), we found mostly noise. Plenty of libraries. Plenty of architecture diagrams. Very little that survived contact with a real company. We took the available approaches, pointed them at an actual client’s data, and watched them produce graphs that were noisy, duplicated, and quietly wrong.

So we built our own. This note is what we learned in the process: three principles that, as far as we can tell, separate a context graph that demos well from one that holds up in front of a client.

First, what we even mean by the term, because half the noise comes from people meaning different things by it.

What a context graph actually is

Strip away the hype and a context graph is simple to describe. It’s a map of the durable things in your world (the people, companies, projects, and products) and how they connect. It’s kept current from the systems you already use. And it’s something you can ask questions of: what’s open on this project, across every tool? Who owns it? Where does it stand?

The Foundation Capital framing emphasizes the glamorous end of this: the why, the decision traces, the reasoning behind a call. That end is real, and it’s the prize. But you cannot capture why a decision was made on a project until you can reliably capture which project it was, who was involved, and what state it was in, across a dozen tools that each name things differently. That unglamorous foundation is the part nobody writes essays about, and it is precisely the part that breaks.

Where context actually lives

Context isn’t one thing in one place. In any organization it’s spread across a spectrum, running from the most unstructured to the most structured. Knowing the shape of that spectrum is half the problem.

At the messy end:

Then you cross into the structured world, where a surprising amount of the load-bearing context actually lives:

Here’s the part that’s easy to miss. As you move from the messy end to the structured end, the structure goes up, and a huge share of what matters is already sitting on the structured side, captured cleanly, by the tools people use every day. It isn’t lost. It’s just scattered.

Context lives in six places, from messiest to most structured, and today they're islands.
Context lives in six places, from messiest to most structured, and today they're islands.

And underneath all six there’s a single, elegant taxonomy waiting to be assembled. A project sits at the center. People work on it. It contributes to a product. It’s related to companies or clients. And it moves certain metrics. That shape is the same whether you read it off a Jira board, a CRM record, a meeting transcript, or a quarterly dashboard. The organization already speaks this language; it just speaks it in six different rooms at once.

A caveat that makes this exact rather than merely tidy: the entity at the center isn’t always a project. A law firm organizes around matters, a hospital around patients or cases, an agency around campaigns. What generalizes is the pattern (one core entity, a small set of durable relationships around it, and the fact that your tools already capture all of it), not the specific shape. Find your center; the rest of the taxonomy hangs off it the same way.

Underneath all six sources, the same simple shape. The org already speaks this language.
Underneath all six sources, the same simple shape. The org already speaks this language.

The trouble is that no tool brings the six rooms together. Each is an island. The project tool doesn’t know what was promised on the call; the transcript doesn’t know the ticket’s status; the CRM knows neither. And while a language model can read any one of these rooms, the volume you would have to sift through to assemble a single concrete answer across all six (“here is where this project really stands, who’s on it, and what it’s moving”) is enormous.

That assembly is the real job of a context graph. Not “can a model read text.” It can. The job is to unify the six islands along the taxonomy they already share, without drowning in the volume. All of this structure already exists; the tragedy is that nobody brings it together. And the most common attempt to do so is exactly what goes wrong.

It breaks because of how almost everyone tries to build it. Here is the obvious way: point a large language model at everything (every email, every meeting, every document) and ask it to read all of it, find the entities, draw the relationships, and keep the whole thing current.

It sounds right. It is the approach most of the field has taken. And in practice it produces a graph that is noisy, brittle, and quietly wrong, for reasons that have nothing to do with how good the model is. To see why, picture the kind of worker this approach actually builds.

The intern who re-derives the company every morning

Imagine you hired a brilliant new analyst, and on their first day you gave them no org chart, no project list, no access to your systems: only the raw firehose. Every email the company has ever sent. Every meeting recording. Every chat log.

To be useful, this analyst has to reconstruct the organization from scratch. They read everything and infer that there’s a project called “Lighthouse.” They keep reading and decide there’s also a “pricing review,” not realizing it’s the same project under a different name. They see “WhatsApp integration” mentioned forty times and create forty slightly different things. By lunch they have a sprawling, duplicated, half-hallucinated picture of the company.

Re-deriving the whole company from raw text, every day, and re-making the same mistakes.
Re-deriving the whole company from raw text, every day, and re-making the same mistakes.

Now make it worse: ask them to do this again every morning, because things change. Each day they re-read the firehose and rebuild the picture, and each day the duplicates and mistakes come back in new shapes.

That is, more or less, what an extract-everything context graph does. The model is brilliant. The method is the problem. We are asking it to re-derive (from raw conversation) knowledge the organization already had, cleanly, in a structured form, the whole time.

The better worker is not a faster reader. It’s a seasoned chief of staff who already knows the org chart cold, pulls structure straight from the systems that hold it, and spends their attention only on the genuinely ambiguous parts, and on what changed.

Everything below is how we build that chief of staff instead of that intern. It comes down to three principles, and the three are really one idea:

Respect the structure that already exists, instead of asking a general-purpose model to invent it back.

That structure lives in three places: your tools, the nature of time, and your people. Each principle reuses the structure in one of them.

The prevailing approach, stated fairly

Before the principles, a fair word on the state of the art, because we are not the first to build a context graph and we don’t want to pretend otherwise.

The leading open approach (Graphiti, from the team behind Zep, is the best-known example and a genuinely good piece of work) treats the graph as something a language model extracts and maintains over time. The model reads each new piece of text, pulls out entities and relationships, and writes them into the graph. To handle change, it stamps each relationship with the dates it was true between, so the graph carries its own history.

This is a reasonable design, and for a stream of unstructured conversation it is close to the best you can do. Our argument is not that it is wrong. Our argument is that an organization is not a stream of unstructured conversation. It already has structure (in three forms), and a design that leans on that structure beats a design that re-derives it. We learned this the hard way, on a real engagement.

Same inputs. The difference is how much you ask the model to invent.
Same inputs. The difference is how much you ask the model to invent.

What a real engagement taught us

We first ran into all three principles building a context graph for a market-intelligence firm, the kind of organization you’d want one for: many concurrent projects, a large analyst team, knowledge spread across tools and documents and calls. We set out to build the model the obvious way, and we hit a wall almost immediately.

The wall was this. A huge share of what we needed the graph to “know” was already known: sitting in their project tools, cleanly structured, perfectly linked, obvious to any analyst on the team. Which project a task belonged to. Who owned it. What state it was in. None of this needed to be inferred. It was right there.

And yet the extract-everything method ignored all of it and tried to rebuild it from the raw text, at great cost in compute, and worse, at great cost in noise. The model would re-derive a project that already existed and give it a slightly new name. It would read a closed task as if it were open. It would manufacture three versions of one thing because three people described it three ways.

We were paying a model to laboriously and unreliably reconstruct facts the organization had already written down. That is the moment the three principles became obvious.

Principle 1: Don’t re-derive what your tools already know

Most of the load-bearing knowledge in an organization is already structured, already linked, and already correct, inside the tools people use every day. A task in Jira already knows its project, its owner, and its status. A CRM already knows which company a contact belongs to. A document already has a title and an author.

To a human this is so obvious it’s invisible. The mistake of the extract-everything approach is to throw all that structure away and ask a language model to recover it from prose. The recovery is expensive, and it is lossy: the model both misses real things and invents false ones.

Our first principle is to take the structure as given. When a connector syncs a project tool, we read the project, the owner, and the status as the facts they already are: directly, deterministically, no model required. The language model is never asked to re-derive what a system already asserts. It is reserved for the genuinely ambiguous work: the action item buried in a meeting, the decision mentioned only in passing, the thing no tool wrote down.

The cost of getting this wrong is not abstract. In an early version of our own graph, before we enforced this principle, the model was left to mint entities freely from meeting chatter. We ended up with 203 “products” in a graph that had 3 real ones. The other two hundred were noise: fragments of conversation the model had promoted to first-class facts. They didn’t just clutter the picture; they poisoned search, because now a query for a real product surfaced a dozen hallucinated cousins.

What the model proposes vs. what earns a place.
What the model proposes vs. what earns a place.

The fix was not a better model. It was a rule: a thing earns a place in the graph because a structured source vouched for it, or because several independent sources corroborate it, not because a model mentioned it once. Liberal extraction is fine as a way to notice candidates. It is a terrible way to decide what’s real.

A mention is not a fact. A thing earns its place; it isn't granted one.
A mention is not a fact. A thing earns its place; it isn't granted one.

The principle, in one line: let the model find what’s missing, never re-derive what’s already there.

Principle 2: Freeze what doesn’t change; let time hang off the side

This is the principle we’re most sure about, and it’s where we differ most sharply from the standard approach.

Time is the hardest thing to get right in any graph. The relationships in an organization are not static, but they are not uniformly dynamic either, and that distinction is everything.

Look closely and you’ll see two very different kinds of fact:

The prevailing approach mixes these together. It puts time inside the graph: stamping every single relationship with the window it was valid for, so the graph itself becomes a tangle of dated, expiring edges. It is elegant in theory and brittle in practice. Every fact, even the ones that never change, now carries temporal bookkeeping. The graph is never still. Querying it means reasoning about time at every hop.

We do the opposite. The graph itself holds nothing temporal.

We build a stable, timeless core (we call it the spine) out of the facts that don’t change: the people, companies, projects, products, and the durable relationships between them. The spine is frozen. It is the org chart that holds still.

Then everything that does change in time we lift out of the graph entirely and hang off the side, as timelines attached to the spine. A project’s status history is a timeline on the project. A person’s role changes are a timeline on the person. The spine stays still; the timelines move.

Don't bake time into the graph. Freeze the spine; hang the timelines off the side.
Don't bake time into the graph. Freeze the spine; hang the timelines off the side.

The payoff is large and immediate. The part of the system you search and reason over (the spine) is simple, stable, and small. The part that churns (the timelines) is cleanly separated, easy to append to, and never destabilizes the structure. You ask “what is this project, and what does it relate to” against something that holds still, and “where does it stand now” against a timeline that’s just a list. The two questions stop fighting each other.

The honest edge case (and the first thing a skeptic asks) is: what if something you filed as identity turns out to change? People do switch companies; a project can be reassigned to a different product. The boundary isn’t sacred; it’s a demotion. The moment a “fixed” fact starts moving, it stops being a spine relationship and becomes a timeline: the company link becomes an employment history, the product link becomes a reassignment record. Identity is the default for things that don’t move, not a promise that they never will. And getting the call wrong is cheap to fix: you move one fact from the body to the side, which is only possible because we kept the two separate in the first place.

The principle, in one line: identity in the graph, time on the timeline.

Principle 3: People resolve what words cannot

The third wall is the duplicate problem, and it has a subtle cause.

The same thing gets called different names in different rooms. One team says “the pricing review,” another says “Project Lighthouse,” a third says “the Q3 thing.” A feature shows up under three descriptions across three meetings. For the graph to be any good, it has to know these are one thing, and this turns out to be genuinely hard.

The instinct is to solve it with similarity: convert the names to numbers (embeddings) and merge the ones that are close. But general-purpose embeddings are trained on the whole internet, not on your organization. They have no idea that, in your world, “Lighthouse” and “the pricing review” are the same initiative. To the model, those two phrases aren’t similar at all. So similarity alone either misses real duplicates or (if you turn it up) collapses things that should stay separate.

The words aren't similar. The people are identical.
The words aren't similar. The people are identical.

There is a far stronger signal sitting right there, and it isn’t in the words. It’s in the people.

If “Project Lighthouse” and “the pricing review” keep being discussed by the same group of people, from the same organizations, they are almost certainly the same thing. The room is the fingerprint. A project is identified less by what it’s called and more by who’s around it. Two names with the same cast of characters resolve to one; the same name with two completely different casts probably splits into two.

So when our system has to decide whether two things are the same, it doesn’t just compare the words. It looks at who’s involved (the surrounding people and organizations) and uses that as the deciding signal. This is something general embeddings simply cannot see, because they were never trained on your org’s social graph. It is the part of the structure that lives in your people, and reusing it is how we dedupe what word-similarity can’t.

Same cast, same project, even when the words don't match.
Same cast, same project, even when the words don't match.

The principle, in one line: the people in the room disambiguate the thing.

What this gives you

Put the three together and you get a context graph with properties the extract-everything approach struggles to reach.

It’s clean. Because real structure is taken as given and the model only fills genuine gaps, the graph isn’t drowning in hallucinated near-duplicates. The 203 imaginary products never get in.

It’s stable and fast to query. Because identity and time are separated, the part you search holds still. You’re not reasoning about dated edges at every hop.

It resolves correctly. Because deduplication leans on who’s involved, not just what things are called, the same project under three names becomes one node, and three genuinely different projects don’t get crushed together.

It’s auditable. Every fact in the graph can point back to where it came from (the email, the document, the meeting) with a sense of how confident we are and why. This matters enormously for the kind of reader who has to defend a conclusion, not just reach one.

It respects who can see what. A real organization has boundaries. The graph honors them: a person only ever sees the slice they’re entitled to, and context drawn from one corner never leaks into another.

And because the structure is clean and the timelines are live, the graph can do more than remember: it can close the loop. A commitment made in a meeting can be matched to its ticket and followed all the way to done. The graph stops being a filing cabinet you query and becomes something closer to a chief of staff that tracks the work.

A small, concrete illustration from our own system: ask it about our company and it returns a single node that has quietly unified the company across five different names, drawn from email, the issue tracker, the meeting tool, and the project tool: all resolved to one identity, each mention traceable to its source. No one told it those five names were the same entity. The structure did.

A real node from our own graph: one identity, unified from five names across the connected stack, every mention traceable.
A real node from our own graph: one identity, unified from five names across the connected stack, every mention traceable.

Why we build it this way

None of these three principles is exotic. Each is, in hindsight, almost obvious, which is exactly the point. They’re obvious to humans, because humans never re-derive their own organization from scratch every morning, never confuse a project’s identity with its weekly status, and never decide what two things are by their names alone while ignoring who’s in the room.

The mistake the field made was handing all three jobs to a general-purpose model and asking it to do them from raw text. The model is extraordinary, but it was being used to reconstruct knowledge the organization already had: in its tools, in the difference between identity and time, and in its people.

So our whole design philosophy reduces to one sentence:

Don’t make the model re-derive your organization. Reuse the structure that’s already there, and spend the model only on what’s genuinely missing.

Three principles, one idea.
Three principles, one idea.

That is what makes a context graph that’s clean enough to trust, stable enough to query, and honest enough to put in front of a client.


Canvas Labs builds Sketch: an open-core context graph for teams. The engine is open-source and self-hostable: audit it, run it air-gapped. A commercial platform layer adds managed hosting, enterprise connectors, governance, and support on top. Talk to us →