Total Internal Reflection: Episode 1

Transcript

Li: This is the craziest question I got asked by the agent. This question gives me more goosebumps than the GPT-4 release.

Li: So Edmund showed me something crazy today, and I want to show this to the world. I don't think I've felt any AI moment like this after the GPT-4 release. That was two years ago. And what the thing you built is literally giving the same shock effect I had with the GPT-4 release. So I want as many people as possible to know this thing exists. I feel like this is going to be huge for all the founders out there who are constantly under the water and being overwhelmed by all the things moving around.

Li: I think we actually built the first chief of staff plus executive assistant that's 24 hours online, always on the clock. And this is a Claude Code subscription.

See how Superposition works.

The agent we built for founders is even better than the ones we build for ourselves.

Edmund: Let me take you through it. I do think this thing is pretty cool. So every morning on Slack, my agent sends me my daily brief, and this is all the context of what I need to do as a founder each day. This is some of the stuff that I think is particularly interesting. It's looking through all newsletters I subscribe to and then making actual high taste synthesis between this context and what we do as a company.

Li: How many newsletters do you subscribe to?

Edmund: Right now I'd say it's gone up about 10x since I started building it, because interestingly, now I don't have to read all of them. What I care about is the cross pollination of context. What's something that's interesting in one newsletter that overlaps with something in another that then is related to something we are actually doing that we care about. So far, so standard, right? You would expect an agent to be able to read these letters and synthesize them.

Li: But being able to combine the newsletters with the company's context and give you correlated suggestions. I think that's something — if you're just using Claude or Superhuman or any email assistant thing without any context about yourself or about the company, then it's basically just summarizing. But your agent is already able to take the events happening in the world and all the newsletters and actually pick out the relevant signals and flag them to you. I think that's why you're able to subscribe to 10 times more newsletters — because you don't need to manually curate through them to find the signals. The agent will find the signals and flag them to you.

Edmund: It's the scaffolding. It's the way that the agent stores context and the way it reasons over it that allows it to do this well. And the same is true of competitor intel. This is super cool. It's gone to look at open jobs at a competitor on Glassdoor to figure out and infer what their strategy might be. I didn't tell it to do that. We have a database of competitors in Notion and it reasons over them and goes and does web search over them, but it has very broad latitude to do stuff itself.

Edmund: And so I think the key themes here of why this works really well is we're using a store of context — Notion — that is both human legible and agent legible. This isn't all just jammed into markdown files. And the agent has really wide parameters for what it's allowed to do.

Edmund: Same for bug reports. I haven't written any of these. The agent is reasoning through my Granola notes from my onboardings and calls with customers and just generating them. And these are actually being actioned — these are already being put into a Slack channel and your agent is drafting them and putting them in the backlog.

Li: Yeah. This one is especially interesting because now your agent is getting all of the bug reports from the client onboarding calls or conversations with clients, automatically generating the bug reports, posting to the bug channel on Slack, and my agent now picks them up, runs them through my pipeline, creates an actual bug ticket in Notion in the bug tracker database. And then the other agent can just pick the bugs up and create a PR to fix them. So it's from the customer conversation all the way to bug fix, everything's automated. I only need to do the code reviews.

Edmund: The interesting thing here is there's parts of the pipeline that still have human legibility. We are not going straight from Granola into PR. We are putting it in a Slack channel where any human in the company can weigh in and start talking and debating in thread about whether that actually is a bug. And similarly, I get this in my daily brief. So it's an automated pipeline with human visibility into certain gates in the pipeline, but we're not humans in the loop. It can do it itself, but we can weigh in as and when necessary.

Li: What's your setup?

Edmund: The setup here is that it goes and gets context from Granola, which has my calls, from emails, now from Slack as well. It needs structure of what to do with that context. So the V1 that we used to have was just markdown. But there's no information hierarchy. It's too flat. I want an assistant that can be proactive and high taste. And so I took the OpenClaude repo and just pasted it into Claude. And I asked, how does OpenClaude handle state and memory? And what do you think's deficient about it?

Edmund: And what was apparent to me was a two tier system that's just about time decay isn't enough, because there's stuff that's super important but infrequent that actually needs some top down information hierarchy. And that's what I've actually created. But you can't just have that in markdown. You need it in a store of record that is also human legible, that has some structure — basically a database. And so something like Obsidian or Notion is a really, really good place to create context. So the scaffolding for this agent is raw context in from an email or a call transcript, it being reasoned over and then appended into ongoing context in Notion.

Edmund: And so I have different databases for different levels of hierarchy of information, the highest of which is themes. So these are ongoing themes that we talk about a lot. I don't manually create them. I didn't go in and hard code themes in Notion. I let the agent derive them from the context.

Li: Can you show us an example? I think that's super cool because a lot of people, including myself, when we're building the agent — the biggest challenge is how do I organize all the information? I connect Claude to all of my stuff: Gmail, calendar, Notion, Slack. But then the agent gets confused because it doesn't know where to pull the relevant context. So I have to constantly manually feed it the context and then manually check the markdown files it generates. It's very tedious, and as soon as you stop giving the context to your agent, it starts losing touch with reality and starts hallucinating and stops being valuable.

Li: What you did here, the most impressive thing is it's an automated system that constantly gathers the context and puts it into the right hierarchies and keeps learning on its own without much human input. Just live your day, just doing your things, sending emails, attending calls, and then all of the core decisions, the themes are distilled and put into the right place, and the agent will be able to go back and reference them. I'd literally pay for this.

Edmund: It's because I was drowning. There's too much context coming in constantly and we need a log of what we care about and a log of what's going on.

Li: And I think that's why I want to record this session because I think all the founders out there are having the same problem. So we're just sharing this system with the world. And Edmund here is the expert.

Edmund: And also, I'm obsessed with talking to founders about stuff like this. We want people to come chat to us and figure out how we're going to actually implement this for folks. If you're curious — these themes are emergent, again, I didn't go hard code these.

Edmund: Here we go. This is great. So this is inside baseball about us. We're thinking about headhunting versus inbound applicants. We build an AI recruiter. Obviously this is something we think about a lot. When a theme is generated, the agent will create a callout of its synthesis, and then every time when it takes new context and sees something that relates to a theme, it appends that new bit of context and then reruns the synthesis given that new context. And so this callout is a constantly, dynamically updating piece of context about what's actually going on. All of these then end up generating more and more ideas and our latest thoughts and insights on a given theme. And the agent actually has full latitude to enrich this database in both axes.

Edmund: It can create new themes, but it can also create new properties. And this is actually, I think, one of the really valuable things — I gave it the ability to do this, which is every single Notion database I created for it, I just created blank. All it has is created date and it has the ability to, if it needs to create some taxonomy, create a status. All of it, it just does on the fly. And then when it creates a new property, it runs through the other records in that database and appends that property to it. And so it's a constantly evolving set of properties that give it its own information, its own desire.

Li: And I think that's a core learning. Give the agents maximum freedom to explore and do things and reflect. That's the key. The more guardrails you put on the agent, the less useful it is because you're basically putting constraints on it instead of letting it explore and come up with its own structure. Which is extremely powerful.

Edmund: And so themes — then one layer down from themes is decisions. So these aren't specific things that have been decided. Crucially, I actually built this initially because I wanted the agent to have a record of its own decisions. Because I realized if everything is just in a flat project in markdown, there's no versioning. So it doesn't know what's been decided and what's been overwritten.

Edmund: But then I very quickly realized that it needed to capture our decisions as well. And so you can see here, again, this was flat. It's created all these taxonomies of relationships between decisions and themes, sources. What category it was. And we have to make some meaty decisions. As a company, we're doing recruitment and so in this case — customers ask us, can your agent only search for female engineers? This is something that requires debate and taste and actual EQ to think about, and now we actually have a record of it, but we would never sit down and document this. We would just make an on the fly decision, maybe have a thread about it on Slack, and then it would be ephemeral and it would just be gone.

Edmund: But now, because the agent's reasoning over the calls we have, we have Granola on in every single call that we do internally. All of this creates this strong time series data of our opinions and views on something at a snapshot in time, which allows the evolution of ideas and a record of it that creates maximal context, both for humans and for agents. I think that's one of the core design principles here. This is in Notion because it's human legible. Any new person who joins the company or new agent who joins the company has this to very quickly ingest and get up to speed.

Li: Right. So this is the source of truth. This is actually our agent to human protocol — a mental model for the company layer to share context. And I think this decision table is extremely relevant to all of our audience. As a startup founder, especially at the early stages, we're making tons of decisions every single day. Every single moment. Give the agent the full authority to record all decisions. Whether it's from a meeting recording or from email or from this kind of conversation — that will give you basically a git history of the company.

Edmund: That's ultimately what we want is context. Because as a startup, your ability to make decisions quickly is a function of how easy it is to learn from them. One of our values is decision frequency is better than decision quality. And being able to quickly decide and move on and know that the context is just going to get captured and you can revisit it. Almost every single decision you make as a startup is reversible. And so the ability to reason over past decisions — the agent itself produces this reasoning in the conversations between us and the agents, and then any actions it takes are stored as decisions in the database.

Li: Defining the clear architecture or the hierarchy of the information and teaching the agent how to funnel different pieces of context into different layers is the key. I think that's something not a lot of people have actually figured out yet. I don't think we have fully figured it out either, but I think we are getting very close.

Edmund: There's two interesting axes here, which is give the agent complete latitude for actions, but apply rigor to the information hierarchy and generate the categories of information yourself. So properties on individual decisions — agent does what the hell it likes — but the hierarchy of themes, decisions, and then these roll up into ideas and then actual tickets and stuff that gets shipped. That's us trying to figure out a stateful way of storing context to create scaffolding for the agent to make high taste decisions.

Edmund: This can literally just generate these decisions in conversations that only we would ever be in — that only me and my agent would ever be in. That's then the next tier of this, which we've just started playing around with, which is — okay, the agent has the firehose of context. Yes, it goes through my email, Slack, et cetera. That's the base level. Then it can append that information in a highly contextual way so it can recall from it as it makes decisions. That's how it can create that good daily brief, because it has the themes and the decisions and the ideas.

Edmund: How can we then use it to generate new ideas? And I created this skill called Reflect that takes all of the context that's been appended recently, reasons over it in light of all the context it has about me and the company, and then asks me just a single clarifying question that it thinks is the most important thing I should be thinking about that day.

Edmund: And we ran this for real yesterday. First time I showed this to you.

Li: I mean that was the goosebump moment. That was the AGI moment.

Edmund: Should we show them what it asked?

Li: Yeah, yeah, yeah. It's crazy.

Edmund: This was the output just in the command line of yesterday when we tried this. We left it running during our standup, and by the end of the standup the agent asked us: what percentage of Li's current engineering time is actually building the taste layer versus the plumbing around it?

Li: Such a good question. I was still reflecting on this question today. It's such a high taste question that — I don't know anyone in the company at the current stage who would have the context and experience to ask this question. This is the question that probably those big corporations are paying a hundred K or whatever amount of consulting fees for.

Edmund: But also no consultant — no one from McKinsey is going to have the context to ask that question exactly. The people who have the context don't have the time. I'm not going to sit there in standup and start stroking my imaginary beard going, “but what percentage is spent on the taste layer versus the plumbing?” Right. It's going to be — ship, let's ship this for this onboarding now in 10 minutes.

Li: There's a million things every day, a million fires. I mean, this is the thing — it's something you would come up with during the Sunday nights, locking yourself into a room and forcing yourself to meditate for a couple minutes and clear your mind. And then this is something you'd probably come up with. But now the agent can do it while we're handling the day-to-day operations in the standup.

Edmund: No meditation needed, no ayahuasca, don't need to go to Burning Man. Just run a command in the terminal.

Li: Oh dude, this is going to replace Burning Man. But Burning Man is dead.

Edmund: And it is the context. It's the scaffolding. It can now — again, this is an emergent property of thinking hard about information hierarchy and entrusting the agent to figure out the rest.

Li: So whoever is listening to this — go explore it. This is the craziest question I got asked by the agent. This question gives me more goosebumps than the GPT-4 release. “Oh, I can talk to an AI chatbot now” versus “oh, AI can run the company now.”

Edmund: And so if we were going to give people three tactical bits of advice to go away and try and set something like this up themselves, what would that be?

Li: Two things. One — give the agent the freedom to create and explore. Don't put too many constraints on it. And two — define your own information hierarchy.

Edmund: And I would say — you don't have to use Notion. You can use Obsidian, use whatever you want. You could use Supabase, really. But ultimately what you need is something that has a table-like structure with the ability to create arbitrary properties on it.

Li: Let your agent create their own skills.

Edmund: Oh, a hundred percent. Yeah. It's better at writing skills than we are. Don't design skills — workshop the job to be done of what the skill will solve. And then let it suggest elegant ways to wrap combinations of the skills up into commands.

Li: This also may lead me to think — how should we design and evolve our own agent? I know a lot of AI agent companies out there are building the product with a lot of guardrails, with a lot of test suites to cover the edge cases to make sure the agent actually behaves. But this gives me another perspective. While still keeping all the guardrails in place so that the agent doesn't go wrong, how can we give the agent more freedom to create, to self-evolve? It's just fantastic. It's so good.