All posts

Building Persistent Memory for an Enterprise AI Agents

Nishar
AI Platform Engineer
Apr 20, 2026
Finance
Hero Actions

Dayos builds Hero, an agentic AI platform for enterprise ERP back-office automation. We deploy inside your existing business applications.

Every Conversation Started from Zero

At Dayos, we build Hero, an AI agent that lives inside your existing ERP system and handles back-office operations. Closing books, running depreciation, and generating financial reports. The stuff finance teams do every month or quarter.

The problem showed up fast: every chat session started cold. A user would say, "Close the books," and the agent would ask, "Which ledger? Which period? Do you want FA depreciation included?" Same questions, every time, even for users who'd done this exact thing a dozen times before.

The agent needed to remember.

Why Not Mem0?

We tried Mem0 first. It's a managed memory layer with a clean API. We integrated it, and it worked, sort of.

Mem0 gives you semantic memory: it extracts facts and preferences from conversations and surfaces them later. "User prefers US Primary Ledger." "User usually closes on the last day of the month." That's useful.

But we hit a wall pretty quickly. Mem0 only supports that one type of memory. For Hero, we needed something else: episodic memory. We needed the agent to remember sequences of what happened, which tools ran, in what order, with which parameters, and whether they succeeded or failed.

When a user asks, "Why did last month's close fail?", you can't answer that from semantic facts alone. You need the episode.

So we dropped Mem0 and built on LangMem.

Two Memories, One Agent

LangMem gave us the abstraction we needed. We implemented a custom LangMemMemoryService that implements Google ADK's MemoryService interface, so ADK automatically injects memories into the agent context before each turn via PreloadMemoryTool. We didn't have to wire any of that ourselves.

The memory system has two namespaces per user, per tenant:

  • (tenant_id, "semantic", user_id)

  • (tenant_id, "episodic", user_id)

Both live in PostgreSQL with the pgvector extension. The store table holds the raw JSON memory blobs. The store_vectors table holds 1536-dimensional embeddings (OpenAI's text-embedding-3-small), indexed with HNSW for cosine-similarity search. Recall is fast even at scale.

Semantic Memory: Learning Who You Are

Semantic memory captures facts, preferences, and persistent context about the user. After each session, an LLM extraction pass runs over the conversation text with instructions like:

"Extract user preferences, facts about the user, and persistent context. Focus on: who the user is, what they prefer, which entities they work with (ledgers, periods, companies, asset books), and any explicit instructions or constraints they have given. Express all facts in plain business language."

The result might be:

  • "User typically works with the US Primary Ledger."

  • "User prefers detailed reports over summaries."

  • "User closes on the last day of the month."

  • "User's company is Acme Corp, fiscal year starts in April."

LangMem handles deduplication automatically. It doesn't just append facts; it runs ADD / UPDATE / DELETE operations against existing memories. If the user switches to a new ledger, the old fact gets updated, not duplicated.

One thing we're explicit about in the extraction prompt: no sensitive data. Passwords, API keys, auth tokens, PINs, credit card numbers. The LLM is instructed to never extract these. It's filtered at the extraction layer, not stored for later filtering.

Episodic Memory: Learning What You Did

Episodic memory is more interesting from an engineering standpoint, and it's what Mem0 couldn't give us.

At the end of each session that involved tool calls, we collapse the entire sequence into a single narrative memory:

"User requested FA period close for US Primary Ledger, period 11-25. run_fa_depreciation(ledger_id=US_PRIMARY, period=11-25) → SUCCESS. run_accounting_transfer(...) → SUCCESS. run_period_close(ledger_id=US_PRIMARY, period=11-25) → FAILURE: open subledgers exist. Overall: Partial success, depreciation complete, period close blocked."

Filling Forms Automatically: The Payoff

Here's where memory turns into tangible time savings. Hero generates structured forms for ERP operations:  fields like ledger, accounting period, category, and source. These are the same fields users fill in every month.

Before memory: A controller running month-end close would spend 3-4 minutes re-entering the same parameters every single time. Ledger selection, period dates, depreciation settings, category mappings:  all manual, all repetitive.

After memory: The first time a user runs FA period close, they fill out the full form. The second time, the form arrives prepopulated with their previous choices. They review, confirm, and go. What used to take 4 minutes of re-entry now takes one confirmation click.

This isn't magic:  it's structured. The precedence rule is explicit:

Explicit user request > Accounting expert inference > Memory defaults

If the user says "close period 12-25", that wins. If the accounting context suggests a specific period, that wins. Memory defaults are the fallback:  a sensible suggestion, not a forced override.

The Write Path: Fire and Forget at Session End

Memory saves happen asynchronously, triggered after the agent finishes streaming its last message. The flow:

  1. Agent stream completes

  2. Hook fires save_memory() as an async task

  3. Session is fetched from ADK's DatabaseSessionService

  4. add_session_to_memory validates preconditions (minimum 2 messages, or at least one tool call)

  5. Messages are extracted and split: text only turns → semantic, collapsed tool sequence → episodic

  6. Both LangMem managers are invoked concurrently with the asyncio event loop. gather

  7. LangMem calls the LLM, extracts facts, runs deduplication, and stores to Postgres

The Read Path: Invisible Recall

Reading memory is handled by ADK's PreloadMemoryTool, which is included in every agent's tool set. Before each agent's turn, ADK calls search_memory with the current conversation as the query.

Our implementation fires two parallel vector searches:

python

semantic_raw, episodic_raw = await asyncio.gather(

    store.asearch((tenant, "semantic", user_id), query=query, limit=12),

    store.asearch((tenant, "episodic", user_id), query=query, limit=8),

)

Results are ranked by updated_at DESC; recency matters more than raw similarity for an ERP agent. A preference from last week beats a preference from six months ago. We cap at 6 semantic and 4 episodic results, sorted in newest-first order, and ADK injects them into the system context.

The agent never explicitly "looks up" memory. It's just there, in context, before every response.

TTL

Memories expire. We set a 30-day idle TTL with refresh-on-read: accessing memory resets its timer. If you haven't talked to Hero in 30 days, your memories expire. This keeps the store clean without requiring manual management. We also built user-facing endpoints (GET /memory, DELETE /memory/{key}) so users can inspect and delete their own memories.

What We Learned

LangMem was the right call. The managed extraction loop, in which the LLM decides what to ADD, UPDATE, or DELETE against existing memories, ensures we never accumulate stale or duplicate facts. Mem0 would have given us an ever-growing pile of facts with no deduplication. LangMem treats memory as a living knowledge base.

Two memories are better than one. Semantic and episodic aren't just categories: they serve different retrieval patterns. Semantic answers "what does this user prefer?" Episodic answers "what happened last time?" Conflating them into a single store would mean that episodic entries pollute fact recall, and vice versa.

What's Next

We're at Memory v1. The next step is letting the agent reason about its memory: not just retrieve facts, but synthesize patterns. "You've run FA period close 8 times this year. Here's what's been different each time." That requires episodic memory retrieval plus inference, not just prefilling forms.

We're building toward an agent that doesn't just remember what happened. It watches what's happening now and flags when something deviates from the pattern. Memory-driven anomaly detection looks like: "Last time this exact workflow ran, it succeeded in 12 minutes. This time it's at 20 and still going: something might be stuck." That kind of contextual awareness is only possible when the agent remembers episodes and can compare them in real time.

The foundation is there.

Share this post