TalkToBodhi: Building a True Intelligence Partner for Researchers

Introduction: Why Scientists Need More Than Generic AI

We have seen AI copilots transform how software engineers write code. But what about scientists? While building bodh scientific™, we envisioned an intelligence partner for scientists working across scientific domains, whether material development, formulation science, process engineering, or quality control & testing.

A polymer scientist wants to discover the relationship between crystallinity and tensile strength in their polymer blends, a bioscientist wants a summary of the pathways of enzyme inhibition for specific protein kinases, and a materials scientist may wonder: how do their ceramic composites for thermal barriers compare with the porosity levels in their pilot-scale samples?

Generic AI tools can answer the first half of these questions, but they cannot connect answers to proprietary experimental data, trial results, and project objectives that drive innovation in R&D, all while maintaining security and privacy.

That is the gap we set out to solve with our context-aware AI Agent: ‘TalkToBodhi’.

Technical Grounding: Why Context and Intelligence Matter

In today’s era, context is everything. TalkToBodhi thrives in that context.

In science, the meaning of a query is inseparable from its environment: the project goals, the experiment history, and the terminology of the field. On the bodh scientific™, every experiment, trial, project, and uploaded file is curated into an AI-ready state the moment it enters the system. Recency is guaranteed. Metadata ensures data is organized in ways that follow scientific logic.

But this agent goes beyond. We wanted an intelligence that feels subconscious: an omni-present layer in the scientist’s workflow. Not a tool you call separately, but a presence that understands the project’s objectives, adapts to the scientist’s working style, and evolves with the experimental repository.

We call this an Intelligence Augmented scientific workflow, one that empowers R&D teams and individuals to drive both efficiency and innovation.

This intelligence is not isolated. It binds with custom machine learning, advanced analytics, optimization algorithms, and automated Design of Experiments (DoE). This means the AI doesn’t just answer, it actively supports decision-making in the research cycle.

This grounding enabled us to make foundational design choices to build a long-term, scalable architecture. Below, we outline the pillars behind TalkToBodhi.

Your research partner that thinks in the background.

"This flow shows how a scientist’s query journeys through bodh scientific™. From understanding intent and pulling in project memories, to smartly blending semantic and full-text search, and weaving in domain agents for analytics, optimization, and formulations- every step is designed to turn raw questions into answers with evidence. And while you wait, it can also run background AI tasks and workflows to push the science forward."

Pillar 1: A Universal Knowledge Backbone

Early-on, we made a pivotal decision: instead of splitting data across specialized stores, we standardized the entire platform on PostgreSQL, given its proven relational core and extensibility.

Hybrid storage: JSONB lets us represent flexible metadata for trial datasets and custom client databases.
Ranked full-text search: using tsvector for precise factual lookups (e.g., “Find tensile strength > 40 MPa in epoxy resin trials”).
Semantic retrieval: PostgreSQL’s pgvector extension powers embedding search across research papers, patents, lab experiment notes, and bodh scientific’s AI-aware spreadsheets.
Advanced datatypes: arrays, ranges, and geometries useful in scientific and analytics workloads.

With a coordinated search plan instead of a single-shot retrieval, PostgreSQL allows us to blend semantic, factual, and structured queries into one system.

Pillar 2: Powerful fail-safe Agentic Orchestration

We built TalkToBodhi on LangGraph SDK, leveraging its infrastructure for long-running, stateful workflows. This enabled us to run a multi-agent system with context management and long-term memory.

Interrupts and branching: agents can pause, delegate, and resume.
Dynamic tool routing: queries are classified and mapped to retrievals, inference pipelines, or sub-agents.
Shared state: context persists - project goals, experiment history, user style.
Time travel and restoration: enabling restoration of state.

It’s fast by design, graceful under load. It lives where the work lives.

Pillar 3: UX Centered Around Intuitiveness and Trust

In R&D, trust and usability are everything. We designed the UX around three principles:

Recency: all context updates in real time. As new data is uploaded, the status of the intelligence being acquired is visible.
Traceability: every answer cites its sources (trials, documentation, external search, domain repositories).
Native integration: voice-enabled, conversational, embedded directly in the project workspace.

It captures the essence of your work—your hypotheses and not the distractions.

Pillar 4: Document AI for Hybrid Chunking and Rich Context

Data in R&D often comes from research papers, patents, product sheets, and project reports. These are not simple text files; they contain tables, figures, formulae, and charts. If an AI agent cannot interpret these, it will miss critical context.

To address this, we integrated Docling (IBM’s open-source Document AI) into our ingestion pipeline:

Rich extraction: parses not just text, but also tables, formulae, charts, and figures.
Context integration: all extracted content is embedded (pgvector) and stored with metadata (JSONB).
Hybrid chunking: segments documents into semantically meaningful and token-efficient chunks.
BYOM: support pipelines with custom or fine-tuned local models.

This ensures that uploaded content is AI-ready the moment it enters the platform, regardless of format.

The Query Understanding Breakthrough

We split thinking from speaking, first the system understands, then it speaks.

One of the biggest learnings in building TalkToBodhi was that understanding a query is a complex task in itself. Initially, we merged query interpretation and answer generation into a single prompt. It worked sometimes, but it also exposed two problems:

Cognitive overload: too much context for the model to process at once.
Lack of transparency: unclear what the model understood before producing an answer.

The breakthrough came when we split TalkToBodhi into two distinct components:

Stage 1: Query Understanding - the agent analyzes the query, session history, past answers, and project memories. It identifies the type, complexity, keywords, embedded sub-questions, and tools/agents to invoke. It does not attempt to answer, rather it creates a structured understanding record.

Stage 2: Answer Curation - the agent retrieves from the right sources, synthesizes a response, while providing citations.

Beyond TalkToBodhi: The Scientific AI Ecosystem

TalkToBodhi is one part of a broader agentic and custom AI ecosystem on the bodh scientific™ platform:

Formulation Agent: suggests new candidate formulations in polymers, coatings, drug excipients and more.
Optimization Agents: multi-objective optimization (e.g., cost vs strength in composites).
Automated DoE: proposes minimal, information-rich experiment sets powered by state-of-the-art optimizers.
Custom ML Pipelines: retrain predictive models as repositories grow and incorporate lab trial data and feedback.
Analytical + Deep Learning Models: integrated into the orchestration layer.

Conclusion: An Omni-Present Intelligence for R&D

TalkToBodhi is a subconscious, omni-present intelligence layer inside the scientist’s workflow. It is aligned with project objectives, aware of experimental history, and fluent in the language of science.

By binding together custom ML, advanced analytics, optimization, and smart DoE, it delivers an Intelligence Augmented scientific workflow: empowering individuals and teams to accelerate discovery and create IP that moves industries forward.

This is the future of AI in R&D: not a bolt-on tool, but a native intelligence that grows with science itself.