Sequential Tool Calling and Response by Tomobobo710 · Pull Request #73 · platinum-hill/cobolt

Tomobobo710 · 2025-06-08T07:01:42Z

Description

The AI can now call tools and use the results before a response is completed.

Re-do the way the AI processes tool calling, thinking, and chat rather than forcing a rigid structure like think->tool->chat.
In addition, refactor the ChatInterface.tsx into smaller chunks, breaking it down into these MessageBlock components:

ChatInput.tsx
MessageBlock.tsx
TextBlock.tsx
ThinkingBlock.tsx
ToolCallBlock.tsx

Which makes it less daunting to individually manage each component of the chat rendering.

Changes Made:

Responses now are returned in "blocks".
Blocks can contain thinking, tool calls, or chat.
AI can call multiple tools in line during response.
MCP tool call context is sequentially fed back to the AI to inform the AI of the tool result during response.
Tool calls are no longer bunched together and now have individual dropdowns per-tool.
Tool and Reasoning dropdowns appear "in position" within the response, not fixed at the top of the response.
AI "chat" responses (not tool, not thinking) will similarly be rendered in sequential position without dropdown.
Tool dropdowns have tool request and response streamed.
All dropdowns are collapsed by default, but can be expanded.
A badge for the dropdown will show its active status ("THINKING", "EXECUTING", "ERROR").
When complete, dropdown headers will show a badge which displays the total time taken by the execution for that block.
Any of the react rendering tags are scrubbed to ensure the AI context is not contaminated.

Additionally:
The cancel generation button has been moved outside of the chat input box.
A clickable button to send the chat message to the AI has been added in it's place.
Allows the use of right click contextual tools like copy, paste, etc. in debug mode.

But Why?

This lets the AI use "in-response learning" by essentially streaming in relevant external info during their response.
A simple example would be AI tool use -> fail -> AI learns from the tool response -> 2nd tool use in a single response.
This is largely model dependent, but in testing, several models were able to use this flow in an organic way.
Without this opportunity, tool use feels heavily restricted and much less useful when compared with a UI that supports this.
Fixes #22

Thinking Ahead

I do plan on looking into options for actual tool request streaming, but this PR is focused on establishing building blocks not only for individual tool request streaming, but the ability to break even further away from any structure controlled even by the AI itself. Ultimately, https://ollama.com/blog/streaming-tool is only integrated tool result context during the response, and contrary to the name of the feature, the tool call is not streamed. This PR refactors a lot of the big bad functions into much smaller bites, and lets Cobolt use the "tool streaming" features of Ollama.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Performance improvement
Code refactoring (no functional changes)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce.

Screenshots

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have tested the app on Mac OS (If not, leave unchecked so we can test before merging)
I have tested the app on Windows (If not, leave unchecked so we can test before merging)
I have tested the app on Linux (If not, leave unchecked so we can test before merging)

gauravagerwala · 2025-06-08T07:43:46Z

+
+        // Get ollama client and constants
+        const ollama = getOllamaClient();
+        const defaultTemperature = 1.0;


these variables should be in config

What config? Sorry..

gauravagerwala · 2025-06-08T07:44:50Z

+        let response;
+        try {
+          response = await ollama.chat({
+            model: MODELS.CHAT_MODEL, // Use chat model, not tools model


we are using a chat model, and a system prompt that does not use tools, even though we are passing tools to Ollama

This is a concept I'm pretty sure I ran over entirely, so I'd love to hear you guy's overall concepts and goals in this area so we can get on the same page about the overall operation and scope of Cobolt as a whole.

Like, from my original perspective like, it felt much too early to try to implement high level concepts like this, and of course this is from the perspective of redoing the entire rendering and generation pipelines just to implement visible tool calling and it's use during response like, this concept is too heady for where the codebase was at the start of this journey. I guess I'm saying that I didn't totally mow this stuff down on purpose, and left this point of entry for future improvement. Because I know to you guys it has importance, but in the scope of this PR it is on life support.

I share @gauravagerwala's concern here. It is possible that the the chat model we are using does not support tool calling. Since the models are executed locally, it would be unreasonable to expect a single model to perform optimally across all tasks. Our approach with Cobolt has been to utilize smaller, specialized models that excel at specific functions, such as tool calling and reasoning.

I think I understand now. Your plan here is for internal tools to use baby models to complete tasks within Cobolt. Is that correct? This wasn't clicking for me until just now...

gauravagerwala · 2025-06-08T07:54:56Z

+
+        // If no tool calls, conversation is complete
+        if (detectedToolCalls.length === 0) {
+          conversationComplete = true;


If no tool calls found for a query, we should call with a different prompt. And ask it to respond to user (instead of use tools)

Similarly, I probably ran over any kind of prompt concept you guys were working towards. So this would be like, something again we can discuss and figure out. Like, I see you guys have plans for agentic situations, but for me, I am more focused on revealing the true nature of the models themselves.

And the feelings I have on this like, during this crazy PR I was torn and constantly scared of stepping on toes here because my intentions were not to just to mow down and destroy the work you guys have done in this kind of prompt switching logic. It's just that like to me, the conductor handles concepts like this similarly, so in my mind I guess I was thinking eventually it could be implemented in that way and not through this prompt concept but idk, to keep it, you're right it would need to be re-implemented here.

I'd love to discuss this further.

gauravagerwala · 2025-06-08T08:12:04Z

+  /**
+   * Creates a generator for sequential inline tool calling response
+   */
+  private async *createSequentialResponseGenerator(


This function is too big. we need to break it down

Totally agree. I have a mentality which lets things grow to ridiculous proportions, which lets concepts grow and mature into fleshed out concepts so that clear and complete separations can be made, and already have made this split over in conductor mode, reducing query_engine.ts to like 150 lines, making a util for tools, separating generators and simple chat into their own files, which definitely breaks down the generator a lot by refactoring the tool logic, but I am totally on board breaking it down further.

gauravagerwala · 2025-06-08T08:13:59Z

- * @param toolCalls - the list of FunctionTools to pass with the query
- * @returns - The response from the LLM
- */
-async function queryOllamaWithTools(requestContext: RequestContext,


We need this to process tool calls

This one I'm not sure I follow.. I've just totally run over this function and it was swallowed whole by the new processRagRatQuery().

gauravagerwala · 2025-06-08T08:44:55Z

+        });
+
+        // SAVE TO MEMORY AFTER EVERY RESPONSE (if enabled)
+        if (isMemoryEnabled() && chatContent.trim()) {


we dont need this. Saving the initial query and final response is sufficient. This will pollute the memory a lot, and make the app much slower.
FYI - multiple LLM calls are made to update memories

I was really scared of memory. In testing it was unreliable at best. I'd love a demystification of your concepts surrounding it. I considered it out of scope for this pr's concept, and though I bumped into it often, to me it's totally unknown. I made my best attempts here to keep it functional, and what you've pinned here is literally me scrambling to debug it for hours.

In fact, my recommendation would be to replace it with MCP. It would reduce the code complexity, and modularize the concept entirely. Though I see models without tool calling might be problematic idk, it's like a whole can of worms.

Like, in practice, the memories saved aren't even raw query or chat, they're like AI summaries somehow..

And there's no way to manage the memory like, idk, to me it felt unfinished. And at a certain point I glazed over it and focused on the massive UI and engine changes. Would love clarity.

Hope you have clarity on this now.

gauravagerwala · 2025-06-08T08:47:10Z

    }

+    // Load the chat history for this specific chat
+    chatHistory.clear();


was this a bug?

Yes, and I think by the time conductor is in it's gone, but the code still remains in main.. Ultimately this should be investigated further.

gauravagerwala · 2025-06-08T08:47:23Z

  setupDevelopmentEnvironment(): void {
    this.mainWindow.webContents.on('context-menu', (_, props) => {
      const { x, y } = props;
+      const { selectionText, editFlags } = props;


this is super helpful. thanks!

Tomobobo710 · 2025-06-08T11:56:09Z

Essentially, my original intentions and involvement with Cobolt secretly included the whole conductor concept. The plan was always to get to the conductor mode, but I started small with the transparent tool calling, then moved toward what eventually became conductor with the PR that was reverted from main. Seeing the problems with that PR, made me scope down a little and that's the focus of this current PR, just get the "block" concept cleanly in place so that in the future, the conductor can formulate the AI responses from these blocks. Like steps on a ladder.

Initial proposal

1f99b58

Tomobobo710 requested review from CoderHam, Rishabh4275, gauravagerwala and pulkitjuneja as code owners June 8, 2025 07:01

revert

3324042

gauravagerwala reviewed Jun 8, 2025

View reviewed changes

Tomobobo710 mentioned this pull request Jun 8, 2025

Implement conductor mode #74

Open

11 tasks

gauravagerwala reviewed Jun 8, 2025

View reviewed changes

pulkitjuneja mentioned this pull request Jun 8, 2025

Add MCP tool call error handling and Retries #22

Closed

Restore multi-model, multi-prompt, small model for internal tools

e900506

Conversation

Tomobobo710 commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes Made:

But Why?

Thinking Ahead

Type of Change

How Has This Been Tested?

Screenshots

Checklist:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tomobobo710 Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tomobobo710 Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tomobobo710 Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tomobobo710 Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tomobobo710 Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tomobobo710 Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tomobobo710 Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tomobobo710 Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tomobobo710 Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tomobobo710 commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Tomobobo710 commented Jun 8, 2025 •

edited

Loading

Tomobobo710 Jun 8, 2025 •

edited

Loading

Tomobobo710 Jun 8, 2025 •

edited

Loading

Tomobobo710 Jun 8, 2025 •

edited

Loading

Tomobobo710 Jun 8, 2025 •

edited

Loading

Tomobobo710 Jun 8, 2025 •

edited

Loading

Tomobobo710 Jun 8, 2025 •

edited

Loading

Tomobobo710 Jun 8, 2025 •

edited

Loading

Tomobobo710 Jun 8, 2025 •

edited

Loading

Tomobobo710 Jun 8, 2025 •

edited

Loading

Tomobobo710 commented Jun 8, 2025 •

edited

Loading