Skip to content

Sequential Tool Calling and Response#73

Open
Tomobobo710 wants to merge 3 commits intoplatinum-hill:mainfrom
Tomobobo710:feature/sequential-tool-calling
Open

Sequential Tool Calling and Response#73
Tomobobo710 wants to merge 3 commits intoplatinum-hill:mainfrom
Tomobobo710:feature/sequential-tool-calling

Conversation

@Tomobobo710
Copy link
Copy Markdown
Contributor

@Tomobobo710 Tomobobo710 commented Jun 8, 2025

Description

The AI can now call tools and use the results before a response is completed.

Re-do the way the AI processes tool calling, thinking, and chat rather than forcing a rigid structure like think->tool->chat.
In addition, refactor the ChatInterface.tsx into smaller chunks, breaking it down into these MessageBlock components:

  • ChatInput.tsx
  • MessageBlock.tsx
  • TextBlock.tsx
  • ThinkingBlock.tsx
  • ToolCallBlock.tsx

Which makes it less daunting to individually manage each component of the chat rendering.

Changes Made:

Responses now are returned in "blocks".
Blocks can contain thinking, tool calls, or chat.
AI can call multiple tools in line during response.
MCP tool call context is sequentially fed back to the AI to inform the AI of the tool result during response.
Tool calls are no longer bunched together and now have individual dropdowns per-tool.
Tool and Reasoning dropdowns appear "in position" within the response, not fixed at the top of the response.
AI "chat" responses (not tool, not thinking) will similarly be rendered in sequential position without dropdown.
Tool dropdowns have tool request and response streamed.
All dropdowns are collapsed by default, but can be expanded.
A badge for the dropdown will show its active status ("THINKING", "EXECUTING", "ERROR").
When complete, dropdown headers will show a badge which displays the total time taken by the execution for that block.
Any of the react rendering tags are scrubbed to ensure the AI context is not contaminated.

Additionally:
The cancel generation button has been moved outside of the chat input box.
A clickable button to send the chat message to the AI has been added in it's place.
Allows the use of right click contextual tools like copy, paste, etc. in debug mode.

But Why?

This lets the AI use "in-response learning" by essentially streaming in relevant external info during their response.
A simple example would be AI tool use -> fail -> AI learns from the tool response -> 2nd tool use in a single response.
This is largely model dependent, but in testing, several models were able to use this flow in an organic way.
Without this opportunity, tool use feels heavily restricted and much less useful when compared with a UI that supports this.
Fixes #22

Thinking Ahead

I do plan on looking into options for actual tool request streaming, but this PR is focused on establishing building blocks not only for individual tool request streaming, but the ability to break even further away from any structure controlled even by the AI itself. Ultimately, https://ollama.com/blog/streaming-tool is only integrated tool result context during the response, and contrary to the name of the feature, the tool call is not streamed. This PR refactors a lot of the big bad functions into much smaller bites, and lets Cobolt use the "tool streaming" features of Ollama.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance improvement
  • Code refactoring (no functional changes)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce.

Screenshots

Animation
image
image

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested the app on Mac OS (If not, leave unchecked so we can test before merging)
  • I have tested the app on Windows (If not, leave unchecked so we can test before merging)
  • I have tested the app on Linux (If not, leave unchecked so we can test before merging)

Comment thread src/cobolt-backend/query_engine.ts Outdated

// Get ollama client and constants
const ollama = getOllamaClient();
const defaultTemperature = 1.0;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these variables should be in config

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What config? Sorry..

Comment thread src/cobolt-backend/query_engine.ts Outdated
let response;
try {
response = await ollama.chat({
model: MODELS.CHAT_MODEL, // Use chat model, not tools model
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are using a chat model, and a system prompt that does not use tools, even though we are passing tools to Ollama

Copy link
Copy Markdown
Contributor Author

@Tomobobo710 Tomobobo710 Jun 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a concept I'm pretty sure I ran over entirely, so I'd love to hear you guy's overall concepts and goals in this area so we can get on the same page about the overall operation and scope of Cobolt as a whole.

Like, from my original perspective like, it felt much too early to try to implement high level concepts like this, and of course this is from the perspective of redoing the entire rendering and generation pipelines just to implement visible tool calling and it's use during response like, this concept is too heady for where the codebase was at the start of this journey. I guess I'm saying that I didn't totally mow this stuff down on purpose, and left this point of entry for future improvement. Because I know to you guys it has importance, but in the scope of this PR it is on life support.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I share @gauravagerwala's concern here. It is possible that the the chat model we are using does not support tool calling. Since the models are executed locally, it would be unreasonable to expect a single model to perform optimally across all tasks. Our approach with Cobolt has been to utilize smaller, specialized models that excel at specific functions, such as tool calling and reasoning.

Copy link
Copy Markdown
Contributor Author

@Tomobobo710 Tomobobo710 Jun 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand now. Your plan here is for internal tools to use baby models to complete tasks within Cobolt. Is that correct? This wasn't clicking for me until just now...

Comment thread src/cobolt-backend/query_engine.ts Outdated

// If no tool calls, conversation is complete
if (detectedToolCalls.length === 0) {
conversationComplete = true;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no tool calls found for a query, we should call with a different prompt. And ask it to respond to user (instead of use tools)

Copy link
Copy Markdown
Contributor Author

@Tomobobo710 Tomobobo710 Jun 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, I probably ran over any kind of prompt concept you guys were working towards. So this would be like, something again we can discuss and figure out. Like, I see you guys have plans for agentic situations, but for me, I am more focused on revealing the true nature of the models themselves.

And the feelings I have on this like, during this crazy PR I was torn and constantly scared of stepping on toes here because my intentions were not to just to mow down and destroy the work you guys have done in this kind of prompt switching logic. It's just that like to me, the conductor handles concepts like this similarly, so in my mind I guess I was thinking eventually it could be implemented in that way and not through this prompt concept but idk, to keep it, you're right it would need to be re-implemented here.

I'd love to discuss this further.

Comment thread src/cobolt-backend/query_engine.ts Outdated
/**
* Creates a generator for sequential inline tool calling response
*/
private async *createSequentialResponseGenerator(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is too big. we need to break it down

Copy link
Copy Markdown
Contributor Author

@Tomobobo710 Tomobobo710 Jun 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree. I have a mentality which lets things grow to ridiculous proportions, which lets concepts grow and mature into fleshed out concepts so that clear and complete separations can be made, and already have made this split over in conductor mode, reducing query_engine.ts to like 150 lines, making a util for tools, separating generators and simple chat into their own files, which definitely breaks down the generator a lot by refactoring the tool logic, but I am totally on board breaking it down further.

* @param toolCalls - the list of FunctionTools to pass with the query
* @returns - The response from the LLM
*/
async function queryOllamaWithTools(requestContext: RequestContext,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this to process tool calls

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I'm not sure I follow.. I've just totally run over this function and it was swallowed whole by the new processRagRatQuery().

@Tomobobo710 Tomobobo710 mentioned this pull request Jun 8, 2025
11 tasks
Comment thread src/cobolt-backend/query_engine.ts Outdated
});

// SAVE TO MEMORY AFTER EVERY RESPONSE (if enabled)
if (isMemoryEnabled() && chatContent.trim()) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont need this. Saving the initial query and final response is sufficient. This will pollute the memory a lot, and make the app much slower.
FYI - multiple LLM calls are made to update memories

Copy link
Copy Markdown
Contributor Author

@Tomobobo710 Tomobobo710 Jun 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was really scared of memory. In testing it was unreliable at best. I'd love a demystification of your concepts surrounding it. I considered it out of scope for this pr's concept, and though I bumped into it often, to me it's totally unknown. I made my best attempts here to keep it functional, and what you've pinned here is literally me scrambling to debug it for hours.

Copy link
Copy Markdown
Contributor Author

@Tomobobo710 Tomobobo710 Jun 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, my recommendation would be to replace it with MCP. It would reduce the code complexity, and modularize the concept entirely. Though I see models without tool calling might be problematic idk, it's like a whole can of worms.

Copy link
Copy Markdown
Contributor Author

@Tomobobo710 Tomobobo710 Jun 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like, in practice, the memories saved aren't even raw query or chat, they're like AI summaries somehow..

Copy link
Copy Markdown
Contributor Author

@Tomobobo710 Tomobobo710 Jun 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And there's no way to manage the memory like, idk, to me it felt unfinished. And at a certain point I glazed over it and focused on the massive UI and engine changes. Would love clarity.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope you have clarity on this now.

Comment thread src/main/main.ts
}

// Load the chat history for this specific chat
chatHistory.clear();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this a bug?

Copy link
Copy Markdown
Contributor Author

@Tomobobo710 Tomobobo710 Jun 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and I think by the time conductor is in it's gone, but the code still remains in main.. Ultimately this should be investigated further.

Comment thread src/main/menu.ts
setupDevelopmentEnvironment(): void {
this.mainWindow.webContents.on('context-menu', (_, props) => {
const { x, y } = props;
const { selectionText, editFlags } = props;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is super helpful. thanks!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jim-carrey-doing-finger-gun-ifvof9lo0j4d9a3n

@Tomobobo710
Copy link
Copy Markdown
Contributor Author

Tomobobo710 commented Jun 8, 2025

Essentially, my original intentions and involvement with Cobolt secretly included the whole conductor concept. The plan was always to get to the conductor mode, but I started small with the transparent tool calling, then moved toward what eventually became conductor with the PR that was reverted from main. Seeing the problems with that PR, made me scope down a little and that's the focus of this current PR, just get the "block" concept cleanly in place so that in the future, the conductor can formulate the AI responses from these blocks. Like steps on a ladder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add MCP tool call error handling and Retries

3 participants