Sequential Tool Calling and Response#73
Sequential Tool Calling and Response#73Tomobobo710 wants to merge 3 commits intoplatinum-hill:mainfrom
Conversation
|
|
||
| // Get ollama client and constants | ||
| const ollama = getOllamaClient(); | ||
| const defaultTemperature = 1.0; |
There was a problem hiding this comment.
these variables should be in config
There was a problem hiding this comment.
What config? Sorry..
| let response; | ||
| try { | ||
| response = await ollama.chat({ | ||
| model: MODELS.CHAT_MODEL, // Use chat model, not tools model |
There was a problem hiding this comment.
we are using a chat model, and a system prompt that does not use tools, even though we are passing tools to Ollama
There was a problem hiding this comment.
This is a concept I'm pretty sure I ran over entirely, so I'd love to hear you guy's overall concepts and goals in this area so we can get on the same page about the overall operation and scope of Cobolt as a whole.
Like, from my original perspective like, it felt much too early to try to implement high level concepts like this, and of course this is from the perspective of redoing the entire rendering and generation pipelines just to implement visible tool calling and it's use during response like, this concept is too heady for where the codebase was at the start of this journey. I guess I'm saying that I didn't totally mow this stuff down on purpose, and left this point of entry for future improvement. Because I know to you guys it has importance, but in the scope of this PR it is on life support.
There was a problem hiding this comment.
I share @gauravagerwala's concern here. It is possible that the the chat model we are using does not support tool calling. Since the models are executed locally, it would be unreasonable to expect a single model to perform optimally across all tasks. Our approach with Cobolt has been to utilize smaller, specialized models that excel at specific functions, such as tool calling and reasoning.
There was a problem hiding this comment.
I think I understand now. Your plan here is for internal tools to use baby models to complete tasks within Cobolt. Is that correct? This wasn't clicking for me until just now...
|
|
||
| // If no tool calls, conversation is complete | ||
| if (detectedToolCalls.length === 0) { | ||
| conversationComplete = true; |
There was a problem hiding this comment.
If no tool calls found for a query, we should call with a different prompt. And ask it to respond to user (instead of use tools)
There was a problem hiding this comment.
Similarly, I probably ran over any kind of prompt concept you guys were working towards. So this would be like, something again we can discuss and figure out. Like, I see you guys have plans for agentic situations, but for me, I am more focused on revealing the true nature of the models themselves.
And the feelings I have on this like, during this crazy PR I was torn and constantly scared of stepping on toes here because my intentions were not to just to mow down and destroy the work you guys have done in this kind of prompt switching logic. It's just that like to me, the conductor handles concepts like this similarly, so in my mind I guess I was thinking eventually it could be implemented in that way and not through this prompt concept but idk, to keep it, you're right it would need to be re-implemented here.
I'd love to discuss this further.
| /** | ||
| * Creates a generator for sequential inline tool calling response | ||
| */ | ||
| private async *createSequentialResponseGenerator( |
There was a problem hiding this comment.
This function is too big. we need to break it down
There was a problem hiding this comment.
Totally agree. I have a mentality which lets things grow to ridiculous proportions, which lets concepts grow and mature into fleshed out concepts so that clear and complete separations can be made, and already have made this split over in conductor mode, reducing query_engine.ts to like 150 lines, making a util for tools, separating generators and simple chat into their own files, which definitely breaks down the generator a lot by refactoring the tool logic, but I am totally on board breaking it down further.
| * @param toolCalls - the list of FunctionTools to pass with the query | ||
| * @returns - The response from the LLM | ||
| */ | ||
| async function queryOllamaWithTools(requestContext: RequestContext, |
There was a problem hiding this comment.
We need this to process tool calls
There was a problem hiding this comment.
This one I'm not sure I follow.. I've just totally run over this function and it was swallowed whole by the new processRagRatQuery().
| }); | ||
|
|
||
| // SAVE TO MEMORY AFTER EVERY RESPONSE (if enabled) | ||
| if (isMemoryEnabled() && chatContent.trim()) { |
There was a problem hiding this comment.
we dont need this. Saving the initial query and final response is sufficient. This will pollute the memory a lot, and make the app much slower.
FYI - multiple LLM calls are made to update memories
There was a problem hiding this comment.
I was really scared of memory. In testing it was unreliable at best. I'd love a demystification of your concepts surrounding it. I considered it out of scope for this pr's concept, and though I bumped into it often, to me it's totally unknown. I made my best attempts here to keep it functional, and what you've pinned here is literally me scrambling to debug it for hours.
There was a problem hiding this comment.
In fact, my recommendation would be to replace it with MCP. It would reduce the code complexity, and modularize the concept entirely. Though I see models without tool calling might be problematic idk, it's like a whole can of worms.
There was a problem hiding this comment.
Like, in practice, the memories saved aren't even raw query or chat, they're like AI summaries somehow..
There was a problem hiding this comment.
And there's no way to manage the memory like, idk, to me it felt unfinished. And at a certain point I glazed over it and focused on the massive UI and engine changes. Would love clarity.
There was a problem hiding this comment.
Hope you have clarity on this now.
| } | ||
|
|
||
| // Load the chat history for this specific chat | ||
| chatHistory.clear(); |
There was a problem hiding this comment.
Yes, and I think by the time conductor is in it's gone, but the code still remains in main.. Ultimately this should be investigated further.
| setupDevelopmentEnvironment(): void { | ||
| this.mainWindow.webContents.on('context-menu', (_, props) => { | ||
| const { x, y } = props; | ||
| const { selectionText, editFlags } = props; |
There was a problem hiding this comment.
this is super helpful. thanks!
|
Essentially, my original intentions and involvement with Cobolt secretly included the whole conductor concept. The plan was always to get to the conductor mode, but I started small with the transparent tool calling, then moved toward what eventually became conductor with the PR that was reverted from main. Seeing the problems with that PR, made me scope down a little and that's the focus of this current PR, just get the "block" concept cleanly in place so that in the future, the conductor can formulate the AI responses from these blocks. Like steps on a ladder. |

Description
The AI can now call tools and use the results before a response is completed.
Re-do the way the AI processes tool calling, thinking, and chat rather than forcing a rigid structure like think->tool->chat.
In addition, refactor the ChatInterface.tsx into smaller chunks, breaking it down into these
MessageBlockcomponents:ChatInput.tsxMessageBlock.tsxTextBlock.tsxThinkingBlock.tsxToolCallBlock.tsxWhich makes it less daunting to individually manage each component of the chat rendering.
Changes Made:
Responses now are returned in "blocks".
Blocks can contain thinking, tool calls, or chat.
AI can call multiple tools in line during response.
MCP tool call context is sequentially fed back to the AI to inform the AI of the tool result during response.
Tool calls are no longer bunched together and now have individual dropdowns per-tool.
Tool and Reasoning dropdowns appear "in position" within the response, not fixed at the top of the response.
AI "chat" responses (not tool, not thinking) will similarly be rendered in sequential position without dropdown.
Tool dropdowns have tool request and response streamed.
All dropdowns are collapsed by default, but can be expanded.
A badge for the dropdown will show its active status ("THINKING", "EXECUTING", "ERROR").
When complete, dropdown headers will show a badge which displays the total time taken by the execution for that block.
Any of the react rendering tags are scrubbed to ensure the AI context is not contaminated.
Additionally:
The cancel generation button has been moved outside of the chat input box.
A clickable button to send the chat message to the AI has been added in it's place.
Allows the use of right click contextual tools like copy, paste, etc. in debug mode.
But Why?
This lets the AI use "in-response learning" by essentially streaming in relevant external info during their response.
A simple example would be AI tool use -> fail -> AI learns from the tool response -> 2nd tool use in a single response.
This is largely model dependent, but in testing, several models were able to use this flow in an organic way.
Without this opportunity, tool use feels heavily restricted and much less useful when compared with a UI that supports this.
Fixes #22
Thinking Ahead
I do plan on looking into options for actual tool request streaming, but this PR is focused on establishing building blocks not only for individual tool request streaming, but the ability to break even further away from any structure controlled even by the AI itself. Ultimately, https://ollama.com/blog/streaming-tool is only integrated tool result context during the response, and contrary to the name of the feature, the tool call is not streamed. This PR refactors a lot of the big bad functions into much smaller bites, and lets Cobolt use the "tool streaming" features of Ollama.
Type of Change
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce.
Screenshots
Checklist: