How-To: Define and use agents with Ragbits#

Ragbits Agent combines the reasoning power of LLMs with the ability to execute custom code through tools. This makes it possible to handle complex tasks by giving the model access to your own Python functions.

When using tool-enabled agents, the LLM reviews the system prompt and incoming messages to decide whether a tool should be called. Instead of just generating a text response, the model can choose to invoke a tool or combine both approaches.

Before using tools, you can check whether your selected model supports function calling with:

litellm.supports_function_calling(model="your-model-name")

If function calling is supported and tools are enabled, the agent interprets the user input, decides whether a tool is needed, executes it if necessary, and returns a final response enriched with tool results.

This response is encapsulated in an AgentResult, which includes the model's output, additional metadata, conversation history, and any tool calls performed.

How to build an agent with Ragbits#

This guide walks you through building a simple agent that uses a get_weather tool to return weather data based on a location.

Define a tool function#

First, define the function you want your agent to call. It should take regular Python arguments and return a JSON-serializable result.

import json


def get_weather(location: str) -> str:
    """
    Returns the current weather for a given location.

    Args:
        location: The location to get the weather for.

    Returns:
        The current weather for the given location.
    """
    if "tokyo" in location.lower():
        return json.dumps({"location": "Tokyo", "temperature": "10", "unit": "celsius"})
    elif "san francisco" in location.lower():
        return json.dumps({"location": "San Francisco", "temperature": "72", "unit": "fahrenheit"})
    elif "paris" in location.lower():
        return json.dumps({"location": "Paris", "temperature": "22", "unit": "celsius"})
    else:

Define a prompt#

Use a structured prompt to instruct the LLM. For details on writing prompts with Ragbits, see the Guide to Prompting.

from pydantic import BaseModel
from ragbits.core.prompt import Prompt


class WeatherPromptInput(BaseModel):
    """
    Input format for the WeatherPrompt.
    """

    location: str


class WeatherPrompt(Prompt[WeatherPromptInput]):
    """
    Prompt that returns weather for a given location.
    """

    system_prompt = """
    You are a helpful assistant that responds to user questions about weather.
    """

    user_prompt = """
    Tell me the temperature in {{ location }}.
    """

Run the agent#

Create the agent, attach the prompt and tool, and run it:

import asyncio
from ragbits.agents import Agent
from ragbits.core.llms import LiteLLM


async def main() -> None:
    """
    Run the example.
    """
    llm = LiteLLM(model_name="gpt-4o-2024-08-06", use_structured_output=True)
    agent = Agent(
        llm=llm,
        prompt=WeatherPrompt,
        tools=[get_weather],
        default_options=AgentOptions(max_total_tokens=500, max_turns=5),
    )

The result is an AgentResult, which includes the model's output, additional metadata, conversation history, and any tool calls performed.

You can find the complete code example in the Ragbits repository here.

Alternative approach: inheritance with `prompt_config`#

In addition to explicitly attaching a Prompt instance, Ragbits also supports defining agents through a combination of inheritance and the @Agent.prompt_config decorator.

This approach lets you bind input (and optionally output) models directly to your agent class. The agent then derives its prompt structure automatically, without requiring a prompt argument in the constructor.

from pydantic import BaseModel
from ragbits.agents import Agent

class WeatherAgentInput(BaseModel):
    """
    Input format for the WeatherAgent
    """

    location: str


@Agent.prompt_config(WeatherAgentInput)
class WeatherAgent(Agent):
    """
    Agent that returns weather for a given location.
    """

    system_prompt = """
    You are a helpful assistant that responds to user questions about weather.
    """
    user_prompt = """
    Tell me the temperature in {{ location }}.
    """

The decorator can also accept an output type, allowing you to strongly type both the inputs and outputs of the agent. If you do not explicitly define a user_prompt, Ragbits will default to {{ input }}.

Once defined, the agent class can be used directly, just like any other subclass of Agent:

import asyncio
from ragbits.agents import Agent
from ragbits.core.llms import LiteLLM

async def main() -> None:
    """
    Run the example.
    """
    llm = LiteLLM(model_name="gpt-4o-2024-08-06", use_structured_output=True)
    agent = WeatherAgent(
        llm=llm,
        tools=[get_weather],
        default_options=AgentOptions(max_total_tokens=500, max_turns=5),
    )
    response = await agent.run(WeatherAgentInput(location="Paris"), tool_choice=get_weather)
    print(response)

You can find the complete code example in the Ragbits repository here.

Tool choice#

To control what tool is used at first call you could use tool_choice parameter. There are the following options: - "auto": let model decide if tool call is needed - "none": do not call tool - "required: enforce tool usage (model decides which one) - Callable: one of provided tools

Conversation history#

Agents can retain conversation context across multiple interactions by enabling the keep_history flag when initializing the agent. This is useful when you want the agent to understand follow-up questions without needing the user to repeat earlier details.

To enable this, simply set keep_history=True when constructing the agent. The full exchange—including messages, tool calls, and results—is stored and can be accessed via the AgentResult.history property.

Example of context preservation#

The following example demonstrates how an agent with history enabled maintains context between interactions:

async def main() -> None:
    """Run the weather agent with conversation history."""
    llm = LiteLLM(model_name="gpt-4o-2024-08-06", use_structured_output=True)
    agent = Agent(llm=llm, prompt=WeatherPrompt, tools=[get_weather], keep_history=True)

    await agent.run(WeatherPromptInput(location="Paris"))

    # Follow-up question about Tokyo - the agent retains weather context
    response = await agent.run("What about Tokyo?")
    print(response)

In this scenario, the agent recognizes that the follow-up question "What about Tokyo?" refers to weather information due to the preserved conversation history. The expected output would be an AgentResult containing the response:

AgentResult(content='The current temperature in Tokyo is 10°C.', ...)

Long term memory tool#

While keep_history maintains context within a single session, long-term memory tool enables agents to store and retrieve information across multiple separate conversations. It uses a vector store for semantic search and organizes memories by keys, allowing personalized context based on provided id.

from pydantic import BaseModel

from ragbits.agents import Agent
from ragbits.agents.tools.memory import LongTermMemory, create_memory_tools
from ragbits.core.embeddings import LiteLLMEmbedder
from ragbits.core.llms import LiteLLM
from ragbits.core.prompt import Prompt
from ragbits.core.vector_stores.in_memory import InMemoryVectorStore

class ConversationInput(BaseModel):
    message: str


class ConversationPrompt(Prompt[ConversationInput, str]):
    """Prompt for conversation with memory capabilities."""

    system_prompt = """
    You are a helpful assistant with long-term memory. You can remember information
    from previous conversations and use it to provide more personalized responses.

    You have access to memory tools that allow you to:
    - Store important facts from conversations
    - Retrieve relevant memories based on queries

    Store all information about the user that might be useful in future conversations.
    Always start with retrieving memories (implicit) to provide more relevant and personalized experience.
    """

    user_prompt = """
    Message: {{ message }}
    """

async def main() -> None:
    # Initialize components
    llm = LiteLLM(model_name="gpt-4o-mini")
    embedder = LiteLLMEmbedder(model_name="text-embedding-3-small")
    vector_store = InMemoryVectorStore(embedder=embedder)
    long_term_memory = LongTermMemory(vector_store=vector_store)

    memory_tools = create_memory_tools(long_term_memory, user_id="user_1")
    agent = Agent(llm=llm, prompt=ConversationPrompt, tools=[*memory_tools])

    # Provide context
    await agent.run(ConversationInput(
        message="I love hiking in the mountains. I'm planning a trip to Rome next month."
    ))

    # New session
    llm = LiteLLM(model_name="gpt-4o-mini")
    memory_tools = create_memory_tools(long_term_memory, user_id="user_1")
    agent = Agent(llm=llm, prompt=ConversationPrompt, tools=[*memory_tools])

    response2 = await agent.run(ConversationInput(
        message="What outdoor activities would you recommend for my trip?"
    ))
    print(response2.content)

    # Agent remembers Rome trip and hiking preference, suggests Castelli Romani trails, etc.

LongTermMemory class also provides internal methods for managing the memories. You can find the complete code example in the Ragbits repository here.

Binding dependencies via AgentRunContext#

You can bind your external dependencies before the first access and safely use them in tools. After first attribute lookup, the dependencies container freezes to prevent mutation during a run.

from dataclasses import dataclass
from ragbits.agents import Agent, AgentRunContext
from ragbits.core.llms.mock import MockLLM, MockLLMOptions

@dataclass
class Deps:
    api_host: str

def get_api_host(context: AgentRunContext | None) -> str:
    """Return the API host taken from the bound dependencies in context."""
    assert context is not None
    return context.deps.api_host

async def main() -> None:
    llm = MockLLM(
        default_options=MockLLMOptions(
            response="Using dependencies from context.",
            tool_calls=[{"name": "get_api_host", "arguments": "{}", "id": "example", "type": "function"}],
        )
    )
    agent = Agent(llm=llm, prompt="Retrieve API host", tools=[get_api_host])

    context = AgentRunContext()
    context.deps.value = Deps(api_host="https://api.local")

    result = await agent.run("What host are we using?", context=context)
    print(result.tool_calls[0].result)

See the runnable example in examples/agents/dependencies.py.

Streaming agent responses#

For use cases where you want to process partial outputs from the LLM as they arrive (e.g., in chat UIs), the Agent class supports streaming through the run_streaming() method.

This method returns an AgentResultStreaming object — an async iterator that yields parts of the LLM response and tool-related events in real time.

from ragbits.agents import Agent, ToolCall, ToolCallResult
from ragbits.core.llms import LiteLLM

async def main() -> None:
    """Run the weather agent with streaming output."""
    llm = LiteLLM(model_name="gpt-4o-2024-08-06", use_structured_output=True)
    agent = Agent(llm=llm, prompt=WeatherPrompt, tools=[get_weather])

    async for chunk in agent.run_streaming(WeatherPromptInput(location="Paris")):
        if isinstance(chunk, ToolCall):
            print(f"Calling tool: {chunk.name}({chunk.arguments})")
        elif isinstance(chunk, ToolCallResult):
            print(f"Tool result: {chunk.result}")
        elif isinstance(chunk, str):
            print(chunk, end="", flush=True)

Streaming custom events from tools#

Tools can emit custom events during execution that are surfaced through the streaming loop. To do this, define your tool as an async generator that yields intermediate events and a final ToolReturn value:

from collections.abc import AsyncGenerator

from pydantic import BaseModel

from ragbits.agents.tool import ToolReturn


class MyEvent(BaseModel):
    """Custom event that we want to stream from an agent"""

    name: str
    description: str


async def my_tool() -> AsyncGenerator[ToolReturn | str | int | MyEvent]:
    """
    Fetches revenue data (in thousands of dollars) for the given year and displays it as a Markdown table.
    """
    yield "My Event 1"
    yield 2
    yield MyEvent(name="My Event 3", description="Another event that will be yielded in an agent")

Events yielded before the ToolReturn are collected and available via result.tool_events after the stream completes:

from ragbits.agents import Agent
from ragbits.core.llms import LiteLLM



async def main() -> None:
    """Run the agent with streaming events from the tools"""
    llm = LiteLLM(model_name="gpt-4o")
    agent = Agent(llm, prompt="Call my_tool for every answer", tools=[my_tool])
    result = agent.run_streaming("Hello, please call my_tool and tell me what is the number it returned!")
    async for event in result:
        print(event)
    print()

You can find the complete code example in the Ragbits repository here.

Native OpenAI tools#

Ragbits supports selected native OpenAI tools (web_search_preview, image_generation and code_interpreter). You can use them together with your tools.

from ragbits.agents.tools import get_web_search_tool

async def main() -> None:
    """Run the weather agent with additional tool."""
    model_name = "gpt-4o-2024-08-06"
    llm = LiteLLM(model_name=model_name, use_structured_output=True)
    agent = Agent(llm=llm, prompt=WeatherPrompt, tools=[get_web_search_tool(model_name)], keep_history=True)

    response = await agent.run(WeatherPromptInput(location="Paris"))
    print(response)

Tool descriptions are available here. For each of these you can see detailed information on the corresponding sub-pages (i.e. here for web search). You can use default parameters or specify your own as a dict. For web search this might look like that:

from ragbits.agents.tools import get_web_search_tool

tool_params = {
        "user_location": {
            "type": "approximate",
            "country": "GB",
            "city": "London",
            "region": "London",
        }
}
web_search_tool = get_web_search_tool("gpt-4o", tool_params)

How-To: Define and use agents with Ragbits#

How to build an agent with Ragbits#

Define a tool function#

Define a prompt#

Run the agent#

Alternative approach: inheritance with prompt_config#

Tool choice#

Conversation history#

Example of context preservation#

Long term memory tool#

Binding dependencies via AgentRunContext#

Streaming agent responses#

Streaming custom events from tools#

Native OpenAI tools#

Alternative approach: inheritance with `prompt_config`#