How to Autoconfigure Your Pipeline#

Ragbits provides a feature that allows users to automatically configure hyperparameters for a pipeline. This functionality is agnostic to the type of optimized structure, with the only requirements being the following:

The optimized pipeline must inherit from ragbits.evaluate.pipelines.base.EvaluationPipeline.
The definition of optimized metrics must adhere to the ragbits.evaluate.metrics.base.Metric interface.
These metrics should be gathered into an instance of ragbits.evaluate.metrics.base.MetricSet.
An instance of a class inheriting from ragbits.evaluate.metrics.loader.base.DataLoader must be provided as the data source for optimization.

Supported Parameter Types#

The optimized parameters can be of the following types:

Continuous
Ordinal
Categorical

For ordinal and continuous parameters, the values should be integers or floats. For categorical parameters, more sophisticated structures are supported, including the possibility of nested parameters of other types.

Each optimized variable should be marked with the optimize=True flag in the configuration.

For categorical variables, you must also provide the choices field, which lists all possible values to be considered during optimization. For continuous and ordinal variables, the range field should be specified as a two-element list defining the minimum and maximum values of interest. For continuous parameters, the elements must be floats, while for ordinal parameters, they must be integers.

Example Usage#

In this example, we will optimize a system prompt for a question-answering pipeline so that the answers contain the minimal number of tokens.

Define the Optimized Pipeline Structure#

from dataclasses import dataclass
from ragbits.evaluate.pipelines.base import EvaluationResult, EvaluationPipeline
from ragbits.core.llms.litellm import LiteLLM
from ragbits.core.prompt import Prompt
from pydantic import BaseModel


@dataclass
class RandomQuestionPipelineResult(EvaluationResult):
    answer: str


class QuestionRespondPromptInput(BaseModel):
    system_prompt_content: str
    question: str


class QuestionRespondPrompt(Prompt[QuestionRespondPromptInput]):
    system_prompt = "{{ system_prompt_content }}"
    user_prompt = "{{ question }}"


class RandomQuestionRespondPipeline(EvaluationPipeline):
    async def __call__(self, data: dict[str, str]) -> RandomQuestionPipelineResult:
        llm = LiteLLM()
        input_prompt = QuestionRespondPrompt(
            QuestionRespondPromptInput(
                system_prompt_content=self.config.system_prompt_content,
                question=data["question"],
            )
        )
        answer = await llm.generate(prompt=input_prompt)
        return RandomQuestionPipelineResult(answer=answer)

Define the Data Loader#

Next, we define the data loader. We'll use Ragbits generation stack to create an artificial data loader:

from ragbits.evaluate.loaders.base import DataLoader, DataT
from ragbits.core.llms.litellm import LiteLLM
from ragbits.core.prompt import Prompt
from pydantic import BaseModel
from omegaconf import OmegaConf


class DatasetGenerationPromptInput(BaseModel):
    topic: str


class DatasetGenerationPrompt(Prompt[DatasetGenerationPromptInput]):
    system_prompt = "Be a provider of random questions on a topic specified by the user."
    user_prompt = "Generate a question about {{ topic }}"


class RandomQuestionsDataLoader(DataLoader):
    async def load(self) -> list[dict[str, str]]:
        questions = []
        llm = LiteLLM()
        for _ in range(self.config.num_questions):
            question = await llm.generate(
                DatasetGenerationPrompt(DatasetGenerationPromptInput(topic=self.config.question_topic))
            )
            questions.append({"question": question})
        return questions


dataloader_config = OmegaConf.create(
    {"num_questions": 10, "question_topic": "conspiracy theories"}
)
dataloader = RandomQuestionsDataLoader(dataloader_config)

Define the Metrics and Run the Experiment#

from pprint import pp as pprint
import tiktoken
from ragbits.evaluate.optimizer import Optimizer
from ragbits.evaluate.metrics.base import Metric, MetricSet, ResultT
from omegaconf import OmegaConf


class TokenCountMetric(Metric):
    def compute(self, results: list[ResultT]) -> dict[str, float]:
        encoding = tiktoken.get_encoding("cl100k_base")
        num_tokens = [len(encoding.encode(out.answer)) for out in results]
        return {"num_tokens": sum(num_tokens) / len(num_tokens)}


metrics = MetricSet(TokenCountMetric())

optimization_cfg = OmegaConf.create(
    {"direction": "minimize", "n_trials": 4, "max_retries_for_trial": 3}
)
optimizer = Optimizer(optimization_cfg)

optimized_params = OmegaConf.create(
    {
        "system_prompt_content": {
            "optimize": True,
            "choices": [
                "Be a friendly bot answering user questions. Be as concise as possible",
                "Be a silly bot answering user questions. Use as few tokens as possible",
                "Be informative and straight to the point",
                "Respond to user questions in as few words as possible",
            ],
        }
    }
)
configs_with_scores = optimizer.optimize(
    pipeline_class=RandomQuestionRespondPipeline,
    config_with_params=optimized_params,
    metrics=metrics,
    dataloader=dataloader,
)
pprint(configs_with_scores)

After executing the code, your console should display an output structure similar to this:

[({'system_prompt_content': 'Be a silly bot answering user questions. Use as few tokens as possible'},
  6.0,
  {'num_tokens': 6.0}),
 ({'system_prompt_content': 'Be a silly bot answering user questions. Use as few tokens as possible'},
  10.7,
  {'num_tokens': 10.7}),
 ({'system_prompt_content': 'Be a friendly bot answering user questions. Be as concise as possible'},
  37.8,
  {'num_tokens': 37.8}),
 ({'system_prompt_content': 'Be informative and straight to the point'},
  113.2,
  {'num_tokens': 113.2})]

This output consists of tuples, each containing three elements:

The configuration used in the trial.
The score achieved.
A dictionary of detailed metrics that contribute to the score.

The tuples are ordered from the best to the worst configuration based on the score.

Please note that the details may vary between runs due to the non-deterministic nature of both the LLM and the optimization algorithm.