Data Loaders#
ragbits.evaluate.dataloaders.base.DataLoader
#
DataLoader(source: Source, *, split: str = 'data', required_keys: set[str] | None = None)
Bases: WithConstructionConfig
, Generic[EvaluationDataT]
, ABC
Evaluation data loader.
Initialize the data loader.
PARAMETER | DESCRIPTION |
---|---|
source |
The source to load the evaluation data from.
TYPE:
|
split |
The split to load the data from. Split is fixed for data loaders to "data", but you can slice it using the Hugging Face API.
TYPE:
|
required_keys |
The required columns for the evaluation data.
TYPE:
|
Source code in packages/ragbits-evaluate/src/ragbits/evaluate/dataloaders/base.py
subclass_from_config
classmethod
#
Initializes the class with the provided configuration. May return a subclass of the class, if requested by the configuration.
PARAMETER | DESCRIPTION |
---|---|
config |
A model containing configuration details for the class.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
An instance of the class initialized with the provided configuration. |
RAISES | DESCRIPTION |
---|---|
InvalidConfigError
|
The class can't be found or is not a subclass of the current class. |
Source code in packages/ragbits-core/src/ragbits/core/utils/config_handling.py
subclass_from_factory
classmethod
#
Creates the class using the provided factory function. May return a subclass of the class, if requested by the factory. Supports both synchronous and asynchronous factory functions.
PARAMETER | DESCRIPTION |
---|---|
factory_path |
A string representing the path to the factory function in the format of "module.submodule:factory_name".
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
An instance of the class initialized with the provided factory function. |
RAISES | DESCRIPTION |
---|---|
InvalidConfigError
|
The factory can't be found or the object returned is not a subclass of the current class. |
Source code in packages/ragbits-core/src/ragbits/core/utils/config_handling.py
preferred_subclass
classmethod
#
preferred_subclass(config: CoreConfig, factory_path_override: str | None = None, yaml_path_override: Path | None = None) -> Self
Tries to create an instance by looking at project's component preferences, either from YAML or from the factory. Takes optional overrides for both, which takes a higher precedence.
PARAMETER | DESCRIPTION |
---|---|
config |
The CoreConfig instance containing preferred factory and configuration details.
TYPE:
|
factory_path_override |
A string representing the path to the factory function in the format of "module.submodule:factory_name".
TYPE:
|
yaml_path_override |
A string representing the path to the YAML file containing the Ragstack instance configuration.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
InvalidConfigError
|
If the default factory or configuration can't be found. |
Source code in packages/ragbits-core/src/ragbits/core/utils/config_handling.py
from_config
classmethod
#
Create an instance of DataLoader
from a configuration dictionary.
PARAMETER | DESCRIPTION |
---|---|
config |
A dictionary containing configuration settings for the data loader.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
An instance of the data loader class initialized with the provided configuration. |
Source code in packages/ragbits-evaluate/src/ragbits/evaluate/dataloaders/base.py
load
async
#
Load the data.
RETURNS | DESCRIPTION |
---|---|
Iterable[EvaluationDataT]
|
The loaded evaluation data. |
RAISES | DESCRIPTION |
---|---|
DataLoaderIncorrectFormatDataError
|
If evaluation dataset is incorrectly formatted. |
Source code in packages/ragbits-evaluate/src/ragbits/evaluate/dataloaders/base.py
map
abstractmethod
async
#
Map the dataset to the evaluation data.
PARAMETER | DESCRIPTION |
---|---|
dataset |
The dataset to map.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Iterable[EvaluationDataT]
|
The evaluation data. |
Source code in packages/ragbits-evaluate/src/ragbits/evaluate/dataloaders/base.py
ragbits.evaluate.dataloaders.document_search.DocumentSearchDataLoader
#
DocumentSearchDataLoader(source: Source, *, split: str = 'data', question_key: str = 'question', document_ids_key: str = 'document_ids', passages_key: str = 'passages', page_numbers_key: str = 'page_numbers')
Bases: DataLoader[DocumentSearchData]
Document search evaluation data loader.
The source used for this data loader should point to a file that can be loaded by Hugging Face.
Initialize the document search data loader.
PARAMETER | DESCRIPTION |
---|---|
source |
The source to load the data from.
TYPE:
|
split |
The split to load the data from. Split is fixed for data loaders to "data", but you can slice it using the Hugging Face API.
TYPE:
|
question_key |
The dataset column name that contains the question.
TYPE:
|
document_ids_key |
The dataset column name that contains the document ids. Document ids are optional.
TYPE:
|
passages_key |
The dataset column name that contains the passages. Passages are optional.
TYPE:
|
page_numbers_key |
The dataset column name that contains the page numbers. Page numbers are optional.
TYPE:
|
Source code in packages/ragbits-evaluate/src/ragbits/evaluate/dataloaders/document_search.py
subclass_from_config
classmethod
#
Initializes the class with the provided configuration. May return a subclass of the class, if requested by the configuration.
PARAMETER | DESCRIPTION |
---|---|
config |
A model containing configuration details for the class.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
An instance of the class initialized with the provided configuration. |
RAISES | DESCRIPTION |
---|---|
InvalidConfigError
|
The class can't be found or is not a subclass of the current class. |
Source code in packages/ragbits-core/src/ragbits/core/utils/config_handling.py
subclass_from_factory
classmethod
#
Creates the class using the provided factory function. May return a subclass of the class, if requested by the factory. Supports both synchronous and asynchronous factory functions.
PARAMETER | DESCRIPTION |
---|---|
factory_path |
A string representing the path to the factory function in the format of "module.submodule:factory_name".
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
An instance of the class initialized with the provided factory function. |
RAISES | DESCRIPTION |
---|---|
InvalidConfigError
|
The factory can't be found or the object returned is not a subclass of the current class. |
Source code in packages/ragbits-core/src/ragbits/core/utils/config_handling.py
preferred_subclass
classmethod
#
preferred_subclass(config: CoreConfig, factory_path_override: str | None = None, yaml_path_override: Path | None = None) -> Self
Tries to create an instance by looking at project's component preferences, either from YAML or from the factory. Takes optional overrides for both, which takes a higher precedence.
PARAMETER | DESCRIPTION |
---|---|
config |
The CoreConfig instance containing preferred factory and configuration details.
TYPE:
|
factory_path_override |
A string representing the path to the factory function in the format of "module.submodule:factory_name".
TYPE:
|
yaml_path_override |
A string representing the path to the YAML file containing the Ragstack instance configuration.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
InvalidConfigError
|
If the default factory or configuration can't be found. |
Source code in packages/ragbits-core/src/ragbits/core/utils/config_handling.py
from_config
classmethod
#
Create an instance of DataLoader
from a configuration dictionary.
PARAMETER | DESCRIPTION |
---|---|
config |
A dictionary containing configuration settings for the data loader.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
An instance of the data loader class initialized with the provided configuration. |
Source code in packages/ragbits-evaluate/src/ragbits/evaluate/dataloaders/base.py
load
async
#
Load the data.
RETURNS | DESCRIPTION |
---|---|
Iterable[EvaluationDataT]
|
The loaded evaluation data. |
RAISES | DESCRIPTION |
---|---|
DataLoaderIncorrectFormatDataError
|
If evaluation dataset is incorrectly formatted. |
Source code in packages/ragbits-evaluate/src/ragbits/evaluate/dataloaders/base.py
map
async
#
Map the dataset to the document search data schema.
PARAMETER | DESCRIPTION |
---|---|
dataset |
The dataset to map.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Iterable[DocumentSearchData]
|
The document search data. |
Source code in packages/ragbits-evaluate/src/ragbits/evaluate/dataloaders/document_search.py
ragbits.evaluate.dataloaders.question_answer.QuestionAnswerDataLoader
#
QuestionAnswerDataLoader(source: Source, *, split: str = 'data', question_key: str = 'question', answer_key: str = 'answer', context_key: str = 'context')
Bases: DataLoader[QuestionAnswerData]
Question answer evaluation data loader.
The source used for this data loader should point to a file that can be loaded by Hugging Face.
Initialize the question answer data loader.
PARAMETER | DESCRIPTION |
---|---|
source |
The source to load the data from.
TYPE:
|
split |
The split to load the data from.
TYPE:
|
question_key |
The dataset column name that contains the question.
TYPE:
|
answer_key |
The dataset column name that contains the answer.
TYPE:
|
context_key |
The dataset column name that contains the context. Context is optional.
TYPE:
|
Source code in packages/ragbits-evaluate/src/ragbits/evaluate/dataloaders/question_answer.py
subclass_from_config
classmethod
#
Initializes the class with the provided configuration. May return a subclass of the class, if requested by the configuration.
PARAMETER | DESCRIPTION |
---|---|
config |
A model containing configuration details for the class.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
An instance of the class initialized with the provided configuration. |
RAISES | DESCRIPTION |
---|---|
InvalidConfigError
|
The class can't be found or is not a subclass of the current class. |
Source code in packages/ragbits-core/src/ragbits/core/utils/config_handling.py
subclass_from_factory
classmethod
#
Creates the class using the provided factory function. May return a subclass of the class, if requested by the factory. Supports both synchronous and asynchronous factory functions.
PARAMETER | DESCRIPTION |
---|---|
factory_path |
A string representing the path to the factory function in the format of "module.submodule:factory_name".
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
An instance of the class initialized with the provided factory function. |
RAISES | DESCRIPTION |
---|---|
InvalidConfigError
|
The factory can't be found or the object returned is not a subclass of the current class. |
Source code in packages/ragbits-core/src/ragbits/core/utils/config_handling.py
preferred_subclass
classmethod
#
preferred_subclass(config: CoreConfig, factory_path_override: str | None = None, yaml_path_override: Path | None = None) -> Self
Tries to create an instance by looking at project's component preferences, either from YAML or from the factory. Takes optional overrides for both, which takes a higher precedence.
PARAMETER | DESCRIPTION |
---|---|
config |
The CoreConfig instance containing preferred factory and configuration details.
TYPE:
|
factory_path_override |
A string representing the path to the factory function in the format of "module.submodule:factory_name".
TYPE:
|
yaml_path_override |
A string representing the path to the YAML file containing the Ragstack instance configuration.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
InvalidConfigError
|
If the default factory or configuration can't be found. |
Source code in packages/ragbits-core/src/ragbits/core/utils/config_handling.py
from_config
classmethod
#
Create an instance of DataLoader
from a configuration dictionary.
PARAMETER | DESCRIPTION |
---|---|
config |
A dictionary containing configuration settings for the data loader.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
An instance of the data loader class initialized with the provided configuration. |
Source code in packages/ragbits-evaluate/src/ragbits/evaluate/dataloaders/base.py
load
async
#
Load the data.
RETURNS | DESCRIPTION |
---|---|
Iterable[EvaluationDataT]
|
The loaded evaluation data. |
RAISES | DESCRIPTION |
---|---|
DataLoaderIncorrectFormatDataError
|
If evaluation dataset is incorrectly formatted. |
Source code in packages/ragbits-evaluate/src/ragbits/evaluate/dataloaders/base.py
map
async
#
Map the dataset to the question answer data schema.
PARAMETER | DESCRIPTION |
---|---|
dataset |
The dataset to map.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Iterable[QuestionAnswerData]
|
The question answer data. |