Hybrid Vector Store & Fusion Strategies#
ragbits.core.vector_stores.hybrid.HybridSearchVectorStore
#
HybridSearchVectorStore(*vector_stores: VectorStore, retrieval_strategy: HybridRetrivalStrategy | None = None)
Bases: VectorStore
A vector store that takes multiple vector store objects and proxies requests to them, returning the union of results.
Constructs a new HybridSearchVectorStore instance.
PARAMETER | DESCRIPTION |
---|---|
vector_stores |
The vector stores to proxy requests to.
TYPE:
|
retrieval_strategy |
The retrieval strategy to use when combining results, uses OrderedHybridRetrivalStrategy by default.
TYPE:
|
Source code in packages/ragbits-core/src/ragbits/core/vector_stores/hybrid.py
retrieval_strategy
instance-attribute
#
retrieval_strategy = retrieval_strategy or OrderedHybridRetrivalStrategy()
subclass_from_config
classmethod
#
Initializes the class with the provided configuration. May return a subclass of the class, if requested by the configuration.
PARAMETER | DESCRIPTION |
---|---|
config |
A model containing configuration details for the class.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
An instance of the class initialized with the provided configuration. |
RAISES | DESCRIPTION |
---|---|
InvalidConfigError
|
The class can't be found or is not a subclass of the current class. |
Source code in packages/ragbits-core/src/ragbits/core/utils/config_handling.py
subclass_from_factory
classmethod
#
Creates the class using the provided factory function. May return a subclass of the class, if requested by the factory.
PARAMETER | DESCRIPTION |
---|---|
factory_path |
A string representing the path to the factory function in the format of "module.submodule:factory_name".
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
An instance of the class initialized with the provided factory function. |
RAISES | DESCRIPTION |
---|---|
InvalidConfigError
|
The factory can't be found or the object returned is not a subclass of the current class. |
Source code in packages/ragbits-core/src/ragbits/core/utils/config_handling.py
preferred_subclass
classmethod
#
preferred_subclass(config: CoreConfig, factory_path_override: str | None = None, yaml_path_override: Path | None = None) -> Self
Tries to create an instance by looking at project's component preferences, either from YAML or from the factory. Takes optional overrides for both, which takes a higher precedence.
PARAMETER | DESCRIPTION |
---|---|
config |
The CoreConfig instance containing preferred factory and configuration details.
TYPE:
|
factory_path_override |
A string representing the path to the factory function in the format of "module.submodule:factory_name".
TYPE:
|
yaml_path_override |
A string representing the path to the YAML file containing the Ragstack instance configuration.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
InvalidConfigError
|
If the default factory or configuration can't be found. |
Source code in packages/ragbits-core/src/ragbits/core/utils/config_handling.py
from_config
classmethod
#
Initializes the class with the provided configuration.
PARAMETER | DESCRIPTION |
---|---|
config |
A dictionary containing configuration details for the class.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
An instance of the class initialized with the provided configuration. |
Source code in packages/ragbits-core/src/ragbits/core/utils/config_handling.py
store
async
#
store(entries: list[VectorStoreEntry]) -> None
Store entries in the vector stores.
Sends entries to all vector stores to be stored, although individual vector stores are free to implement their own logic regarding which entries to store. For example, some vector stores may only store entries with specific type of content (images, text, etc.).
PARAMETER | DESCRIPTION |
---|---|
entries |
The entries to store.
TYPE:
|
Source code in packages/ragbits-core/src/ragbits/core/vector_stores/hybrid.py
retrieve
async
#
retrieve(text: str, options: VectorStoreOptions | None = None) -> list[VectorStoreResult]
Retrieve entries from the vector stores most similar to the provided text. The results are combined using the retrieval strategy provided in the constructor.
PARAMETER | DESCRIPTION |
---|---|
text |
The text to query the vector store with.
TYPE:
|
options |
The options for querying the vector stores.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
list[VectorStoreResult]
|
The entries. |
Source code in packages/ragbits-core/src/ragbits/core/vector_stores/hybrid.py
remove
async
#
remove(ids: list[UUID]) -> None
Remove entries from all vector stores.
PARAMETER | DESCRIPTION |
---|---|
ids |
The list of entries' IDs to remove.
TYPE:
|
Source code in packages/ragbits-core/src/ragbits/core/vector_stores/hybrid.py
list
async
#
list(where: WhereQuery | None = None, limit: int | None = None, offset: int = 0) -> list[VectorStoreEntry]
List entries from the vector stores. The entries can be filtered, limited and offset. Vector stores are queried in the order they were provided in the constructor.
PARAMETER | DESCRIPTION |
---|---|
where |
The filter dictionary - the keys are the field names and the values are the values to filter by. Not specifying the key means no filtering.
TYPE:
|
limit |
The maximum number of entries to return.
TYPE:
|
offset |
The number of entries to skip.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
list[VectorStoreEntry]
|
The entries. |
Source code in packages/ragbits-core/src/ragbits/core/vector_stores/hybrid.py
ragbits.core.vector_stores.hybrid_strategies.OrderedHybridRetrivalStrategy
#
Bases: HybridRetrivalStrategy
A class that orders the results by score and deduplicates them by choosing the first occurrence of each entry. This algorithm is also known as "Relative Score Fusion".
Constructs a new OrderedHybridRetrivalStrategy instance.
PARAMETER | DESCRIPTION |
---|---|
sum_scores |
if True sums the scores of the same entries, otherwise keeps the best score (i.e., the biggest one). Summing scores boosts the results that are present in results from multiple vector stores.
TYPE:
|
Source code in packages/ragbits-core/src/ragbits/core/vector_stores/hybrid_strategies.py
join
#
Joins the multiple lists of results into a single list.
PARAMETER | DESCRIPTION |
---|---|
results |
The lists of results to join.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
list[VectorStoreResult]
|
The joined list of results. |
Source code in packages/ragbits-core/src/ragbits/core/vector_stores/hybrid_strategies.py
ragbits.core.vector_stores.hybrid_strategies.ReciprocalRankFusion
#
Bases: HybridRetrivalStrategy
An implementation of Reciprocal Rank Fusion (RRF) for combining search results, based on the paper "Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods": https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf
Constructs a new ReciprocalRankFusion instance.
PARAMETER | DESCRIPTION |
---|---|
k_constant |
The "k" constant used in the RRF formula, meant to mitigate the impact of high rankings by outlier systems. The value of 60 is recommended by the authors of the RRF paper. Qdrant uses a value of 2.
TYPE:
|
sum_scores |
if True sums the scores of the same entries, otherwise keeps the best score. (i.e., the biggest one). Summing scores boosts the results that are present in results from multiple vector stores. Not summing will result in a very simple behavior: the list will include first results from all vector stores, then second results (excluding the duplicates), and so on. The original version of RRF sums the scores, so the default value is True.
TYPE:
|
Source code in packages/ragbits-core/src/ragbits/core/vector_stores/hybrid_strategies.py
join
#
Joins the multiple lists of results into a single list using Reciprocal Rank Fusion.
PARAMETER | DESCRIPTION |
---|---|
results |
The lists of results to join.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
list[VectorStoreResult]
|
The joined list of results. |
Source code in packages/ragbits-core/src/ragbits/core/vector_stores/hybrid_strategies.py
ragbits.core.vector_stores.hybrid_strategies.DistributionBasedScoreFusion
#
Bases: HybridRetrivalStrategy
An implementation of Distribution-Based Score Fusion (DBSF) for combining search results, based on the "Distribution-Based Score Fusion (DBSF), a new approach to Vector Search Ranking" post: https://medium.com/plain-simple-software/distribution-based-score-fusion-dbsf-a-new-approach-to-vector-search-ranking-f87c37488b18
Constructs a new DistributionBasedScoreFusion instance.
PARAMETER | DESCRIPTION |
---|---|
sum_scores |
if True sums the scores of the same entries, otherwise keeps the best score. (i.e., the biggest one). Summing scores boosts the results that are present in results from multiple vector stores. The original DBSF article remains neutral on this matter, so the default value is False. Many implementations (like Qdrant) use summing.
TYPE:
|
Source code in packages/ragbits-core/src/ragbits/core/vector_stores/hybrid_strategies.py
join
#
Joins the multiple lists of results into a single list using Distribution-Based Score Fusion.
PARAMETER | DESCRIPTION |
---|---|
results |
The lists of results to join.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
list[VectorStoreResult]
|
The joined list of results. |