Document Search#
ragbits.document_search.DocumentSearchOptions
#
Bases: Options
, Generic[QueryRephraserOptionsT, VectorStoreOptionsT, RerankerOptionsT]
Object representing the options for the document search.
ATTRIBUTE | DESCRIPTION |
---|---|
query_rephraser_options |
The options for the query rephraser.
TYPE:
|
vector_store_options |
The options for the vector store.
TYPE:
|
reranker_options |
The options for the reranker.
TYPE:
|
model_config
class-attribute
instance-attribute
#
query_rephraser_options
class-attribute
instance-attribute
#
vector_store_options
class-attribute
instance-attribute
#
reranker_options
class-attribute
instance-attribute
#
dict
#
dict() -> dict[str, Any]
Creates a dictionary representation of the Options instance. If a value is None, it will be replaced with a provider-specific not-given sentinel.
RETURNS | DESCRIPTION |
---|---|
dict[str, Any]
|
A dictionary representation of the Options instance. |
Source code in packages/ragbits-core/src/ragbits/core/options.py
ragbits.document_search.DocumentSearch
#
DocumentSearch(vector_store: VectorStore[VectorStoreOptionsT], *, query_rephraser: QueryRephraser[QueryRephraserOptionsT] | None = None, reranker: Reranker[RerankerOptionsT] | None = None, default_options: DocumentSearchOptions[QueryRephraserOptionsT, VectorStoreOptionsT, RerankerOptionsT] | None = None, ingest_strategy: IngestStrategy | None = None, parser_router: DocumentParserRouter | None = None, enricher_router: ElementEnricherRouter | None = None)
Bases: ConfigurableComponent[DocumentSearchOptions[QueryRephraserOptionsT, VectorStoreOptionsT, RerankerOptionsT]]
Main entrypoint to the document search functionality. It provides methods for document retrieval and ingestion.
Retrieval
- Uses QueryRephraser to rephrase the query.
- Uses VectorStore to retrieve the most relevant elements.
- Uses Reranker to rerank the elements.
Ingestion
- Uses IngestStrategy to orchestrate ingestion process.
- Uses DocumentParserRouter to route the document to the appropriate DocumentParser to parse the content.
- Uses ElementEnricherRouter to redirect the element to the appropriate ElementEnricher to enrich the element.
Initialize the DocumentSearch instance.
PARAMETER | DESCRIPTION |
---|---|
vector_store |
The vector store to use for retrieval.
TYPE:
|
query_rephraser |
The query rephraser to use for retrieval.
TYPE:
|
reranker |
The reranker to use for retrieval.
TYPE:
|
default_options |
The default options for the search.
TYPE:
|
ingest_strategy |
The ingestion strategy to use for ingestion.
TYPE:
|
parser_router |
The document parser router to use for ingestion.
TYPE:
|
enricher_router |
The element enricher router to use for ingestion.
TYPE:
|
Source code in packages/ragbits-document-search/src/ragbits/document_search/_main.py
options_cls
class-attribute
instance-attribute
#
options_cls: type[DocumentSearchOptions] = DocumentSearchOptions
ingest_strategy
instance-attribute
#
ingest_strategy = ingest_strategy or SequentialIngestStrategy()
subclass_from_config
classmethod
#
Initializes the class with the provided configuration. May return a subclass of the class, if requested by the configuration.
PARAMETER | DESCRIPTION |
---|---|
config |
A model containing configuration details for the class.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
An instance of the class initialized with the provided configuration. |
RAISES | DESCRIPTION |
---|---|
InvalidConfigError
|
The class can't be found or is not a subclass of the current class. |
Source code in packages/ragbits-core/src/ragbits/core/utils/config_handling.py
subclass_from_factory
classmethod
#
Creates the class using the provided factory function. May return a subclass of the class, if requested by the factory.
PARAMETER | DESCRIPTION |
---|---|
factory_path |
A string representing the path to the factory function in the format of "module.submodule:factory_name".
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Self
|
An instance of the class initialized with the provided factory function. |
RAISES | DESCRIPTION |
---|---|
InvalidConfigError
|
The factory can't be found or the object returned is not a subclass of the current class. |
Source code in packages/ragbits-core/src/ragbits/core/utils/config_handling.py
from_config
classmethod
#
Creates and returns an instance of the DocumentSearch class from the given configuration.
PARAMETER | DESCRIPTION |
---|---|
config |
A configuration object containing the configuration for initializing the DocumentSearch instance.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
DocumentSearch
|
An initialized instance of the DocumentSearch class.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
ValidationError
|
If the configuration doesn't follow the expected format. |
InvalidConfigError
|
If one of the specified classes can't be found or is not the correct type. |
Source code in packages/ragbits-document-search/src/ragbits/document_search/_main.py
preferred_subclass
classmethod
#
preferred_subclass(config: CoreConfig, factory_path_override: str | None = None, yaml_path_override: Path | None = None) -> Self
Tries to create an instance by looking at project's component prefferences, either from YAML or from the factory. Takes optional overrides for both, which takes a higher precedence.
PARAMETER | DESCRIPTION |
---|---|
config |
The CoreConfig instance containing preferred factory and configuration details.
TYPE:
|
factory_path_override |
A string representing the path to the factory function in the format of "module.submodule:factory_name".
TYPE:
|
yaml_path_override |
A string representing the path to the YAML file containing the Ragstack instance configuration. Looks for the configuration under the key "document_search", and if not found, instantiates the class with the preferred configuration for each component.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
InvalidConfigError
|
If the default factory or configuration can't be found. |
Source code in packages/ragbits-document-search/src/ragbits/document_search/_main.py
search
async
#
search(query: str, options: DocumentSearchOptions[QueryRephraserOptionsT, VectorStoreOptionsT, RerankerOptionsT] | None = None) -> Sequence[Element]
Search for the most relevant chunks for a query.
PARAMETER | DESCRIPTION |
---|---|
query |
The query to search for.
TYPE:
|
options |
The document search retrieval options.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Sequence[Element]
|
A list of chunks. |
Source code in packages/ragbits-document-search/src/ragbits/document_search/_main.py
ingest
async
#
ingest(documents: str | Iterable[DocumentMeta | Document | Source], fail_on_error: bool = True) -> IngestExecutionResult
Ingest documents into the search index.
PARAMETER | DESCRIPTION |
---|---|
documents |
A string representing a source-specific URI (e.g., "gcs://bucket/") or an iterable of
TYPE:
|
fail_on_error |
If True, raises IngestExecutionError when any errors are encountered during ingestion. If False, returns all errors encountered in the IngestExecutionResult.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
IngestExecutionResult
|
An IngestExecutionResult containing the results of the ingestion process. |
RAISES | DESCRIPTION |
---|---|
IngestExecutionError
|
If fail_on_error is True and any errors are encountered during ingestion. |