Document Processing#
ragbits.document_search.ingestion.document_processor.DocumentProcessorRouter
#
The DocumentProcessorRouter is responsible for routing the document to the correct provider based on the document metadata such as the document type.
Source code in packages/ragbits-document-search/src/ragbits/document_search/ingestion/document_processor.py
from_dict_to_providers_config
staticmethod
#
Creates ProvidersConfig from dictionary config. Example of the dictionary config: { "txt": { { "type": "UnstructuredProvider" } } }
PARAMETER | DESCRIPTION |
---|---|
dict_config |
The dictionary with configuration.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
ProvidersConfig
|
ProvidersConfig object. |
Source code in packages/ragbits-document-search/src/ragbits/document_search/ingestion/document_processor.py
from_config
classmethod
#
from_config(providers_config: ProvidersConfig | None = None) -> DocumentProcessorRouter
Create a DocumentProcessorRouter from a configuration. If the configuration is not provided, the default configuration will be used. If the configuration is provided, it will be merged with the default configuration, overriding the default values for the document types that are defined in the configuration. Example of the configuration: { DocumentType.TXT: YourCustomProviderClass(), DocumentType.PDF: UnstructuredProvider(), }
PARAMETER | DESCRIPTION |
---|---|
providers_config |
The dictionary with the providers configuration, mapping the document types to the provider class.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
DocumentProcessorRouter
|
The DocumentProcessorRouter. |
Source code in packages/ragbits-document-search/src/ragbits/document_search/ingestion/document_processor.py
get_provider
#
Get the provider for the document.
PARAMETER | DESCRIPTION |
---|---|
document_meta |
The document metadata.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
BaseProvider
|
The provider for processing the document. |
RAISES | DESCRIPTION |
---|---|
ValueError
|
If no provider is found for the document type. |