How to use local LLMs with Ragbits#
This guide explains how to set up and use local LLMs in Ragbits. It covers installation, model initialization, and configuration options.
Setting up and using a local models#
To use a local LLMs, you need to install the 'local' extra requirements:
Local LLMs in Ragbits use AutoModelForCausalLM
with device_map="auto"
. This setting automatically fills all available space on the GPU(s) first, then the CPU, and finally the hard drive (the absolute slowest option) if there is still not enough memory. See the Hugging Face documentation for more details.
Using a local model is as simple as:
from ragbits.core.llms.local import LocalLLM
local_llm = LocalLLM(model_name="mistral-7b")
response = local_llm.generate("Tell me a science fact.")
print(response)
The model_name
parameter can be specified in several ways:
- a string representing the model ID of a pretrained model hosted on Hugging Face Hub, such as "mistral-7b"
,
- a path to a directory containing a model, e.g., "./my_model_directory/"
,
- a path or URL to a saved configuration JSON file, e.g., "./my_model_directory/configuration.json"
.
Local LLM Options#
The LocalLLMOptions
class provides a set of parameters to fine-tune the behavior of local LLMs. These options described in the HuggingFace documentation.
Example usage:
from ragbits.core.llms.local import LocalLLM, LocalLLMOptions
options = LocalLLMOptions(
temperature=0.7,
max_new_tokens=100,
do_sample=True,
top_p=0.9
)
local_llm = LocalLLM(model_name="mistral-7b", default_options=options)
response = local_llm.generate("Explain quantum mechanics in simple terms.")
print(response)