RAG Pipeline Using FastEmbed for Embeddings Generationn
Last Updated: September 19, 2024
FastEmbed is a lightweight, fast, Python library built for embedding generation, maintained by Qdrant. It is suitable for generating embeddings efficiently and fast on CPU-only machines.
In this notebook, we will use FastEmbed-Haystack integration to generate embeddings for indexing and RAG.
Haystack 2.0 Useful Sources
Install dependencies
!pip install fastembed-haystack qdrant-haystack wikipedia transformers
Download contents and create docs
favourite_bands="""Audioslave
Green Day
Muse (band)
Foo Fighters (band)
Nirvana (band)""".split("\n")
import wikipedia
from haystack.dataclasses import Document
raw_docs=[]
for title in favourite_bands:
page = wikipedia.page(title=title, auto_suggest=False)
doc = Document(content=page.content, meta={"title": page.title, "url":page.url})
raw_docs.append(doc)
Clean, split and index documents on Qdrant
from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
from haystack.components.preprocessors import DocumentCleaner, DocumentSplitter
from haystack_integrations.components.embedders.fastembed import FastembedDocumentEmbedder
from haystack.document_stores.types import DuplicatePolicy
document_store = QdrantDocumentStore(
":memory:",
embedding_dim =384,
recreate_index=True,
return_embedding=True,
wait_result_from_api=True,
)
cleaner = DocumentCleaner()
splitter = DocumentSplitter(split_by='sentence', split_length=3)
splitted_docs = splitter.run(cleaner.run(raw_docs)["documents"])
len(splitted_docs["documents"])
493
FastEmbed Document Embedder
Here we are initializing the FastEmbed Document Embedder and using it to generate embeddings for the documents.
We are using a small and good model, BAAI/bge-small-en-v1.5
and specifying the parallel
parameter to 0 to use all available CPU cores for embedding generation.
β οΈ If you are running this notebook on Google Colab, please note that Google Colab only provides 2 CPU cores, so the embedding generation could be not as fast as it can be on a standard machine.
For more information on FastEmbed-Haystack integration, please refer to the documentation and API reference.
document_embedder = FastembedDocumentEmbedder(model="BAAI/bge-small-en-v1.5", parallel = 0, meta_fields_to_embed=["title"])
document_embedder.warm_up()
documents_with_embeddings = document_embedder.run(splitted_docs["documents"])
Fetching 9 files: 0%| | 0/9 [00:00<?, ?it/s]
Fetching 9 files: 100%|ββββββββββ| 9/9 [00:00<00:00, 148034.26it/s]
Fetching 9 files: 100%|ββββββββββ| 9/9 [00:00<00:00, 32458.07it/s]
Fetching 9 files: 100%|ββββββββββ| 9/9 [00:00<00:00, 223365.30it/s]
Fetching 9 files: 100%|ββββββββββ| 9/9 [00:00<00:00, 55758.84it/s]
Fetching 9 files: 100%|ββββββββββ| 9/9 [00:00<00:00, 81884.46it/s]
Fetching 9 files: 100%|ββββββββββ| 9/9 [00:00<00:00, 140853.49it/s]
Fetching 9 files: 100%|ββββββββββ| 9/9 [00:00<00:00, 105443.40it/s]
Fetching 9 files: 100%|ββββββββββ| 9/9 [00:00<00:00, 112014.05it/s]
Fetching 9 files: 100%|ββββββββββ| 9/9 [00:00<00:00, 76260.07it/s]
Fetching 9 files: 100%|ββββββββββ| 9/9 [00:00<00:00, 123766.35it/s]
Fetching 9 files: 100%|ββββββββββ| 9/9 [00:00<00:00, 63443.25it/s]
Fetching 9 files: 100%|ββββββββββ| 9/9 [00:00<00:00, 55431.33it/s]
Fetching 9 files: 100%|ββββββββββ| 9/9 [00:00<00:00, 82782.32it/s]
Fetching 9 files: 100%|ββββββββββ| 9/9 [00:00<00:00, 57368.90it/s]
Fetching 9 files: 100%|ββββββββββ| 9/9 [00:00<00:00, 9792.15it/s]
Fetching 9 files: 100%|ββββββββββ| 9/9 [00:00<00:00, 8983.52it/s]
Fetching 9 files: 100%|ββββββββββ| 9/9 [00:00<00:00, 10585.74it/s]
Fetching 9 files: 100%|ββββββββββ| 9/9 [00:00<00:00, 59634.65it/s]
Fetching 9 files: 100%|ββββββββββ| 9/9 [00:00<00:00, 46260.71it/s]
Fetching 9 files: 100%|ββββββββββ| 9/9 [00:00<00:00, 36900.04it/s]
Calculating embeddings: 100%|ββββββββββ| 493/493 [00:35<00:00, 13.73it/s]
document_store.write_documents(documents_with_embeddings.get("documents"), policy=DuplicatePolicy.OVERWRITE)
500it [00:00, 4262.26it/s]
493
RAG Pipeline using Zephyr-7B
from haystack import Pipeline
from haystack_integrations.components.retrievers.qdrant import QdrantEmbeddingRetriever
from haystack_integrations.components.embedders.fastembed import FastembedTextEmbedder
from haystack.components.generators import HuggingFaceAPIGenerator
from haystack.components.builders.prompt_builder import PromptBuilder
from pprint import pprint
# Enter your Hugging Face Token
# this is needed to use Zephyr, calling the free Hugging Face Inference API
from getpass import getpass
import os
os.environ["HF_API_TOKEN"] = getpass("Enter your Hugging Face Token: https://huggingface.co/settings/tokens ")
generator = HuggingFaceAPIGenerator(api_type="serverless_inference_api",
api_params={"model": "HuggingFaceH4/zephyr-7b-beta"},
generation_kwargs={"max_new_tokens":500})
generator.warm_up()
# define the prompt template
prompt_template = """
Using only the information contained in these documents return a brief answer (max 50 words).
If the answer cannot be inferred from the documents, respond \"I don't know\".
Documents:
{% for doc in documents %}
{{ doc.content }}
{% endfor %}
Question: {{question}}
Answer:
"""
query_pipeline = Pipeline()
# FastembedTextEmbedder is used to embed the query
query_pipeline.add_component("text_embedder", FastembedTextEmbedder(model="BAAI/bge-small-en-v1.5", parallel = 0, prefix="query:"))
query_pipeline.add_component("retriever", QdrantEmbeddingRetriever(document_store=document_store))
query_pipeline.add_component("prompt_builder", PromptBuilder(template=prompt_template))
query_pipeline.add_component("generator", generator)
# connect the components
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
query_pipeline.connect("retriever.documents", "prompt_builder.documents")
query_pipeline.connect("prompt_builder", "generator")
Try the pipeline
question = "Who is Dave Grohl?"
results = query_pipeline.run(
{ "text_embedder": {"text": question},
"prompt_builder": {"question": question},
}
)
Calculating embeddings: 100%|ββββββββββ| 1/1 [00:00<00:00, 24.62it/s]
for d in results['generator']['replies']:
pprint(d)
(' Dave Grohl is the founder and lead vocalist of the American rock band Foo '
'Fighters, which he formed in 1994 after the breakup of Nirvana, in which he '
'was the drummer.')