Integration: OpenSearch
A Document Store for storing and retrieval from OpenSearch
Table of Contents
Haystack 2.0
Installation
Use pip
to install OpenSearch:
pip install opensearch-haystack
Usage
Once installed, initialize your OpenSearch database to use it with Haystack 2.0:
from haystack_integrations.document_stores.opensearch import OpenSearchDocumentStore
document_store = OpenSearchDocumentStore()
Writing Documents to OpenSearchDocumentStore
To write documents to OpenSearchDocumentStore
, create an indexing pipeline.
from haystack.components.file_converters import TextFileToDocument
from haystack.components.writers import DocumentWriter
indexing = Pipeline()
indexing.add_component("converter", TextFileToDocument())
indexing.add_component("writer", DocumentWriter(document_store))
indexing.connect("converter", "writer")
indexing.run({"converter": {"paths": file_paths}})
License
opensearch-haystack
is distributed under the terms of the
Apache-2.0 license.
Haystack 1.x
You can use OpenSearch in your Haystack pipelines with the OpenSearchDocumentStore
For a detailed overview of all the available methods and settings for the OpenSearchDocumentStore
, visit the Haystack
API Reference
Installation (1.x)
pip install farm-haystack[opensearch]
Usage (1.x)
Once installed and running, you can start using OpenSearch with Haystack by initializing it:
from haystack.document_stores import OpenSearchDocumentStore
document_store = OpenSearchDocumentStore()
Writing Documents to OpenSearchDocumentStore
To write documents to your OpenSearchDocumentStore
, create an indexing pipeline, or use the write_documents()
function.
For this step, you may make use of the available
FileConverters and
PreProcessors, as well as other
Integrations that might help you fetch data from other resources.
Indexing Pipeline
from haystack import Pipeline
from haystack.document_stores import OpenSearchDocumentStore
from haystack.nodes import PDFToTextConverter, PreProcessor
document_store = OpenSearchDocumentStore()
converter = PDFToTextConverter()
preprocessor = PreProcessor()
indexing_pipeline = Pipeline()
indexing_pipeline.add_node(component=converter, name="PDFConverter", inputs=["File"])
indexing_pipeline.add_node(component=preprocessor, name="PreProcessor", inputs=["PDFConverter"])
indexing_pipeline.add_node(component=document_store, name="DocumentStore", inputs=["PreProcessor"])
indexing_pipeline.run(file_paths=["filename.pdf"])
Using OpenSearch in a Query Pipeline
Once you have documents in your OpenSearchDocumentStore
, it’s ready to be used in any Haystack pipeline. For example, below is a pipeline that makes use of the
“deepset/question-generation” prompt that is designed to generate questions for the retrieved documents. If our OpenSearchDocumentStore
had documents about food in it, you could generate questions about “Pizzas” in the following way:
from haystack import Pipeline
from haystack.document_stores import OpenSearchDocumentStore
from haystack.nodes import BM25Retriever, PromptNode
document_store = OpenSearchDocumentStore()
retriever = BM25Retriever(document_sotre = document_store)
prompt_node = PromptNode(model_name_or_path = "gpt-4",
api_key = "YOUR_OPENAI_KEY",
default_prompt_template = "deepset/question-generation")
query_pipeline = Pipeline()
query_pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"])
query_pipeline.add_node(component=prompt_node, name="PromptNode", inputs=["Retriever"])
query_pipeline.run(query = "Pizzas")