Web QA with Mixtral-8x7B-Instruct-v0.1

_{Last Updated:
June 6, 2024}

Colab by Tuana Celik - ( LI & Twitter)

Quick guide to building with the new Mixtral-8x7B-Instruct-v0.1 model by Mistral AI. This Colab uses Haystack 2.0 (you can read the announcement here).

Why is Mixtral different?

This is quite a cool new model. It’s the first of its kind as an open-source model in that it is a mixture (get Mix tral) of 8 7B parameter models. This article on Hugging Face describes this phenomenon way better and in way more detail than I can. The idea is that it’s a mixture of 8 different “experts”, with a router that routes a query to a given expert (not entirely accurate, but a good way to simplify). This means not all 8 models are inferenced, making it remarkably fast too!

Use the HuggingFaceAPIGenerator to query the model on its own
Add the generator to a full RAG Pipeline (on the web)

Screenshot 2023-12-13 at 17.46.33.png

Install dependencies

!pip install haystack-ai trafilatura sentence_transformers "huggingface_hub>=0.22.0"

Prompt the Model - Standalone

We are using Hugging Face Serverless Inference API.

This requires an API key: https://huggingface.co/settings/tokens
You should also accept Mistral conditions here: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1

import os
from getpass import getpass

os.environ["HF_API_TOKEN"] = getpass("Enter Hugging Face token: ")

from haystack.components.generators import HuggingFaceAPIGenerator

generator = HuggingFaceAPIGenerator(
    api_type="serverless_inference_api",
    api_params={"model": "mistralai/Mixtral-8x7B-Instruct-v0.1"})

result = generator.run("What's Natural Language Processing?", generation_kwargs={"max_new_tokens": 350})
print(result["replies"][0])

Use the Model in a full RAG pipeline (on the web)

Here, we will be using the same generator component as the above, in a full RAG pipeline. You can change this pipeline to use your own data source (such as a vector databse, Notion, documentation) instead of the LinkContentFetcher we are using here.

from haystack.components.fetchers.link_content import LinkContentFetcher
from haystack.components.converters import HTMLToDocument
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.rankers import TransformersSimilarityRanker
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack import Pipeline

fetcher = LinkContentFetcher()
converter = HTMLToDocument()
document_splitter = DocumentSplitter(split_by="word", split_length=50)
similarity_ranker = TransformersSimilarityRanker(top_k=3)

prompt_template = """
According to these documents:

{% for doc in documents %}
  {{ doc.content }}
{% endfor %}

Answer the given question: {{question}}
Answer:
"""
prompt_builder = PromptBuilder(template=prompt_template)

pipeline = Pipeline()
pipeline.add_component("fetcher", fetcher)
pipeline.add_component("converter", converter)
pipeline.add_component("splitter", document_splitter)
pipeline.add_component("ranker", similarity_ranker)
pipeline.add_component("prompt_builder", prompt_builder)
pipeline.add_component("llm", generator)

pipeline.connect("fetcher.streams", "converter.sources")
pipeline.connect("converter.documents", "splitter.documents")
pipeline.connect("splitter.documents", "ranker.documents")
pipeline.connect("ranker.documents", "prompt_builder.documents")
pipeline.connect("prompt_builder.prompt", "llm.prompt")

question = "What do graphs have to do with Haystack?"
result = pipeline.run({"prompt_builder": {"question": question},
                   "ranker": {"query": question},
                   "fetcher": {"urls": ["https://haystack.deepset.ai/blog/introducing-haystack-2-beta-and-advent"]},
                    "llm":{"generation_kwargs": {"max_new_tokens": 350}}})

print(result['llm']['replies'][0])