Hacker News Summaries with Custom Components
Last Updated: September 19, 2024
by Tuana Celik: Twitter, LinkedIn
π Check out the Customizing RAG Pipelines to Summarize Latest Hacker News Posts with Haystack 2.0 Preview article for a detailed run through of this example.
Install dependencies
!pip install newspaper3k
!pip install haystack-ai
Create a Custom Haystack 2.0 Component
This HackernewsNewestFetcher
ferches the last_k
newest posts on Hacker News and returns the contents as a List of Haystack Document objects
from typing import List
from haystack import component, Document
from newspaper import Article
import requests
@component
class HackernewsNewestFetcher():
@component.output_types(articles=List[Document])
def run(self, last_k: int):
newest_list = requests.get(url='https://hacker-news.firebaseio.com/v0/newstories.json?print=pretty')
articles = []
for id in newest_list.json()[0:last_k]:
article = requests.get(url=f"https://hacker-news.firebaseio.com/v0/item/{id}.json?print=pretty")
if 'url' in article.json():
articles.append(article.json()['url'])
docs = []
for url in articles:
try:
article = Article(url)
article.download()
article.parse()
docs.append(Document(content=article.text, meta={'title': article.title, 'url': url}))
except:
print(f"Couldn't download {url}, skipped")
return {'articles': docs}
Create a Haystack 2.0 RAG Pipeline
This pipeline uses the components available in the Haystack 2.0 preview package at time of writing (22 September 2023) as well as the custom component we’ve created above.
The end result is a RAG pipeline designed to provide a list of summaries for each of the last_k
posts on Hacker News, followes by the source URL.
from getpass import getpass
import os
os.environ["OPENAI_API_KEY"] = getpass("OpenAI Key: ")
from haystack import Pipeline
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import OpenAIGenerator
prompt_template = """
You will be provided a few of the latest posts in HackerNews, followed by their URL.
For each post, provide a brief summary followed by the URL the full post can be found in.
Posts:
{% for article in articles %}
{{article.content}}
URL: {{article.meta['url']}}
{% endfor %}
"""
prompt_builder = PromptBuilder(template=prompt_template)
llm = OpenAIGenerator(model="gpt-4")
fetcher = HackernewsNewestFetcher()
pipe = Pipeline()
pipe.add_component("hackernews_fetcher", fetcher)
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)
pipe.connect("hackernews_fetcher.articles", "prompt_builder.articles")
pipe.connect("prompt_builder.prompt", "llm.prompt")
result = pipe.run(data={"hackernews_fetcher": {"last_k": 3}})
print(result['llm']['replies'][0])