Hacker News Summaries with Custom Components


by Tuana Celik: Twitter, LinkedIn

πŸ“š Check out the Customizing RAG Pipelines to Summarize Latest Hacker News Posts with Haystack 2.0 Preview article for a detailed run through of this example.

Install dependencies

!pip install newspaper3k
!pip install haystack-ai

Create a Custom Haystack 2.0 Component

This HackernewsNewestFetcher ferches the last_k newest posts on Hacker News and returns the contents as a List of Haystack Document objects

from typing import List
from haystack import component, Document
from newspaper import Article
import requests

@component
class HackernewsNewestFetcher():

  @component.output_types(articles=List[Document])
  def run(self, last_k: int):
    newest_list = requests.get(url='https://hacker-news.firebaseio.com/v0/newstories.json?print=pretty')
    articles = []
    for id in newest_list.json()[0:last_k]:
      article = requests.get(url=f"https://hacker-news.firebaseio.com/v0/item/{id}.json?print=pretty")
      if 'url' in article.json():
        articles.append(article.json()['url'])

    docs = []
    for url in articles:
      try:
        article = Article(url)
        article.download()
        article.parse()
        docs.append(Document(content=article.text, meta={'title': article.title, 'url': url}))
      except:
        print(f"Couldn't download {url}, skipped")
    return {'articles': docs}

Create a Haystack 2.0 RAG Pipeline

This pipeline uses the components available in the Haystack 2.0 preview package at time of writing (22 September 2023) as well as the custom component we’ve created above.

The end result is a RAG pipeline designed to provide a list of summaries for each of the last_k posts on Hacker News, followes by the source URL.

from getpass import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass("OpenAI Key: ")
from haystack import Pipeline
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import OpenAIGenerator

prompt_template = """
You will be provided a few of the latest posts in HackerNews, followed by their URL.
For each post, provide a brief summary followed by the URL the full post can be found in.

Posts:
{% for article in articles %}
  {{article.content}}
  URL: {{article.meta['url']}}
{% endfor %}
"""

prompt_builder = PromptBuilder(template=prompt_template)
llm = OpenAIGenerator(model="gpt-4")
fetcher = HackernewsNewestFetcher()

pipe = Pipeline()
pipe.add_component("hackernews_fetcher", fetcher)
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)

pipe.connect("hackernews_fetcher.articles", "prompt_builder.articles")
pipe.connect("prompt_builder.prompt", "llm.prompt")
result = pipe.run(data={"hackernews_fetcher": {"last_k": 3}})
print(result['llm']['replies'][0])