Analyze Your Instagram Comments’ Vibe with Apify and Haystack
Last Updated: October 3, 2024
Author: Jiri Spilka (
Apify)
Idea: Bilge Yücel (
deepset.ai)
Ever wondered if your Instagram posts are truly vibrating among your audience? In this cookbook, we’ll show you how to use the Instagram Comment Scraper Actor to download comments from your instagram post and analyze them using a large language model. All performed within the Haystack ecosystem using the apify-haystack integration.
We’ll start by using the Actor to download the comments, clean the data with the DocumentCleaner and then use the OpenAIGenerator to discover the vibe of the Instagram posts.
Install dependencies
!pip install apify-haystack==0.1.4 haystack-ai
Set up the API keys
You need to have an Apify account and obtain APIFY_API_TOKEN.
You also need an OpenAI account and OPENAI_API_KEY
import os
from getpass import getpass
os.environ["APIFY_API_TOKEN"] = getpass("Enter YOUR APIFY_API_TOKEN")
os.environ["OPENAI_API_KEY"] = getpass("Enter YOUR OPENAI_API_KEY")
Enter YOUR APIFY_API_TOKEN··········
Enter YOUR OPENAI_API_KEY··········
Use the Haystack Pipeline to Orchestrate Instagram Comments Scraper, Comments Cleanup, and Analysis Using LLM
Now, let’s decide which post to analyze. We can start with these two posts that might reveal some interesting insights:
@tiffintech
on How to easily keep up with tech?@kamaharishis
on Affordable Care Act
We’ll download the comments using the Instagram Scraper Actor. But first, we need to understand the output format of the Actor.
The output is in the following format:
[
{
"text": "You've just uncovered the goldmine for me 😍 but I still love your news and updates!",
"timestamp": "2024-09-02T16:27:09.000Z",
"ownerUsername": "codingmermaid.ai",
"ownerProfilePicUrl": "....",
"postUrl": "https://www.instagram.com/p/C_a9jcRuJZZ/"
},
{
"text": "Will check it out🙌",
"timestamp": "2024-09-02T16:29:28.000Z",
"ownerUsername": "author.parijat",
"postUrl": "https://www.instagram.com/p/C_a9jcRuJZZ/"
}
]
We will convert this JSON to a Haystack Document using the dataset_mapping_function
as follows
from haystack import Document
def dataset_mapping_function(dataset_item: dict) -> Document:
return Document(content=dataset_item.get("text"), meta={"ownerUsername": dataset_item.get("ownerUsername")})
Once we understand the Actor output format and have the dataset_mapping_function
, we can setup the Haystack component to enable interaction between the Haystack and Apify.
First, we need to provide actor_id
, dataset_mapping_function
along with input parameters run_input
.
We can define the run_input
in three ways:
- i) when creating the
ApifyDatasetFromActorCall
class - ii) as arguments in a pipeline.
- iii) as argumennts to the
run()
function when we callingApifyDatasetFromActorCall.run()
- iv) as a combination of
i)
andii)
as shown in this cookbook.
For a detailed description of the input parameters, visit the Instagram Comments Scraper page.
Let’s setup the ApifyDatasetFromActorCall
from apify_haystack import ApifyDatasetFromActorCall
document_loader = ApifyDatasetFromActorCall(
actor_id="apify/instagram-comment-scraper",
run_input={"resultsLimit": 50},
dataset_mapping_function=dataset_mapping_function,
)
Next, we’ll define a prompt
for the LLM and connect all the components in the
Pipeline.
from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.components.preprocessors import DocumentCleaner
prompt = """
Analyze these Instagram comments to determine if the post is generating positive energy, excitement,
or high engagement. Focus on sentiment, emotional tone, and engagement patterns to conclude if
the post is 'vibrating' with high energy. Be concise."
Context:
{% for document in documents %}
{{ document.content }}
{% endfor %}
Analysis:
"""
cleaner = DocumentCleaner(remove_empty_lines=True, remove_extra_whitespaces=True, remove_repeated_substrings=True)
prompt_builder = PromptBuilder(template=prompt)
generator = OpenAIGenerator(model="gpt-4o-mini")
pipe = Pipeline()
pipe.add_component("loader", document_loader)
pipe.add_component("cleaner", cleaner)
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", generator)
pipe.connect("loader", "cleaner")
pipe.connect("cleaner", "prompt_builder")
pipe.connect("prompt_builder", "llm")
<haystack.core.pipeline.pipeline.Pipeline object at 0x7b45ef117be0>
🚅 Components
- loader: ApifyDatasetFromActorCall
- cleaner: DocumentCleaner
- prompt_builder: PromptBuilder
- llm: OpenAIGenerator
🛤️ Connections
- loader.documents -> cleaner.documents (list[Document])
- cleaner.documents -> prompt_builder.documents (List[Document])
- prompt_builder.prompt -> llm.prompt (str)
After that, we can run the pipeline. The execution and analysis will take approximately 30-60 seconds.
# \@tiffintech on How to easily keep up with tech?
url = "https://www.instagram.com/p/C_a9jcRuJZZ/"
res = pipe.run({"loader": {"run_input": {"directUrls": [url]}}})
res.get("llm", {}).get("replies", ["No response"])[0]
'Overall, the Instagram comments on the post reflect positive energy, excitement, and high engagement. The use of emojis such as 😂, 😍, 🙌, ❤️, and 🔥 indicate enthusiasm and excitement. Many comments express gratitude, appreciation, and eagerness to explore the resources mentioned in the post. There are also interactions between users tagging each other and discussing their interest in the topic, further increasing engagement. Overall, the post seems to be generating high energy and positive vibes from the audience.'
Now, let’s us run the same analysis. This time with the @kamalaharris post
# \@kamalaharris on Affordable Care Act
url = "https://www.instagram.com/p/C_RgBzogufK/"
res = pipe.run({"loader": {"run_input": {"directUrls": [url]}}})
res.get("llm", {}).get("replies", ["No response"])[0]
'The comments on this post are highly polarized, with strong opinions expressed on both sides of the political spectrum. There is a mix of negative and positive sentiment, with some users expressing excitement and support for the current administration (e.g., emojis like 💙💙💙💙, Kamala 👏👏) while others criticize past policies and individuals associated with them (e.g., Trump 2024, lack of education). Overall, the engagement on this post is high, with users actively debating and defending their viewpoints. Despite the divisive nature of the comments, the post is generating a high level of energy and engagement.'
The analysis shows that the first post about How to easily keep up with tech? is vibrating with high energy:
The Instagram comments reveal a strong level of engagement and positive energy. Emojis like 😍, 😂, ❤️, 🙌, and 🔥 are frequently used, indicating excitement and enthusiasm. Commenters express gratitude, excitement, and appreciation for the content. The tone is overwhelmingly positive, supportive, and encouraging, with many users tagging others to share the content. Overall, this post is generating a vibrant and highly engaged response.
However, the post by @kamalaharris
on the
Affordable Care Act is (not surprisingly) sparking a lot of controversy with negative comments.
The comments on this post are generating negative energy but with high engagement. There’s a strong focus on political opinions, particularly concerning insurance companies, the Affordable Care Act, Trump, and Biden. Many comments express frustration, criticism, and disagreement, with some users discussing party affiliations or support for specific politicians. There are also mentions of misinformation and conspiracy theories. Engagement is high, with numerous comment threads delving into various political issues. Overall, this post is vibrating with intense energy, driven by political opinions, disagreements, and active discussions.
💡 You might receive slightly different results, as the comments may have changed since the last run