Analyzing Netflix viewership data with ChatGPT

Are you ready to bring more awareness to your brand? Consider becoming a sponsor for The AI Impact Tour. Learn more about the opportunities here.

In case you missed it, yesterday Netflix departed from its modus operandi of keeping all but its most successful viewership numbers under wraps and actually published a public dataset containing all of the titles with more than 100,000 viewership hours from the period of six months between January 2023 and June 2023.

“In total, this report covers more than 18,000 titles — representing 99% of all viewing on Netflix — and nearly 100 billion hours viewed,” the company wrote in a blog post announcing the new report, entitled “What We Watched: A Netflix Engagement Report,” which it also committed to updating and releasing biannually.

Netflix counts “viewership hours” as opposed to viewers or households (though it likely also has this information) because of course, some people may watch things more than once.

While the streamer highlighted some of the findings, I decided to download the report — it is available on Netflix’s blog as a .xslx file, or Excel spreadsheet file — and run it through OpenAI’s ChatGPT (using GPT-4 on a personal ChatGPT Plus subscription) to test out the data analysis capabilities of late.

VB Event

The AI Impact Tour

Connect with the enterprise AI community at VentureBeat’s AI Impact Tour coming to a city near you!


Learn More

Spoiler alert: ChatGPT did a decent job providing a clear, straightforward, if brief analysis of the data contained therein. It suffered hiccups though — presenting an error when asked to generate a chart and struggling to create what I asked for in my prompts.

Take a look at my process below. I started with a simple request: “Can you please perform a data analysis of this data?” ChatGPT dutifully complied and provided a nice description of what was contained therein.

ChatGPT also highlighted “key points” and “key insights” for me, including something that I probably would not have caught if I were looking at the data with my own (untrained) human eyes: “The ‘Release Date’ column has a significant number of missing values (13,359), which may limit certain types of time-based analyses.”

Intriguingly, though the first section ChatGPT gave me from its “key insights” was titled “The Top 10 Most-Watched Titles (Jan-Jun 2023)” it did not actually list those out. I had to ask for them separately.

I also asked for the lowest viewed titles during this period by viewership hours, the median viewed title, and the average hours viewed and the title that was closest to this value. ChatGPT provided them all for me.

However, ChatGPT struggled when I asked it to generate me a line plot showing the viewership hours for titles on a month-by-month graph (note: the dataset didn’t include this data to begin with, and only included total viewership hours of each title for the entire six-month period measured).

It initially generated an almost illegible plot that included dates going back to 2010 on its x-axis, which represented the earliest release dates among the titles in the set.

However, when I asked it to correct the error and focus only on the six-month span included in the dataset, it provided a more legible — if still ultimately misleading — plot.

Because Netflix’s data did not include a breakdown of how many viewership hours each title incurred per month — let alone a compiled version of total viewership hours for month — the chart above actually only represents total 6-month cumulative viewership hours for new titles released in each month.

The hours viewed for a title released in January, for example, actually represent all the hours it was watched during the entire January-June period.

ChatGPT is not smart enough on its own — without prompting — to figure out how to correctly label this chart so that it is clear to the human reader what data is being presented: it is not total hours viewed as the y-axis label states. Instead, it is just the total hours viewed over the Jan-June 2023 period for all titles released in each month. That’s ultimately not a very helpful chart, unfortunately.

It took me multiple attempts to get ChatGPT to create a useful and correctly labeled chart, going back and forth with it as it created version after version that was not quite what I asked, until I finally got something decent.

So, while ChatGPT may be a helpful analysis partner — for the casual user like myself — it still has a long way to go to being a trustworthy, reliable and intrinsically helpful data analyst.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Source link