Warning: potential spoilers
Game of Thrones Season 7 was too short and left us wondering who is going to die in the last season...in two years🙁. Wouldn't it be cool to be able to speculate a little while we're waiting??
As scientists, we logically begin from a scientific approach. Have other machine learners already tackled this burning question? A group of students from the Technical University of Munich did a great job with data from the books. We also found that Milán Janosov created a social network of the TV show characters, and was able to calculate a Dying Score for the most important characters. Great, so predicting who will die in Game of Thrones is feasible.
Then, we looked for available data. There is plenty of information about the characters in this wiki entry. But structured data, about the TV show, which is what we needed, is scarce. Fresh data including the episodes from Season 7 is also quite rare. Thankfully, Jeffrey Lancaster has been brave enough to transcribe every episode from all of the seasons, including Season 7. 👍
We should explain our methodology a little here. Predictive analytics is neither magic nor intuition. It is a scientific prediction, based on already observed phenomena. We want to predict who is most likely to die in Season 8. So we begin by building a model on who died in the distant past. This model should then allow us to predict an imminent death. We also need to check against a more recent past to see if the model can still be applied. For example, being a Stark is a sure way to die in the 3rd Season. But this rule no longer applies in later seasons, since almost every Stark is dead by then…. This behaviour is called “data drift”. To avoid this, we begin by building a model based on the early seasons, and apply it to an already broadcast season. All this is possible if the data contains information about when exactly each dead character passed away.
This is how we discovered that every character is mourned to a degree (really! GoT fans are true fans). Indexing deaths can be a very moving task, depending on the importance the indexer gives to the character. We found that the number of deaths varies across the 7 seasons, depending on the indexer, from about 200 to 1 243! This last one listed the death of every living thing, including pigeons killed for food! We had to sift through this information, and choose what was most relevant.
Trials and failures
Once the data was collected, we were able to prepare several scenarios, that we call orchestrations. We made several models, one for each season, and checked whether it was still accurate for the following season. That was when we had to face our biggest disappointment ever 😱
The same hopeless loop message: No informative feature. Over and over. For each season, no matter how deep we requested PredicSis.ai to dig for information contained in the previous episode transcript. Not even the smallest crumb of a predictive indicator.
Other orchestrations led to the same disappointing conclusion: no two deaths are the same. Looking for identical explanations, even in the previous episode, is a hopeless task.
Time for a little humility; it’s not because we did not succeed that means it is impossible. We have not yet explored all approaches, and not integrated all potentially available data sources. For instance, an additional data source could be the one created by Milàn Jonasov, where he clusters groups of characters, calculated from their simultaneous appearances during the show. And we could use Natural Language Processing tools on the scripts to detect the main words of the soon-to-be-killed characters. Or we could ask George R.R. Martin directly 😉, but besides the fact that it would be cheating, we might not get an answer because he probably hasn’t decided yet. Or has he...?
Do any of you, readers of this article, want to try these improvements by yourselves? We would be more than happy to receive a little help with this quest! We will of course lend you an instance of our predictive tool for free so that you can test your own ideas (and different questions too!). Just contact us!
Maybe you’ve already gone further down this road than we have, in which case please share in the comments below your recipe to success, or advice.
“Winter is coming”
We will be publishing a second post focusing on the book data. We have some interesting findings to share with you even though we will not be able to reveal the names of the future kills. If you really feel the need to read machine learning stuff, try Zackarey Thoutt’s masterpiece.
Intrigued? To get a sense of what we do at PredicSis.ai, please visit our Free Trial page