<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Dinesh Vatvani</title><link href="https://dvatvani.github.io/" rel="alternate"></link><link href="https://dvatvani.github.io/feeds/all.atom.xml" rel="self"></link><id>https://dvatvani.github.io/</id><updated>2020-10-25T16:00:00+00:00</updated><entry><title>Analysis of a WhatsApp chat log</title><link href="https://dvatvani.github.io/whatsapp-analysis.html" rel="alternate"></link><updated>2020-10-25T16:00:00+00:00</updated><author><name>Dinesh Vatvani</name></author><id>tag:dvatvani.github.io,2020-10-25:whatsapp-analysis.html</id><summary type="html">&lt;h1&gt;Introduction&lt;/h1&gt;
&lt;p&gt;With the ever-increasing role of technology and software services in our modern lives, we&amp;#8217;re passively creating an increasingly large digital footprint. Our browsers store our history by default. Our favourite map apps store our location history by default. Our banks keep records of our transactions. Our phones and smart-watches keep track of the number of footsteps we take every day along with heart rate, and various other bits of biometric data. Youtube, Netflix, Spotify, Amazon video and similar services all keep track our media consumption history. There are also many other apps or services that keep track of a myriad of other interesting habits and behaviours. I personally enjoy downloading copies of my digital footprint and analysing them. I think there&amp;#8217;s a lot that can be learned from examining my own behaviours and patterns to help me reflect on some unconscious choices I&amp;#8217;m making and help me obtain a better and more objective understanding of myself. It also serves as a useful digital diary since I can automatically collate data from several sources nightly to give me a fairly good data-driven summary of what I was doing on any given day. To that end, I&amp;#8217;ve written several scripts across the past 6 years to download and analyse my digital footprint from a range of different&amp;nbsp;sources.&lt;/p&gt;
&lt;p&gt;In this blog post, I&amp;#8217;ll be talking about specifically about the output from a script I wrote to analyse WhatsApp chat logs. You can run this analysis on your own chat logs by running the Python Notebook which can be found &lt;a href="https://github.com/dvatvani/Whatsapp-analyzer"&gt;here&lt;/a&gt;. If all that looks or sounds too technical for you, I created a &lt;a href="https://whatsapp-analysis.herokuapp.com/"&gt;WebApp&lt;/a&gt; where you can drag and drop a WhatsApp chat log to run a slightly more limited version of this analysis (I&amp;#8217;ve had to remove some of the more memory-intensive parts due to memory constraints in the cloud platform I&amp;#8217;m using to host this). There are more detailed instructions on how to replicate this work on your own chat logs &lt;a href="#replicate"&gt;towards the end of this blog post&lt;/a&gt;.&lt;/p&gt;
&lt;h1&gt;Sample analysis&amp;nbsp;results&lt;/h1&gt;
&lt;p&gt;Here are the results from an analysis of an anonymised chat log I&amp;#8217;m part of (with permission from the relevant group chat members) to showcase some interesting things that could be done with WhatsApp chat log data and the types of insights that could be gained relatively easily with some basic Natural Language Processing (&lt;span class="caps"&gt;NLP&lt;/span&gt;).&lt;/p&gt;
&lt;h3&gt;Summary&amp;nbsp;table&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;4,955&lt;/code&gt; total messages from 4 people, from &lt;code&gt;2016-12-07&lt;/code&gt; to &lt;code&gt;2020-10-20&lt;/code&gt; &lt;/p&gt;
&lt;p&gt;&lt;center&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align="center"&gt;&lt;/th&gt;
&lt;th align="center"&gt;Isambard&lt;/th&gt;
&lt;th align="center"&gt;Lysander&lt;/th&gt;
&lt;th align="center"&gt;Perseus&lt;/th&gt;
&lt;th align="center"&gt;Seraphina&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align="center"&gt;&lt;strong&gt;Contribution&lt;/strong&gt;&lt;/td&gt;
&lt;td align="center"&gt;&lt;/td&gt;
&lt;td align="center"&gt;&lt;/td&gt;
&lt;td align="center"&gt;&lt;/td&gt;
&lt;td align="center"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="center"&gt;Total N messages&lt;/td&gt;
&lt;td align="center"&gt;1,921&lt;/td&gt;
&lt;td align="center"&gt;942&lt;/td&gt;
&lt;td align="center"&gt;1,035&lt;/td&gt;
&lt;td align="center"&gt;1,057&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="center"&gt;Total N words&lt;/td&gt;
&lt;td align="center"&gt;19,979&lt;/td&gt;
&lt;td align="center"&gt;12,325&lt;/td&gt;
&lt;td align="center"&gt;14,062&lt;/td&gt;
&lt;td align="center"&gt;11,390&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="center"&gt;Total N characters&lt;/td&gt;
&lt;td align="center"&gt;101,607&lt;/td&gt;
&lt;td align="center"&gt;63,472&lt;/td&gt;
&lt;td align="center"&gt;72,766&lt;/td&gt;
&lt;td align="center"&gt;57,491&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="center"&gt;&lt;strong&gt;Message type&lt;/strong&gt;&lt;/td&gt;
&lt;td align="center"&gt;&lt;/td&gt;
&lt;td align="center"&gt;&lt;/td&gt;
&lt;td align="center"&gt;&lt;/td&gt;
&lt;td align="center"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="center"&gt;Text&lt;/td&gt;
&lt;td align="center"&gt;94.4%&lt;/td&gt;
&lt;td align="center"&gt;96.3%&lt;/td&gt;
&lt;td align="center"&gt;92.2%&lt;/td&gt;
&lt;td align="center"&gt;94.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="center"&gt;Media&lt;/td&gt;
&lt;td align="center"&gt;2.2%&lt;/td&gt;
&lt;td align="center"&gt;2.7%&lt;/td&gt;
&lt;td align="center"&gt;5.1%&lt;/td&gt;
&lt;td align="center"&gt;3.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="center"&gt;Link&lt;/td&gt;
&lt;td align="center"&gt;3.4%&lt;/td&gt;
&lt;td align="center"&gt;1.0%&lt;/td&gt;
&lt;td align="center"&gt;2.7%&lt;/td&gt;
&lt;td align="center"&gt;2.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="center"&gt;&lt;strong&gt;Message  content&lt;/strong&gt;&lt;/td&gt;
&lt;td align="center"&gt;&lt;/td&gt;
&lt;td align="center"&gt;&lt;/td&gt;
&lt;td align="center"&gt;&lt;/td&gt;
&lt;td align="center"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="center"&gt;Sentences per message&lt;/td&gt;
&lt;td align="center"&gt;1.40&lt;/td&gt;
&lt;td align="center"&gt;1.57&lt;/td&gt;
&lt;td align="center"&gt;1.38&lt;/td&gt;
&lt;td align="center"&gt;1.41&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="center"&gt;Words per message&lt;/td&gt;
&lt;td align="center"&gt;11.1&lt;/td&gt;
&lt;td align="center"&gt;13.6&lt;/td&gt;
&lt;td align="center"&gt;14.8&lt;/td&gt;
&lt;td align="center"&gt;11.6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="center"&gt;Characters per message&lt;/td&gt;
&lt;td align="center"&gt;54.1&lt;/td&gt;
&lt;td align="center"&gt;69.2&lt;/td&gt;
&lt;td align="center"&gt;74.1&lt;/td&gt;
&lt;td align="center"&gt;56.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="center"&gt;Messages containing emoji&lt;/td&gt;
&lt;td align="center"&gt;2.9%&lt;/td&gt;
&lt;td align="center"&gt;0.2%&lt;/td&gt;
&lt;td align="center"&gt;0.2%&lt;/td&gt;
&lt;td align="center"&gt;2.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="center"&gt;Messages containing profanity&lt;/td&gt;
&lt;td align="center"&gt;1.9%&lt;/td&gt;
&lt;td align="center"&gt;0.0%&lt;/td&gt;
&lt;td align="center"&gt;5.6%&lt;/td&gt;
&lt;td align="center"&gt;0.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;/center&gt;&lt;/p&gt;
&lt;p&gt;The summary table is already a pretty useful overview of the chat log, but we can visualise the data and delve slightly deeper into the patterns in the chat&amp;nbsp;log.&lt;/p&gt;
&lt;h3&gt;Message&amp;nbsp;types&lt;/h3&gt;
&lt;p&gt;We can start with some plots on the types of messages sent to visualise who favours media messages (audio, video, images, or gifs) and who tends to share external links in the&amp;nbsp;chat&lt;/p&gt;
&lt;p&gt;&lt;center&gt;
&lt;img alt="" src="https://dvatvani.github.io/static/whatsapp-analysis/message_types.svg" width="700px"&gt;
&lt;/center&gt;&lt;/p&gt;
&lt;h3&gt;Message&amp;nbsp;contribution&lt;/h3&gt;
&lt;p&gt;We can plot the overall message contribution from each&amp;nbsp;person.&lt;/p&gt;
&lt;p&gt;&lt;center&gt;
&lt;img alt="" src="https://dvatvani.github.io/static/whatsapp-analysis/n_messages_and_avg_length.svg" width="700px"&gt;
&lt;br&gt;
&lt;img alt="" src="https://dvatvani.github.io/static/whatsapp-analysis/sentences_per_message.svg" width="550px"&gt;
&lt;br&gt;
&lt;img alt="" src="https://dvatvani.github.io/static/whatsapp-analysis/word_counts.svg" width="550px"&gt;
&lt;/center&gt;&lt;/p&gt;
&lt;p&gt;The last plot above contains a few notable literary works to contextualise word counts better. I&amp;#8217;ve included a longer list of reference literary works in the code that generates this plot, so you will likely see a different set of reference works for context if you choose to re-run this analysis on your own chat logs appropriate to the magnitude of word counts in the chat log being&amp;nbsp;analysed.&lt;/p&gt;
&lt;h3&gt;Contribution over&amp;nbsp;time&lt;/h3&gt;
&lt;p&gt;We can also look at when the each person has contributed to the conversation. This can be done as either a calendar&amp;nbsp;view&amp;#8230;&lt;/p&gt;
&lt;p&gt;&lt;center&gt;
&lt;img alt="" src="https://dvatvani.github.io/static/whatsapp-analysis/calmap-Isambard.svg" width="550px"&gt;
&lt;img alt="" src="https://dvatvani.github.io/static/whatsapp-analysis/calmap-Lysander.svg" width="550px"&gt;
&lt;img alt="" src="https://dvatvani.github.io/static/whatsapp-analysis/calmap-Perseus.svg" width="550px"&gt;
&lt;img alt="" src="https://dvatvani.github.io/static/whatsapp-analysis/calmap-Seraphina.svg" width="550px"&gt;
&lt;/center&gt;&lt;/p&gt;
&lt;p&gt;&amp;#8230;or can also be viewed as a timeseries
&lt;center&gt;
&lt;img alt="" src="https://dvatvani.github.io/static/whatsapp-analysis/activity_timeseries.svg" width="550px"&gt;
&lt;br&gt;
&lt;img alt="" src="https://dvatvani.github.io/static/whatsapp-analysis/activity_timerseries_stacked_area.svg" width="550px"&gt;
&lt;/center&gt;&lt;/p&gt;
&lt;p&gt;The last plot can be represented as a relative plot rather than an absolute plot if want to see who has been contributing more/less relative to everyone else over time.
&lt;center&gt;
&lt;img alt="" src="https://dvatvani.github.io/static/whatsapp-analysis/relative_activity_timerseries.svg" width="550px"&gt;
&lt;/center&gt;&lt;/p&gt;
&lt;p&gt;Another thing we can do with the timeseries data that could be of interest is group the activity by day of week and time of day to look at daily and weekly patterns.
&lt;center&gt;
&lt;img alt="" src="https://dvatvani.github.io/static/whatsapp-analysis/activity_by_day_of_week.svg" width="550px"&gt;
&lt;img alt="" src="https://dvatvani.github.io/static/whatsapp-analysis/activity_by_time_of_day.svg" width="550px"&gt;
&lt;/center&gt;
Note that the curves in the &lt;code&gt;Activity by time of day&lt;/code&gt; plot are measured every minute across the 24 hours and smoothed with a Gaussian convolution. The smooth curves are not an artifact of&amp;nbsp;interpolation.&lt;/p&gt;
&lt;h3&gt;Conversation&amp;nbsp;Dynamics&lt;/h3&gt;
&lt;p&gt;We can gain some insight into the group conversation dynamics by looking at who tends to respond to each user. This doesn&amp;#8217;t take into account the content of the message to infer which message is being responded to. It is based purely on who the previous message was from every time a message is sent. As such, it can include replies to oneself. Contiguous messages from one person have been excluded if they are posted within 3 minutes of the previous message, as this is deemed to effectively be a single message split across multiple messages. 
&lt;center&gt;
&lt;img alt="" src="https://dvatvani.github.io/static/whatsapp-analysis/response_matrix.svg" width="550px"&gt;
&lt;/center&gt;&lt;/p&gt;
&lt;p&gt;In a similar vein, we can also examine the response times for each&amp;nbsp;user.&lt;/p&gt;
&lt;p&gt;&lt;center&gt;
&lt;img alt="" src="https://dvatvani.github.io/static/whatsapp-analysis/response_time.svg" width="550px"&gt;
&lt;/center&gt; &lt;/p&gt;
&lt;h3&gt;Grammatical and linguistic&amp;nbsp;preferences&lt;/h3&gt;
&lt;p&gt;Most of the analysis above looks at patterns in when people message and a very rough overview of the nature of the messages. It can also be insightful to examine the content of the messages more closely to identify people&amp;#8217;s grammatical and linguistic preferences. Below are a few examples of the types in insights that can be obtained by parsing and analysing the content of the&amp;nbsp;messages.&lt;/p&gt;
&lt;p&gt;Let&amp;#8217;s start by looking at the use of punctuation, emoticons (Emoji) and profanity.
&lt;center&gt;
&lt;img alt="" src="https://dvatvani.github.io/static/whatsapp-analysis/punctuation_use.svg" width="550px"&gt;
&lt;img alt="" src="https://dvatvani.github.io/static/whatsapp-analysis/emoticon_use.svg" width="550px"&gt;
&lt;img alt="" src="https://dvatvani.github.io/static/whatsapp-analysis/profanity_use.svg" width="700px"&gt;
&lt;p style="font-size: 12px;"&gt;Note: Profanity detection is done using the &lt;a href="https://github.com/vzhou842/profanity-check"&gt;profanity-check&lt;/a&gt;&amp;nbsp;library&lt;/p&gt;
&lt;/center&gt;&lt;/p&gt;
&lt;p&gt;We can also look at the distribution of word lengths from each person.
&lt;center&gt;
&lt;img alt="" src="https://dvatvani.github.io/static/whatsapp-analysis/word_length_distribution.svg" width="700px"&gt;
&lt;/center&gt;
In this instance it turns out not to vary all that much across users, largely due to the composition of the group, but that is not always the&amp;nbsp;case. &lt;/p&gt;
&lt;p&gt;An interesting analysis we can perform is to compare the relative usage frequency of different words from each user against the natural prevalence rate of those words in the English language (based on occurrence in the web on websites in English) to find the words that each user uses disproportionately often.
&lt;center&gt;
&lt;img alt="" src="https://dvatvani.github.io/static/whatsapp-analysis/characteristic_words.svg"&gt;
&lt;/center&gt;
Using the same concept of words&amp;#8217; natural prevalence rate in English-language websites above, we can determine the average log(natural prevalence frequency) of all words used by each person. This is a measure for how obscure/niche the words used by each person is, with a lower average log(frequency) indicating more frequent use of rare words.
&lt;center&gt;
&lt;img alt="" src="https://dvatvani.github.io/static/whatsapp-analysis/avg_word_prevalence.svg" width="550px"&gt;
&lt;/center&gt;
The plot above can be interpreted as some proxy for vocabulary complexity/specificity. An interesting measure to accompany that is vocabulary breadth. To do that, we can plot the cumulative unique word count against cumulative total word count for each&amp;nbsp;user. &lt;/p&gt;
&lt;p&gt;&lt;center&gt;
&lt;img alt="" src="https://dvatvani.github.io/static/whatsapp-analysis/word_count_curves.svg" width="550px"&gt;
&lt;/center&gt;&lt;/p&gt;
&lt;p&gt;Early in the conversation, we expect most words used to be new. As the conversation continues, we expect the number of unique words to slowly tail-off as many of the words used in the conversation will be repeated words. Theoretically, if the conversation goes on infinitely long and the topics of conversation covered in the chat are exhaustive, we expect this curve to asymptote towards a value that represents some approximation of the scope of each person&amp;#8217;s total vocabulary size. The plot above displays the early part of that curve. In practice, most WhatsApp conversations will be far too small to reach the vocabulary size asymptote, and will often have limited topics of conversation covered. Moreover, many people will likely modulate the tone and complexity of the language they choose to use in casual WhatsApp conversations in ways that make it unrepresentative of their true vocabulary breadth. These curves have been left in the analysis because I believe they are interesting, but it is important to emphasize that they are only broadly indicative of each person&amp;#8217;s vocabulary scope specifically as observed in the particular chat log which, for many reasons, will not be representative of their true overall vocabulary&amp;nbsp;size.&lt;/p&gt;
&lt;p&gt;Finally, we can look into how similar the observed vocabulary is between the participants of the group. This can be done by taking all unique words used by each person and comparing it to the unique words used by every other person. The vocabulary similarity between 2 people can be defined as the number of words they use in common divided by total number of words used by either person, and is known as the Jaccard index (in set notation, it&amp;#8217;s defined by |A∩B|/|A∪B|, where A and B are the sets of words used by each person). 
&lt;center&gt;
&lt;img alt="" src="https://dvatvani.github.io/static/whatsapp-analysis/vocabulary_similarity_matrix.svg" width="550px"&gt;
&lt;/center&gt;&lt;/p&gt;
&lt;h1&gt;Potential areas for future&amp;nbsp;development&lt;/h1&gt;
&lt;p&gt;&lt;span class="caps"&gt;NLP&lt;/span&gt; is a rich and expansive field and there is plenty more that could be done with this dataset using &lt;span class="caps"&gt;NLP&lt;/span&gt; tools and techniques. Some ideas for potential future developments could&amp;nbsp;be:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Automatic topic of conversation detection, to pick out who tends to favour discussions on particular topics and mapping when in time different topics get&amp;nbsp;discussed&lt;/li&gt;
&lt;li&gt;The comparison of vocabulary similarity between people could be improved (currently using Jaccard Similarity based on a unique words used by each&amp;nbsp;person)&lt;/li&gt;
&lt;li&gt;Linguistic style profiles could be added by assessing vocabulary similarity or prose style similarity to external text datasets e.g. Gossip magazines, tabloid newspapers, broad-sheet newspapers, tech magazines, Victorian novels, scientific journal publications, legal documents,&amp;nbsp;etc. &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I may come back to implement these in the future, but if anyone would like to try to train any of these models, please let me know in the comments or &lt;span class="caps"&gt;DM&lt;/span&gt; me on Twitter. I&amp;#8217;d be more than happy to collaborate or review pull requests on&amp;nbsp;GitHub.&lt;/p&gt;
&lt;h1 id='replicate'&gt;Generate your own chat log&amp;nbsp;reports&lt;/h1&gt;

&lt;p&gt;If you&amp;#8217;d like to analyse your own chat logs and create plots like the ones above, then follow the instructions&amp;nbsp;below.&lt;/p&gt;
&lt;h3&gt;Extracting a WhatsApp chat&amp;nbsp;log&lt;/h3&gt;
&lt;p&gt;In order to analyse your own chat logs, you&amp;#8217;ll first need to export your own chat log from WhatsApp. These instructions are for exporting a chat log from an Android device. Doing it from an iOS device should be similar, but if in doubt, I&amp;#8217;m sure there will be other instructions online telling you how to export a WhatsApp chat log from&amp;nbsp;iOS. &lt;/p&gt;
&lt;p&gt;Open any conversation on WhatsApp, click on the kebab icon (3 vertical dots icon on the top right of the conversation screen), then &lt;code&gt;More...&lt;/code&gt; and &lt;code&gt;Export chat&lt;/code&gt;. This should bring up a prompt on whether you want to export with or without media. Export the chat log without media. When the export is ready, it should bring up another selection of which app to use to deal with the files. Either select a file browser to save the files or any email client app and send the exported chat log to yourself. The export may contain multiple files. The one of interest will be named something like &amp;#8220;WhatsApp chat with {group_name}.txt&amp;#8221;. We&amp;#8217;ll be using this in the next&amp;nbsp;step&lt;/p&gt;
&lt;h3&gt;Analysing the chat&amp;nbsp;log&lt;/h3&gt;
&lt;p&gt;If you&amp;#8217;re comfortable with running Python scripts and notebooks, then the Jupyter (Python) notebook that I used for the analysis above can be found &lt;a href="https://github.com/dvatvani/Whatsapp-analyzer"&gt;here&lt;/a&gt;. Clone the repository, install the dependencies (listed in the requirements.txt file), then place your chat log in the &lt;code&gt;chat_logs&lt;/code&gt; folder and run the notebook. Remember to update the chat log file name in the notebook to the one you just added there. Once the script has finished running, the plots will be visible in-line in the notebook, but all plots are also saved in the outputs folder under a subfolder name that corresponds to the chat log&amp;nbsp;name.&lt;/p&gt;
&lt;p&gt;If you&amp;#8217;re &lt;span class="caps"&gt;NOT&lt;/span&gt; comfortable with running python scripts on your own, then I&amp;#8217;ve set up a &lt;a href="https://whatsapp-analysis.herokuapp.com/"&gt;minimalist WebApp&lt;/a&gt; where you can upload or drag and drop your WhatsApp chat log and have it processed for you. It&amp;#8217;s hosted on the free-tier of a cloud platform and shared among all readers, so may be quite slow, depending on how many people are using it at any given time. I&amp;#8217;ve had to remove a couple of the plots for this WebApp due to memory constraints in the cloud platform. None of your data will be stored, so save a local copy of the output if you&amp;#8217;d like to keep hold of it or want to share it with anyone. Attempting to share a link to the results by copying the &lt;span class="caps"&gt;URL&lt;/span&gt; will not work (since they would require a copy of the output to be&amp;nbsp;saved).&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Thanks&amp;nbsp;to:&lt;/em&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;[&lt;span class="caps"&gt;REDACTED&lt;/span&gt;]&lt;/em&gt; : For letting me use an anonymised version of our group chat in this blog&amp;nbsp;post.&lt;/li&gt;
&lt;/ul&gt;</summary><category term="Qantified-Self"></category><category term="WhatsApp"></category></entry><entry><title>An analysis of board games: Part III - Mapping the board game landscape</title><link href="https://dvatvani.github.io/BGG-Analysis-Part-3.html" rel="alternate"></link><updated>2020-09-04T00:10:00+01:00</updated><author><name>Dinesh Vatvani</name></author><id>tag:dvatvani.github.io,2020-09-04:BGG-Analysis-Part-3.html</id><summary type="html">&lt;p&gt;This is part &lt;span class="caps"&gt;III&lt;/span&gt; in my series on analysing BoardGameGeek data. Other parts can be found&amp;nbsp;here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="./BGG-Analysis-Part-1.html"&gt;Part I: Introduction and general&amp;nbsp;trends&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="./BGG-Analysis-Part-2.html"&gt;Part &lt;span class="caps"&gt;II&lt;/span&gt;: Complexity bias in &lt;span class="caps"&gt;BGG&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part &lt;span class="caps"&gt;III&lt;/span&gt;: Mapping the board game&amp;nbsp;landscape&lt;/li&gt;
&lt;/ul&gt;
&lt;hr /&gt;
&lt;h1&gt;Introduction&lt;/h1&gt;
&lt;p&gt;Previous posts in this series cover how we generated a dataset from BoardGameGeek, explored general trends in the tabletop games landscape over time and looked at complexity bias inherent in the &lt;span class="caps"&gt;BGG&lt;/span&gt; dataset. This post explores a comparison of game ratings at an individual user level to determine which games are similar and use that to create a map of the board games&amp;nbsp;landscape. &lt;/p&gt;
&lt;h1&gt;Data&amp;nbsp;collection&lt;/h1&gt;
&lt;p&gt;Before we are able to perform any user-level ratings analysis, we need to collect a dataset that contains game ratings at an individual user level since the previous dataset in parts I and &lt;span class="caps"&gt;II&lt;/span&gt; used an average rating for each game. Extracting individual-account-level information from Board Game Geek is possible using their &lt;span class="caps"&gt;XML&lt;/span&gt; &lt;span class="caps"&gt;API&lt;/span&gt;, but is more challenging and time-consuming than extracting game-level aggregates due to some constraints in the &lt;span class="caps"&gt;API&lt;/span&gt; (e.g. limited to 100 user-level ratings per request). As such, obtaining a comprehensive list of all game ratings by each user for all games in the &lt;span class="caps"&gt;BGG&lt;/span&gt; database was not considered a viable approach. Instead, the individual user level ratings were obtained for &lt;code&gt;500&lt;/code&gt; of the most populer (by Ownership) games on &lt;span class="caps"&gt;BGG&lt;/span&gt;, with an additional &lt;code&gt;53&lt;/code&gt; hand-picked to sample some of the more recent successful titles, including Wingspan, Res Arcana, etc. Those criteria bring down the total number of user-level ratings to be collected considerably, but still amounts to &lt;code&gt;7.5 million&lt;/code&gt; individual game ratings at a user level. Those &lt;code&gt;7.5 million&lt;/code&gt; user-level game ratings covering &lt;code&gt;553&lt;/code&gt; successful games were collected and were found to contain ratings from &lt;code&gt;265,374&lt;/code&gt; unique &lt;span class="caps"&gt;BGG&lt;/span&gt;&amp;nbsp;accounts. &lt;/p&gt;
&lt;p&gt;The dataset is currently in a SQLite database. If anyone would like a copy of the data, please let me&amp;nbsp;know.&lt;/p&gt;
&lt;h1&gt;User-Driven&amp;nbsp;Similarity&lt;/h1&gt;
&lt;p&gt;Having collected individual game ratings per &lt;span class="caps"&gt;BGG&lt;/span&gt; user, we can take any pair of games, find the users that have provided ratings for both of these games and see how the ratings across games are related. There are a few examples below showing that &lt;span class="caps"&gt;BGG&lt;/span&gt; users who tend to like &lt;code&gt;Monopoly&lt;/code&gt; tend to also like &lt;code&gt;Risk&lt;/code&gt;. Similarly, users who like &lt;code&gt;Yahtzee&lt;/code&gt; tend to also like &lt;code&gt;UNO&lt;/code&gt;. On the other hand, users that like &lt;code&gt;Monopoly&lt;/code&gt; aren&amp;#8217;t any more likely to enjoy &lt;code&gt;Twilight Struggle&lt;/code&gt;, and users liking &lt;code&gt;UNO&lt;/code&gt; tells us nothing about their affinity for &lt;code&gt;Scythe&lt;/code&gt;. The extent to which ratings of games are correlated indicates how likely it is that users will like one game if they like the other. It&amp;#8217;s important to highlight that when we say a user &amp;#8220;likes&amp;#8221; a game here, we are always talking in relative terms. It means that users that like game A &lt;em&gt;more than average&lt;/em&gt; are likely to enjoy game B &lt;em&gt;more than average&lt;/em&gt; too if their ratings are positively&amp;nbsp;correlated. &lt;/p&gt;
&lt;p&gt;&lt;center&gt;&lt;img alt="Sample rating correlations" src="https://dvatvani.github.io/static/BGG-analysis/sample_rating_correlations.png" /&gt;&lt;/center&gt;&lt;/p&gt;
&lt;p&gt;The correlation of user-level ratings between games can be interpreted as some form of similarity between games. After all, if the users who tend to like one tend to also like the other, there will presumably be something similar between the games. However, the similarity between the games may not be obvious based on a traditional board games classification taxonomy. What the correlation captures is essentially games that &amp;#8220;scratch the same itch&amp;#8221; or tap into a similar core appeal. This could be the feeling of solving an abstract puzzle, a thematic appeal, the social-component, the rewarding feeling of building an elegant engine, the feeling of cooperating with friends, or any other. The games may have very different mechanics, themes, complexity levels or even overall average ratings, but likely tap into a similar core appeal, and that core appeal will resonate with some groups of &lt;span class="caps"&gt;BGG&lt;/span&gt; users more than&amp;nbsp;others.&lt;/p&gt;
&lt;h1&gt;Scaling up the&amp;nbsp;comparisons&lt;/h1&gt;
&lt;p&gt;Now that we&amp;#8217;ve introduced the concept of game similarity based on user-rating correlations, we can calculate the pairwise correlations for all &lt;code&gt;152,628&lt;/code&gt; unique pairs of games in our dataset. Despite the user-level correlation approach to assessing how similar games are knowing nothing about the games&amp;#8217; type, genre, mechanics, complexity level, rating, designers, or anything tangible about the game, the similarity approach is able to identify that remakes or alternate versions of the same game are very similar (e.g. &lt;code&gt;Codenames&lt;/code&gt;, &lt;code&gt;Codenames: Pictures&lt;/code&gt; and &lt;code&gt;Codenames: Duet&lt;/code&gt;, or &lt;code&gt;Brass: Lancashire&lt;/code&gt; and &lt;code&gt;Brass: Birmingham&lt;/code&gt;). This approach also finds, rather reassuringly, that games that we would intuitively class as being broadly similar tend to have high user-level rating correlations as well. For example, &lt;code&gt;One Night Ultimate Warewolf&lt;/code&gt;, &lt;code&gt;Secret Hitler&lt;/code&gt;, &lt;code&gt;Coup&lt;/code&gt;, and &lt;code&gt;The Resistance&lt;/code&gt; are all light party games based on communication and deception.  They all end up with high correlations with one another. Similarly, word-games like &lt;code&gt;Boggle&lt;/code&gt;, &lt;code&gt;Scrabble&lt;/code&gt;, &lt;code&gt;Taboo&lt;/code&gt;, &lt;code&gt;Scattergories&lt;/code&gt;, &lt;code&gt;Pictionary&lt;/code&gt;, &lt;code&gt;Bananagrams&lt;/code&gt; also group together in the same way. Another example is the &amp;#8220;Easy to learn. Hard to master&amp;#8221; strategy cluster of &lt;code&gt;Chess&lt;/code&gt;, &lt;code&gt;Go&lt;/code&gt; and &lt;code&gt;Diplomacy&lt;/code&gt;. These correlations and their general alignment with games that we&amp;#8217;d intuitively consider similar allows us to build a simple recommendation system that displays the most similar games to any other game (refer to Dashboard below for an implementation of&amp;nbsp;this)&lt;/p&gt;
&lt;h1&gt;Mapping the board game&amp;nbsp;landscape&lt;/h1&gt;
&lt;p&gt;The full grid of &lt;code&gt;152,628&lt;/code&gt; game similarities is non-trivial to visualise in its native form. To accurately display the similarities between all games in that matrix as distances between points, we would need a 552-dimensional (N-1) graph. Obviously, that&amp;#8217;s not really a tractable solution. Fortunately for us, there are machine learning techniques that provide us with an adequate solution to this problem. A technique known as t-Distributed Stochastic Neighbour Embedding (commonly abbreviated as t-&lt;span class="caps"&gt;SNE&lt;/span&gt;) allows us to create a lower-dimensionality projection, or more correctly, a manifold, of the 553 x 553 correlation matrix that attempts to keep points that are close together in the high-dimensionality space close together in the reduced-dimensionality space too. What this means is that we can obtain a set of points in 2D that best preserves adjacency between points close together in high-dimensional space, therefore keeping similar games together. Below is an interactive visualision of the results using this approach. You can hover over any point to get more information on it and a list of its most similar games. There is also an interactive dashboard (see next section) where you can search for individual games to highlight them in the&amp;nbsp;plot.&lt;/p&gt;
&lt;script src="https://dvatvani.github.io/static/BGG-analysis/test.js" id="5aba6f58-365f-4605-977d-ac1ca5aee322"&gt;&lt;/script&gt;

&lt;p&gt;We can see that the games that we mentioned as being similar above are close to each other in this visualisation. This visualisation also shows that games percieved to be good &amp;#8220;Gateway games&amp;#8221; such as &lt;code&gt;Catan&lt;/code&gt;, &lt;code&gt;Carcassone&lt;/code&gt; and &lt;code&gt;Ticket to Ride&lt;/code&gt; are also in close proximity to each other (bottom of the light blue group), despite not having many common themes or mechanics between them. Similarly, many pre-1960s traditional family games such as &lt;code&gt;Monopoly&lt;/code&gt;, &lt;code&gt;Risk&lt;/code&gt;, &lt;code&gt;Battleship&lt;/code&gt; or &lt;code&gt;Clue&lt;/code&gt; cluster together as well (bottom of the red group). Navigating the plot reveals several groups of games that are intuitively grouped together e.g. Economic Games, Visual party games, communication-based party games, hidden information card games. Interestingly, I found 2 game designers whose games tend to cluster together: Vlaada Chvátil (near the top right of the orange area) and Reiner Knizia (top left of dark blue area). It&amp;#8217;s also interesting that in both of these cases, despite there being a very distinct cluster for their games, they each have games that do not belong in their own cluster e.g. &lt;code&gt;Codenames&lt;/code&gt; does not appear to belong with the other Vlaada Chvátil games. Similarly, &lt;code&gt;The Quest for El Dorado&lt;/code&gt; does not belong with the other Reiner Knizia games. There are many other interesting features in the plot, but they are best left for the readers to explore and&amp;nbsp;discover.&lt;/p&gt;
&lt;h1&gt;Interactive&amp;nbsp;Dashboard&lt;/h1&gt;
&lt;p&gt;I&amp;#8217;ve built a basic interactive dashboard with more control over the visualisation of the &lt;span class="caps"&gt;BG&lt;/span&gt; landscape seen above, as well as a basic recommendation system that lists the most similar games to any game of interest. It can be found using the link&amp;nbsp;below:&lt;/p&gt;
&lt;p&gt;&lt;center&gt;&lt;a href="https://bgg-similarity-dashboard.herokuapp.com/"&gt;Link to dashboard&lt;/a&gt;&lt;/center&gt;&lt;/p&gt;
&lt;p&gt;&lt;center&gt;&lt;a href="https://bgg-similarity-dashboard.herokuapp.com/"&gt;&lt;img alt="Dashboard Image" src="https://dvatvani.github.io/static/BGG-analysis/dashboard_image.png" /&gt;&lt;/a&gt;&lt;/center&gt;&lt;/p&gt;
&lt;h1&gt;Closing&amp;nbsp;remarks&lt;/h1&gt;
&lt;p&gt;I hope that the framework presented here helps nudge the discussion around tabletop games and their classification towards a consideration of the games&amp;#8217; core appeal rather than a classification based on some of the games&amp;#8217; trappings and mechanics e.g. &amp;#8220;Wargame&amp;#8221;, &amp;#8220;Thematic game&amp;#8221;, &amp;#8220;Hex and Counter game&amp;#8221;, etc. This analysis also had a useful byproduct of allowing us to create a rudimentary game recommender system based on user-level ratings correlations (recommender available in the dashboard) that will hopefully be useful to some people, despite the limited scope of &lt;code&gt;553&lt;/code&gt; games.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;em&gt;Thanks to&lt;/em&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://twitter.com/elizhargrave"&gt;Elizabeth Hargrave&lt;/a&gt;: Elizabeth Hargrave suggested that it might be interesting to do a gender-level analysis on the &lt;span class="caps"&gt;BGG&lt;/span&gt; dataset following my previous analysis on board games. That motivated the collection of a user-rating-level dataset, which eventually sparked this&amp;nbsp;idea.&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://twitter.com/ColmSeeley"&gt;Colm Seeley&lt;/a&gt; for introducing me to the world of modern board games, countless discussions and ideas on interesting things to do with the dataset, and for helping me identify and name many of the clusters in the mapped board game&amp;nbsp;landscape.&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://twitter.com/PresstoFan"&gt;Yihui Fan&lt;/a&gt; for suggesting some interesting neural-network-based analysis ideas that could be performed on this&amp;nbsp;data.&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;</summary><category term="Board games"></category></entry><entry><title>Making aesthetically pleasing dot density Venn diagrams</title><link href="https://dvatvani.github.io/dot-density-venn-diagrams.html" rel="alternate"></link><updated>2019-04-14T20:00:00+01:00</updated><author><name>Dinesh Vatvani</name></author><id>tag:dvatvani.github.io,2019-04-14:dot-density-venn-diagrams.html</id><summary type="html">&lt;h1&gt;Introduction&lt;/h1&gt;
&lt;p&gt;Venn diagrams are a very common and intuitive way to visualise sets and relative population sizes of different cuts of data. From a data visualisation perspective, Venn diagrams are used in several different ways to present&amp;nbsp;data:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Euler diagrams: A qualitiative overview of which sets overlap with others, and which sets are subsets of others (Euler diagrams are technically not Venn diagrams, but I have included them here because these types of diagrams are colloquially still referred to by many as Venn&amp;nbsp;diagrams)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;center&gt;
    &lt;img src="https://dvatvani.github.io/static/Venn-diagrams/British_Isles_Euler_diagram.png" width="500" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Source: &lt;a href="https://en.wiktionary.org/wiki/Euler_diagram"&gt;Wikipedia&lt;/a&gt;&lt;/em&gt;
  &lt;/center&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Labelled population sizes in the diagram: These are a straight forward way to present the data, but from a perceptual standpoint, our brains aren&amp;#8217;t very good at intuitively processing this. It&amp;#8217;s only marginally better than presenting the data in the form of a&amp;nbsp;table&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;center&gt;
    &lt;img src="https://dvatvani.github.io/static/Venn-diagrams/population-size-labelled-venn-diagram.png" width="500" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;source: &lt;a href="https://www.geckoboard.com/learn/data-literacy/data-science-glossary/venn-diagram/"&gt;Geckoboard&lt;/a&gt;&lt;/em&gt;
  &lt;/center&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Area-proportional or scaled Venn diagram: These aim to scale the area of different regions of a Venn diagram so that they are proportional to the population of that segment. This can be quite a useful way to convey relative population sizes of the regions of the Venn or Euler diagrams, but geometric restrictions means that this can&amp;#8217;t be accurately done with circles for cases with more than 2 overlapping sets (the number of degrees of freedom from altering relative size and distance between circles will be lower than the number of distinct regions in the Venn diagram for all cases with n&amp;gt;2). There are ways around this problem using triangles or irregular shapes for the 3-set or higher case, but it is likely that you will run into geometric limitations when presenting information in this&amp;nbsp;way&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;center&gt;
    &lt;img src="https://dvatvani.github.io/static/Venn-diagrams/Area_proportional_Venn.png" width="500" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;source: &lt;a href="https://stackoverflow.com/questions/8713994/venn-diagram-proportional-and-color-shading-with-semi-transparency"&gt;StackOverflow post&lt;/a&gt;&lt;/em&gt;
  &lt;/center&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Dot density Venn diagram: Another way to present more quantitative information is by populating the regions of the Venn diagram with icons or dots that represent the relative population of the region of the Venn diagram. This is a flexible way to present quantitative information that is also perceptually easy to&amp;nbsp;process.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;center&gt;
    &lt;img src="https://dvatvani.github.io/static/Venn-diagrams/dot-density-example.PNG" width="500" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;source: &lt;a href="http://robslink.com/SAS/democd59/venn_density.htm"&gt;Robert Allison&amp;#8217;s website&lt;/a&gt;&lt;/em&gt;
  &lt;/center&gt;&lt;/p&gt;
&lt;p&gt;I generally like the latter as a visualisation approach because of its flexibility and perceptual interpretability. However, the way it is done is typically with randomly sampled points for each region or manually placed points in arbitrary locations within a region. I have always thought that these could look nicer if the points distribution within a region were approximately evenly spaced, so this blog post is my attempt at solving that&amp;nbsp;problem.&lt;/p&gt;
&lt;h1&gt;Lloyd&amp;#8217;s algorithm for pseudo-random&amp;nbsp;sampling&lt;/h1&gt;
&lt;p&gt;Lloyd&amp;#8217;s algorithm is designed to generate roughly evenly spaced points in space, so I&amp;#8217;ll be using this as the key process for the pseudo-random sampling to create evenly distributed points. The way it works is heavily reliant on Voronoi tessellation. If you want to learn more about Voronoi tessellation, I can recommend &lt;a href="http://datagenetics.com/blog/may12017/index.html"&gt;this DataGenetics post&lt;/a&gt; introducing the&amp;nbsp;concept.&lt;/p&gt;
&lt;p&gt;Lloyd&amp;#8217;s algorithm starts with a set of randomly distributed points, and then recursively generates the Voronoi cells for that set of points and moves the points to the centroids of the Voronoi cells. Each iteration of this process increases the uniformity of the spacing between points. Each step is visualised&amp;nbsp;below:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Start with a set of random points
  &lt;center&gt;
    &lt;img src="https://dvatvani.github.io/static/Venn-diagrams/random_points.png" /&gt;
  &lt;/center&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Determine the Voronoi tesselation for that set of points
  &lt;center&gt;
    &lt;img src="https://dvatvani.github.io/static/Venn-diagrams/voronoi.png" /&gt;
  &lt;/center&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Move each point (orange) to the centroid (blue) of its Voronoi cell
  &lt;center&gt;
    &lt;img src="https://dvatvani.github.io/static/Venn-diagrams/lloyd_iteration.png" /&gt;
  &lt;/center&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We can see that this process increases the distance between points that are close&amp;nbsp;together.&lt;/p&gt;
&lt;p&gt;This process can be done recursively to keep increasing the distance between points that are closest together until the system reaches an equilibrium point, thereby generating an approximately uniformly distributed set of points. The animation below shows the effect of cycling through &lt;code&gt;30&lt;/code&gt; iterations of Lloyd&amp;#8217;s&amp;nbsp;algorithm&lt;/p&gt;
&lt;p&gt;&lt;center&gt;
  &lt;img src="https://dvatvani.github.io/static/Venn-diagrams/animation.gif" width="500" /&gt;
&lt;/center&gt;&lt;/p&gt;
&lt;p&gt;This approach can be applied to all regions in a dot density Venn diagram to turn the figure on the left into the figure on the&amp;nbsp;right.&lt;/p&gt;
&lt;p&gt;&lt;center&gt;
  &lt;img src="https://dvatvani.github.io/static/Venn-diagrams/unrelaxed_Venn.png" width="500" /&gt;
  &lt;img src="https://dvatvani.github.io/static/Venn-diagrams/Venn.png" width="500" /&gt;
&lt;/center&gt;&lt;/p&gt;
&lt;p&gt;That looks much nicer to me and it doesn&amp;#8217;t lose any perceptual accuracy. I think this might become my default choice for visualising population sizes in sets in the&amp;nbsp;future.&lt;/p&gt;
&lt;p&gt;If you&amp;#8217;re interested in generating similar graphs, the code I used wrote to generate the Lloyd-relaxed dot density Venn diagram can be found &lt;a href="http://nbviewer.jupyter.org/github/dvatvani/dvatvani.github.io/blob/master/static/Venn-diagrams/Lloyd_relaxation_on_dot_density_diagrams.ipynb"&gt;here&lt;/a&gt; in the form of a Jupyter Notebook&amp;nbsp;(Python).&lt;/p&gt;</summary><category term="Python"></category></entry><entry><title>An analysis of board games: Part II - Complexity bias in BGG</title><link href="https://dvatvani.github.io/BGG-Analysis-Part-2.html" rel="alternate"></link><updated>2018-12-09T02:30:00+00:00</updated><author><name>Dinesh Vatvani</name></author><id>tag:dvatvani.github.io,2018-12-08:BGG-Analysis-Part-2.html</id><summary type="html">&lt;p&gt;This is part &lt;span class="caps"&gt;II&lt;/span&gt; in my series on analysing BoardGameGeek data. Other parts can be found&amp;nbsp;here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="./BGG-Analysis-Part-1.html"&gt;Part I: Introduction and general&amp;nbsp;trends&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part &lt;span class="caps"&gt;II&lt;/span&gt;: Complexity bias in&amp;nbsp;BoardGameGeek&lt;/li&gt;
&lt;li&gt;&lt;a href="./BGG-Analysis-Part-3.html"&gt;Part &lt;span class="caps"&gt;III&lt;/span&gt;: Mapping the board game&amp;nbsp;landscape&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr /&gt;
&lt;h1&gt;Introduction&lt;/h1&gt;
&lt;p&gt;In &lt;a href="./BGG-Analysis-Part-1.html"&gt;Part I&lt;/a&gt;, I describe how I generated a dataset from BoardGameGeek and explored general trends in the rate of release, ratings and complexity. It also looked at the prevalence of different mechanics and themes throughout the hobby and how this has changed in the past 30 years. In this post, we&amp;#8217;ll investigate complexity bias in &lt;span class="caps"&gt;BGG&lt;/span&gt;&amp;nbsp;ratings.&lt;/p&gt;
&lt;h1&gt;Complexity bias in&amp;nbsp;ratings&lt;/h1&gt;
&lt;p&gt;BoardGameGeek&amp;#8217;s top 100 list is a very visible &amp;#8220;beacon&amp;#8221; for the hobby and many players will use this list to make decisions about which games to try or buy. It is comparable to the IMDb top 250 in the role it plays in shaping what the community perceives as the apex of Board Game experiences. However, one of the problems with the &lt;span class="caps"&gt;BGG&lt;/span&gt; top 100 is that it is disproportionately dominated by big and complex games. This makes it less useful for a sizeable majority of board game players looking for good new games to play, since many of the games on that list will look inaccessible and daunting. The relationship between a game&amp;#8217;s complexity and how highly rated it is on &lt;span class="caps"&gt;BGG&lt;/span&gt; is not just limited to the top 100. In fact, there is a pretty clear correlation between how complex a game is and how highly rated it is on BoardGameGeek, as shown&amp;nbsp;below.&lt;/p&gt;
&lt;p&gt;&lt;center&gt;&lt;img alt="Rating vs weight" src="https://dvatvani.github.io/static/BGG-analysis/complexity-bias.png" /&gt;&lt;/center&gt;
&lt;center&gt;&lt;em&gt;Note: The above graph only includes games with &amp;gt; 100 votes for game weight&lt;/em&gt;&lt;/center&gt;&lt;/p&gt;
&lt;p&gt;The existence of this correlation in the &lt;span class="caps"&gt;BGG&lt;/span&gt; dataset makes it easier to understand why the top 100 is disproportionately populated with big, complex&amp;nbsp;games.&lt;/p&gt;
&lt;p&gt;It is worth making a couple of comments based on the graph&amp;nbsp;above:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;This graph does not necessarily mean that more complex board games are inherently better. While the graph above does show a clear (and statistically significant) relationship between perceived complexity and overall rating, we need to appreciate that there is a strong sampling bias present in our dataset that leads to this result i.e. Complex board games disproportionately appeal to the &lt;span class="caps"&gt;BGG&lt;/span&gt; user&amp;nbsp;base.&lt;/li&gt;
&lt;li&gt;A curious feature of the graph above is the tail of games of low complexity and low ratings at the bottom left of the plot. This &amp;#8220;tail of spite&amp;#8221; consists of relatively old mass-appeal games. Every single game in the tail of spite was released pre-1980, with many being considerably older than that. The games that form the tail of spite are shown in the table&amp;nbsp;below:&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;center&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align="left"&gt;Name&lt;/th&gt;
&lt;th align="center"&gt;Avg. rating&lt;/th&gt;
&lt;th align="center"&gt;Avg. weight&lt;/th&gt;
&lt;th align="center"&gt;Year published&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align="left"&gt;Tic-Tac-Toe&lt;/td&gt;
&lt;td align="center"&gt;2.6&lt;/td&gt;
&lt;td align="center"&gt;1.11&lt;/td&gt;
&lt;td align="center"&gt;-1300&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;Monopoly&lt;/td&gt;
&lt;td align="center"&gt;4.4&lt;/td&gt;
&lt;td align="center"&gt;1.68&lt;/td&gt;
&lt;td align="center"&gt;1933&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;Trouble&lt;/td&gt;
&lt;td align="center"&gt;3.7&lt;/td&gt;
&lt;td align="center"&gt;1.05&lt;/td&gt;
&lt;td align="center"&gt;1965&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;Pay Day&lt;/td&gt;
&lt;td align="center"&gt;4.7&lt;/td&gt;
&lt;td align="center"&gt;1.23&lt;/td&gt;
&lt;td align="center"&gt;1975&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;Checkers&lt;/td&gt;
&lt;td align="center"&gt;4.9&lt;/td&gt;
&lt;td align="center"&gt;1.79&lt;/td&gt;
&lt;td align="center"&gt;1150&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;Pachisi&lt;/td&gt;
&lt;td align="center"&gt;4.5&lt;/td&gt;
&lt;td align="center"&gt;1.21&lt;/td&gt;
&lt;td align="center"&gt;400&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;Sorry!&lt;/td&gt;
&lt;td align="center"&gt;4.5&lt;/td&gt;
&lt;td align="center"&gt;1.17&lt;/td&gt;
&lt;td align="center"&gt;1929&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;Battleship&lt;/td&gt;
&lt;td align="center"&gt;4.5&lt;/td&gt;
&lt;td align="center"&gt;1.23&lt;/td&gt;
&lt;td align="center"&gt;1931&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;Mouse Trap&lt;/td&gt;
&lt;td align="center"&gt;4.1&lt;/td&gt;
&lt;td align="center"&gt;1.12&lt;/td&gt;
&lt;td align="center"&gt;1963&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;Connect Four&lt;/td&gt;
&lt;td align="center"&gt;4.8&lt;/td&gt;
&lt;td align="center"&gt;1.20&lt;/td&gt;
&lt;td align="center"&gt;1974&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;The Game of Life&lt;/td&gt;
&lt;td align="center"&gt;4.1&lt;/td&gt;
&lt;td align="center"&gt;1.19&lt;/td&gt;
&lt;td align="center"&gt;1960&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;Operation&lt;/td&gt;
&lt;td align="center"&gt;4.0&lt;/td&gt;
&lt;td align="center"&gt;1.08&lt;/td&gt;
&lt;td align="center"&gt;1965&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;Guess Who?&lt;/td&gt;
&lt;td align="center"&gt;4.8&lt;/td&gt;
&lt;td align="center"&gt;1.12&lt;/td&gt;
&lt;td align="center"&gt;1979&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;Candy Land&lt;/td&gt;
&lt;td align="center"&gt;3.2&lt;/td&gt;
&lt;td align="center"&gt;1.05&lt;/td&gt;
&lt;td align="center"&gt;1949&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;Snakes and Ladders&lt;/td&gt;
&lt;td align="center"&gt;2.8&lt;/td&gt;
&lt;td align="center"&gt;1.00&lt;/td&gt;
&lt;td align="center"&gt;-200&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;Twister&lt;/td&gt;
&lt;td align="center"&gt;4.6&lt;/td&gt;
&lt;td align="center"&gt;1.09&lt;/td&gt;
&lt;td align="center"&gt;1966&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;Pick Up Sticks&lt;/td&gt;
&lt;td align="center"&gt;4.2&lt;/td&gt;
&lt;td align="center"&gt;1.05&lt;/td&gt;
&lt;td align="center"&gt;1850&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;Bingo&lt;/td&gt;
&lt;td align="center"&gt;2.7&lt;/td&gt;
&lt;td align="center"&gt;1.02&lt;/td&gt;
&lt;td align="center"&gt;1530&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="left"&gt;Memory&lt;/td&gt;
&lt;td align="center"&gt;4.7&lt;/td&gt;
&lt;td align="center"&gt;1.16&lt;/td&gt;
&lt;td align="center"&gt;1959&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;/center&gt;&lt;/p&gt;
&lt;h1&gt;Correcting for the complexity&amp;nbsp;bias&lt;/h1&gt;
&lt;p&gt;Since the regression in the graph above reveals how games&amp;#8217; ratings are related to complexity within the &lt;span class="caps"&gt;BGG&lt;/span&gt; dataset, we can artificially correct for the correlation by adjusting the game ratings to penalize complex games and reward simpler games. For the more mathematically inclined among you, I&amp;#8217;m referring to the residuals of the regression between rating and&amp;nbsp;complexity.&lt;/p&gt;
&lt;p&gt;A short illustration goes a long way to intuitively explain what the process&amp;nbsp;does.&lt;/p&gt;
&lt;p&gt;&lt;center&gt;
&lt;img alt="Alt Text" src="https://dvatvani.github.io/static/BGG-analysis/manual_animation.gif" /&gt;
&lt;/center&gt;&lt;/p&gt;
&lt;p&gt;Applying that artificial correction gives us a &amp;#8220;complexity-agnostic&amp;#8221; rating for all games. Below is an interactive plot showing the rating vs complexity after the rating correction. Hover over any point to see the name of the game and the game&amp;#8217;s new &lt;span class="caps"&gt;BGG&lt;/span&gt; rank and&amp;nbsp;rating.&lt;/p&gt;
&lt;p&gt;&lt;center&gt;
&lt;iframe width="820" height="470" src="https://dvatvani.github.io/static/BGG-analysis/toolbar.html" frameborder="0" allowfullscreen&gt;
&lt;/iframe&gt;&lt;br&gt;
&lt;strong&gt;&lt;em&gt;Hover your mouse over (or tap if you&amp;#8217;re on mobile) any point for more information about the game&lt;/em&gt;&lt;/strong&gt;
&lt;/center&gt;&lt;/p&gt;
&lt;p&gt;We can use these corrected ratings to re-rank all of the games and obtain a complexity-agnostic top 100 list. Note that &lt;span class="caps"&gt;BGG&lt;/span&gt; use something called a Bayesian mean to rank their games instead of taking the raw average ratings. What this does is effectively give each game a certain number of additional &amp;#8220;average&amp;#8221; rating votes. This is designed to push games with a very low number of ratings towards the average to prevent the top games list being dominated by games with only a couple of perfect score ratings. I&amp;#8217;ve used a similar approach, using the same Bayesian prior as &lt;span class="caps"&gt;BGG&lt;/span&gt; (Bayesian prior of about &lt;code&gt;5.5&lt;/code&gt; with a weight of around &lt;code&gt;1,000&lt;/code&gt; ratings). As a result, there may be some cases where a game with a higher average rating end up having a lower rank than a game with a slightly lower average rating if the second has significantly more rating votes. The re-ranked &lt;span class="caps"&gt;BGG&lt;/span&gt; list using these corrected ratings has the complex games evenly spread throughout the ranked list of games rather than disproportionately skewed towards the top, thereby allowing some of the great, but less complex, games to shine through to the top&amp;nbsp;100.&lt;/p&gt;
&lt;p&gt;I have applied the complexity-bias correction to all games with over 30 rating votes. Below is an interactive table that allows you to navigate the full list. It also includes a search function to find the impact of the complexity-bias correction on specific&amp;nbsp;games.&lt;/p&gt;
&lt;p&gt;&lt;object width="100%" height="700" type="text/html" data="https://dvatvani.github.io/static/BGG-analysis/data.html"&gt;&lt;/object&gt;
 &lt;center&gt;&lt;em&gt;Note: This table only includes games with &amp;gt;= 30 rating votes&lt;/em&gt;&lt;/center&gt;&lt;/p&gt;
&lt;p&gt;Some of the games experienced a fairly substantial push up/down the rankings ladder as a result of the complexity bias correction. Some of the games that benefitted the most from this rating correction and have risen to the top 100 are &lt;code&gt;Skull&lt;/code&gt;, &lt;code&gt;BANG! The Dice Game&lt;/code&gt;, &lt;code&gt;Love Letter: Batman&lt;/code&gt;, &lt;code&gt;No Thanks!&lt;/code&gt;, &lt;code&gt;Time's Up!&lt;/code&gt;, &lt;code&gt;Spyfall&lt;/code&gt; and &lt;code&gt;Sushi Go!&lt;/code&gt;. Conversely, some of the games that have been penalized the most are &lt;code&gt;Twilight Imperium (Third Edition)&lt;/code&gt;, &lt;code&gt;Alchemists&lt;/code&gt;, &lt;code&gt;War of the Ring (first edition)&lt;/code&gt;, &lt;code&gt;A Game of Thrones: The Board Game (Second Edition)&lt;/code&gt;, &lt;code&gt;Through the Ages: A Story of Civilization&lt;/code&gt; and &lt;code&gt;Caverna: The Cave Farmers&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Looking at the revised top 100 from the list above, I still have some reservations about it, but it looks much more reasonable to me than the original &lt;span class="caps"&gt;BGG&lt;/span&gt; top 100 list. I suspect that for most board game players looking to try out new good games, this list would look far more approachable, while still being filled with excellent&amp;nbsp;games.&lt;/p&gt;
&lt;p&gt;I hope that you&amp;#8217;ve enjoyed learning about the complexity bias inherent in the BoardGameGeek dataset and how we can correct for it. The discussion on whether or not complex games really are better is far from over, but hopefully people looking for some of the lighter great games to play will find this more welcoming take on the &lt;span class="caps"&gt;BGG&lt;/span&gt; top 100&amp;nbsp;useful.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;The code I wrote for this analysis can be found &lt;a href="http://nbviewer.jupyter.org/github/dvatvani/dvatvani.github.io/blob/master/static/BGG-analysis/BGG_analysis_-_complexity_bias_correction.ipynb"&gt;here&lt;/a&gt; in the form of a Jupyter Notebook&amp;nbsp;(Python).&lt;/em&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;em&gt;Thanks to&lt;/em&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://twitter.com/ColmSeeley"&gt;Colm Seeley&lt;/a&gt; for co-authoring this work with&amp;nbsp;me&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Catherine Maddox for great feedback on the writing and presentation of the&amp;nbsp;post&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Quintin Smith (Quinns) from Shut Up &lt;span class="amp"&gt;&amp;amp;&lt;/span&gt; Sit Down for allowing me to use material from one of his talks in a presentation of this&amp;nbsp;analysis&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;GitHub user &lt;code&gt;TheWeatherman&lt;/code&gt; for creating the &lt;a href="https://github.com/ThaWeatherman"&gt;&lt;span class="caps"&gt;BGG&lt;/span&gt; scraper&lt;/a&gt; that I modified to collect the data used for this&amp;nbsp;analysis.&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;_GitHub user &lt;code&gt;vividvilla&lt;/code&gt; for building the useful &lt;a href="https://github.com/vividvilla/csvtotable"&gt;CSVtoTable&lt;/a&gt;&amp;nbsp;tool&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;If you enjoyed reading this, you may also&amp;nbsp;enjoy:&lt;/em&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.shutupandsitdown.com/bgg100-100-81/"&gt;&lt;em&gt;Shut Up &lt;span class="amp"&gt;&amp;amp;&lt;/span&gt; Sit Down&amp;#8217;s take on the &lt;span class="caps"&gt;BGG&lt;/span&gt; top&amp;nbsp;100&lt;/em&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://opinionatedgamers.com/2015/08/17/are-boardgames-getting-better-an-empirical-analysis/"&gt;&lt;em&gt;Are Boardgames Getting Better? An Empirical Analysis&lt;/em&gt;&lt;/a&gt; by &lt;a href="https://opinionatedgamers.com/"&gt;&lt;em&gt;Opinionated&amp;nbsp;Gamers&lt;/em&gt;&lt;/a&gt;&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://boardgamegeek.com/blogpost/11991/numbers-bgg-rank-data-analysis"&gt;&lt;em&gt;By the Numbers - &lt;span class="caps"&gt;BGG&lt;/span&gt; Rank Data + Analysis&lt;/em&gt;&lt;/a&gt; by Oliver&amp;nbsp;Kiley&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;</summary><category term="Board games"></category></entry><entry><title>An analysis of board games: Part I - Introduction and general trends</title><link href="https://dvatvani.github.io/BGG-Analysis-Part-1.html" rel="alternate"></link><updated>2018-12-08T03:30:00+00:00</updated><author><name>Dinesh Vatvani</name></author><id>tag:dvatvani.github.io,2018-03-05:BGG-Analysis-Part-1.html</id><summary type="html">&lt;p&gt;This is part I in my series on analysing BoardGameGeek data. Other parts can be found&amp;nbsp;here:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Part I: Introduction and general&amp;nbsp;trends&lt;/li&gt;
&lt;li&gt;&lt;a href="./BGG-Analysis-Part-2.html"&gt;Part &lt;span class="caps"&gt;II&lt;/span&gt;: Complexity bias in &lt;span class="caps"&gt;BGG&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="./BGG-Analysis-Part-3.html"&gt;Part &lt;span class="caps"&gt;III&lt;/span&gt;: Mapping the board game&amp;nbsp;landscape&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr /&gt;
&lt;h1&gt;Introduction&lt;/h1&gt;
&lt;p&gt;Over the last few years, board games have become one of my favoured pastimes. My journey of discovery in this space has been very enjoyable, but the deeper I delve down the rabbit hole, the more it makes me wonder about the board game landscape as a whole, particularly about the genres I haven&amp;#8217;t tried, the different types of mechanics I&amp;#8217;ve not been exposed to, games that have an unusual pairing of mechanics and how the board game landscape as a whole has evolved over time. I found a few different bits of analysis on Kaggle, in forums and blogs that scratched the surface of these topics, but not enough to relieve the itch of my curiosity, so I decided to get my hands dirty and rummage through the data mine&amp;nbsp;myself.&lt;/p&gt;
&lt;h1&gt;Data collection and&amp;nbsp;description.&lt;/h1&gt;
&lt;p&gt;BoardGameGeek is a fantastic database for board game information, so it seemed like a no-brainer to me to use this as the main source of the data for my analysis. There are pre-scraped and ready to use &lt;span class="caps"&gt;BGG&lt;/span&gt; datasets on &lt;a href="https://www.kaggle.com/mrpantherson/board-game-data/data"&gt;Kaggle&lt;/a&gt; and &lt;a href="https://github.com/ThaWeatherman/scrapers/tree/master/boardgamegeek"&gt;GitHub&lt;/a&gt;, but neither of those suited my purpose since the Kaggle dataset is limited to the top 5000 board games on &lt;span class="caps"&gt;BGG&lt;/span&gt; and the GitHub dataset is 2 years old and is also missing some data fields that I was interested in such as a list of mechanics for each board game. I decided to re-run a modified version of the scraper I found on &lt;a href="https://github.com/ThaWeatherman/scrapers/tree/master/boardgamegeek"&gt;GitHub&lt;/a&gt; to allow me to fetch additional fields such as a list of mechanics, categories and designers that were not collected by the original scraper and obtain a slightly richer and more up-to-date board games dataset. This generated a dataset containing &lt;code&gt;76,597&lt;/code&gt; board games and &lt;code&gt;13,675&lt;/code&gt; board game expansions. The modified scraper and the scraped dataset can both be found &lt;a href="https://github.com/dvatvani/dvatvani.github.io/tree/master/static/BGG-analysis/scraper_and_data"&gt;here&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;For the analysis in this post, we&amp;#8217;ll be focusing on base board games only, not&amp;nbsp;expansions.&lt;/p&gt;
&lt;h1&gt;A note on sampling bias in the&amp;nbsp;dataset&lt;/h1&gt;
&lt;p&gt;Before we delve into any any serious analysis, we should highlight that any patterns or observations found here reflect patterns observed within the boardgamegeek dataset. Depending on the context in the analysis, these observations may or may not be reflective of the board game industry as a whole, since the perspective and behaviour of boardgamegeek users will not always accurately represent all board game players. People who have a boardgamegeek account and actively log plays and rate games are very likely to be more invested and informed on board games compared to the average board game player. A good demonstration of this bias can be seen in the list of most owned games on&amp;nbsp;boardgamegeek.&lt;/p&gt;
&lt;p&gt;&lt;center&gt;&lt;img alt="Most popular games" src="/static/BGG-analysis/Most_popular_board_games.png" /&gt;&lt;/center&gt;&lt;/p&gt;
&lt;p&gt;Whilst the exact sales figures are hard to come by, it is generally agreed that the most popular (by ownership, not rating) board games include Chess, Monopoly, Risk, Scrabble, Pictionary, Cluedo, Trivial Pursuit, etc. (sources: &lt;a href="http://minnesotasnewcountry.com/top-selling-board-games-of-all-time/"&gt;1&lt;/a&gt;, &lt;a href="https://www.insidermonkey.com/blog/11-most-sold-board-games-ever-373692/11/"&gt;2&lt;/a&gt;, &lt;a href="https://www.therichest.com/rich-list/most-popular/the-top-10-most-sold-board-games-ever/"&gt;3&lt;/a&gt;, &lt;a href="https://hobbylark.com/board-games/The-Top-Ten-Board-Games-Of-All-Time"&gt;4&lt;/a&gt;). All of these games are under-represented in the &lt;span class="caps"&gt;BGG&lt;/span&gt; dataset due to the aforementioned bias. There will be many other cases in the analysis where this bias is likely having an effect, but I&amp;#8217;ll address them as they&amp;nbsp;come.&lt;/p&gt;
&lt;h1&gt;A golden age of board&amp;nbsp;games&lt;/h1&gt;
&lt;p&gt;There has been a lot of discussion suggesting that we are currently in a board game golden age (sources: &lt;a href="https://www.theguardian.com/technology/2016/sep/25/board-games-back-tabletop-gaming-boom-pandemic-flash-point"&gt;1&lt;/a&gt;, &lt;a href="https://boardgamegeek.com/thread/1679807/golden-age-board-gaming"&gt;2&lt;/a&gt;, &lt;a href="https://www.shutupandsitdown.com/videos/board-game-golden-age-talk/"&gt;3&lt;/a&gt;). I thought it would be interesting to see if the data supports this&amp;nbsp;view.&lt;/p&gt;
&lt;h3&gt;Board game publication rate over&amp;nbsp;time&lt;/h3&gt;
&lt;p&gt;&lt;center&gt;&lt;img alt="Growth in number of games published each year" src="https://dvatvani.github.io/static/BGG-analysis/game-release-rate-growth.png" /&gt;&lt;/center&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Historically, there has been a broadly exponential increase in board games coming out each&amp;nbsp;year.&lt;/li&gt;
&lt;li&gt;Based on the exponential growth of board game publications observed so far, we expect the number of board games published over the course of a year to double every 12.6 years. This is the board games analogue of &lt;a href="https://en.wikipedia.org/wiki/Moore%27s_law"&gt;Moore&amp;#8217;s&amp;nbsp;law&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;We are currently observing the release of around &lt;code&gt;3,500&lt;/code&gt; new board games every year, and that number is increasing by around &lt;code&gt;5.7%&lt;/code&gt; each&amp;nbsp;year.&lt;/li&gt;
&lt;li&gt;The growth of the industry appears to have stagnated between the mid 1980s and late 1990s. There was, however, a disproportionate surge in growth of number of board games released per year from 1999 to 2005 that made up for the stagnation observed in the previous years. It&amp;#8217;s not entirely clear to me why the stagnation or surge occurred during those years, but given that the transition between the stagnation and surge aligns with the release of the boardgamegeek website (first available in 2000), it&amp;#8217;s possible that these changes in new games published per year are an artefact in the data due to the availability of boardgamegeek (i.e. obscure board games before the existence of boardgamegeek may have been lost to the sands of time, whereas after the existence of boardgamegeek, it&amp;#8217;s more likely that obscure games will still make it to the&amp;nbsp;database).&lt;/li&gt;
&lt;li&gt;It is worth remembering that this is just referring to new games, and it doesn&amp;#8217;t even include&amp;nbsp;expansions!&lt;/li&gt;
&lt;li&gt;Overall, whilst the number of new board games released per year currently appears to be very high, it is currently in near perfect alignment with what&amp;#8217;s expected given the historic growth trends of the board game industry. There is nothing particularly unique or different about the &lt;em&gt;rate of release&lt;/em&gt; of new board games to support the view that we&amp;#8217;re currently in a board game golden age. However, the rate of release of board games is just one of many aspects that could lead people to believe that we&amp;#8217;re in a board game golden age. We can have a look at some other factors&amp;nbsp;too.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Board game ratings by publication&amp;nbsp;year&lt;/h3&gt;
&lt;p&gt;&lt;center&gt;&lt;img alt="Evolution of average game ratings by year" src="https://dvatvani.github.io/static/BGG-analysis/rating-by-year.png" /&gt;&lt;/center&gt;&lt;/p&gt;
&lt;p&gt;This data suggests that board games have been getting steadily better since around 2002, but that there&amp;#8217;s been a disproportionate improvement in in game ratings in the last couple of years. While the last couple of years certainly appears to have seen the release of great games such as Pandemic Legacy, Gloomhaven, Scythe, etc., it&amp;#8217;s not clear to me what caused the games to get disproportionately better in the last couple of years. Perhaps the consumer market for board games increased in size noticeably, leading to more resources poured into game development, but unfortunately, I don&amp;#8217;t have the data to test&amp;nbsp;this.&lt;/p&gt;
&lt;p&gt;It&amp;#8217;s also worth noting that there may be an element of ratings inflation present in this data i.e. the baseline rating for an average game has increased, because people might have perceived a rating of 5 to be average a decade ago, but now they might perceive a rating of 7 to be average, so average ratings could be increasing over time despite games not necessarily getting&amp;nbsp;better.&lt;/p&gt;
&lt;h1&gt;Other industry trends over&amp;nbsp;time&lt;/h1&gt;
&lt;h3&gt;Complexity&lt;/h3&gt;
&lt;p&gt;The &amp;#8220;complexity&amp;#8221; of a board game is a relatively loosely defined term, since it encompasses different types of learning and decision making characteristics involved in learning how to play as well as playing a game. To give a quick example, Chess/Go are relatively simple games in terms of their rules. In both cases, the rules can all be concisely explained and understood in a few minutes. However, getting a full grasp of all the strategies and tactics made possible by these simple rules can take a very long time as there is a considerable amount of complexity born from the number of different moves possible each turn as well as the fact that every move affects the available possible moves in future turns (i.e. turns are not independent). Boardgamegeek contains a &amp;#8220;weight&amp;#8221; score board games (rated by users) that provides a reduced, all-encompassing sense of the complexity of a game, based on users&amp;#8217; perception. We can look at how the complexity scores of board games have evolved based on when games were released. I&amp;#8217;ve focussed on games post 1995 since the dataset of games that have enough weight ratings before then starts to get a bit thin before&amp;nbsp;then.&lt;/p&gt;
&lt;p&gt;&lt;center&gt;&lt;img alt="Evolution of complexity over time" src="https://dvatvani.github.io/static/BGG-analysis/complexity-by-year.png" /&gt;&lt;/center&gt;&lt;/p&gt;
&lt;p&gt;It appears that board games have not only been getting better in terms of ratings, but also more complex since the mid 1990s. The trends in the complexity mirror the trends we see in overall ratings, with games appearing to have gotten steadily more complex since the early 2000s, and the last couple of years exhibiting a disproportionate growth in&amp;nbsp;complexity.&lt;/p&gt;
&lt;p&gt;The parallels between the trends in overall ratings and complexity beg for a more direct comparison between them, but that&amp;#8217;s a fairly substantial topic in itself, so I&amp;#8217;ll address that in a future post of this series where the analysis has more room to&amp;nbsp;breathe.&lt;/p&gt;
&lt;h3&gt;Mechanics&lt;/h3&gt;
&lt;p&gt;Mechanics are the basic constructs of rules or methods that allow you to interact with a board game to allow gameplay. These can be simple things such as dice rolling or drawing (e.g. Pictionary), to slightly more involved cases such as Card drafting or Route/Network building. I&amp;#8217;ve listed the mechanics on &lt;span class="caps"&gt;BGG&lt;/span&gt; below by how often they&amp;#8217;re found on board games. Many of the mechanics&amp;#8217; names are intuitive, but some require more explanation. Descriptions and examples of all mechanics can be found &lt;a href="https://boardgamegeek.com/wiki/page/mechanism"&gt;here&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Most popular&amp;nbsp;mechanics&lt;/p&gt;
&lt;p&gt;&lt;center&gt;&lt;img alt="Most popular mechanics" src="/static/BGG-analysis/Most_popular_mechanics.png" /&gt;&lt;/center&gt;&lt;/p&gt;
&lt;p&gt;Unsurprisingly, the top 2 are related to dice, which is often seen as an iconic staple of board&amp;nbsp;games.&lt;/p&gt;
&lt;p&gt;We can also look at how mechanics prevalence has changed over the last couple of&amp;nbsp;decades.&lt;/p&gt;
&lt;p&gt;&lt;center&gt;&lt;img alt="Comparison of mechanics prevalence in the 1990s and 2010s" src="https://dvatvani.github.io/static/BGG-analysis/Mechanics_in_90s_and_10s.png" /&gt;&lt;/center&gt;&lt;/p&gt;
&lt;p&gt;There are several mechanics that have become more popular over the last couple of decades, but the most notable is &lt;code&gt;Hand management&lt;/code&gt; (choosing which set of cards to keep in your hand and which to play/discard since some card combinations will be better than others and there may a limit on the number of cards you can keep in your hand. Examples of popular board games with the &lt;code&gt;Hand management&lt;/code&gt; mechanic are &lt;code&gt;Settlers of Catan&lt;/code&gt; and &lt;code&gt;Pandemic&lt;/code&gt;). I don&amp;#8217;t have any robust insight into why this might be the case, but it&amp;#8217;s perhaps because it can incorporate interesting decision making elements into games without requiring much explanation or complicated rules. Once the function of the cards/sets of cards in your hand are understood, the intricacies of the trade-offs, risks and benefits of playing/discarding particular cards and managing your hand often become self evident. The theme of mechanics that force you to think and make decisions is also present in other mechanics on the rise such as &lt;code&gt;Variable player powers&lt;/code&gt;, &lt;code&gt;Area control/Area influence&lt;/code&gt;, &lt;code&gt;Card drafting&lt;/code&gt;, &lt;code&gt;Deck building&lt;/code&gt;, &lt;code&gt;Worker placement&lt;/code&gt;, etc. On the other end of the spectrum, we have &lt;code&gt;Roll/Spin and move&lt;/code&gt; mechanic (This is the mechanic present in &lt;code&gt;Monopoly&lt;/code&gt; or &lt;code&gt;Snakes and ladders&lt;/code&gt; in which the movement of a player&amp;#8217;s token is decided by dice roll or the spin of a wheel) experiencing a huge decline. It&amp;#8217;s perhaps unsurprising to see its decline since this mechanic requires little to no consideration or decision making at all, and is therefore a less engaging way to interact with a board&amp;nbsp;game.&lt;/p&gt;
&lt;p&gt;It is also worth noting that there are many more mechanics on the rise than on the fall. This is because modern games are beginning to include more mechanics. The inclusion of more mechanics over time is consistent with the analysis earlier suggesting that games are becoming more complex with&amp;nbsp;time.&lt;/p&gt;
&lt;p&gt;&lt;center&gt;&lt;img alt="Increase in number of mechanics over time" src="https://dvatvani.github.io/static/BGG-analysis/mechanics_inflation.png" /&gt;&lt;/center&gt;&lt;/p&gt;
&lt;h3&gt;Themes&lt;/h3&gt;
&lt;p&gt;The &lt;span class="caps"&gt;BGG&lt;/span&gt; classification taxonomy for games is a little odd. At the highest level, they contain a game &lt;code&gt;Type&lt;/code&gt; classification (e.g. &lt;code&gt;Strategy&lt;/code&gt;, &lt;code&gt;Thematic&lt;/code&gt;, &lt;code&gt;Wargame&lt;/code&gt;, &lt;code&gt;Party&lt;/code&gt;, etc.). There aren&amp;#8217;t very many classification types, and more than 75% of games on &lt;span class="caps"&gt;BGG&lt;/span&gt; don&amp;#8217;t even contain a &lt;code&gt;Type&lt;/code&gt; classification, so I won&amp;#8217;t be analysing it in this post, but if you&amp;#8217;re interested in looking of most prevalent game types, etc. they can be found in the raw analysis Jupyter notebook file. The next level in the taxonomy is &lt;code&gt;Category&lt;/code&gt;. A cursory look at the values in this field will show that it&amp;#8217;s a disorganised mix of themes, game &amp;#8220;types&amp;#8221; and mechanics (as an example, &lt;code&gt;Codenames&lt;/code&gt; contains the Categories: &lt;code&gt;Party Game&lt;/code&gt;, &lt;code&gt;Card Game&lt;/code&gt;, &lt;code&gt;Word Game&lt;/code&gt;, &lt;code&gt;Deduction&lt;/code&gt;, &lt;code&gt;Spies/Secret Agents&lt;/code&gt;). I&amp;#8217;ve manually filtered the list of tags in the &lt;code&gt;Category&lt;/code&gt; classification level down to only include themes, since they were the elements that appeared most consistently under the &lt;code&gt;Category&lt;/code&gt; classification. &lt;span class="caps"&gt;BGG&lt;/span&gt; also contains further levels of classification taxonomy, but I&amp;#8217;ll only be looking into themes derived from tags in the &lt;code&gt;Type&lt;/code&gt; field for this&amp;nbsp;analysis.&lt;/p&gt;
&lt;p&gt;The most popular themes in &lt;span class="caps"&gt;BGG&lt;/span&gt;&amp;nbsp;are:&lt;/p&gt;
&lt;p&gt;&lt;center&gt;&lt;img alt="Most popular themes" src="https://dvatvani.github.io/static/BGG-analysis/Most_popular_themes.png" /&gt;&lt;/center&gt;&lt;/p&gt;
&lt;p&gt;Similar to the mechanics, we can also see which themes have become more/less popular over the last couple of&amp;nbsp;decades.&lt;/p&gt;
&lt;p&gt;&lt;center&gt;&lt;img alt="Comparison of themes prevalence in the 1990s and 2010s" src="https://dvatvani.github.io/static/BGG-analysis/Themes_in_90s_and_10s.png" /&gt;&lt;/center&gt;&lt;/p&gt;
&lt;p&gt;Looking at the most popular themes, there is a big shift occurring in the dominant themes present in games. In fact, 8 out of the top 10 most popular themes are either in the top 5 most rapidly rising or top 5 most rapidly declining lists, suggesting a strong and clear shift in the themes that engage the current generation of board game players. The themes that are on the rise include &lt;code&gt;Fantasy&lt;/code&gt;, &lt;code&gt;Science fiction&lt;/code&gt; and &lt;code&gt;Fighting&lt;/code&gt;, whereas &lt;code&gt;Trivia&lt;/code&gt;, &lt;code&gt;Movies / TV / Radio theme&lt;/code&gt;, &lt;code&gt;Sports&lt;/code&gt;, &lt;code&gt;Racing&lt;/code&gt; and &lt;code&gt;Economic&lt;/code&gt; are all on the decline. This seems to suggest that the themes that capture our imagination today are slightly less grounded in reality (at least compared to 20 years&amp;nbsp;ago).&lt;/p&gt;
&lt;p&gt;More analysis on the &lt;span class="caps"&gt;BGG&lt;/span&gt; dataset can be found in &lt;code&gt;part II&lt;/code&gt; (Coming soon) of this series of analysis posts, in which we explore the relationship between mechanics, themes and game&amp;nbsp;ratings.&lt;/p&gt;
&lt;p&gt;I&amp;#8217;ve left a lot of smaller bits of analysis that didn&amp;#8217;t fit into the structure of this write-up, but if you&amp;#8217;re interested in learning more about the board game dataset, you can find all of my analysis, including the code &lt;a href="http://nbviewer.jupyter.org/github/dvatvani/dvatvani.github.io/blob/master/static/BGG-analysis/BGG_analysis.ipynb"&gt;here&lt;/a&gt; in the form of a Jupyter Notebook&amp;nbsp;(Python).&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;em&gt;Thanks to&lt;/em&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://twitter.com/ColmSeeley"&gt;Colm Seeley&lt;/a&gt; for introducing me to the world of modern board games, countless discussions and ideas on interesting things to do with the dataset, helping me structure the analysis and for providing some manually collected data. I look forward to co-presenting this analysis with him in the&amp;nbsp;future.&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Catherine Maddox for great feedback on the writing and presentation of the&amp;nbsp;post&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Yihui Fan and Hugh Thompson for helpful feedback on the clarity and aesthetics of the&amp;nbsp;graphs.&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;GitHub user &lt;code&gt;TheWeatherman&lt;/code&gt; for creating the &lt;a href="https://github.com/ThaWeatherman"&gt;&lt;span class="caps"&gt;BGG&lt;/span&gt; scraper&lt;/a&gt; that I modified for this&amp;nbsp;analysis.&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;If you enjoyed reading this, you may also&amp;nbsp;enjoy:&lt;/em&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://www.kaggle.com/mrpantherson/board-game-data#"&gt;&lt;em&gt;Boardgamegeek dataset on Kaggle&lt;/em&gt;&lt;/a&gt;, with multiple users&amp;#8217; analysis on the&amp;nbsp;dataset&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://opinionatedgamers.com/2015/08/17/are-boardgames-getting-better-an-empirical-analysis/"&gt;&lt;em&gt;Are Boardgames Getting Better? An Empirical Analysis&lt;/em&gt;&lt;/a&gt; by &lt;a href="https://opinionatedgamers.com/"&gt;&lt;em&gt;Opinionated&amp;nbsp;Gamers&lt;/em&gt;&lt;/a&gt;&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://boardgamegeek.com/blogpost/11991/numbers-bgg-rank-data-analysis"&gt;&lt;em&gt;By the Numbers - &lt;span class="caps"&gt;BGG&lt;/span&gt; Rank Data + Analysis&lt;/em&gt;&lt;/a&gt; by Oliver&amp;nbsp;Kiley&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;</summary><category term="Board games"></category></entry><entry><title>TV show episode ratings</title><link href="https://dvatvani.github.io/tv-show-episode-ratings.html" rel="alternate"></link><updated>2016-05-28T22:00:00+01:00</updated><author><name>Dinesh Vatvani</name></author><id>tag:dvatvani.github.io,2016-05-28:tv-show-episode-ratings.html</id><summary type="html">&lt;p&gt;This post is about a simple visualisation of the episode ratings of &lt;span class="caps"&gt;TV&lt;/span&gt; shows. The idea behind this is heavily borrowed from &lt;a href="http://graphtv.kevinformatics.com/"&gt;Graph &lt;span class="caps"&gt;TV&lt;/span&gt;&lt;/a&gt;. I use that site often and really like it, but the plots it generates are based on &lt;a href="http://www.imdb.com/"&gt;IMDb&lt;/a&gt; rating data. I&amp;#8217;ve always wanted something similar but using &lt;a href="https://trakt.tv/"&gt;Trakt.tv&lt;/a&gt; rating data instead, so I decided to write a script to do just&amp;nbsp;that. &lt;/p&gt;
&lt;p&gt;Below are the episode ratings for the top 10 most popular shows, according to Trakt.tv. The plots are interactive. You can hover over a point to get more information on the episode or pan/zoom on the data using the tools on the bottom left of each&amp;nbsp;plot. &lt;/p&gt;
&lt;p&gt;I will likely create a small web app to make it easier to generate the plots online for any tv show at some stage in the future, but if anyone is interested in generating similar plots for other shows now, the Python code to generate the plots is available &lt;a href="https://github.com/dvatvani/trakt-ratings-trends"&gt;here on GitHub&lt;/a&gt;. A Jupyter notebook with the code can also be found &lt;a href="http://nbviewer.jupyter.org/github/dvatvani/trakt-ratings-trends/blob/master/trakt-ratings-trends.ipynb"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;style&gt;

table
{
  border-collapse: collapse;
  width: 300px;
}
th
{
  color: #ffffff;
  background-color: #000000;
}
td
{
  background-color: #cccccc;
}
table, th, td
{
  font-family:Arial, Helvetica, sans-serif;
  border: 1px solid black;
  text-align: right;
}

&lt;/style&gt;

&lt;div id="fig_el43921586921842571185746"&gt;&lt;/div&gt;

&lt;script type="text/javascript" src="https://dvatvani.github.io/static/tv-show-episode-ratings/game_of_thrones.js"&gt;&lt;/script&gt;

&lt;style&gt;

table
{
  border-collapse: collapse;
  width: 300px;
}
th
{
  color: #ffffff;
  background-color: #000000;
}
td
{
  background-color: #cccccc;
}
table, th, td
{
  font-family:Arial, Helvetica, sans-serif;
  border: 1px solid black;
  text-align: right;
}

&lt;/style&gt;

&lt;div id="fig_el43921951669525390596189"&gt;&lt;/div&gt;

&lt;script type="text/javascript" src="https://dvatvani.github.io/static/tv-show-episode-ratings/breaking_bad.js"&gt;&lt;/script&gt;

&lt;style&gt;

table
{
  border-collapse: collapse;
  width: 300px;
}
th
{
  color: #ffffff;
  background-color: #000000;
}
td
{
  background-color: #cccccc;
}
table, th, td
{
  font-family:Arial, Helvetica, sans-serif;
  border: 1px solid black;
  text-align: right;
}

&lt;/style&gt;

&lt;div id="fig_el43921573256326407222326"&gt;&lt;/div&gt;

&lt;script type="text/javascript" src="https://dvatvani.github.io/static/tv-show-episode-ratings/the_big_bang_theory.js"&gt;&lt;/script&gt;

&lt;style&gt;

table
{
  border-collapse: collapse;
  width: 300px;
}
th
{
  color: #ffffff;
  background-color: #000000;
}
td
{
  background-color: #cccccc;
}
table, th, td
{
  font-family:Arial, Helvetica, sans-serif;
  border: 1px solid black;
  text-align: right;
}

&lt;/style&gt;

&lt;div id="fig_el43922826681681423509412"&gt;&lt;/div&gt;

&lt;script type="text/javascript" src="https://dvatvani.github.io/static/tv-show-episode-ratings/the_walking_dead.js"&gt;&lt;/script&gt;

&lt;style&gt;

table
{
  border-collapse: collapse;
  width: 300px;
}
th
{
  color: #ffffff;
  background-color: #000000;
}
td
{
  background-color: #cccccc;
}
table, th, td
{
  font-family:Arial, Helvetica, sans-serif;
  border: 1px solid black;
  text-align: right;
}

&lt;/style&gt;

&lt;div id="fig_el43921932024407192895674"&gt;&lt;/div&gt;

&lt;script type="text/javascript" src="https://dvatvani.github.io/static/tv-show-episode-ratings/dexter.js"&gt;&lt;/script&gt;

&lt;style&gt;

table
{
  border-collapse: collapse;
  width: 300px;
}
th
{
  color: #ffffff;
  background-color: #000000;
}
td
{
  background-color: #cccccc;
}
table, th, td
{
  font-family:Arial, Helvetica, sans-serif;
  border: 1px solid black;
  text-align: right;
}

&lt;/style&gt;

&lt;div id="fig_el43921916642569335910854"&gt;&lt;/div&gt;

&lt;script type="text/javascript" src="https://dvatvani.github.io/static/tv-show-episode-ratings/how_i_met_your_mother.js"&gt;&lt;/script&gt;

&lt;style&gt;

table
{
  border-collapse: collapse;
  width: 300px;
}
th
{
  color: #ffffff;
  background-color: #000000;
}
td
{
  background-color: #cccccc;
}
table, th, td
{
  font-family:Arial, Helvetica, sans-serif;
  border: 1px solid black;
  text-align: right;
}

&lt;/style&gt;

&lt;div id="fig_el43922452610884251329649"&gt;&lt;/div&gt;

&lt;script type="text/javascript" src="https://dvatvani.github.io/static/tv-show-episode-ratings/sherlock.js"&gt;&lt;/script&gt;

&lt;style&gt;

table
{
  border-collapse: collapse;
  width: 300px;
}
th
{
  color: #ffffff;
  background-color: #000000;
}
td
{
  background-color: #cccccc;
}
table, th, td
{
  font-family:Arial, Helvetica, sans-serif;
  border: 1px solid black;
  text-align: right;
}

&lt;/style&gt;

&lt;div id="fig_el439222331763277133446"&gt;&lt;/div&gt;

&lt;script type="text/javascript" src="https://dvatvani.github.io/static/tv-show-episode-ratings/arrow.js"&gt;&lt;/script&gt;

&lt;style&gt;

table
{
  border-collapse: collapse;
  width: 300px;
}
th
{
  color: #ffffff;
  background-color: #000000;
}
td
{
  background-color: #cccccc;
}
table, th, td
{
  font-family:Arial, Helvetica, sans-serif;
  border: 1px solid black;
  text-align: right;
}

&lt;/style&gt;

&lt;div id="fig_el43921909720324799833593"&gt;&lt;/div&gt;

&lt;script type="text/javascript" src="https://dvatvani.github.io/static/tv-show-episode-ratings/homeland.js"&gt;&lt;/script&gt;

&lt;style&gt;

table
{
  border-collapse: collapse;
  width: 300px;
}
th
{
  color: #ffffff;
  background-color: #000000;
}
td
{
  background-color: #cccccc;
}
table, th, td
{
  font-family:Arial, Helvetica, sans-serif;
  border: 1px solid black;
  text-align: right;
}

&lt;/style&gt;

&lt;div id="fig_el43921571023769423801765"&gt;&lt;/div&gt;

&lt;script type="text/javascript" src="https://dvatvani.github.io/static/tv-show-episode-ratings/friends.js"&gt;&lt;/script&gt;</summary><category term="Python"></category><category term="TV"></category></entry><entry><title>A Song of Ice and Fire : Chapter ratings</title><link href="https://dvatvani.github.io/ASOIAF-Chapter-ratings.html" rel="alternate"></link><updated>2016-04-10T23:12:00+01:00</updated><author><name>Dinesh Vatvani</name></author><id>tag:dvatvani.github.io,2016-04-10:ASOIAF-Chapter-ratings.html</id><summary type="html">&lt;p&gt;This post relates to Game of Thrones, or more specifically to the series of books the show is based on: A Song of Ice and&amp;nbsp;Fire. &lt;/p&gt;
&lt;p&gt;The website &lt;a href="http://towerofthehand.com/"&gt;Tower of the Hand&lt;/a&gt; contains ratings for each chapter in the series of books. Chapters&amp;#8217; ratings are generated by users. Each chapter has ratings from typically around 150 people so there will still be a reasonable amount of uncertainty around each chapter rating, but there is still enough information in here to give us broad ideas about the overall progression in how interesting the books are, the most interesting books and the most interesting &lt;span class="caps"&gt;POV&lt;/span&gt;&amp;nbsp;characters.&lt;/p&gt;
&lt;p&gt;Let&amp;#8217;s start by having a quick overview of the progression of chapter ratings across the entire&amp;nbsp;series.&lt;/p&gt;
&lt;h1&gt;Chapter ratings in the entire&amp;nbsp;series&lt;/h1&gt;
&lt;p&gt;&lt;img alt="Chapter ratings in all books" src="https://dvatvani.github.io/static/ASOIAF-Chapter-ratings/ASOIAF_-_all_books_-_chapter_ratings.png" /&gt;&lt;/p&gt;
&lt;p&gt;This overview suggests&amp;nbsp;that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A Game of Thrones is fairly consistent in its chapter&amp;nbsp;ratings&lt;/li&gt;
&lt;li&gt;The final quarter of A Clash of Kings is comparatively&amp;nbsp;dull&lt;/li&gt;
&lt;li&gt;A Storm of Swords gets better as the book&amp;nbsp;progresses&lt;/li&gt;
&lt;li&gt;A Feast for Crows is not as good as the other books, but it gets better as the book&amp;nbsp;progresses&lt;/li&gt;
&lt;li&gt;A Dance with Dragons is the most inconsistent in terms of chapter&amp;nbsp;ratings&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Having read the books, I&amp;#8217;m inclined to agree with the overview provided by the ratings so&amp;nbsp;far.&lt;/p&gt;
&lt;p&gt;Let&amp;#8217;s break the chapters down by book to have a slightly closer look at the&amp;nbsp;ratings. &lt;/p&gt;
&lt;h1&gt;Chapter ratings by&amp;nbsp;book&lt;/h1&gt;
&lt;p&gt;&lt;img alt="Chapter ratings in AGOT" src="https://dvatvani.github.io/static/ASOIAF-Chapter-ratings/ASOIAF_-_AGOT_-_chapter_ratings.png" /&gt;
&lt;img alt="Chapter ratings in ACOK" src="https://dvatvani.github.io/static/ASOIAF-Chapter-ratings/ASOIAF_-_ACOK_-_chapter_ratings.png" /&gt;
&lt;img alt="Chapter ratings in ASOS" src="https://dvatvani.github.io/static/ASOIAF-Chapter-ratings/ASOIAF_-_ASOS_-_chapter_ratings.png" /&gt;
&lt;img alt="Chapter ratings in AFFC" src="https://dvatvani.github.io/static/ASOIAF-Chapter-ratings/ASOIAF_-_AFFC_-_chapter_ratings.png" /&gt;
&lt;img alt="Chapter ratings in ADWD" src="https://dvatvani.github.io/static/ASOIAF-Chapter-ratings/ASOIAF_-_ADWD_-_chapter_ratings.png" /&gt;&lt;/p&gt;
&lt;h1&gt;Chapter ratings by &lt;span class="caps"&gt;POV&lt;/span&gt;&amp;nbsp;character&lt;/h1&gt;
&lt;p&gt;It can be interesting to break down the chapter ratings by the point of view characters to see how the various plot lines progress in terms of maintaining reader&amp;nbsp;interest.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Chapter ratings by POV character" src="https://dvatvani.github.io/static/ASOIAF-Chapter-ratings/Chapter_ratings_by_POV_character_-_all_books.png" /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Daenerys&lt;/strong&gt; : With the exception of one strong chapter, Daenerys&amp;#8217; chapters in the final book are not very&amp;nbsp;good.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Ned&lt;/strong&gt; : As we all know, he was a short lived character, but his chapters were consistently&amp;nbsp;great&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Brienne&lt;/strong&gt; : Many people complain about Brienne&amp;#8217;s chapters in &lt;code&gt;AFFC&lt;/code&gt;. It&amp;#8217;s interesting to see that Brienne&amp;#8217;s chapters start out being dull, but appear to get more interesting as the book&amp;nbsp;progresses.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tyrion&lt;/strong&gt; : Goes from having several strong chapters in the previous books to having a weak showing in &lt;code&gt;ADWD&lt;/code&gt;. The drop in the quality of his storyline is particularly jarring considering the strength of his chapters at the end of &lt;code&gt;ASOS&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Theon/Reek&lt;/strong&gt; : One of the few consistently solid &lt;span class="caps"&gt;POV&lt;/span&gt; characters in &lt;code&gt;ADWD&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;Chapter rating distributions by&amp;nbsp;book&lt;/h1&gt;
&lt;p&gt;&lt;img alt="Chapter ratings distribution by book - violin" src="https://dvatvani.github.io/static/ASOIAF-Chapter-ratings/Chapter_rating_distributions_by_book_-_violin.png" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;img alt="Chapter ratings distribution by book - box" src="https://dvatvani.github.io/static/ASOIAF-Chapter-ratings/Chapter_rating_distributions_by_book_-_box.png" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;center&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th align="center"&gt;Book&lt;/th&gt;
&lt;th align="center"&gt;mean&lt;/th&gt;
&lt;th align="center"&gt;std. dev.&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td align="center"&gt;&lt;span class="caps"&gt;AGOT&lt;/span&gt;&lt;/td&gt;
&lt;td align="center"&gt;8.21&lt;/td&gt;
&lt;td align="center"&gt;0.56&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="center"&gt;&lt;span class="caps"&gt;ACOK&lt;/span&gt;&lt;/td&gt;
&lt;td align="center"&gt;7.75&lt;/td&gt;
&lt;td align="center"&gt;0.70&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="center"&gt;&lt;span class="caps"&gt;ASOS&lt;/span&gt;&lt;/td&gt;
&lt;td align="center"&gt;7.99&lt;/td&gt;
&lt;td align="center"&gt;0.63&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="center"&gt;&lt;span class="caps"&gt;AFFC&lt;/span&gt;&lt;/td&gt;
&lt;td align="center"&gt;7.55&lt;/td&gt;
&lt;td align="center"&gt;0.52&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td align="center"&gt;&lt;span class="caps"&gt;ADWD&lt;/span&gt;&lt;/td&gt;
&lt;td align="center"&gt;8.03&lt;/td&gt;
&lt;td align="center"&gt;0.69&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;/center&gt;&lt;/p&gt;
&lt;p&gt;If we rank the books by the average ratings of the chapters in each book, they rank in the order &lt;code&gt;AGOT&lt;/code&gt; &amp;gt; &lt;code&gt;ADWD&lt;/code&gt; ≈ &lt;code&gt;ASOS&lt;/code&gt; &amp;gt; &lt;code&gt;ACOK&lt;/code&gt; &amp;gt; &lt;code&gt;AFFC&lt;/code&gt;. 
The overall book ratings on &lt;a href="https://www.goodreads.com/series/43790-a-song-of-ice-and-fire"&gt;Goodreads&lt;/a&gt;, however, suggest that &lt;code&gt;ASOS&lt;/code&gt; &amp;gt; &lt;code&gt;AGOT&lt;/code&gt; &amp;gt; &lt;code&gt;ACOK&lt;/code&gt; &amp;gt; &lt;code&gt;ADWD&lt;/code&gt; &amp;gt; &lt;code&gt;AFFC&lt;/code&gt;. 
Personally, my views on the quality of the books are more aligned with the Goodreads ratings, but it&amp;#8217;s likely because the overall experience of a book is not well represented by the average of its&amp;nbsp;chapters.&lt;/p&gt;
&lt;h1&gt;Chapter rating distributions by &lt;span class="caps"&gt;POV&lt;/span&gt;&amp;nbsp;character&lt;/h1&gt;
&lt;p&gt;We can also have a look at the distributions of chapter ratings in each book to see which of the &lt;span class="caps"&gt;POV&lt;/span&gt; characters have the better&amp;nbsp;chapters.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Chapter ratings distribution by POV - violin" src="https://dvatvani.github.io/static/ASOIAF-Chapter-ratings/Chapter_rating_distributions_by_POV_character_-_violin.png" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;img alt="Chapter ratings distribution by POV - box" src="https://dvatvani.github.io/static/ASOIAF-Chapter-ratings/Chapter_rating_distributions_by_POV_character_-_box.png" /&gt;&lt;/p&gt;
&lt;p&gt;The distributions are ranked by average chapter rating, with the highest average on the left. The top few &lt;span class="caps"&gt;POV&lt;/span&gt; characters are all characters with a single &lt;span class="caps"&gt;POV&lt;/span&gt; chapter so far. From the characters that have multiple &lt;span class="caps"&gt;POV&lt;/span&gt; chapters, Ned Stark has the most interesting chapters. It goes some way to explain why he&amp;#8217;s such a fan favourite&amp;nbsp;character.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;If you have any ideas about what might be interesting to do with this dataset, let me know in the comments. The Jupyter notebook that was used to generate all the plots in this blog post can be found &lt;a href="http://nbviewer.jupyter.org/github/dvatvani/dvatvani.github.io/blob/master/static/ASOIAF-Chapter-ratings/ASOIAF_chapter_ratings.ipynb"&gt;here&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;</summary><category term="Python"></category><category term="Game of Thrones"></category></entry><entry><title>Solving the 8 Queens problem with python</title><link href="https://dvatvani.github.io/8-Queens.html" rel="alternate"></link><updated>2016-03-28T00:45:00+01:00</updated><author><name>Dinesh Vatvani</name></author><id>tag:dvatvani.github.io,2016-03-28:8-Queens.html</id><summary type="html">&lt;p&gt;This is my approach to solving the 8 Queens puzzle with&amp;nbsp;Python. &lt;/p&gt;
&lt;p&gt;For anyone unfamiliar with the 8 Queens puzzle, it is the problem of placing eight queens on a standard (8x8) chessboard such that no queen is in a position that can attack any other. This post will have the solutions to the puzzle, so if you&amp;#8217;d like to attempt to solve it on your own, now would be a good time to stop reading this&amp;nbsp;post.&lt;/p&gt;
&lt;p&gt;I was first made aware of the existence of this puzzle in a pub one evening with some friends. One of my friends started trying to solve the puzzle manually and found a solution in about 10 minutes. This inspired me to attempt to tackle the problem with Python to see if I would have been able to find a solution faster. I took me around 15 minutes to solve the puzzle using python, but found 92 solutions (there are 12 if you eliminate symmetrically related&amp;nbsp;solutions). &lt;/p&gt;
&lt;p&gt;This original code I wrote to solve the problem looked like&amp;nbsp;this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;itertools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;permutations&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;combinations&lt;/span&gt;

&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;How big is your chess board?&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;is_diagonal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;point1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;point2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;x1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;point1&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;y1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;point1&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;x2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;point2&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;y2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;point2&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;gradient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y2&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;y1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x2&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;x1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;gradient&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;gradient&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;list_of_permutations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;permuation&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;permutations&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
    &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;permuation&lt;/span&gt;
    &lt;span class="n"&gt;all_permutations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;list_of_permutations&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;all_permutations&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;possible_solution&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;list_of_permutations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;solutions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;piece1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;piece2&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;combinations&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;possible_solution&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;solutions&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;is_diagonal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;piece1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;piece2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;solutions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;possible_solution&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;


&lt;p&gt;I&amp;#8217;ve since expanded it to make it easier to understand, abstracting some useful functions and added some code to remove equivalent solutions and help visualise the solutions, but the code above contains the main logic that runs at the heart of the approach I took. The expanded version of the code can be found &lt;a href="http://nbviewer.jupyter.org/github/dvatvani/dvatvani.github.io/blob/master/static/8-Queens/8_Queens_problem.ipynb"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Let&amp;#8217;s break it down a little bit to explain what&amp;#8217;s&amp;nbsp;happening. &lt;/p&gt;
&lt;p&gt;We know that no two queens can attack each other. This means that there must be 1 queen per row. Similarly, there must be 1 queen per column. In this approach, we&amp;#8217;re going to take 8 queens, assign them to the rows 1-8 and determine what columns they must each be in in order for the puzzle criteria to be satisfied. Since there are 8 queens and 8 column positions, there are 40,320 (nPr with n=r=8) ways to arrange 8 queens on a chessboard such that there is one queen per row and 1 queen per column. Since we already know what none of the queens will be attacking any other horizontally or vertically, all we need to do is to check each of the 40,320 arrangements to see if any queen is diagonally threatening any other. This takes about a second to run in total (1.06 seconds on my mid-range 5-year-old Desktop computer) for all 40,320 possible queen arrangements and returns 92 solutions that fit the criteria of having no queen attacking any other. Some of these will be symmetrically related. For example, here are 8 solutions from the set of 92 that are related to each other through 90 or 180 degree rotations; or mirror planes (i.e. they are horizontal, vertical or diagonal mirror images of each&amp;nbsp;other).&lt;/p&gt;
&lt;p&gt;&lt;img alt="symmetry_equivalent_solutions_example" src="/static/8-Queens/symmetry_equivalent_solutions.png" /&gt;&lt;/p&gt;
&lt;p&gt;When we remove the solutions that are related, we are left with the 12 unique solutions for the 8x8 board case, shown&amp;nbsp;below:&lt;/p&gt;
&lt;p&gt;&lt;img alt="unique_solutions" src="/static/8-Queens/unique_solutions.png" /&gt;&lt;/p&gt;
&lt;p&gt;The Jupyter notebook containing the current version of the code is available &lt;a href="http://nbviewer.jupyter.org/github/dvatvani/dvatvani.github.io/blob/master/static/8-Queens/8_Queens_problem.ipynb"&gt;here&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Thanks to my&amp;nbsp;friends:&lt;/em&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Daniele Tomerini for introducing me to this&amp;nbsp;puzzle&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Hugh Thompson, whose attempts at solving this puzzle manually inspired me to tackle it using&amp;nbsp;python&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;</summary><category term="Python"></category><category term="Puzzles"></category></entry></feed>