Gamer Genotypes (A Principal Components Analysis of User Ratings)

Introduction

So, after a brief hiatus due to work getting busy, I'm back and ready for some more analysis of board gaming data.  During this time, I was able to download individual user ratings for each game in the BoardGameGeek database.  This incredibly rich dataset allows us to do more sophisticated analysis that wasn't possible with just the games data.

For this post, we're going to run a principal components analysis (PCA) on individual users' ratings data.  Essentially, what PCA does is take a high-dimensional dataset and break it down into a few "principal components" that together explain most of the variation in the data.  A principal component is a vector that points a certain direction in the data.  As an example, if most BGG users either love wargames and hate party games, or hate wargames and love party games, then one of our principal components will point in the direction of high wargame ratings and low party game ratings.  Users who love wargames and hate party games will have a high value for that component, and users who hate wargames and love party games will have a low value.

By breaking ratings data down into principal components, we can learn about what the most common groups of preferences are among BGG users.

Data and Methodology

The data comes from BGG's database on individual user ratings for each game.  All users and all games are used, including expansions, for a total of over 7 million ratings.  For each user, I compute the average rating they gave to games of different mechanics, categories, and types.  Specifically, I compute the average rating given to:

  • Each mechanic and category in BGG's official list of mechanics and categories
  • Playing time between 1-30 minutes, 31-60 minutes, 61-120 minutes, 121+ minutes
  • Game weight between 1-2 / 2-3 / 3-4 / 4-5
  • Max players=2, Max players=3-4, Max players=5+

Of course, not every user gave a rating to a game in each of these groups.  To deal with this, I seed each user with a single 5.5 rating for each of the above categories.  This has the added benefit of pushing up the average rating when the user has rated more games out of a given category, but with decreasing marginal returns (going from 1 to 5 ratings in a category has a big effect, but going from 100 to 105 has a small effect).  This is also consistent with BGG's practice of computing their Bayesian average rating by seeding games with a number of 5.5 ratings in addition to actual user ratings.

The final dataset contains the average ratings by game category for 151,000 users. Because the ratings categories are all the same units, I did not rescale them before running the PCA.

Results

I will focus on the first four principal components.  The scree plot below shows that by the time we reach the fourth component, the additional explanatory power gained by additional components starts to become small and fairly constant.

scree_pca

The first four components will also have the most natural interpretation, as we shall see below.  For each principal component, I will show the top positive factor loadings (high ratings) and the top negative factor loadings (low ratings)

Principal Component 1:  The "General Gamer Gene"

Top Positive Factor Loadings Top Negative Factor Loadings
1 Game Weight: 3-4 None
2 Playing Time: 121+ Minutes
3 Category: Economic
4 Category: Civilization
5 Mechanic: Card Drafting
6 Mechanic: Simultaneous Action Selection
7  Mechanic: Action Point Allowance System
8  Mechanic: Area Control / Area Influence
9 Mechanic: Worker Placement
10 Category: Science Fiction

PC1 is interesting because it loads positively on all game categories, with an emphasis on longer, weightier games, and giving high ratings to broadly popular mechanics like card drafting and worker placement, and categories like "civilization" and "economic".  PC1 does not load negatively on any of the factors, indicating that the predominant driver of PC1 is enjoying a broad range of games, and (importantly) having rated many diverse types of games.  For this reason, I call PC1 the "general gamer gene".

Principal Component 2: The "Thematic vs. Strategic Gene"

Top Positive Factor Loadings Top Negative Factor Loadings
1 Category: Fighting Category: Farming
2 Mechanic: Variable Player Powers Category: Economic
3 Category: Horror Mechanic: Worker Placement
4 Category: Adventure Category: Industry/Manufacturing
5 Mechanic: Cooperative Play Category: City Building
6 Category: Role Playing Mechanic: Area Enclosure

PC2 clearly delineates a preference for thematic vs. strategic games.  It loads positively on categories and mechanics that generally foster immersive, thematic experiences, and loads negatively on categories that have weak theme, and are thus more focused on purely strategic interactions.  For this reason, I call PC2 the "thematic vs. strategic gene".

Principal Component 3: The "Wargamer Gene"

Top Positive Factor Loadings Top Negative Factor Loadings
1 Category: Wargame Game Weight: 1-2
2 Mechanic: Campaign / Battle Card Driven Mechanic: Set Collection
3 Mechanic: Hex-and-Counter Playing Time: 1-30 Minutes
4 Category: World War I Category: Card Game
5 Category: World War II Mechanic: Card Drafting
6 Category: Political Category: Party Game

PC3 is basically the wargamer gene.  As you can see, users with a high PC3 love war games, and they tend to dislike short, casual games.

Principal Component 4: The "Social Gamer Gene"

Top Positive Factor Loadings Top Negative Factor Loadings
1  Category: Party Game Game Weight: 3-4
2 Category: Deduction Playing Time: 121+ Minutes
3 Category: Spies/Secret Agents Mechanic: Hand Management
4 Mechanic: Voting Mechanic: Dice Rolling
5 Mechanic: Memory Category: Economic

PC4 is the "social gamer gene".  These users prefer party games, and especially games where there is some kind of social deduction involved.  They appear to dislike games that take too long, or have mechanics that might be considered fiddly, like hand management and dice rolling.

Disclaimer: These are not gamer categories, but gamer genes

One thing I should mention is that these aren't meant to be buckets that we can place gamers into.  The point here isn't to put a user into the "wargamer" bucket and another user into the "social gamer" bucket.  Rather, these are gamer genes that we all share, in either positive or negative quantities.  So if I'm someone who likes social games and wargames, but I don't like thematic games, I'm going to have low values for PC2, and high values for PC3 and PC4.  If I additionally play and rate lots of games, I'll also have a high value of PC1.

My personal gamer genotype

An interesting exercise you can do with PCA is to calculate the value of each principal component for any given user, as long as you have their game ratings.  Since I have the ratings for all users, I can additionally rank users along each component.  Here are my personal values along each component, expressed as a percentile of all users:

 General Gamer Thematic vs. Strategic Wargamer Social Gamer
ekung 55 71 52 45

Basically, what this says about me is that I'm a fairly average BGG user, except that I have a relatively stronger preference for thematic games than most users.  I already know why this is, as I've rated highly a number of cooperative fantasy themed games, like Pathfinder ACG, Lord of the Rings LCG, and Shadows Over Camelot.  So I think it's a fairly accurate depiction of who I am as a gamer.  I tend to think that I'm more of a social gamer than the 45th percentile, but in fairness this isn't reflected in my ratings, since I think the only social games I've rated are Dixit and The Resistance.

Conclusion

I was fairly surprised at how clearly some common stereotypes for gamer personalities came out from the principal components analysis. It tells us that there's some truth to those stereotypes on average, even if every individual is different.  What do you think?  If you'd like your own gamer genotypes analyzed, drop a comment with your BGG username and I'll post it.

 

 

 

4 thoughts on “Gamer Genotypes (A Principal Components Analysis of User Ratings)

    • r0t1prata
      PC1: 76
      PC2: 64
      PC3: 49
      PC4: 96

      You're definitely higher than average for Gene 2, but looks like you have especially high Gene 4!

Leave a Comment