I successfully defended my thesis in the Fall of 2021, which completed my Master’s of Science in Computer Science at the University of Kentucky advised by Dr. Brent Harrison. During my undergraduate studies, I worked on an independent study project where I studied the benefits of using machine learning to procedurally generate platforms in an infinite runner game. In doing that project I found research to be very enjoyable and interesting which motivated me to continue similar work in graduate school.


The Problem

PCG is used in Minecraft to generate large worlds.

Procedural content generation (PCG) is a means for creating a large amount of content for a video game by random generation while following a defined set of rules. This is a popular technique in games with high replayability such as Minecraft, Borderlands, and most roguelike games such as The Binding of Isaac. The problem with PCG is that the content is pseudo random, meaning there is no telling what will be presented to the player and there is no guarantee that the player’s experience will be a good one. For example, imagine playing a dungeon crawler for the first time and has to fight a monster before presented with an opportunity to obtain a weapon and is killed as a result. This is a frustrating player experience and may cause the player to quit the game. Sure you could write rules in your dungeon generation that ensures a player finds a sword in the first room and enemies can’t spawn until later, but what if you really wanted to jump right into the action and have the player fight right away? We would need to control the player’s experienced game events on a room level, which is the focus of my research and was successful using a genetic algorithm.

Previous Work

Others researchers in the field of machine learning applied to PCG have studied similar ideas by involving a game designer in the process of level creation. Such methods are called mixed initiative tools. Previously, these tools have been used to create variants based on a designer’s level with varying complexity, asking for designer feedback and applying it over many iterations of a level, and generic level grammars that inform the generator of the rules of the level (ex: level must have an accessible key if there is a lock). While these approaches can utilize the game designer’s input to inform the level generation, they still fail to ensure a series of game events will occur in a particular order. One study has achieved the goal of using a game designer’s sequence of desired game events and ensure the player experiences them, however this was achieved in a linear sidescrolling platformer game in which the player had no freedom to experience the game events in any other order than in which they are presented. When the game environment has more than one axis on which the player can move, the problem becomes much more difficult if you don’t force the player into a hallway where the game events line up. It is much more difficult to control the game events a player experiences when they have the freedom to explore, however I found it to be an interesting problem that I was keen on solving.

Comparison between game events that can only be experienced linearly and game events that have no predetermined sequence.

Environment

To those familiar with the Unity Engine and the official tutorials, the game environment used for my experiment is surely recognizable. The game is a simple roguelike called “Scavenger” where the player must navigate across 8x8 grid levels reaching the exit while avoiding enemies and collecting food. The player is allowed to explore the level and break impassable walls to create new paths, however each step taken removes one point from the player’s food count and being attacked by an enemy removes 20. When the food counter reaches 0, the player starves to death. This game environment is great because it is a roguelike which traditionally heavily utilize PCG, is explorable with a clear objective, the actions are all deterministic, includes a few game events but not too many to overcomplicate this study, and is played in discrete time meaning time advances in integer values and only once the player has moved which is easier to work with when simulating gameplay programmatically.

Additionally the game is 2D which makes it easy to represent in a text which makes it straightforward to use as input for a genetic algorithm.

Testing the Environment

Game events are an abstract feature of a level. You can’t look at a level and instantly know what events the player will experience. So to test a level to obtain experienced game evets I needed a gameplay agent to test each level. I created an A* agent because this behavior best represents how a human player would play this particular game. The goal of the game is to stay alive and navigate the level efficiently for minimum food loss. A* agents navigate environments maximizing reward, so if we consider the food count as reward, the A* agent will complete levels with the maximum food remaining. As the agent navigates the level, events are recorded and upon completion those events are returned and associated with the level.

Genetic Algorithms

From the previous works I read, genetic algorithms showed to be a viable method for generating levels to contain specific criteria or features. I considered a game event to be a feature of a level, making this an attractive approach. A genetic algorithm consists of three parts: a fitness function, crossover, and mutation. These three processes are repeated until a number of individuals are generated to fill the next generation. For my experiment each generation contained 25 levels and ran for 25 generations. These numbers were selected because they allow for good variance in the levels in a single generation and is a sufficient amount of generations to reach an equilibrium in average fitness.

Fitness function

This describes how well a datum fits a defined desirable model and determines its likelihood to be used for reproduction. In this experiment, this is the how closely the designer’s defined sequence of events fits those experienced by an A* agent testing the individual.

Example of crossover between two Scavenger levels.

Crossover

After all individuals in the current generation have been scored by the fitness function, we create the next generation using a method called crossover. This is achieved by picking 2 parents at random with replacement, weighted by their fitness values, splitting each at an arbitrary point, and combining those split parts into 2 new individuals. The idea being that the parents that best fit the fitness function will be used more often to create a generation that fits the fitness function even better. Survival of the fittest.

Example of mutation where an enemy is removed and a new tile receives food.



Mutation

If we used the stopped at crossover, there is a possibility that no individuals in the previous generation fit the fitness function at all therefore all future generations would never fit either. To remedy this we subject each point in the data to mutation. In the context of this game environment, tiles in the level can be randomly changed to an empty floor tile or wall tile, or gain food or an enemy. What the tile mutates into is randomly selected, but weighted by what events the designer defined in their sequence of events relative to how often those events are being seen in the generated levels. After each generation, the probability of a tile being mutated is decreased until it is nearly 0 to prevent levels with high fitness from being mutated into a lower fitness.


Experiments

To prove the effectiveness of my program, I ran experiments to observe the average fitness of all levels in each generation over time. The average fitness for the 25 levels of the current generation were compared to the average fitness of a set of 1000 purely randomly generated levels (containing 0-4 enemies, 1-5 food pickups, and 5-9 walls). This gave me a baseline to ensure my genetic algorithm was fitting the series of events dictated by a designer better than a purely random generator. The initial 25 levels for the first generation were similarly generated, but not included in the set of 1000 levels.

I gave the genetic algorithm several different permutations of event sequences of varying length and complexity and recorded the average fitness to those sequences after each generation of levels.

Results and Discussion

The following graphs represent the average fitness over time for the generation of levels by the solid lines and represent the baseline by the horizontal dashed line. Each sequence is given a unique color. Each character of the event sequence represent a unique event (‘F’ = eat food, ‘E’ = attacked by enemy, ‘X’ = wall broken, ‘W’ = win). Each sequence must end with a ‘W’ indicating that the level is able to be completed successfully.

Single Event Sequences

The first experiment was a sanity check to ensure the algorithm could control the player to experience a single event before completing the level. The tested sequences were FW, and EW.

The graph clearly shows that the average fitness of the generated levels is clearly greater than that of the baseline, which is a good indicator that the program is out performing the purely random generator. Notice that the average fitness does not reach a value of exactly 1. This is fine because it is the average over 25 levels, all of which are subject to some amount of mutation which may cause some of the final level to not perfectly match the desired event sequences. However, in our final generations, the levels with the top 3 fitness values all returned the exact target sequence of events.

The difference in baselines between the two event sequences is curious as well, but easily explained. The “eat food” event should be experienced very frequently and the “attacked by enemy” should be experienced very infrequently because the agent testing the level is A* meaning it is maximizing reward (food), and minimizing cost (losing health to enemies). With this in mind, it is even more impressive that the levels generated are triggering the event to be attacked by the enemy EXACTLY once. If we wanted to have the player take damage to an enemy, the simple design choice is to fill the level with enemies, but that would likely trigger several damage events. The genetic algorithm here created levels that were carefully designed to calculate a single damage event even on an optimal path.

Double Event Sequences

The next experiment was the true test of the effectiveness of my system because it would evaluate its ability to ensure a sequence of events occurring in a particular order, even if it is merely two events. Again we see that both sequences approach an average generational fitness of 1, which is a clear indicator of the top 3 fitness values of the final generations exactly returning the desired sequence of events.

This is an interesting graph because the average fitness of the random set of levels to the FFW event sequence is very high. As a result, our system cannot do much better than the random levels, but it still does. We expect this is because, again, the food pickup is the most desirable by the agent, so it will generate those events more often.

Longer Event Sequences

This graph shows the limitation of our system. In a sequence of 3 events, the average generational fitness is at most 0.6, which is not nearly as good as it was in the previous experiments. However, keep in mind from our fitness function that even one mistake in the sequence can have a large negative impact on the fitness value. It should also be noted that the output levels for this test (3 highest) had fitness values in the 0.7 to 0.8 range, which is still quite good on an individual basis. Though not perfect, the player will still experience most of what the designer intended.

In a sequence of 4 events, the system is incapable of generating a level which produces the desired sequence.

As the desired event sequence grows in length, the more concentrated with items the level will be. Since our level is quite small (8x8) managing more than 2 events is reasonably difficult to achieve. We notice this especially in the sequences of length 4 because many of those levels are impossible to be completed in a reasonable amount of time by the agent, which is likely why the fitness values are so low.

Perhaps we would see better results if we extended the cutoff times when the agent is simulating, however, when originally making these experiments, we did not want to vary the cutoff times so that we had a fair comparison to the other tests.

Generated level for events FW

This level is interesting because there are enemies guarding each food. If the generator were taking the easy way out, then there would never be enemies when we want just the food event.

Generated level for events XFW

An interesting level design where the player must choose if it is worthwhile to dig out the food that is completely walled in, or take the longer route with no obstacles

Conclusion

We have seen that prior to our system, there had been no research attempts to implement a procedural level generator which can ensure a predefined sequence of game events is experienced by players in a freely-explorable, 2D environment. And I have shown that using agent simulation and a genetic algorithm is a successful method to do so.

Some limitations to the system are that it currently takes this system roughly 3 minutes to fully test a generation of 25 levels. Since we need about 10 generations for successful results, we are looking at about 30 minutes to generate a valid level. Arguably, a designer could hand design a level in that amount of time for this particular environment. This is somewhere where I could improve. 

I would also like to compare other methods to this one to see if perhaps an LSTM could produce better or more timely results.