Advent Of Code and Algorithms
By Tyler
Authors note
I’ve had a hard time thinking of a topic to write on for this month. I knew I wanted something to do with Christmas, but couldn’t come up with something I found suitable. However at work this month I helped give two trainings where we solved one of the Advent of Code Puzzles for this year, and discussed the algorithmic complexity of two different solutions. Coming up on the end of the month, I decided that a discussion of the training, being tied to the advent of code, might be as much of a link to Christmas as I was going to get. I hope you all had a thoughtful and loving Christmas and have a happy New Year.
Introduction
Many years ago I came across Skiena’s Algorithm Design Manual. It was when I was first learning about computer science, and I found the topic enthralling. Skiena’s stories were intriguing and the methods taught were approachable. However when it came to the analysis of algorithms, I felt a little out of my element. The problem was that I really didn’t have enough experience to understand what the purpose of it all was. While I could grasp the ideas behind it, I failed to understand them.
Fast forward a few years, and I had the opportunity to provide a training at work. Through the course of this training we were able to learn and talk about these same concepts, such as algorithmic complexity and Big-O notation. The exercise was so exciting, and fun. Whats more is that now with experience under my belt, I felt I understood the concepts. My hope with this post is that at least a little of that excitment comes accross, and maybe some of the understanding too.
Problem Statement
For the training we decided that we would solve Day 6 of the 2021 Advent of Code. If you haven’t ever had the pleasure of learning about the Advent of Code, it is yearly fixture wherein every day leading up to Christmas a new programming challenge or puzzle is released. In this post I would like to walk down the same problem, with an eye to algorithmic analysis of the two solutions we came up with.
For a full treatment of the problem, I invite you to read the statement from the above link. Otherwise here is the wrap up. We are searching the sea floor in a yuletide submarine, searching for Santa’s sleigh keys. While there we notice a sizeable population of lanternfish and decide to model their population growth. Every 7 days a lanternfish will spawn a new fish, with newly spawned fish taking 9 days to spawn their first child fish, and then 7 days for all following children.
When developing and analyzing algorithms, the first thing we want to do is specify the task formally, describing both our inputs and our outputs. This will give us some common ideas to work with and reference in our solutions. Lets do that:
Given an array
Now that we have formally specified our task at hand we can see that we are dealing with two inputs:
Solution 1: Simulation
The code
The most straightforward solution would be to simulate the population growth by modeling each fish, and adding to the initial array as we go. This is the process that is described by the problem statement. Following the instructions is a surefire way to get the correct answer, so lets try it out. Below is some pseudo-code describing such a method:
|
|
That is fairly simple, with each line corresponding to a step in the above
defined process. We start the procedure by initializing a loop over each day. Then
for each day we loop over the existing values in the array. If a value in the
array is equal to 0, we change it to be 6, and push a new value of 8 to the
array. Otherwise we simple decrement the value in question by one. Once we
have gone through all the days, we return the resulting length of the array
How do we analyze algorithms?
The solution is simple and for now we will assume it is correct too. But how can we know how efficient it is before we do the work of programming it? This is where we need to start looking at a way to model our code in a theoretical computer. This will allow us to ignore strange details like the hardware being used, and the efficiency of languages used to implement our code. Using such a model would also allow us compare two or more procedures and make judgments on their relative efficiency.
The most common model when doing such an analysis is the Random-access Machine (RAM) model. With the RAM we assume we have a computer which can perform basic instructions (add, subtract, multiply, divide, assign variables, access variables, etc.) each in some constant (though not necessarily equal for each instruction) amount of time. Also the machine acts sequentially, processing each instruction in order.
Using this model, we can analyze pseudo-code of an algorithm and get an idea of
how fast it can execute, before even programming it into a real machine. We
simply count how many times each step will be executed, multiplying it by the
cost of that step. With algorithms that only use basic instructions we will assume
each line has some independent constant cost
Analyzing the simulation method
So let’s start counting the number of times each statement is executed, in our proposed solution.
|
|
Lines 1, and 8 are easy. Line 1 is executed A.length()
will grow as we
progress. Not only that but because the
Because of this we are going to make 3 simplifying assumptions:
- We are going to assume our initial array starts with all
. - We will not account for the initial “cooldown” for each new fish. In other
words, whenever a new
is added to it will have a value of 6, not 8. - Finally we assume that
is divisible by 7.
We can justify these assumptions, since both Assumptions 1 and 2 have the effect
of overestimating the time our procedure will take. This gives us an upper bound
on the worst case possible. Assumption 3 doesn’t affect the final outcome on a
worst case level, as it is trivial to pick an
Armed with those three assumptions, we can begin to estimate how many times we
will run lines 2-7. In order to get a feel for it lets begin looking at the first few
days individually. In our calculations we will let A
, and
Then on day 0:
We run the loop
And so on until
Following this pattern:
So then the total number of times line 2 is run would be:
Which we can rewrite as
This is a nasty sum, but it is solvable, reducing to
We have now calculated how many times line 2 is run. Using similar logic, we can
see that Line 3 is run
Lines 4, 5 are both run
We now have a good estimate on how long our algorithm will run, given any size
This is where Big-O notation
steps in. Even with all the assumptions we have made, we are getting more detail
than we really care for. What Big-O does is shows us how our algorithm grows in
the limit. As
In doing this we can say that our Simulation algorithm runs in
Solution 2: Counting with a dictionary
The Code
Because we know that all fish advance days at the same intervals (i.e every fish experiences a new day at the same rate), any two fish on the same cycle are indistinguishable. What we could do, rather than simulate each individual fish, is keep track of how many fish are on what day in their countdown towards spawning a new fish. We could use a dictionary that maps each day (0-8), with the number of fish on that day in their reproduction cycle. Each day, we advance the values at each key to the next key, with key 0 being added to day 6, and day 8, representing the spawning of new fish. See below for our pseudo-code for this process:
|
|
Analysis
Because we are not dealing with the exponential growth of our array, analyzing the pseudo-code is far simpler:
|
|
Lets look at Each Line:
- Line 1 has a cost of 0 because it is a comment and not executed
- Line 2 has a cost of 0 because it is a comment and not executed
- Line 3 starts off our procedure, by creating a new dictionary with the keys 0 through 8, with each key having a 0 value. Because the dictionary is fixed in size we assume a constant cost to this step.
- Line 4 starts a loop to seed our
fish_dict
with the counts of fish at each day in the reproduction cycle, this means it will loop over all items, plus 1 other time to close the loop. - Line 5 is run
times. - Line 6 is run
times. - Line 7 has a cost of 0 because it is a comment and not executed
- Line 8 is run
times - Line 9 is run
times. - Line 10 is run
times because it is run 9 times (once for every number 1-8, plus the time to close the loop), for every iteration of the outer loop. - Line 11 is run
times, once for every number 1-8, for every iteration of the outer loop. - Line 12 is run
times. - Line 13 is run
times.
Our resulting equation is:
Once again, we only care about the highest order polynomial, which in this case
is
Does it work?
All of this math and assumptions seems like an awful lot like armchair
programming. Does any of this pan out in reality? To answer that question I
implemented both versions of the pseduocode into a single python script (see
appendix). I then timed the execution for
So what?
There are a lot of things I enjoy about working with algorithms, and analyzing them. The math can be fun, and coming up with an efficient solution is rewarding. However there is a deeper insight I have learned from algorithm design, than merely getting computers to run faster. It is to “Work Smarter not Harder”. Christmas time and the New Year are natural times for us to pause and reflect on new goals and habits that we want to build into our lives. Sometimes the list of things we want to accomplish can feel overwhelming. Like our first solution, if we rely on our own sheer willpower we will quickly use up that limited resource. I know that in my own life, struggles that I have had were overcome, not by myself alone, but with the help of faith, family and friends. Christmas time is when we can recognize The Gift God has given us, with Christ showing us a more excellent way to live our lives in happiness. Rely on these resources and work smarter this coming year. Merry Christmas and Happy New Year!
Appendix
Summation
In order to solve:
we first remember that
Using the linearity of summations we can expand this to:
Which simplifies to:
Now that we have it simplified we can see that our main summation is a Geometric Series. Luckily some big brained mathematicians have figured out how to derive a formula from a geometric series, which means we can write this all as
Simulate.py
#!/usr/bin/env python3
"""Simulate the Population Growth of Lanternfish"""
import argparse
def main():
parser = argparse.ArgumentParser()
parser.add_argument(
"-m", "--method", choices=["simulate", "mapping"], default="simulate"
)
parser.add_argument("days", type=int)
args = parser.parse_args()
if args.method == "simulate":
simulate_lanternfish(args.days)
else:
mapping_lanternfish(args.days)
DATA = [
1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1,
1, 1, 4, 1, 1, 1, 1, 3, 1, 1, 3, 1, 1, 1, 4, 1, 5, 1, 3, 1, 1, 1, 1, 1, 5,
1, 1, 1, 1, 1, 5, 5, 2, 5, 1, 1, 2, 1, 1, 1, 1, 3, 4, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 5, 4, 1, 1, 1, 1, 1, 5, 1, 2, 4, 1, 1, 1, 1,
1, 3, 3, 2, 1, 1, 4, 1, 1, 5, 5, 1, 1, 1, 1, 1, 2, 5, 1, 4, 1, 1, 1, 1, 1,
1, 2, 1, 1, 5, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 4, 3, 1, 1, 3, 1, 3, 1, 4, 1, 5, 4, 1, 1, 2, 1, 1,
5, 1, 1, 1, 1, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 4, 1, 1, 1, 1, 1,
1, 1, 5, 4, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 4, 1, 1, 1, 2, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 2, 1, 2, 1, 1,
4, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 3, 2, 1, 4, 1, 5, 1, 1, 1,
4, 5, 1, 1, 1, 1, 1, 1, 5, 1, 1, 5, 1, 2, 1, 1, 2, 4, 1, 1, 2, 1, 5, 5, 3,
]
def advance_day(fish_dict):
repro = fish_dict[0]
for i in range(1, 9):
fish_dict[i - 1] = fish_dict[i]
fish_dict[6] += repro
fish_dict[8] = repro
def mapping_lanternfish(n: int):
"""Simulates the Growth of a lanternfish population
This was our second algorithm written because our original became too slow at even moderate
amount of days. It Takes the fish, makes an initial counter of all the fish located at each
day, and then iterates over the days, moving and adding to the counter to keep track of
the number of lantern fish.
Args:
n (int): Number of days to simulate
"""
fish = {
# Day: Count
0: 0,
1: 0,
2: 0,
3: 0,
4: 0,
5: 0,
6: 0,
7: 0,
8: 0,
}
# Inital Seeding of fish:
for f in DATA:
fish[f] += 1
for _ in range(0, n):
advance_day(fish)
print(sum(fish.values()))
class LanternFish:
def __init__(self, time_to_reproduce=8):
self._time_to_reproduce = time_to_reproduce
def __str__(self):
return str(self._time_to_reproduce)
def advance_age(self):
"""Advances age, and produces a New Fish if time has come"""
baby_fish = None
self._time_to_reproduce -= 1
# self._time_to_reproduce = self._time_to_reproduce - 1
if self._time_to_reproduce < 0:
baby_fish = self.reproduce()
self._time_to_reproduce = 6
return baby_fish
def reproduce(self):
return LanternFish()
def simulate_lanternfish(n: int):
"""Simulates the Growth of the lanternfish population
This is our initial 'naive' attempt at solving a simulation for the growth of the lanternfish
population. The algorithim is straightforward, as it starts with an initial group of lantern
fish and iterates over it for every day, adding fish to the arrray as required.
Args:
n (int): Number of days to simulate
"""
initial_fish = [LanternFish(x) for x in DATA]
for day in range(0, n):
new_fish = []
for fish in initial_fish:
baby = fish.advance_age()
if baby:
new_fish.append(baby)
initial_fish.extend(new_fish)
if day % 10 == 0:
print(f"Day {day}: Total {len(initial_fish)}")
print(f"final total {len(initial_fish)}")
if __name__ == "__main__":
main()