
My very good friend and PhD Student, David Thorpe, is doing very important research about UK based autistic employees, please participate if you fit the criteria!

My very good friend and PhD Student, David Thorpe, is doing very important research about UK based autistic employees, please participate if you fit the criteria!
As with the previous days, I will be starting with getting my test and real input into a good format before even taking pass at the problem…
real_run = False
file_name = "day4-input.txt" if real_run else "day4-test.txt"
# create a list from the file, removing any '\n' characters
data = [line.rstrip('\n') for line in open(file_name)]
# print data to check it's what we want it to be
print(data)
['ecl:gry pid:860033327 eyr:2020 hcl:#fffffd', 'byr:1937 iyr:2017 cid:147 hgt:183cm', '', 'iyr:2013 ecl:amb cid:350 eyr:2023 pid:028048884', 'hcl:#cfa07d byr:1929', '', 'hcl:#ae17e1 iyr:2013', 'eyr:2024', 'ecl:brn pid:760753108 byr:1931', 'hgt:179cm', '', 'hcl:#cfa07d eyr:2025 pid:166559648', 'iyr:2011 ecl:brn hgt:59in']
This time, we have a “passport” spread out across multiple lines, so we have more data prep to do and a dictionary will be best as for each passport there is a series of keys with values.
Since each passport is split by a blank line, this is when we will be able to add the passport to the dictionary (and then we shouldn’t forget the last one…)
list_passports = []
passport_dict = {}
for line in data:
if not line:
list_passports.append(passport_dict)
# clear out the passport each time, else it will "remember" the previous passport
passport_dict = {}
# make sure to continue here, otherwise the rest of the code will execute and throw errors
# (comment definitely not from experience and 5 mins of bug hunting...)
continue
space_splits = line.split(' ')
for item in space_splits:
key, value = item.split(':')
passport_dict[key] = value
# append the final passport dict
list_passports.append(passport_dict)
print(list_passports)
[{'ecl': 'gry', 'pid': '860033327', 'eyr': '2020', 'hcl': '#fffffd', 'byr': '1937', 'iyr': '2017', 'cid': '147', 'hgt': '183cm'}, {'iyr': '2013', 'ecl': 'amb', 'cid': '350', 'eyr': '2023', 'pid': '028048884', 'hcl': '#cfa07d', 'byr': '1929'}, {'hcl': '#ae17e1', 'iyr': '2013', 'eyr': '2024', 'ecl': 'brn', 'pid': '760753108', 'byr': '1931', 'hgt': '179cm'}, {'hcl': '#cfa07d', 'eyr': '2025', 'pid': '166559648', 'iyr': '2011', 'ecl': 'brn', 'hgt': '59in'}]
We need to find out which passports are valid and which ones are not. There are 8 possible keys and we need all of then, apart from CID. SInce we have our passports represented as a dictionary we can get the keys and check that the required keys are a subset of the keys on the dictionary. Our required keys won’t contain CID as we don’t care if it is ther or not!
To check if one list conatins another list, I like to use sets and the “issubset()” method, but there’s some more ways of doing this in this geeksforgeeks article: geeksforgeeks.org/python-check-if-one-list-is-subset-of-other/
Keys:
req_keys = set(["byr", "iyr", "eyr", "hgt", "hcl", "ecl", "pid"])
valid_passports = 0
for passport in list_passports:
passport_keys = set(passport.keys())
print(passport_keys)
if req_keys.issubset(passport_keys):
valid_passports += 1
{'hgt', 'pid', 'ecl', 'cid', 'hcl', 'iyr', 'eyr', 'byr'}
{'cid', 'ecl', 'pid', 'hcl', 'iyr', 'eyr', 'byr'}
{'hgt', 'pid', 'ecl', 'hcl', 'iyr', 'eyr', 'byr'}
{'hgt', 'pid', 'ecl', 'hcl', 'iyr', 'eyr'}
valid_passports
2
Now the regulations on the passports get a little tighter and we need to check the passports further. We also have more data now, given a valid list and an invalid list, and that will teach me for not putting my data prep into a function!
So we’re going to give data prep another go and add the valid, invalid and the combination of the two as a file named mixed. I’ll put all the data prep in one function so I can be transparent about how many times I run through the code!
def data_prep(real_run=False, file_name=""):
file_name = "day4-input.txt" if real_run else file_name
# create a list from the file, removing any '\n' characters
data = [line.rstrip('\n') for line in open(file_name)]
list_passports = []
passport_dict = {}
for line in data:
if not line:
list_passports.append(passport_dict)
# clear out the passport each time, else it will "remember" the previous passport
passport_dict = {}
# make sure to continue here, otherwise the rest of the code will execute and throw errors
# (comment definitely not from experience and 5 mins of bug hunting...)
continue
space_splits = line.split(' ')
for item in space_splits:
key, value = item.split(':')
passport_dict[key] = value
# append the final passport dict
list_passports.append(passport_dict)
return list_passports
invalid_filename = "day4-invalid.txt"
invalid_passports = data_prep(file_name=invalid_filename)
valid_filename = "day4-valid.txt"
valid_passports = data_prep(file_name=valid_filename)
mixed_filename = "day4-mixed.txt"
mixed_passports = data_prep(file_name=mixed_filename)
# byr (Birth Year) - four digits; at least 1920 and at most 2002.
# iyr (Issue Year) - four digits; at least 2010 and at most 2020.
# eyr (Expiration Year) - four digits; at least 2020 and at most 2030.
def year_check(year, key):
# check year is numeric
try:
year_int = int(year)
except ValueError:
# not numeric
return False
# check year is made up of 4 digits
if len(year) != 4:
return False
# if key is byr check between 1920 and 2002
if key == "byr" and (1920 <= year_int <= 2002):
return True
# else if key iyr check between 2010 and 2020
elif key == "iyr" and (2010 <= year_int <= 2020):
return True
# else if key eyr check between 2010 and 2020
elif key == "eyr" and (2020 <= year_int <= 2030):
return True
return False
# Test year_check func does what we want...
print(year_check("2012", "byr")) # Expected: False
print(year_check("2012", "iyr")) # Expected: True
print(year_check("2012", "eyr")) # Expected: False
print(year_check("2022", "byr")) # Expected: False
print(year_check("2022", "iyr")) # Expected: False
print(year_check("2022", "eyr")) # Expected: True
print(year_check("7072", "byr")) # Expected: False
print(year_check("2022a", "iyr")) # Expected: False
print(year_check("202", "eyr")) # Expected: False
False
True
False
False
False
True
False
False
False
#hgt (Height) - a number followed by either cm or in:
# - If cm, the number must be at least 150 and at most 193.
# - If in, the number must be at least 59 and at most 76.
def height_check(height):
# split value and units
value = height[:-2]
unit = height[-2:]
try:
value_int = int(value)
except ValueError:
# not numeric
return False
if unit == "in" and (59 <= value_int <= 76):
return True
if unit == "cm" and (150 <= value_int <= 193):
return True
return False
print(height_check("152cm")) # Expected True
print(height_check("194cm")) # Expected False
print(height_check("152in")) # Expected False
print(height_check("60in")) # Expected True
print(height_check("58in")) # Expected False
print(height_check("ello")) # Expected False
True
False
False
True
False
False
import re
# hcl (Hair Color) - a # followed by exactly six characters 0-9 or a-f
def hair_check(hair_colour):
# check starts with a '#'
first_char = hair_colour[0]
following_chars = hair_colour[1:]
if first_char != '#':
return False
# using regex validate that the following characters are 0-9 or a-f and there are 6 of them
reg = "^[0-9a-f]{6}$"
match = re.search(reg, following_chars)
if match:
return True
return False
print(hair_check("#abcdef")) # Expected: True
print(hair_check("#a0123f")) # Expected: True
print(hair_check("#ghijkl")) # Expected: False
print(hair_check("#ghi012")) # Expected: False
True
True
False
False
# ecl (Eye Color) - exactly one of: amb blu brn gry grn hzl oth
def eye_check(col):
valid_cols = ["amb", "blu", "brn", "gry", "grn", "hzl", "oth"]
if col in valid_cols:
return True
return False
print(eye_check("amb")) # Expected: True
print(eye_check("oth")) # Expected: True
print(eye_check("AMB")) # Expected: False
print(eye_check("amber")) # Expected: False
True
True
False
False
# pid (Passport ID) - a nine-digit number, including leading zeroes.
def id_check(pid):
try:
int_id = int(pid)
except ValueError:
# not numeric
return False
if len(pid) == 9:
return True
return False
def key_val_check(key, value):
if key in ["byr", "iyr", "eyr"]:
return year_check(value, key)
if key == "hgt":
return height_check(value)
if key == "hcl":
return hair_check(value)
if key == "ecl":
return eye_check(value)
if key == "pid":
return id_check(value)
if key == "cid":
# We don't care, we can return True here always
return True
return False
def check_passports(passports):
valid_pports = []
for pport in passports:
pport_keys = set(pport.keys())
if not req_keys.issubset(pport_keys):
# Passport doesn't pass criteria from first part
continue
valid_pport = True
for key in pport:
if not key_val_check(key, pport[key]):
valid_pport = False
break
if valid_pport:
valid_pports.append(pport)
return valid_pports
return_valid = check_passports(valid_passports) # expecting 4 valid passports
if(len(return_valid) == 4):
print("Returned as expected")
else:
print(valid_passports)
print("Returned unexpected:")
print(return_valid)
Returned as expected
return_invalid = check_passports(invalid_passports) # expecting 0 valid passports
if(len(return_invalid) == 0):
print("Returned as expected")
else:
print(invalid_passports)
print("Returned unexpected:")
print(return_invalid)
Returned as expected
return_mixed = check_passports(mixed_passports) # expecting 4 valid passports
if(len(return_mixed) == 4):
print("Returned as expected")
else:
print(mixed_passports)
print("Returned unexpected:")
print(return_mixed)
Returned as expected
I’m sure there is probably a more concise way of doing this but I’m fairly happy with it.
I there should be a pandas way of doing this and reducing the rows using loc, think that’d be far more efficient too. Let’s try it! Make sure you have ran ‘pip install pandas’ in your terminal to install pandas before importing it here.
import pandas as pd
df = pd.DataFrame(mixed_passports)
print(df)
pid hgt ecl iyr eyr byr hcl cid
0 087499704 74in grn 2012 2030 1980 #623a2f NaN
1 896056539 165cm blu 2014 2029 1989 #a97842 129
2 545766238 164cm hzl 2015 2022 2001 #888785 88
3 093154719 158cm blu 2010 2021 1944 #b6652a NaN
4 186cm 170 amb 2018 1972 1926 #18171d 100
5 012533040 170cm grn 2019 1967 1946 #602927 NaN
6 021572410 182cm brn 2012 2020 1992 dab227 277
7 3556412378 59cm zzz 2023 2038 2007 74454a NaN
# you can use map to apply a function over a row, and put it inside the square braces to apply it to only return the valid rows
df = df[df['pid'].map(id_check)]
print(df)
pid hgt ecl iyr eyr byr hcl cid
0 087499704 74in grn 2012 2030 1980 #623a2f NaN
1 896056539 165cm blu 2014 2029 1989 #a97842 129
2 545766238 164cm hzl 2015 2022 2001 #888785 88
3 093154719 158cm blu 2010 2021 1944 #b6652a NaN
5 012533040 170cm grn 2019 1967 1946 #602927 NaN
6 021572410 182cm brn 2012 2020 1992 dab227 277
# we can apply all the functions that take one parameter like this but not the year checks
# (unless we create new funcs for them 😉 )
df = df.loc[df['pid'].map(id_check) &
df['hgt'].map(height_check) &
df['ecl'].map(eye_check) &
df['hcl'].map(hair_check)
]
print(df)
pid hgt ecl iyr eyr byr hcl cid
0 087499704 74in grn 2012 2030 1980 #623a2f NaN
1 896056539 165cm blu 2014 2029 1989 #a97842 129
2 545766238 164cm hzl 2015 2022 2001 #888785 88
3 093154719 158cm blu 2010 2021 1944 #b6652a NaN
5 012533040 170cm grn 2019 1967 1946 #602927 NaN
By using apply and lambda we can narrow down the data using the funtions we made earlier and pass in multiple values
ref: https://towardsdatascience.com/apply-and-lambda-usage-in-pandas-b13a1ea037f7
df = df[df.apply(lambda x: year_check(x['iyr'],'iyr') and year_check(x['eyr'],'eyr') and year_check(x['byr'],'byr'),axis=1)]
print(df)
pid hgt ecl iyr eyr byr hcl cid
0 087499704 74in grn 2012 2030 1980 #623a2f NaN
1 896056539 165cm blu 2014 2029 1989 #a97842 129
2 545766238 164cm hzl 2015 2022 2001 #888785 88
3 093154719 158cm blu 2010 2021 1944 #b6652a NaN
To make the above code a little neater… we could create a check_all_years function which takes in the 3 different years and checks them all… Like this:
def check_all_years(iyr, eyr, byr):
return year_check(iyr,'iyr') and year_check(eyr,'eyr') and year_check(byr,'byr')
df = df[df.apply(lambda x: check_all_years(x['iyr'], x['eyr'], x['byr']),axis=1)]
print(df)
pid hgt ecl iyr eyr byr hcl cid
0 087499704 74in grn 2012 2030 1980 #623a2f NaN
1 896056539 165cm blu 2014 2029 1989 #a97842 129
2 545766238 164cm hzl 2015 2022 2001 #888785 88
3 093154719 158cm blu 2010 2021 1944 #b6652a NaN
I learnt a few new pandas tricks and enjoyed the complexity of this challenge, I didn’t find it particularly strenuous, myself, but there was a lot of code to write! The hardest part was making sure we hit ever part of the specification!
As with the previous days, I will be starting with getting my input into a good format before I’ve even taken a real pass at the problem…
real_run = False
file_name = "day3-input.txt" if real_run else "day3-test.txt"
# create a list from the file, removing any '\n' characters
data = [line.rstrip('\n') for line in open(file_name)]
# print data to check it's what we want it to be
print(data)
['..##.......', '#...#...#..', '.#....#..#.', '..#.#...#.#', '.#...##..#.', '..#.##.....', '.#.#.#....#', '.#........#', '#.##...#...', '#...##....#', '.#..#...#.#']
Each line of our data is a layer in a toboggan slope… It repeats inifinitely out to either side and ‘#’ are trees and ‘.’ are open spaces.
We have a route given to us (down 1, right 3) and we want to return the amount of trees we are given. If we start from (0,0), we will go down hitting (1,3) (2,6) (3,9) etc.
length = len(data)
for row in range(length):
col = row * 3
item = data[row][col]
print(item)
.
.
#
.
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-2-cfdbe1fab67a> in <module>
4 col = row * 3
5
----> 6 item = data[row][col]
7
8 print(item)
IndexError: string index out of range
Since the length of each row is less than 3x the length we will get a string index out of range exception. Since we are starting at a 0 index and we know the length of each row, we can use modulus.
length = len(data)
row_length = len(data[0])
for row in range(length):
col = row * 3 % row_length
item = data[row][col]
print(item)
.
.
#
.
#
#
.
#
#
#
#
Now we make sure to keep track of the count as it increases.
length = len(data)
row_length = len(data[0])
tree_count = 0
for row in range(length):
col = (row * 3) % row_length
item = data[row][col]
if item == '#':
tree_count += 1
print(tree_count)
7
Now, we do the same for a group of:
So we can generalise and pass in a parameter for what the column position. We can address 4/5 requirements by just making this one change!
def traverse_path(path_data, right):
length = len(path_data)
row_length = len(path_data[0])
tree_count = 0
for row in range(length):
col = (row * right) % row_length
item = data[row][col]
if item == '#':
tree_count += 1
return tree_count
# Test with the one we already know...
r3_d1 = traverse_path(data, 3)
print(r3_d1)
7
And to figure the “down” motion of right 1, down 2 we can make some adjustments to the row number and divide the length so we don’t go down beyond the rows.
import math
def traverse_path(path_data, right, down=1):
# Use ceil as to make sure we get the final rows! (due to range taking us up to strictly less than the total)
length = math.ceil(len(path_data) / down)
row_length = len(path_data[0])
tree_count = 0
for row in range(length):
col = (row * right) % row_length
row_num = row * down
item = data[row_num][col]
if item == '#':
tree_count += 1
return tree_count
# Test with the one we already know...
r3_d1 = traverse_path(data, 3)
print(r3_d1)
# and do all the others:
r1_d1 = traverse_path(data, 1)
r5_d1 = traverse_path(data, 5)
r7_d1 = traverse_path(data, 7)
r1_d2 = traverse_path(data, 1, 2)
product = r3_d1 * r1_d1 * r5_d1 * r7_d1 * r1_d2
print(product)
7
336
But perhaps we can go one step further, reduce that repeated code and give the instructions as a dict…
instructions = [
{'right':1, 'down':1},
{'right':3, 'down':1},
{'right':5, 'down':1},
{'right':7, 'down':1},
{'right':1, 'down':2}
]
def prod_trav_paths(instrs, path_data):
results = []
for instr in instrs:
res = traverse_path(path_data, instr['right'], instr['down'])
results.append(res)
return math.prod(results)
prod_trav_paths(instructions, data)
336
Made utilising jupyter notebook
As with day 1, I will be starting with getting my input into a good format before even taken a reat pass at the problem…
real_run = False
file_name = "day2-input.txt" if real_run else "day2-test.txt"
# create a list from the file, removing any '\n' characters
data = [line.rstrip('\n') for line in open(file_name)]
# print data to check it's what we want it to be
print(data)
['1-3 a: abcde', '1-3 b: cdefg', '2-9 c: ccccccccc']
Each line of our data is a NUMBER1-NUMBER2 CHAR: STRING and is a valid password when there are between NUMBER1 and NUMBER2 instances of CHAR in STRING. We the need to return the number of acceptable passwords so we’ll be keeping track of that.
First I want to make sure my line splitting function is up to scratch so I’ll be testing that out on just the first item in data and bearing in mind that NUMBER1 and NUMBER2 may be more than a single digit… re.split() is a great way to split on multiple characters at once and that is what we’ll be using. This post explains more: https://www.geeksforgeeks.org/python-split-multiple-characters-from-string/
import re
number_one, number_two, char, blank, string = re.split('-| |:', data[0])
print(number_one, number_two, char, blank, string)
1 3 a abcde
Now we have a line of code we are confident will split our string how we expect it to every time, we can use these values to verify whether or not it is a valid string. We can do this in a function too, returning true if the string is valid, and then we can use it in our actual loop!
An useful base string method for this is string count as seen here: https://www.tutorialspoint.com/python3/string_count.htm
def valid_string(to_validate):
number_one, number_two, char, blank, string = re.split('-| |:', to_validate)
# turn our numbers into int types, rather than strings, as they are now!
num_one = int(number_one)
num_two = int(number_two)
char_count = string.count(char)
if char_count >= num_one and char_count <= num_two:
return True
else:
return False
# Test one that we know works
print(valid_string(data[0]))
# Test one we know that doesn't
# We'd add some print lines around our variables if we don't get the answer we expect!
print(valid_string(data[1]))
True
False
Taking this to the final point of this puzzle we can use the logic we’ve just created in a loop and since we created it as a function, we can just call valid_string for every line in the data!
valid_count = 0
for line in data:
# Since python is extremely clever and True = 1 and False = 0,
# we can add to valid count with the boolean response from the verify_string function
valid_count += valid_string(line)
print(valid_count)
2
Sorted, some neat code and the answer we were hoping for.
With some list comprehension, we can represent the above cell, in just one line:
count = len([line for line in data if valid_string(line)])
print(count)
2
And then submit our solutions to Advent of Code and select our star!
This part means that little of our part one code is useful! It changes the verification system entirely and so we can use what we’ve learnt and the same splitting system, but we’ll have to be checking different parameters.
The lines now mean that either (but not both) the character at NUMBER_ONE OR NUMBER_TWO are CHAR – but we need to remember that python uses a 0 index and this system starts at 1.
We’ll create a new function which implements this system and then test it.
def valid_string_pt2(to_validate):
number_one, number_two, char, blank, string = re.split('-| |:', to_validate)
# minus one from our ints so they now match python indexing
num_one = int(number_one) - 1
num_two = int(number_two) - 1
# find out chars by accessing the string in the same way we would a list
char_one = string[num_one]
char_two = string[num_two]
# an exclusive or is the same as this matching chars == 1, since if both matched, it would be 2
matching_chars = int(char_one == char) + int(char_two == char)
if matching_chars == 1:
return True
else:
return False
# Test one that we know works
print(valid_string_pt2(data[0]))
# Test one we know that doesn't
print(valid_string_pt2(data[1]))
True
False
Since we got what we want, we can move on and put our list comprehension strategy to work
count_pt2 = len([line for line in data if valid_string_pt2(line)])
print(count_pt2)
1
We got what we were hoping for so we can now run with real_run being true and hope for the best when we submit!
Before even looking at the problem I like to make sure that my data is prepared nicely. I paste the “test” input given on the description page into a file named day1-test.txt and the full puzzle input into day1-input.txt.
We then want to read out this data and I link to start with it in a dictionary, where each line is an item in the list.
I like to start with a variable which sets us into what mode we want to run it – I will set real_run to False to start with, to test with and get to grips with the problem. I will set it to True when I am ready to run the code for real and produce my output
real_run = False
# Set file name based on if we are in real or test mode.
file_name = "day1-input.txt" if real_run else "day1-test.txt"
I like the following method, it is neat and all on one line. I don’t want to fuss about opening the file and doing any more than I have to here. It uses the file open function, rstrip method and list comprehension.
Resources:
# create a list from the file, removing any '\n' characters
data = [line.rstrip('\n') for line in open(file_name)]
# print data to check it's what we want it to be
print(data)
['1721', '979', '366', '299', '675', '1456']
Now we have the test input in a useable format we are able to start looking at solving the problem…
The problem can be summarised as:
# make a note of our given sum value to find
sum_value = 2020
# it is worth doint some more data prep now, since we know all out items in our list should be whole numbers - aka ints
data = [int(val) for val in data]
# sorting may also be useful in this case
data.sort()
print(data)
[299, 366, 675, 979, 1456, 1721]
In this method we are able to use our fixed value sum_value and operate for each value in our list – until we find our pair.
For each value, we can see if the difference between it and sum_value is contained in the list.
for val in data:
diff = sum_value - val
if diff in data:
pair = (val, diff)
break
result = pair[0] * pair[1]
print(f"Found the pair! {pair}. Result: {result}")
Found the pair! (299, 1721). Result: 514579
Incredibly similar to Part One but instead of only two values we need to find three that add up to the sum_value.
# define inner loop which will return the set of
def inner_loop(outer_val, new_list):
for inner_val in new_list:
diff = sum_value - outer_val - inner_val
if diff in new_list:
return (outer_val, inner_val, diff)
return None
remaining_data = data.copy()
for val in data:
# removing the values in the top level data as once we have parsed this
# we'll have checked it with every combination of the list so may as well get rid
remaining_data.remove(val)
trips = inner_loop(val, remaining_data)
if trips:
break
result = trips[0] * trips[1] * trips[2]
print(f"Found the trips! {trips}. Result: {result}")
Found the trips! (366, 675, 979). Result: 241861950
And like that, we have our results to both parts. We swap our file into real_run mode and can get our final results.
I’m going to try and go one step further than the Advent Of Code 2020 Day 1 goes and try to create a function which takes in the set size that we want to sum to our given value.
This time we want to use sets, not lists and we can use the sum and combinations methods.
Useful methods:
import itertools
def find_combination(data_list, set_size, sum_val):
set_data = set(data)
list_of_sets = [set(i) for i in itertools.combinations(set_data, set_size)]
for current_set in list_of_sets:
set_total = sum(current_set)
if set_total == sum_val:
return current_set
# if combination has not been found by now... there isn't one
return False
# Verify we get the same value as part one
find_combination(data, 2, 2020)
{299, 1721}
# Verify we get the same value as part two
find_combination(data, 3, 2020)
{366, 675, 979}
# We don't expect this to return anything sensible
find_combination(data, 4, 2020)
False
# Added up the first 4 values in the list
find_combination(data, 4, 2319)
{299, 366, 675, 979}
Made using Jupyter notebook
Hi!
Welcome to my brand new site. I love to solve problems and have found myself using python to solve a number of problems and puzzles recently. I really want to share my thought process and solutions – hopefully to help others solve these problems and understand some of the maths behind them.
I am a Software Engineer and I studied Discrete Mathematics at university so hopefully have a good grasp on programming and mathematical practices.
I hope we can build something cool and helpful here.
Elle 🙂