Reference: https://adventofcode.com/2020/day/4
Data Preparation
As with the previous days, I will be starting with getting my test and real input into a good format before even taking pass at the problem…
real_run = False
file_name = "day4-input.txt" if real_run else "day4-test.txt"
# create a list from the file, removing any '\n' characters
data = [line.rstrip('\n') for line in open(file_name)]
# print data to check it's what we want it to be
print(data)
['ecl:gry pid:860033327 eyr:2020 hcl:#fffffd', 'byr:1937 iyr:2017 cid:147 hgt:183cm', '', 'iyr:2013 ecl:amb cid:350 eyr:2023 pid:028048884', 'hcl:#cfa07d byr:1929', '', 'hcl:#ae17e1 iyr:2013', 'eyr:2024', 'ecl:brn pid:760753108 byr:1931', 'hgt:179cm', '', 'hcl:#cfa07d eyr:2025 pid:166559648', 'iyr:2011 ecl:brn hgt:59in']
This time, we have a “passport” spread out across multiple lines, so we have more data prep to do and a dictionary will be best as for each passport there is a series of keys with values.
Since each passport is split by a blank line, this is when we will be able to add the passport to the dictionary (and then we shouldn’t forget the last one…)
list_passports = []
passport_dict = {}
for line in data:
if not line:
list_passports.append(passport_dict)
# clear out the passport each time, else it will "remember" the previous passport
passport_dict = {}
# make sure to continue here, otherwise the rest of the code will execute and throw errors
# (comment definitely not from experience and 5 mins of bug hunting...)
continue
space_splits = line.split(' ')
for item in space_splits:
key, value = item.split(':')
passport_dict[key] = value
# append the final passport dict
list_passports.append(passport_dict)
print(list_passports)
[{'ecl': 'gry', 'pid': '860033327', 'eyr': '2020', 'hcl': '#fffffd', 'byr': '1937', 'iyr': '2017', 'cid': '147', 'hgt': '183cm'}, {'iyr': '2013', 'ecl': 'amb', 'cid': '350', 'eyr': '2023', 'pid': '028048884', 'hcl': '#cfa07d', 'byr': '1929'}, {'hcl': '#ae17e1', 'iyr': '2013', 'eyr': '2024', 'ecl': 'brn', 'pid': '760753108', 'byr': '1931', 'hgt': '179cm'}, {'hcl': '#cfa07d', 'eyr': '2025', 'pid': '166559648', 'iyr': '2011', 'ecl': 'brn', 'hgt': '59in'}]
Part One
We need to find out which passports are valid and which ones are not. There are 8 possible keys and we need all of then, apart from CID. SInce we have our passports represented as a dictionary we can get the keys and check that the required keys are a subset of the keys on the dictionary. Our required keys won’t contain CID as we don’t care if it is ther or not!
To check if one list conatins another list, I like to use sets and the “issubset()” method, but there’s some more ways of doing this in this geeksforgeeks article: geeksforgeeks.org/python-check-if-one-list-is-subset-of-other/
Keys:
- byr (Birth Year)
- iyr (Issue Year)
- eyr (Expiration Year)
- hgt (Height)
- hcl (Hair Color)
- ecl (Eye Color)
- pid (Passport ID)
- cid (Country ID)
req_keys = set(["byr", "iyr", "eyr", "hgt", "hcl", "ecl", "pid"])
valid_passports = 0
for passport in list_passports:
passport_keys = set(passport.keys())
print(passport_keys)
if req_keys.issubset(passport_keys):
valid_passports += 1
{'hgt', 'pid', 'ecl', 'cid', 'hcl', 'iyr', 'eyr', 'byr'}
{'cid', 'ecl', 'pid', 'hcl', 'iyr', 'eyr', 'byr'}
{'hgt', 'pid', 'ecl', 'hcl', 'iyr', 'eyr', 'byr'}
{'hgt', 'pid', 'ecl', 'hcl', 'iyr', 'eyr'}
valid_passports
2
Part Two
Now the regulations on the passports get a little tighter and we need to check the passports further. We also have more data now, given a valid list and an invalid list, and that will teach me for not putting my data prep into a function!
So we’re going to give data prep another go and add the valid, invalid and the combination of the two as a file named mixed. I’ll put all the data prep in one function so I can be transparent about how many times I run through the code!
def data_prep(real_run=False, file_name=""):
file_name = "day4-input.txt" if real_run else file_name
# create a list from the file, removing any '\n' characters
data = [line.rstrip('\n') for line in open(file_name)]
list_passports = []
passport_dict = {}
for line in data:
if not line:
list_passports.append(passport_dict)
# clear out the passport each time, else it will "remember" the previous passport
passport_dict = {}
# make sure to continue here, otherwise the rest of the code will execute and throw errors
# (comment definitely not from experience and 5 mins of bug hunting...)
continue
space_splits = line.split(' ')
for item in space_splits:
key, value = item.split(':')
passport_dict[key] = value
# append the final passport dict
list_passports.append(passport_dict)
return list_passports
invalid_filename = "day4-invalid.txt"
invalid_passports = data_prep(file_name=invalid_filename)
valid_filename = "day4-valid.txt"
valid_passports = data_prep(file_name=valid_filename)
mixed_filename = "day4-mixed.txt"
mixed_passports = data_prep(file_name=mixed_filename)
# byr (Birth Year) - four digits; at least 1920 and at most 2002.
# iyr (Issue Year) - four digits; at least 2010 and at most 2020.
# eyr (Expiration Year) - four digits; at least 2020 and at most 2030.
def year_check(year, key):
# check year is numeric
try:
year_int = int(year)
except ValueError:
# not numeric
return False
# check year is made up of 4 digits
if len(year) != 4:
return False
# if key is byr check between 1920 and 2002
if key == "byr" and (1920 <= year_int <= 2002):
return True
# else if key iyr check between 2010 and 2020
elif key == "iyr" and (2010 <= year_int <= 2020):
return True
# else if key eyr check between 2010 and 2020
elif key == "eyr" and (2020 <= year_int <= 2030):
return True
return False
# Test year_check func does what we want...
print(year_check("2012", "byr")) # Expected: False
print(year_check("2012", "iyr")) # Expected: True
print(year_check("2012", "eyr")) # Expected: False
print(year_check("2022", "byr")) # Expected: False
print(year_check("2022", "iyr")) # Expected: False
print(year_check("2022", "eyr")) # Expected: True
print(year_check("7072", "byr")) # Expected: False
print(year_check("2022a", "iyr")) # Expected: False
print(year_check("202", "eyr")) # Expected: False
False
True
False
False
False
True
False
False
False
#hgt (Height) - a number followed by either cm or in:
# - If cm, the number must be at least 150 and at most 193.
# - If in, the number must be at least 59 and at most 76.
def height_check(height):
# split value and units
value = height[:-2]
unit = height[-2:]
try:
value_int = int(value)
except ValueError:
# not numeric
return False
if unit == "in" and (59 <= value_int <= 76):
return True
if unit == "cm" and (150 <= value_int <= 193):
return True
return False
print(height_check("152cm")) # Expected True
print(height_check("194cm")) # Expected False
print(height_check("152in")) # Expected False
print(height_check("60in")) # Expected True
print(height_check("58in")) # Expected False
print(height_check("ello")) # Expected False
True
False
False
True
False
False
import re
# hcl (Hair Color) - a # followed by exactly six characters 0-9 or a-f
def hair_check(hair_colour):
# check starts with a '#'
first_char = hair_colour[0]
following_chars = hair_colour[1:]
if first_char != '#':
return False
# using regex validate that the following characters are 0-9 or a-f and there are 6 of them
reg = "^[0-9a-f]{6}$"
match = re.search(reg, following_chars)
if match:
return True
return False
print(hair_check("#abcdef")) # Expected: True
print(hair_check("#a0123f")) # Expected: True
print(hair_check("#ghijkl")) # Expected: False
print(hair_check("#ghi012")) # Expected: False
True
True
False
False
# ecl (Eye Color) - exactly one of: amb blu brn gry grn hzl oth
def eye_check(col):
valid_cols = ["amb", "blu", "brn", "gry", "grn", "hzl", "oth"]
if col in valid_cols:
return True
return False
print(eye_check("amb")) # Expected: True
print(eye_check("oth")) # Expected: True
print(eye_check("AMB")) # Expected: False
print(eye_check("amber")) # Expected: False
True
True
False
False
# pid (Passport ID) - a nine-digit number, including leading zeroes.
def id_check(pid):
try:
int_id = int(pid)
except ValueError:
# not numeric
return False
if len(pid) == 9:
return True
return False
def key_val_check(key, value):
if key in ["byr", "iyr", "eyr"]:
return year_check(value, key)
if key == "hgt":
return height_check(value)
if key == "hcl":
return hair_check(value)
if key == "ecl":
return eye_check(value)
if key == "pid":
return id_check(value)
if key == "cid":
# We don't care, we can return True here always
return True
return False
def check_passports(passports):
valid_pports = []
for pport in passports:
pport_keys = set(pport.keys())
if not req_keys.issubset(pport_keys):
# Passport doesn't pass criteria from first part
continue
valid_pport = True
for key in pport:
if not key_val_check(key, pport[key]):
valid_pport = False
break
if valid_pport:
valid_pports.append(pport)
return valid_pports
return_valid = check_passports(valid_passports) # expecting 4 valid passports
if(len(return_valid) == 4):
print("Returned as expected")
else:
print(valid_passports)
print("Returned unexpected:")
print(return_valid)
Returned as expected
return_invalid = check_passports(invalid_passports) # expecting 0 valid passports
if(len(return_invalid) == 0):
print("Returned as expected")
else:
print(invalid_passports)
print("Returned unexpected:")
print(return_invalid)
Returned as expected
return_mixed = check_passports(mixed_passports) # expecting 4 valid passports
if(len(return_mixed) == 4):
print("Returned as expected")
else:
print(mixed_passports)
print("Returned unexpected:")
print(return_mixed)
Returned as expected
Repeat with the real data set and we should be all good!
I’m sure there is probably a more concise way of doing this but I’m fairly happy with it.
I there should be a pandas way of doing this and reducing the rows using loc, think that’d be far more efficient too. Let’s try it! Make sure you have ran ‘pip install pandas’ in your terminal to install pandas before importing it here.
import pandas as pd
df = pd.DataFrame(mixed_passports)
print(df)
pid hgt ecl iyr eyr byr hcl cid
0 087499704 74in grn 2012 2030 1980 #623a2f NaN
1 896056539 165cm blu 2014 2029 1989 #a97842 129
2 545766238 164cm hzl 2015 2022 2001 #888785 88
3 093154719 158cm blu 2010 2021 1944 #b6652a NaN
4 186cm 170 amb 2018 1972 1926 #18171d 100
5 012533040 170cm grn 2019 1967 1946 #602927 NaN
6 021572410 182cm brn 2012 2020 1992 dab227 277
7 3556412378 59cm zzz 2023 2038 2007 74454a NaN
# you can use map to apply a function over a row, and put it inside the square braces to apply it to only return the valid rows
df = df[df['pid'].map(id_check)]
print(df)
pid hgt ecl iyr eyr byr hcl cid
0 087499704 74in grn 2012 2030 1980 #623a2f NaN
1 896056539 165cm blu 2014 2029 1989 #a97842 129
2 545766238 164cm hzl 2015 2022 2001 #888785 88
3 093154719 158cm blu 2010 2021 1944 #b6652a NaN
5 012533040 170cm grn 2019 1967 1946 #602927 NaN
6 021572410 182cm brn 2012 2020 1992 dab227 277
# we can apply all the functions that take one parameter like this but not the year checks
# (unless we create new funcs for them 😉 )
df = df.loc[df['pid'].map(id_check) &
df['hgt'].map(height_check) &
df['ecl'].map(eye_check) &
df['hcl'].map(hair_check)
]
print(df)
pid hgt ecl iyr eyr byr hcl cid
0 087499704 74in grn 2012 2030 1980 #623a2f NaN
1 896056539 165cm blu 2014 2029 1989 #a97842 129
2 545766238 164cm hzl 2015 2022 2001 #888785 88
3 093154719 158cm blu 2010 2021 1944 #b6652a NaN
5 012533040 170cm grn 2019 1967 1946 #602927 NaN
By using apply and lambda we can narrow down the data using the funtions we made earlier and pass in multiple values
ref: https://towardsdatascience.com/apply-and-lambda-usage-in-pandas-b13a1ea037f7
df = df[df.apply(lambda x: year_check(x['iyr'],'iyr') and year_check(x['eyr'],'eyr') and year_check(x['byr'],'byr'),axis=1)]
print(df)
pid hgt ecl iyr eyr byr hcl cid
0 087499704 74in grn 2012 2030 1980 #623a2f NaN
1 896056539 165cm blu 2014 2029 1989 #a97842 129
2 545766238 164cm hzl 2015 2022 2001 #888785 88
3 093154719 158cm blu 2010 2021 1944 #b6652a NaN
To make the above code a little neater… we could create a check_all_years function which takes in the 3 different years and checks them all… Like this:
def check_all_years(iyr, eyr, byr):
return year_check(iyr,'iyr') and year_check(eyr,'eyr') and year_check(byr,'byr')
df = df[df.apply(lambda x: check_all_years(x['iyr'], x['eyr'], x['byr']),axis=1)]
print(df)
pid hgt ecl iyr eyr byr hcl cid
0 087499704 74in grn 2012 2030 1980 #623a2f NaN
1 896056539 165cm blu 2014 2029 1989 #a97842 129
2 545766238 164cm hzl 2015 2022 2001 #888785 88
3 093154719 158cm blu 2010 2021 1944 #b6652a NaN
And there we have it…
I learnt a few new pandas tricks and enjoyed the complexity of this challenge, I didn’t find it particularly strenuous, myself, but there was a lot of code to write! The hardest part was making sure we hit ever part of the specification!