Day 4 – Advent of Code 2020

Reference: https://adventofcode.com/2020/day/4

Data Preparation

As with the previous days, I will be starting with getting my test and real input into a good format before even taking pass at the problem…

real_run = False
file_name = "day4-input.txt" if real_run else "day4-test.txt"

# create a list from the file, removing any '\n' characters
data = [line.rstrip('\n') for line in open(file_name)]

# print data to check it's what we want it to be
print(data)
['ecl:gry pid:860033327 eyr:2020 hcl:#fffffd', 'byr:1937 iyr:2017 cid:147 hgt:183cm', '', 'iyr:2013 ecl:amb cid:350 eyr:2023 pid:028048884', 'hcl:#cfa07d byr:1929', '', 'hcl:#ae17e1 iyr:2013', 'eyr:2024', 'ecl:brn pid:760753108 byr:1931', 'hgt:179cm', '', 'hcl:#cfa07d eyr:2025 pid:166559648', 'iyr:2011 ecl:brn hgt:59in']

This time, we have a “passport” spread out across multiple lines, so we have more data prep to do and a dictionary will be best as for each passport there is a series of keys with values.

Since each passport is split by a blank line, this is when we will be able to add the passport to the dictionary (and then we shouldn’t forget the last one…)

list_passports = []

passport_dict = {}

for line in data:
    if not line:
        list_passports.append(passport_dict)
        # clear out the passport each time, else it will "remember" the previous passport
        passport_dict = {}
        # make sure to continue here, otherwise the rest of the code will execute and throw errors
        # (comment definitely not from experience and 5 mins of bug hunting...)
        continue

    space_splits = line.split(' ')

    for item in space_splits:
        key, value = item.split(':')
        passport_dict[key] = value

# append the final passport dict
list_passports.append(passport_dict)

print(list_passports)
[{'ecl': 'gry', 'pid': '860033327', 'eyr': '2020', 'hcl': '#fffffd', 'byr': '1937', 'iyr': '2017', 'cid': '147', 'hgt': '183cm'}, {'iyr': '2013', 'ecl': 'amb', 'cid': '350', 'eyr': '2023', 'pid': '028048884', 'hcl': '#cfa07d', 'byr': '1929'}, {'hcl': '#ae17e1', 'iyr': '2013', 'eyr': '2024', 'ecl': 'brn', 'pid': '760753108', 'byr': '1931', 'hgt': '179cm'}, {'hcl': '#cfa07d', 'eyr': '2025', 'pid': '166559648', 'iyr': '2011', 'ecl': 'brn', 'hgt': '59in'}]

Part One

We need to find out which passports are valid and which ones are not. There are 8 possible keys and we need all of then, apart from CID. SInce we have our passports represented as a dictionary we can get the keys and check that the required keys are a subset of the keys on the dictionary. Our required keys won’t contain CID as we don’t care if it is ther or not!

To check if one list conatins another list, I like to use sets and the “issubset()” method, but there’s some more ways of doing this in this geeksforgeeks article: geeksforgeeks.org/python-check-if-one-list-is-subset-of-other/

Keys:

  • byr (Birth Year)
  • iyr (Issue Year)
  • eyr (Expiration Year)
  • hgt (Height)
  • hcl (Hair Color)
  • ecl (Eye Color)
  • pid (Passport ID)
  • cid (Country ID)
req_keys = set(["byr", "iyr", "eyr", "hgt", "hcl", "ecl", "pid"])
valid_passports = 0 

for passport in list_passports:
    passport_keys = set(passport.keys())
    print(passport_keys)
    if req_keys.issubset(passport_keys):
        valid_passports += 1
{'hgt', 'pid', 'ecl', 'cid', 'hcl', 'iyr', 'eyr', 'byr'}
{'cid', 'ecl', 'pid', 'hcl', 'iyr', 'eyr', 'byr'}
{'hgt', 'pid', 'ecl', 'hcl', 'iyr', 'eyr', 'byr'}
{'hgt', 'pid', 'ecl', 'hcl', 'iyr', 'eyr'}
valid_passports
2

Part Two

Now the regulations on the passports get a little tighter and we need to check the passports further. We also have more data now, given a valid list and an invalid list, and that will teach me for not putting my data prep into a function!
So we’re going to give data prep another go and add the valid, invalid and the combination of the two as a file named mixed. I’ll put all the data prep in one function so I can be transparent about how many times I run through the code!

def data_prep(real_run=False, file_name=""):
    file_name = "day4-input.txt" if real_run else file_name
    # create a list from the file, removing any '\n' characters
    data = [line.rstrip('\n') for line in open(file_name)]

    list_passports = []
    passport_dict = {}

    for line in data:
        if not line:
            list_passports.append(passport_dict)
            # clear out the passport each time, else it will "remember" the previous passport
            passport_dict = {}
            # make sure to continue here, otherwise the rest of the code will execute and throw errors
            # (comment definitely not from experience and 5 mins of bug hunting...)
            continue

        space_splits = line.split(' ')

        for item in space_splits:
            key, value = item.split(':')
            passport_dict[key] = value

    # append the final passport dict
    list_passports.append(passport_dict)
    return list_passports
invalid_filename = "day4-invalid.txt"
invalid_passports = data_prep(file_name=invalid_filename)
valid_filename = "day4-valid.txt"
valid_passports = data_prep(file_name=valid_filename)
mixed_filename = "day4-mixed.txt"
mixed_passports = data_prep(file_name=mixed_filename)
# byr (Birth Year) - four digits; at least 1920 and at most 2002.
# iyr (Issue Year) - four digits; at least 2010 and at most 2020.
# eyr (Expiration Year) - four digits; at least 2020 and at most 2030.

def year_check(year, key):
    # check year is numeric
    try:
        year_int = int(year)
    except ValueError:
        # not numeric
        return False

    # check year is made up of 4 digits
    if len(year) != 4:
        return False

    # if key is byr check between 1920 and 2002
    if key == "byr" and (1920 <= year_int <= 2002):
        return True
    # else if key iyr check between 2010 and 2020
    elif key == "iyr" and (2010 <= year_int <= 2020):
        return True
    # else if key eyr check between 2010 and 2020
    elif key == "eyr" and (2020 <= year_int <= 2030):
        return True

    return False
# Test year_check func does what we want...
print(year_check("2012", "byr")) # Expected: False
print(year_check("2012", "iyr")) # Expected: True
print(year_check("2012", "eyr")) # Expected: False

print(year_check("2022", "byr")) # Expected: False
print(year_check("2022", "iyr")) # Expected: False
print(year_check("2022", "eyr")) # Expected: True

print(year_check("7072", "byr")) # Expected: False
print(year_check("2022a", "iyr")) # Expected: False
print(year_check("202", "eyr")) # Expected: False
False
True
False
False
False
True
False
False
False
#hgt (Height) - a number followed by either cm or in:
# - If cm, the number must be at least 150 and at most 193.
# - If in, the number must be at least 59 and at most 76.
def height_check(height):
    # split value and units
    value = height[:-2]
    unit = height[-2:]

    try:
        value_int = int(value)
    except ValueError:
        # not numeric
        return False

    if unit == "in" and (59 <= value_int <= 76):
        return True

    if unit == "cm" and (150 <= value_int <= 193):
        return True

    return False
print(height_check("152cm")) # Expected True
print(height_check("194cm")) # Expected False
print(height_check("152in")) # Expected False
print(height_check("60in")) # Expected True
print(height_check("58in")) # Expected False
print(height_check("ello")) # Expected False
True
False
False
True
False
False
import re
# hcl (Hair Color) - a # followed by exactly six characters 0-9 or a-f
def hair_check(hair_colour):
    # check starts with a '#'
    first_char = hair_colour[0]
    following_chars = hair_colour[1:]

    if first_char != '#':
        return False

    # using regex validate that the following characters are 0-9 or a-f and there are 6 of them
    reg = "^[0-9a-f]{6}$"
    match = re.search(reg, following_chars)

    if match:
        return True

    return False
print(hair_check("#abcdef")) # Expected: True
print(hair_check("#a0123f")) # Expected: True
print(hair_check("#ghijkl")) # Expected: False
print(hair_check("#ghi012")) # Expected: False
True
True
False
False
# ecl (Eye Color) - exactly one of: amb blu brn gry grn hzl oth
def eye_check(col):
    valid_cols = ["amb", "blu", "brn", "gry", "grn", "hzl", "oth"]

    if col in valid_cols:
        return True

    return False
print(eye_check("amb")) # Expected: True
print(eye_check("oth")) # Expected: True
print(eye_check("AMB")) # Expected: False
print(eye_check("amber")) # Expected: False
True
True
False
False
# pid (Passport ID) - a nine-digit number, including leading zeroes.
def id_check(pid):
    try:
        int_id = int(pid)       
    except ValueError:
        # not numeric
        return False

    if len(pid) == 9:
        return True
    return False
def key_val_check(key, value):
    if key in ["byr", "iyr", "eyr"]:
        return year_check(value, key)
    if key == "hgt":
        return height_check(value)
    if key == "hcl":
        return hair_check(value)
    if key == "ecl":
        return eye_check(value)
    if key == "pid":
        return id_check(value)
    if key == "cid":
        # We don't care, we can return True here always
        return True

    return False
def check_passports(passports):
    valid_pports = []
    for pport in passports:
        pport_keys = set(pport.keys())
        if not req_keys.issubset(pport_keys):
            # Passport doesn't pass criteria from first part
            continue
        valid_pport = True
        for key in pport:
            if not key_val_check(key, pport[key]):
                valid_pport = False
                break

        if valid_pport:
            valid_pports.append(pport)
    return valid_pports
return_valid = check_passports(valid_passports) # expecting 4 valid passports
if(len(return_valid) == 4):
    print("Returned as expected")
else:
    print(valid_passports)
    print("Returned unexpected:")
    print(return_valid)
Returned as expected
return_invalid = check_passports(invalid_passports) # expecting 0 valid passports
if(len(return_invalid) == 0):
    print("Returned as expected")
else:
    print(invalid_passports)
    print("Returned unexpected:")
    print(return_invalid)
Returned as expected
return_mixed = check_passports(mixed_passports) # expecting 4 valid passports
if(len(return_mixed) == 4):
    print("Returned as expected")
else:
    print(mixed_passports)
    print("Returned unexpected:")
    print(return_mixed)
Returned as expected

Repeat with the real data set and we should be all good!

I’m sure there is probably a more concise way of doing this but I’m fairly happy with it.

I there should be a pandas way of doing this and reducing the rows using loc, think that’d be far more efficient too. Let’s try it! Make sure you have ran ‘pip install pandas’ in your terminal to install pandas before importing it here.

import pandas as pd
df = pd.DataFrame(mixed_passports)
print(df)
          pid    hgt  ecl   iyr   eyr   byr      hcl  cid
0   087499704   74in  grn  2012  2030  1980  #623a2f  NaN
1   896056539  165cm  blu  2014  2029  1989  #a97842  129
2   545766238  164cm  hzl  2015  2022  2001  #888785   88
3   093154719  158cm  blu  2010  2021  1944  #b6652a  NaN
4       186cm    170  amb  2018  1972  1926  #18171d  100
5   012533040  170cm  grn  2019  1967  1946  #602927  NaN
6   021572410  182cm  brn  2012  2020  1992   dab227  277
7  3556412378   59cm  zzz  2023  2038  2007   74454a  NaN
# you can use map to apply a function over a row, and put it inside the square braces to apply it to only return the valid rows
df = df[df['pid'].map(id_check)]
print(df)
         pid    hgt  ecl   iyr   eyr   byr      hcl  cid
0  087499704   74in  grn  2012  2030  1980  #623a2f  NaN
1  896056539  165cm  blu  2014  2029  1989  #a97842  129
2  545766238  164cm  hzl  2015  2022  2001  #888785   88
3  093154719  158cm  blu  2010  2021  1944  #b6652a  NaN
5  012533040  170cm  grn  2019  1967  1946  #602927  NaN
6  021572410  182cm  brn  2012  2020  1992   dab227  277
# we can apply all the functions that take one parameter like this but not the year checks 
# (unless we create new funcs for them 😉 )
df = df.loc[df['pid'].map(id_check) &
    df['hgt'].map(height_check) &
    df['ecl'].map(eye_check) &
    df['hcl'].map(hair_check)
]

print(df)
         pid    hgt  ecl   iyr   eyr   byr      hcl  cid
0  087499704   74in  grn  2012  2030  1980  #623a2f  NaN
1  896056539  165cm  blu  2014  2029  1989  #a97842  129
2  545766238  164cm  hzl  2015  2022  2001  #888785   88
3  093154719  158cm  blu  2010  2021  1944  #b6652a  NaN
5  012533040  170cm  grn  2019  1967  1946  #602927  NaN

By using apply and lambda we can narrow down the data using the funtions we made earlier and pass in multiple values
ref: https://towardsdatascience.com/apply-and-lambda-usage-in-pandas-b13a1ea037f7

df = df[df.apply(lambda x: year_check(x['iyr'],'iyr') and year_check(x['eyr'],'eyr') and year_check(x['byr'],'byr'),axis=1)]
print(df)
         pid    hgt  ecl   iyr   eyr   byr      hcl  cid
0  087499704   74in  grn  2012  2030  1980  #623a2f  NaN
1  896056539  165cm  blu  2014  2029  1989  #a97842  129
2  545766238  164cm  hzl  2015  2022  2001  #888785   88
3  093154719  158cm  blu  2010  2021  1944  #b6652a  NaN

To make the above code a little neater… we could create a check_all_years function which takes in the 3 different years and checks them all… Like this:

def check_all_years(iyr, eyr, byr):
    return year_check(iyr,'iyr') and year_check(eyr,'eyr') and year_check(byr,'byr')
df = df[df.apply(lambda x: check_all_years(x['iyr'], x['eyr'], x['byr']),axis=1)]
print(df)
         pid    hgt  ecl   iyr   eyr   byr      hcl  cid
0  087499704   74in  grn  2012  2030  1980  #623a2f  NaN
1  896056539  165cm  blu  2014  2029  1989  #a97842  129
2  545766238  164cm  hzl  2015  2022  2001  #888785   88
3  093154719  158cm  blu  2010  2021  1944  #b6652a  NaN

And there we have it…

I learnt a few new pandas tricks and enjoyed the complexity of this challenge, I didn’t find it particularly strenuous, myself, but there was a lot of code to write! The hardest part was making sure we hit ever part of the specification!

Leave a comment