Off-By-One Horrors

Humans count from 1, and yet arrays are indexed from 0 (unless you have the misfortune of using MATLAB). There's no wonder we suffer from the infamous off-by-one errors so often.

Whilst solving a coding problem recently, I experienced the worst form of the off-by-one error I've ever seen. An off by one error that seemed to only occur on random test cases with absolutely no discernable pattern.

The solution to this problem confounded me for three weeks, before I eventually came up with a kind of ugly solution for it.

Don't give me five! Really!

To set the scene, one day I was working through some Codewars problems, and I had just solved a nice simple one, Don't give me five!

The goal of this problem was to count the numbers between two bounds that did not contain a 5.

Since this only deals with small numbers, you can use the naive method of looping between the start and end number, and infact you can easily solve this with a one-liner.

def dont_give_me_five(start,end):
    return len([i for i in range(start, end + 1) if not "5" in str(i)])

This problem has a big brother, though. Don't give me a five! Really!

This one has numbers up to 19 digits long, spanning both positive and negative numbers, and there's so many test cases to pass that there's no for looping your way out of it.

Counting Fives

The first course of action towards solving this is to find how many 5s are in a certain range. We can do this by making some simple observations:

In the range 1-10 there is 1 5
In the range 1-100 there are 19 5s
In the range 1-1000 there are 271 5s

Conversely, this means:

In the range 1-10 there are 9 not 5s
In the range 1-100 there are 81 not 5s
In the range 1-1000 there are 729 not 5s

Generalising, we can say:

In the range 1-10^n there are 9^n numbers that do not contain a 5.

The Plan

It seemed simple enough. We count the number of not 5s up to and inclusive of the upper bound, then subtract the number of not 5s up to and inclusive of the lower bound.

To do this, we will iterate over each digit of the bounds, left to right.

For example, considering 123456:

Count how many not 5s in the range 1-100000
Count how many not 5s in the range 1-10000 and double it
Count how many not 5s in the range 1-1000 and triple it
Count how many not 5s in the range 1-100 and quadruple it

The next digit is a 5, so we need only count the not 5s in the range 0-49. This is calculated by the number of not 5s in 1-50, minus 1 so as not to include the number 50.

Let’s also consider the example of the number of not 5s in 70.

In that case we count the number in the range 1-100 and subtract 3x the number in the range 1-10 to arrive at 54.

What about the number of not 5s in any number under 10? We can either fall back to the 'naive' method from the easier version of this problem, or use a ternery operator.

return n if n < 5 else n - 1

That’s already a fair handful of considerations, but let's not forget edge cases.

Edge Cases Considerations

There's a few edge cases to think about:

Both lower and upper bounds are negative - in this case we change the sign and switch the lower and upper bounds
"Crossing the line" - if the lower bound is negative and the upper bound is positive then we add 1 to the final count to account for 0

Also, our plan of subtracting the number of not 5s up to the lower bound from the number of not 5s up to the upper bounds will not work if the lower bound is negative. In that case we need to sum them.

The Code

It's time to start writing code. To save a small amount of processing time, I precalculated how many numbers were in each range and stored it in a dictionary.

# Method for calculating the digits
# Note - CUMULATIVE_COUNTS dictionary needs seeding with CUMULATIVE_COUNTS = {1: 9} for this to work
def cumulative_sum_counts(digits):
    for i in range(digits):
        try:
            # Check if the entry already exists before writing a new entry
            CUMULATIVE_COUNTS[i + 1]
        except:
            CUMULATIVE_COUNTS[i + 1] = CUMULATIVE_COUNTS[i] * 9


# Precalculated number of digits that aren't 5 over various intervals
CUMULATIVE_COUNTS = {
    1: 9,
    2: 81,
    3: 729,
    4: 6561,
    5: 59049,
    6: 531441,
    7: 4782969,
    8: 43046721,
    9: 387420489,
    10: 3486784401,
    11: 31381059609,
    12: 282429536481,
    13: 2541865828329,
    14: 22876792454961,
    15: 205891132094649,
    16: 1853020188851841,
    17: 16677181699666569,
    18: 150094635296999121,
    19: 1350851717672992089,
    20: 12157665459056928801
    }

The dictionary keys represent the powers of 10 that we are counting the number of not 5s in, i.e. dictionary key n represents the count in the range 1-10^n.

Now let's deal with the edge case where both upper and lower bounds are negative. We simple swap the signs so both numbers are positive, and swap the lower and upper bounds:

if start < 0 and end < 0:
    start, end = abs(end), abs(start)

Now it’s time to actually count the number of digits.

I defined a subfunction called count_not_fives designed specifically to count how many numbers in the range aren't fives. Within this we will loop over each digit as outlined in 'the plan'.

def count_not_fives(num):
    # 'Naive' method
    num = abs(num)
    if num < 5:
        return num
    elif num == 5:
        return 4
    elif num < 10:
        return num - 1

    # Otherwise we will need to iterate over each digit
    # E.g. 257 - we count the 200s, then we add 5 * 10 - 1 because the next digit is 5
    # E.g. 638 - We count 1000, then minus 4 * 100 off that (because 6 > 5), count 3 * 10, then count num not 5s in 8
    num = str(num)
    significance = len(num) - 1
    sequence_length = 0

    # We don't care about the last number since that will be a single digit counted by the 'naive' method
    for i in range(len(num) - 1):
        digit = int(num[i])
        if digit < 5:
            sequence_length += digit * CUMULATIVE_COUNTS[significance]
        elif digit == 5:
            sequence_length += digit * CUMULATIVE_COUNTS[significance] - 1
            # Every trailing digit no longer needs to be counted
            return sequence_length
        else:
            sequence_length += CUMULATIVE_COUNTS[significance + 1]
            sequence_length -= (10 - digit) * CUMULATIVE_COUNTS[significance]

        significance -= 1

    # Add on the last little bit
    sequence_length += count_not_fives(int(num[-1]))
    return sequence_length

Finally, if the starting digit is below zero, we sum our two counts and add one to include the number 0. Otherwise we subtract one count from the other.

start_sequence = count_not_fives(start)
end_sequence = count_not_fives(end)
    
if start <= 0:
    return end_sequence + start_sequence + 1
return end_sequence - start_sequence

Great.

Except it doesn't work.

The Off-By-One Error

Now comes the part where I hit a brick wall. I spent hours working on this, the longest I’ve ever spent on a problem, and I got nowhere. I eventually had to give up.

Leaving this problem unsolved consumed me. I felt like making something that could efficiently count the numbers not including 5 was the hard part, but the real difficulty came from solving this off-by-one error. I felt like I’d done all the hard work, only to be left with something stupid and trivial to trip me up.

To give you an example of what was going wrong and how baffled I was, here are some failed test cases:

Over the range (40076, 2151514229639903569) I calculated 326131553237897712 instead of 326131553237897713
Over the range (486794202706742299,3717393997447866413) I calculated 472657344146365324 instead of 472657344146365325
Over the range (-5611425327137314270,-2268077268131749338) I calculated 406206174884255254 instead of 406206174884255255

Keep in mind, I was also passing heaps of test cases.

So just what was going on here?

Well actually, even some of the really small test cases were failing. The range (1, 9) returned 7 instead of 8.

This was one of the key puzzle pieces in solving this. Decomposing how the program works out the sum for the range (1,9), we know that it counts 8 digits for the upper bound, then 1 digit from the lower bound. So it was returning 8 - 1: 7.

So perhaps the issue is that we're accidentally not including the lower bound in the count, so let's reduce start by 1. But only if start is positive, because this function seems to work perfectly from a negative lower bound and positive upper bound.

if start > 0:
    start -= 1

Great.

Except for that previously mentioned range of 1-9, because now the function is returning 9 instead of 8.

Every other test case - literally thousands of them - are passing. This single one is going wrong.

The Ugly Fix

I solved this problem a while ago now, so the exact toil I went through I can't remember. But getting even to this point took a long, long time of futile trial and error and experimentation, trying to see patterns amongst noise.

Every time I tried to make a fix to correct the off-by-one errors, new off-by-one errors arose in their place.

However, through observation we can see what the problem is preventing the range (1, 9) from passing.

Remember a while back, where I said dealing with negative numbers meant 'crossing the line' and thus we had to add 1 to include 0?

if start <= 0:
    return end_sequence + start_sequence + 1

The range (1, 9) is now being calculated correctly, but we are unwittingly adding 1 at the very end of the function.

So let's change that start <= 0 to start < 0.

Now the program fails one of the edge cases. The range (0, 1). This clearly should count two digits, but we’re getting 1 returned. This is because when we run count_not_fives(0) we get 0, so the whole function returns 1 instead of 2.

So finally, here comes the ugly, hacky fix. Save start to a new variable ss. At the end, we check if ss <= 0 instead of it start <= 0.

# Reduce start by 1 to be inclusive of starting digit. Save the start value to ss
ss = start
if start > 0:
    start -= 1

start_sequence = count_not_fives(start)
end_sequence = count_not_fives(end)
    
if ss <= 0:
    return end_sequence + start_sequence + 1
return end_sequence - start_sequence

Problem finally solved. But damn, is it ugly.

When you submit your correct solution, you immediately get access to everyone else's solutions, which of course makes you realise that your solution is massively over-engineered and hideous, whilst the top solutions are elegant and beautiful.

My code may well be ugly, but it's well commented, it works and it's fast.

Conclusion

Writing this in retrospect make me realise that I was perhaps so lost in the frustration of not being able to solve the problem, that I was unable to rationally work out what was going wrong.

Although I say this took 3 weeks to solve, I only spent a few days of back and forth on it. I decided I needed to get it out of my head and finally came back to it several weeks later, only to solve it within a very short space of time.

When we are amidst a difficult problem, it's easy to throw everything we have at it. Trying random fixes without proper reasoning behind it. Adjust values up and down to see what changes.

This was an off by one error, so surely changing a value by one will fix it, but no, because that has far reaching consequences leading other things to break and new off-by-one errors to arise in their place.

We really need to look to the root cause of the problem to fix it properly.

Sometimes the hardest thing of all is stepping away and leaving something alone, and yet, that's often the best thing to do. We, of course, know this, and yet in the moment it is so difficult to do.

Full Solution in Python

Link to Kata.

This kata is the performance version of Don't give me five by user5036852.
Your mission, should you accept it, is to return the count of all integers in a given range which do not contain the digit 5 (in base 10 representation).
You are given the start and the end of the integer range. The start and the end are both inclusive.
Examples:
1,9 -> 1,2,3,4,6,7,8,9 -> return 8
4,17 -> 4,6,7,8,9,10,11,12,13,14,16,17 -> return 12
The result may contain fives. ;-)
The start will always be smaller than the end. Both numbers can be also negative.
The regions can be very large and there will be a large number of test cases. So brute force solutions will certainly time out!
Good luck, warrior!

def dont_give_me_five(start, end):
    # Precalculated number of digits that aren't 5 over various intervals
    # Dictionary key represents the number of trailing 0's
    CUMULATIVE_COUNTS = {1: 9, 2: 81, 3: 729, 4: 6561, 5: 59049, 6: 531441, 7: 4782969, 8: 43046721, 9: 387420489, 10: 3486784401, 11: 31381059609, 12: 282429536481, 13: 2541865828329, 14: 22876792454961, 15: 205891132094649, 16: 1853020188851841, 17: 16677181699666569, 18: 150094635296999121, 19: 1350851717672992089, 20: 12157665459056928801}
    
    # Observations:
    # 9 numbers without 5 between 0-10
    # -> 90 between 0-100 but we need to remove 9 for 50-59 -> 81
    # -> 810 between 0-1000 but we need to remove 81 for 500-599, therefor be 729
    # So we can iteratively calculate these
    """
    # Just a method for pre-calculating the digits
    def cumulative_sum_counts(digits):
        for i in range(digits):
            try:
                CUMULATIVE_COUNTS[i + 1]
            except:
                CUMULATIVE_COUNTS[i + 1] = CUMULATIVE_COUNTS[i] * 9
    
    cumulative_sum_counts(19)
    print(CUMULATIVE_COUNTS)
    """
    
    # Edge cases - numbers are < 10, just use a naive solution
    # "Crossing the line" - we cross 0 in the range, add one extra digit
    # Start and end are negative - make them positive and flip them    
    if start < 0 and end < 0:
        start, end = abs(end), abs(start)
    
    # This is required to be inclusive of start
    ss = start
    if start > 0:
        start -= 1
    
    def count_not_fives(num):
        num = abs(num)
        if num < 5:
            return num
        elif num == 5:
            return 4
        elif num < 10:
            return num - 1

        # Otherwise we will need to iterate over each digit
        # E.g. 257 - we count the 200s, then we add 5 * 10 - 1 because the next digit is 5
        # E.g. 638 - We count 1000, then minus 4 * 100 off that (because 6 > 5), count 3 * 10, then count num not 5s in 8
        num = str(num)
        significance = len(num) - 1
        sequence_length = 0
        # We don't care about the last number
        for i in range(len(num) - 1):
            digit = int(num[i])
            if digit < 5:
                sequence_length += digit * CUMULATIVE_COUNTS[significance]
            elif digit == 5:
                sequence_length += digit * CUMULATIVE_COUNTS[significance] - 1
                # Every trailing digit no longer needs to be counted
                return sequence_length
            else:
                sequence_length += CUMULATIVE_COUNTS[significance + 1]
                sequence_length -= (10 - digit) * CUMULATIVE_COUNTS[significance]

            significance -= 1

        # Add on the last little bit
        sequence_length += count_not_fives(int(num[-1]))
        return sequence_length
    
    # So now we have a good method
    start_sequence = count_not_fives(start)
    end_sequence = count_not_fives(end)
    
    if ss <= 0:
        return end_sequence + start_sequence + 1
    return end_sequence - start_sequence