Tuomas Siipola Articles Projects

Comparing NumPy arrays

Let's say we're given some numbers which we want to recompute using NumPy. Checking that our results match the original ones sounds trivial but, depending on the data type, there are a couple of pitfalls to avoid.

NumPy arrays

Let's start with the simple case: arrays of integers. They can be compared using np.array_equal:

A = np.array([1, 2, 3])
B = np.array([1, 2, 3])
assert np.array_equal(A, B)

We have to be more careful with floating-point numbers. For example:

A = np.array([0.1 + 0.2])
B = np.array([0.3])
assert np.array_equal(A, B)  # AssertionError

fails because of the inherent inaccuracy of floating-point arithmetic. Instead, np.allclose should be used for comparing floating-point arrays. One thing to note is that np.allclose raises an exception if differently shaped arrays are passed in, but we can write a wrapper to handle this:

def myallclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False):
    if a.shape != b.shape:
        return False
    return np.allclose(A, B, rtol=rtol, atol=atol, equal_nan=equal_nan)

A = np.array([0.1 + 0.2])
B = np.array([0.3])
assert myallclose(A, B)  # OK

Additionally, equal_nan argument is False by default, but most likely you want to set this to True:

A = np.array([np.nan])
B = np.array([np.nan])
assert myallclose(A, B)                  # AssertionError
assert myallclose(A, B, equal_nan=True)  # OK

Masked arrays

Similar to np.array_equal, there's ma.allequal for masked arrays, but its behaviour can be surprising. Logically, any masked values should be equal, but a masked value and a non-masked value should not. Both requirements cannot be met at the same time with the provided fill_value argument:

A = ma.array([1, 2, 3], mask=[0, 1, 0])  # [1, --, 3]
B = ma.array([1, 4, 5], mask=[0, 0, 1])  # [1, 4, --]
assert ma.allequal(A, A, fill_value=True) == True    # OK
assert ma.allequal(A, B, fill_value=True) == False   # AssertionError
assert ma.allequal(A, A, fill_value=False) == False  # AssertionError
assert ma.allequal(A, B, fill_value=False) == True   # OK

We can get the desired result by writing our own function:

def myallequal(a, b):
    if not np.array_equal(ma.getmaskarray(a), ma.getmaskarray(b)):
        return False
    return ma.allequal(a, b)

A = ma.array([1, 2, 3], mask=[0, 1, 0])  # [1, --, 3]
B = ma.array([1, 4, 5], mask=[0, 0, 1])  # [1, 4, --]
assert myallequal(A, A) == True   # OK
assert myallequal(A, B) == False  # OK

Similarly to np.allclose, there's also ma.allclose, but it suffers from the same issue as ma.allequal and doesn't provide equal_nan argument. Here's a function that works for our purposes:

def myallclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False):
    if not np.array_equal(ma.getmaskarray(a), ma.getmaskarray(b)):
        return False
    res = np.all(np.isclose(a, b, rtol=rtol, atol=atol, equal_nan=equal_nan))
    if res is ma.masked:
        return True
    return res

A = ma.array([0.3,       0.4, 0.5], mask=[0, 1, 0])  # [0.3, --, 0.5]
B = ma.array([0.1 + 0.2, 0.4, 0.5], mask=[0, 1, 0])  # [0.3, --, 0.5]
C = ma.array([0.1 + 0.2, 0.3, 0.4], mask=[0, 0, 1])  # [0.3, 0.4, --]
assert myallclose(A, B) == True    # OK
assert myallclose(A, C) == False   # OK