Filling in missing samples by interpolating.

Ryan_Neve1 · September 2, 2009, 6:33pm

Hello,

I’ve got many 1d arrays of data which contain occasional NaNs where there weren’t any samples at that depth bin. Something like this…

array([np.nan,1,2,3,np.nan,5,6,7,8,np.nan,np.nan,11,12,np.nan,np.nan,np.nan])

But much bigger, and I have hundreds of them. Most NaN’s are isolated between two valid values, but they still make my contour plots look terrible.

Rather than just mask them, I want to interpolate so my plot doesn’t have holes in it where it need not.

I want to change any NaN which is preceded and followed by a value to the average of those two values.
If it only has one valid neighbor, I want to change it to the values of it’s neighbor.

Here’s a simplified version of my code:

from copy import copy
import numpy as np

sample_array = np.array(([np.nan,1,2,3,np.nan,5,6,7,8,np.nan,np.nan,11,12,np.nan,np.nan,np.nan]))
#Make a copy so we aren’t working on the original

cast = copy(sample_array)
#Now iterate over the copy

for j,sample in enumerate(cast):
# If this sample is a NaN, let’s try to interpolate

if np.isnan(sample):
    #Get the neighboring values, but make sure we don't index out of bounds

    prev_val = cast[max(j-1,0)]
    next_val = cast[min(j+1,cast.size-1)]

    print "Trying to fix",prev_val,"->",sample,"<-",next_val
    # First try an average of the neighbors

    inter_val = 0.5 * (prev_val + next_val)
    if np.isnan(inter_val):

        #There must have been an neighboring Nan, so just use the only valid neighbor
        inter_val = np.nanmax([prev_val,next_val])

    if np.isnan(inter_val):
        print "   No changes made"

    else:
        print "   Fixed to",prev_val,"->",inter_val,"<-",next_val

        #Now fix the value in the original array
        sample_array[j] = inter_val

After this is run, we have:
sample_array = array([1,1,2,3,4,5,6,7,8,8,11,11,12,12,np.nan,np.nan])

This
works, but is very slow for something that will be on the back end of a web page.
Perhaps something that uses masked arrays and some of
the numpy.ma methods?
I keep thinking there must be some much more clever way of doing this.

-Ryan

Joao_Luis_Silva1 · September 3, 2009, 2:00pm

Ryan Neve wrote:

Hello,

[...]
This works, but is very slow for something that will be on the back end of a web page.

Iterating in python is usually slow, so you should use numpy array methods if possible. I've made a faster version. It gives the same result for your test case, but you should test it further to see if it treats all cases properly.

Regards,
Jo�o Silva

···

------------------------------------------------

import numpy as np

sample_array = np.array(([np.nan,1,2,3,np.nan,5,6,7,8,np.nan,np.nan,11,12,np.nan,np.nan,np.nan]))

#Replace single nan with the neighbours average
sample_array[1:-1] = np.where(np.isnan(sample_array[1:-1]),(sample_array[:-2]+sample_array[2:])/2.0,sample_array[1:-1])

#Fix ... Number nan ...
sample_array[1:]= np.where(np.logical_and(np.logical_not(np.isnan(sample_array[:-1])),np.isnan(sample_array[1:])),sample_array[:-1],sample_array[1:])

#Fix ... nan Number ...
sample_array[:-1]= np.where(np.logical_and(np.logical_not(np.isnan(sample_array[1:])),np.isnan(sample_array[:-1])),sample_array[1:],sample_array[:-1])

print sample_array

------------------------------------------------