Filling in missing samples by interpolating.


I’ve got many 1d arrays of data which contain occasional NaNs where there weren’t any samples at that depth bin. Something like this…


But much bigger, and I have hundreds of them. Most NaN’s are isolated between two valid values, but they still make my contour plots look terrible.

Rather than just mask them, I want to interpolate so my plot doesn’t have holes in it where it need not.

I want to change any NaN which is preceded and followed by a value to the average of those two values.
If it only has one valid neighbor, I want to change it to the values of it’s neighbor.

Here’s a simplified version of my code:

from copy import copy
import numpy as np

sample_array = np.array(([np.nan,1,2,3,np.nan,5,6,7,8,np.nan,np.nan,11,12,np.nan,np.nan,np.nan]))
#Make a copy so we aren’t working on the original

cast = copy(sample_array)
#Now iterate over the copy

for j,sample in enumerate(cast):
# If this sample is a NaN, let’s try to interpolate

if np.isnan(sample):
    #Get the neighboring values, but make sure we don't index out of bounds

    prev_val = cast[max(j-1,0)]
    next_val = cast[min(j+1,cast.size-1)]

    print "Trying to fix",prev_val,"->",sample,"<-",next_val
    # First try an average of the neighbors

    inter_val = 0.5 * (prev_val + next_val)
    if np.isnan(inter_val):

        #There must have been an neighboring Nan, so just use the only valid neighbor
        inter_val = np.nanmax([prev_val,next_val])

    if np.isnan(inter_val):
        print "   No changes made"

        print "   Fixed to",prev_val,"->",inter_val,"<-",next_val

        #Now fix the value in the original array
        sample_array[j] = inter_val

After this is run, we have:
sample_array = array([1,1,2,3,4,5,6,7,8,8,11,11,12,12,np.nan,np.nan])

works, but is very slow for something that will be on the back end of a web page.
Perhaps something that uses masked arrays and some of
the methods?
I keep thinking there must be some much more clever way of doing this.


Ryan Neve wrote:


This works, but is very slow for something that will be on the back end of a web page.

Iterating in python is usually slow, so you should use numpy array methods if possible. I've made a faster version. It gives the same result for your test case, but you should test it further to see if it treats all cases properly.

Jo�o Silva



import numpy as np

sample_array = np.array(([np.nan,1,2,3,np.nan,5,6,7,8,np.nan,np.nan,11,12,np.nan,np.nan,np.nan]))

#Replace single nan with the neighbours average
sample_array[1:-1] = np.where(np.isnan(sample_array[1:-1]),(sample_array[:-2]+sample_array[2:])/2.0,sample_array[1:-1])

#Fix ... Number nan ...
sample_array[1:]= np.where(np.logical_and(np.logical_not(np.isnan(sample_array[:-1])),np.isnan(sample_array[1:])),sample_array[:-1],sample_array[1:])

#Fix ... nan Number ...
sample_array[:-1]= np.where(np.logical_and(np.logical_not(np.isnan(sample_array[1:])),np.isnan(sample_array[:-1])),sample_array[1:],sample_array[:-1])

print sample_array