how to express statistical data in colors

You're translating a histogram of your data into a colormap, yes?

The matplotlib histogram returns bins and patches, which you could translate into color intensities; but I bet scipy.stats.histogram would be easier. Then the bin centers are the segment boundaries of the colormap, and the weight in each bin is the respective color intensity.

Also, color has a finite extent but the bin weight might not. You'll need to choose a nominal max value to norm the colors to, and decide whether to use the same max value all the time (so early plots might all be light, late plots all dark) or calculate it from the data each time you plot (in which case the colorbar this month might not mean the same thing as the color bar last month).

I think using all three of RGB is too confusing -- do it bluescale or grayscale.

&C

···

On Nov 5, 2012, at 7:13 AM, rand0m@...4228... wrote:

Hi Chloe

Thank you for answering.

I agree the way you suggest. Currently I have done this:

import matplotlib
import matplotlib.pyplot as plt

# http://matplotlib.org/examples/api/colorbar_only.html
#
http://matplotlib.org/api/colors_api.html#matplotlib.colors.LinearSegmentedColormap

# The lookup table is generated using linear interpolation for each
primary color, with the 0-1 domain divided into any number of segments.
# x, y0, y1
cdict = {'red': [(0.0, 0.0, 0.0),
                  (0.5, 1.0, 1.0),
                  (1.0, 1.0, 1.0)],

        'green': [(0.0, 0.0, 0.0),
                  (0.25, 0.0, 0.0),
                  (0.75, 1.0, 1.0),
                  (1.0, 1.0, 1.0)],

        'blue': [(0.0, 0.0, 0.0),
                  (0.5, 0.0, 0.0),
                  (1.0, 1.0, 1.0)]}

# create colormap
my_cmap = matplotlib.colors.LinearSegmentedColormap("my_colormap",
cdict, N=256, gamma=1.0)

# optional: register colormap
#plt.register_cmap(name='my_colormap', data=cdict)

fig = plt.figure(figsize=(5,1))
fig.subplots_adjust(top=0.99, bottom=0.01, left=0.2, right=0.99)
plt.axis("off")
import numpy as np
a = np.linspace(0, 1, 256).reshape(1,-1)
a = np.vstack((a,a))
plt.imshow(a, aspect='auto', cmap=my_cmap, origin='lower')

plt.show()

Now the tricky part has still to be done. I have a varying number (ca.
500, increasing) of values between 60 and 90. Those values must be
represented in the colorbar. White if there is no value, blue towards
black the more values are in the same area.
For this, I guess, I have to set a x for each value (and three x since
the color is calculated using RGB). And the closer it is to the previous
one the more I have to calculate the color between blue and black.

Or do you suggest another way to implement this?

I do not know of any other software that this issue has been implemented.

cheers!

On 10/26/2012 07:47 PM, Chloe Lewis wrote:

you'll be doing something like the second color bar, but making the
boundary and color definitions a lot more flexible. Where the discrete
color bar uses

cmap = mpl.colors.ListedColormap(['r', 'g', 'b', 'c'])
bounds = [1, 2, 4, 7, 8]

you'll be making a whole LinearSegmentedColormap, see

http://matplotlib.org/api/colors_api.html#matplotlib.colors.LinearSegmentedColormap

and check out specifically the ascii-art explanation of interpolation between row[i] and row[i+1]. Red, green, blue will break based on your data density and how you want to express 'intensity'. And depending on whether you'll make it red-green-colorblindness neutral!

Interesting problem. Has it been implemented in some other software?

Chloe Lewis
PhD candidate, Harte Lab
Division of Ecosystem Sciences, ESPM
University of California, Berkeley
137 Mulford Hall
Berkeley, CA 94720
chlewis@...1016... <mailto:chlewis@…1016…>

Hi!

I think a histogram isn't the thing I need because it is not important
when (the time) the values between 60 and 90 have been "created". Only
the values and the amount of values is important.

Also when talking about a colormap I'm not sure if this is required. In
the end I want only one color (blue) in a rectangle that changes the
color/appearance based on the density. So I guess to bluescale it is the
right way as you suggested.

And as you said, the min density value would be 60 and the max density
value would be 90. I think I will make those values fix.

As you told the "colorbar" might not mean the same on a later time. This
is no problem and basically the goal of it.

Cool, I guess this is the concept to be implemented. I'm searching for
ways to bluescale with matplotlib..

I have hacked a little code snipped. I think it does what I desire,
except of one thing left. Some of the little "elements" I draw do
overlap and I dont know why. I print the values to plot on the x-axis to
the console. As you see the x-coordinates do not overlap.. Does anyone
know what the problem is?

from matplotlib.ticker import MultipleLocator
import numpy as np
import matplotlib.pyplot as plt
import random
import array
from pylab import gca

# Source:

#CONSTANTS
NPOINTS = 100
COLOR='blue'
RESFACT=10
MAP='winter' # choose carefully, or color transitions will not appear smooth
FIGRES=111.0 # figure size: must be float!

# create random data
np.random.seed()
x =
tmp = 0
while tmp < NPOINTS:
    x.append(random.randrange(60, 98, 1))
    tmp = tmp+1

fake_y_array = np.array([0])
a = 0
while a < NPOINTS-1:
    fake_y_array = np.append(fake_y_array, 0)
    a = a+1
y = fake_y_array
x = sorted(x)

#print x

fig = plt.figure()
ax4 = fig.add_subplot(FIGRES) # high resolution alpha

npointsHiRes = len(x)

stats = dict()
for index in x:
    #stats.insert(index, stats[index] + 1)
    try:
        stats[index] = stats[index] + 1
    except:
        stats[index] = 1

print stats

# alpha is the transparency parameter
# based on the more values we have, the smaller is the
# difference between transparency per element in the graph
# we multiply this alpha with a given factor to make the elements
# appear visible enough for the human eye
alpha_steps = (1.0/len(stats))*1

for i in range(npointsHiRes):
    #print 'x: ' + str(x[i])
    #print 'stats: ' + str(stats[x[i]]) + ' (a) ' +
str(alpha_steps*stats[x[i]])

    if x[i] is x[i-1]:
        # skip this round because we already
        # have drawn one element
        # and based on its transparency
        # it is expressed how many times this
        # value exists
        continue

    ytmp = y[i:i+2]
    xtmp = x[i:i+2]
    try:
        if xtmp[0] is xtmp[1]:
            # ok if they equal, we cannot draw a visible line
            # +1 to draw it..
            xtmp[1] = xtmp[1]+1
    except IndexError:
        # last element is sometimes "alone"
        # so we add an effectively last one
        # to draw it with eye-visibility
        xtmp.append(xtmp[0]+1)
        a = np.array([0])
        ytmp = np.hstack((ytmp, a))

    ax4.plot(xtmp,ytmp,
             alpha=alpha_steps*stats[x[i]],
             color=COLOR, lw=100) #drawstyle: [ 'default' | 'steps' |
'steps-pre' | 'steps-mid' | 'steps-post' ]

ax4.set_xlim(min(x),max(x)+1)

gca().xaxis.set_major_locator(MultipleLocator((round(len(x)/FIGRES))*2))
gca().yaxis.set_major_locator(MultipleLocator())

plt.grid(True)

#fig.savefig('gradColorLine.png')
plt.show()

···

On 11/06/2012 01:25 AM, Chloe Lewis wrote:

You're translating a histogram of your data into a colormap, yes?

The matplotlib histogram returns bins and patches, which you could translate into color intensities; but I bet scipy.stats.histogram would be easier. Then the bin centers are the segment boundaries of the colormap, and the weight in each bin is the respective color intensity.

Also, color has a finite extent but the bin weight might not. You'll need to choose a nominal max value to norm the colors to, and decide whether to use the same max value all the time (so early plots might all be light, late plots all dark) or calculate it from the data each time you plot (in which case the colorbar this month might not mean the same thing as the color bar last month).

I think using all three of RGB is too confusing -- do it bluescale or grayscale.

&C

On Nov 5, 2012, at 7:13 AM, rand0m@...4228... wrote:

Hi Chloe

Thank you for answering.

I agree the way you suggest. Currently I have done this:

import matplotlib
import matplotlib.pyplot as plt

# http://matplotlib.org/examples/api/colorbar_only.html
#
http://matplotlib.org/api/colors_api.html#matplotlib.colors.LinearSegmentedColormap

# The lookup table is generated using linear interpolation for each
primary color, with the 0-1 domain divided into any number of segments.
# x, y0, y1
cdict = {'red': [(0.0, 0.0, 0.0),
                  (0.5, 1.0, 1.0),
                  (1.0, 1.0, 1.0)],

        'green': [(0.0, 0.0, 0.0),
                  (0.25, 0.0, 0.0),
                  (0.75, 1.0, 1.0),
                  (1.0, 1.0, 1.0)],

        'blue': [(0.0, 0.0, 0.0),
                  (0.5, 0.0, 0.0),
                  (1.0, 1.0, 1.0)]}

# create colormap
my_cmap = matplotlib.colors.LinearSegmentedColormap("my_colormap",
cdict, N=256, gamma=1.0)

# optional: register colormap
#plt.register_cmap(name='my_colormap', data=cdict)

fig = plt.figure(figsize=(5,1))
fig.subplots_adjust(top=0.99, bottom=0.01, left=0.2, right=0.99)
plt.axis("off")
import numpy as np
a = np.linspace(0, 1, 256).reshape(1,-1)
a = np.vstack((a,a))
plt.imshow(a, aspect='auto', cmap=my_cmap, origin='lower')

plt.show()

Now the tricky part has still to be done. I have a varying number (ca.
500, increasing) of values between 60 and 90. Those values must be
represented in the colorbar. White if there is no value, blue towards
black the more values are in the same area.
For this, I guess, I have to set a x for each value (and three x since
the color is calculated using RGB). And the closer it is to the previous
one the more I have to calculate the color between blue and black.

Or do you suggest another way to implement this?

I do not know of any other software that this issue has been implemented.

cheers!

On 10/26/2012 07:47 PM, Chloe Lewis wrote:

you'll be doing something like the second color bar, but making the
boundary and color definitions a lot more flexible. Where the discrete
color bar uses

cmap = mpl.colors.ListedColormap(['r', 'g', 'b', 'c'])
bounds = [1, 2, 4, 7, 8]

you'll be making a whole LinearSegmentedColormap, see

http://matplotlib.org/api/colors_api.html#matplotlib.colors.LinearSegmentedColormap

and check out specifically the ascii-art explanation of interpolation between row[i] and row[i+1]. Red, green, blue will break based on your data density and how you want to express 'intensity'. And depending on whether you'll make it red-green-colorblindness neutral!

Interesting problem. Has it been implemented in some other software?

Chloe Lewis
PhD candidate, Harte Lab
Division of Ecosystem Sciences, ESPM
University of California, Berkeley
137 Mulford Hall
Berkeley, CA 94720
chlewis@...1016... <mailto:chlewis@…1016…>

I think a histogram isn't the thing I need because it is not important
when (the time) the values between 60 and 90 have been "created". Only
the values and the amount of values is important.

You can make the values the independent axis of the histogram.

Also when talking about a colormap I'm not sure if this is required. In
the end I want only one color (blue) in a rectangle that changes the
color/appearance based on the density. So I guess to bluescale it is the
right way as you suggested.

I think custom blue scale can only be done as a custom colormap; but R and G will be constant throughout.

&C