[ANN, x-post] Creating a space for scientific open source at Berkeley (with UW and NYU)

Hi folks,

forgive me for the x-post to a few lists and the semi off-topic nature of this post, but I think it’s worth mentioning this to our broader community. To keep the SNR of each list high, I’d prefer any replies to happen on the numfocus list.

Yesterday, during an event at the White House OSTP, an announcement was made about a 5-year, $37.8M initative funded by the Moore and Sloan foundations to create a collaboration between UC Berkeley, the University of Washington and NYU on Data Science environments:

We worked in private on this for a year, so it’s great to be able to finally engage the community in an open fashion. I’ve provided some additional detail in my blog:


At Berkeley, we are using this as an opportunity to create the new Berkeley Institute for Data Science (BIDS):


and from the very start, open source and the scientific Python ecosystem have been at the center of our thinking. In the team of co-PIs we have, in addition to me, a bunch of Python supporters:

  • Josh Bloom leads our Python bootcamps and graduate seminar)

  • Cathryn Carson founded the DLab (dlab.berkeley.edu), which runs python.berkeley.edu.

  • Philip Stark: Stats Chair, teaches reproducible research with Python tools.

  • Kimmen Sjolander: comp. biologist whose tools are all open source Python.

  • Mike Franklin and Ion Stoica: co-directors of AMPLab, whose Spark framework has Python support.

  • Dave Culler: chair of CS, which now uses Python for its undergraduate intro courses.

We will be working very hard to basically make BIDS “a place for people like us” (and by that I mean open source scientific computing, not just Python: Juila, R, etc. are equally welcome). This is a community that has a significant portion of academic scientists who struggle with all the issues I list in my post, and solving that problem is an explicit goal of this initiative (in fact, it was the key point identified by the foundations when they announced the competition for this grant).

Beyond that, we want to create a space where the best of academia, the power of a university like Berkeley, and the best of our open source communities, can come together. We are just barely getting off the ground, deep in more mundane issues like building renovations, but over the next few months we’ll be clarifying our scientific programs, starting to have open positions, etc.

Very importantly, I want to thank everyone who, for the last decade+, has been working like mad to make all of this possible. It’s absolutely clear to me that the often unrewarded work of many of you was essential in this process, shaping the very existence of “data science” and the recognition that it should be done in an open, collaborative, reproducible fashion. Consider this event an important victory along the way, and hopefully a starting point for much more work in slightly better conditions.

Here are some additional resources for anyone interested:



Fernando Perez (@fperez_org; http://fperez.org)

fperez.net-at-gmail: mailing lists only (I ignore this when swamped!)
fernando.perez-at-berkeley: contact me here for any direct mail