Fukushima from Afar

Mike Miller

April 02, 2011

In the past month I had the all too rare opportunity to do something that was both terrifically exciting and, I believe, deeply meaningful. On Friday, March 11, I awoke in Seattle to the news of the devastating earthquake and tsunami that struck Japan. As the day developed the sheer magnitude of the true devastation emerged and it was clear that sobering stories would continue to unfold for quite awhile.

I was doubly surprised by the emerging media coverage of the Fukushima reactor complex. The Japanese are absolutely top-tier in both their science and engineering and the very fact that Tokyo had so little damage from a 9.0 earthquake (imagine if that happened here!) suggested that they were equally prepared for the aftermath. To a large extent, they were — just not for a series of events that were so powerful that within days it was clear that the Fukushima situation had changed dramatically.

In this era of real-time data, it occurred to me just how hard it was for the educated public to gain access to the right data. What was the level of radiation being released by Fukushima? Was it enough to be dangerous to the people in Japan? Was it enough to impact our sensitive (and expensive!) physics experiments here in Seattle? Was it enough to be a public health concern in the US?

These questions, plus a little prodding from senior faculty at UW, put the wheels in motion for me. If information was going to continue to be so hard to find, then maybe we could do something about it. We are physicists, after all. Emails were sent, grad students were diverted from thesis writing and qualifying exam preparation, and by Monday we had worked to convert my lab at UW from Dark Matter detector R&D to an air radioactivity monitoring lab, containing some very expensive, one-of-a-kind equipment, along with the requisite quantities of duct tape and zip-ties holding everything together. This was no small feat, and it never would’ve happened without the work of a superb team and the right tools.

We knew that any radiation that was released would be heavily diluted in transit to the west coast, so we had to (i) find a way to sample enormous quantities of air (nearly 150,000 cubic meters per day) and (ii) protect our radiation counter and air samples from the copious amounts of normal background activity. A little known (or appreciated) fact is that natural radiation is part of your daily life, from radon decay products to bananas, and natural radiation would likely drown out any faint signal from Fukushima. By Wednesday we started recording data, and then it suddenly hit me — we were going to be producing a lot of data, and it was already coming in. The data had to be catalogued, processed, condensed, distributed, data models modified, reprocessed, redistributed, and we were learning as we went along. Not only was the data was important to us and our experiments, but the public was hungry for realtime results (that had to be correct!), so we didn’t have the luxury of writing a spec, reviewing it, building it, prototyping, or any of the typical processes. We already were running TV crews through the lab daily, time was short, and so I turned to the tool that I happened to know best: open source BigCouch, and specifically the free hosted version on Cloudant.

We began putting together a real-time BigCouch workflow that ultimately enabled us to complete complex processing (data transformation and aggregation, signal processing, statistical analyses and the like), real-time data monitoring and visualization, collaborative analyses, and ultimately to serve the results directly to the public from the database. At the end of this post I will briefly review my BigCouch experiences, both good and bad.

On Friday, March 18, I had just placed the overnight air filter sample into our monitoring station and was adding some additional signal processing to our workflow. After 20 minutes of counting I happened to look at the computer screen to check the realtime data acquisition histogram, and there it was, clear as day, the first arrival of 131-I in the Seattle air. I spent about 10 minutes double checking, looking for other characteristic fission products in the spectra, and then realized we had something very unique.

Racing down the hall, I gathered a team and the next week is an absolute blur. That Friday night, around 10, we knew that we had unambiguously detected fission fragments from Fukushima, and we had a good handle on the total level of the radioactivity levels. While results were far below any local health concerns (roughly 1000 times below the EPA limits), it was higher than I had personally expected, and we suddenly knew that the situation in Japan was indeed serious. Our thoughts go out to those that live near Fukushima, as well as the heroic workers at the plant as they fight to contain the incident.

As the week progressed, we saw the radiation levels peak, slowly decline, and peak again as we detected events that occurred roughly 7 days after they happened in Japan. Within a day of our initial sighting we had posted the numbers and plots online, and they continued to update in real-time as the situation unfolded. By the middle of the week we posted the first publication draft online, and are now witnessing the radiation levels drop below the limit of detectability in our experiment. If you are interested in the details, the pieces in the Washington Post, Nature News, and NY Times were particularly well written, in my opinion.

Before closing, I want to briefly review the pros and cons of using BigCouch/Cloudant for this critical application.

Pros

  • Flexibility. The ability to smoothly handle both structured and unstructured data was critical. The document philosophy allowed us to be agile with data types (metadata + binary data side-by-side) and date models (sometimes changing by 20x per day).

  • Cloud. Not having to install and maintain my own database server for this purpose was a huge win.

  • Scalability. This application isn’t the world’s biggest data producer, but it’s no slouch either. We were able to easily store and process Terabytes of data without breaking a sweat, and it was nice to know that we wouldn’t be limited by scaling concerns in the future.

  • Real-time Analytics. We were able to easily insert our domain specific libraries/application into an incremental MapReduce workflow to provide up-to-the-minute results.

  • Collaborative. The database is a web server, so we were immediately able to collaborate as a team. The fact that the db is also a web server that speaks JSON over HTTP enables simple integration with other standard tools for client-side visualization, etc.

  • Thin client applications. I was able to write ultra-lightweight client-side javascript applications for data monitoring and presentation. These applications were typically around 100 lines of code and served directly from the cloud to the browser, which means I didn’t have to set up my own web-tier stack, nor learn Rails, Django, etc. This has been on my list of “things to learn” for a year, and this pushed me to finally do it.

Now the things that can definitely be improved

  • Analytics integration. Like Hadoop, building a BigCouch analytics workflow is a bit raw and has a substantial learning curve. This experience makes me realize that there is tremendous low-hanging fruit for tools that bridge that gap for the average developer and/or analyst.

  • CouchApp. The client-side JS model is a huge win here, and the couchapp tool was a big help in turning my JS and HTML files immediately into an application in the cloud. However, I had to strip out a significant amount of unused baggage to make sense of it all. Further, there is still some work to be done to allow for rich applications that securely allow not just data reads from the client(which currently works well), but also writes.

In summary, BigCouch was a key enabling technology, and I will definitely use it for future large scale experiments. In this case, it helped maximize the value in a very time-sensitive big data problem.

Our thoughts go to the people nearest Fukushima. We are fortunate that the radioactivity in the US is so low. I feel both proud and humbled to have been able to contribute in some way, by bringing scientific information to the public in a time of great uncertainty.

Comment on HN