Muck and Mystery
   Loitering With Intent
blog - at - crumbtrail.org
January 12, 2010
Tuple Space

The advantages of interdisciplinary science are getting more attention.

Some of the most important research of the last quarter-century, the authors argue, has resulted from "synthetic science" —an approach which combines concepts, tools, and data from multiple disciplines to produce new insights or discoveries.

They cite the work of J. John Sepkoski Jr., who over a 20-year period compiled a database of more than 37,000 entries tracking the first and last appearance of different organisms in the fossil record. The entries, they write, "cut across taxa, time, and geography to reveal emergent patterns over more than 500 million years of life that could not be extracted from the component data in isolation."

"That database led to previously undetermined knowledge of five separate mass extinctions through time, understanding of how major geologic events can increase or reduce biodiversity, the realization that near-shore environments produce a disproportionately large share of evolutionary novelty, and other findings," Sidlauskas said. "It also spawned a new field of synthetic paleobiology."

Increasing specialization within disciplines brings opportunies for "interdisciplinary" science within each discipline as well as cross disciplines. But there are impediments.
there are a number of cultural barriers to pursuing this kind of science, the researchers say. For one, it is difficult for young scientists to find appropriate training. In addition, peer review and journal publication tend to emphasize the analysis of new data rather than old, they argue. Funding from state and federal agencies is more frequently directed toward more conventional approaches, not to mention the institutional challenges with job searches, promotion and tenure - all of which are geared toward more traditional science.

The technological barriers also are daunting, but offer tantalizing potential, Sidlauskas said.

"When you're looking to synthesize data from several hundred individual studies, data formatting, storage, and accessibility become huge issues," he said. "There has been a growing movement by funding agencies and journals to permanently archive all raw data and materials in some kind of standardized format so they are not lost over time and can be used by researchers of the future."

"It's kind of an open-source approach to science," he added. "Data archives may require some kind of proprietary protection for a few months or years, but after a certain amount of time, they should become public domain. Only by saving the data that underlie today's science will we allow future scientists to use those data in ways that may far exceed what the original researchers envisioned."

I would argue that this is the wrong approach. It is too data dependent. You don't need a single database or a single data format. You shouldn't need or even want to know the location or format of the data. What you want are a set of methods that have exported APIs that have defined outputs. The method knows how to find the data and serve it in the defined format. You end up with a virtual database that does not actually exist in any place but that you can access as if it was on your personal workstation or local network.

Conceptually the user of such a database has a simple relational database though it can be seen as an associative memory. It isn't restricted to accessing data from a variety of locations and formats, it also has data that doesn't exist anywhere, it is computed on the fly and served as if it was just another row/column element. The computation can be a simple rollup - a sum or total for example - but it can also be a more sophisticated computation and inference.

Bringing a new database into such a system would involve writing methods rather than reformatting data. If the underlying data format changes the method changes rather than every application that thought it knew the data format and location.

Ho Hum, decades old ideas, but when we hear talk of Heroes of Science who spend 20 years compiling a database one wonders if this is not another opportunity for synthetic science. Those biologists need to work with some computer scientists, or at least a few teenage hackers.

Posted by back40 at 04:53 PM | Tools

TrackBack URL for Tuple Space -


Comments
Post a comment









Remember personal info?