From Glass to Gigabytes

In Building D at the Harvard College Observatory, there is a record of the universe. This particular version – reasonably complete – is made of glass and it weighs about 300 tons. Less a collection than a coalescence, the Harvard plate stacks contain roughly 525,000 photographs in all. These images of the night sky were taken from observatories as far-flung as New Zealand, Peru, and South Africa.

Screen Shot 2014-10-27 at 9.40.59 AM

Storage of astronomical plates at the Harvard College Observatory. Image source.

This data collection represents the congealed labor of hundreds of astronomers working over several decades and represent millions of hours of travel and work. The oldest images are daguerreotypes dating to before the American Civil War; the most recent photographs were taken at the end of the Cold War.

Screen Shot 2014-10-27 at 9.44.12 AM

Halley’s comet taken on April 21, 1910 from Arequipa, Peru with the 8-inch Bache Doublet, Voigtlander. The exposure was 30 minutes. Source.

The photographic emulsion on each of these photographic plates give information about the brightness and location for about tens of thousands of different objects. Additional inspection of the plates provided more information and analysis. For example, consider the image below. This is a photo negative of the Large Magellanic Cloud. It was taken in January 1897 by an astronomer working at a Harvard-operated telescope in Arequipa, Peru. After the plate was developed, it circulated back to Cambridge for analysis. Each of the notations on the pate was made by one of the “women computers” that observatory director Edward C. Pickering employed. The markings on the plate signal a star or other object of interest, some of which would be explored further.

Screen Shot 2014-10-27 at 9.40.40 AM

This image of the Large Magellanic Cloud was taken in January 1897 by a Harvard astronomer working in Arequipa, Peru. Source.

Stars, planets, galaxies, along with the occasional comet or asteroid, were all captured on glass. Occasionally, non-astronomical oddities were recorded too.

Screen Shot 2014-10-27 at 9.52.46 AM

Praying mantis recorded on January 10, 1925 in image made at Bloemfontein, South Africa.

The sum total offers an analog record of the universe unmatched in terms of sky coverage and time span. The exposed plates – most are eight by ten inches in size – were shipped back to the HCO for preservation and storage. Only on rare occasions would one of the fragile plates circulate out again, perhaps traveling from HCO to another observatory. But, most of the time, the plates remained in Cambridge, archived in sturdy olive green cabinets. Astronomers wanting to use the collection had to travel to Cambridge. The collection’s librarian annotated the brown paper envelope each plate was kept in with additional details, creating “metadata” – where as plate was taken and by whom, which telescope was used, and perhaps who had found it of especial scientific value.

Screen Shot 2014-10-27 at 9.57.09 AM

Envelope notations for one of HCO’s 525,000+ plates; this indicates when the image was made (1949) and what region of the sky was observed.

Logbooks maintained by observers recorded other important metadata. Here’s an example:

Screen Shot 2014-10-27 at 9.35.57 AM

Logbook page from 1888. All of these journals are in the process of being digitized, thanks to the efforts of volunteer George Champine who passed away in 2013.

Just as assembling the Harvard plate collection was time consuming and laborious, so was working with the items in it. Once a desired plate was located, a researcher would pore over it for hours with an high-magnification eyepiece to extract useful information from the data the plate recorded. As data generated by modern astronomical instrumentation of the mid-1970s onward was increasingly “born digital” (and utilized as such), the analog photographic plates represented a wasting and unwieldy asset to many scientists.

For the Harvard plate collection, however, these issues of access, usefulness, and circulation are changing. Over the last decade, a group of professional and amateur astronomers have constructed and begun operating the Digital Access to a Sky Century @ Harvard (DASCH) project.

Conceived by astronomer Jonathan Grindlay and executed by a team of staff members, students, and volunteers, the goal of DASCH is to distill and condense – via a custom-built scanning machine and automatic data-processing and calibration pipeline – the astronomical information contained in all of those glass plates into digital data.

To get to the heart of DASCH, one descends one of Building D’s tightly wound spiral staircases. Eventually, you get to a small climate-controlled room dominated by specially designed digital scanning machine.

Pic2

The DASCH machine; it can scan two 8″x10″ plates at a time.

The entire apparatus rests on a one-ton granite table to minimize vibration errors. A custom camera above the scanner bed stitches overlapping frames made of the photographic plate – some collected just a few decades after Charles Babbage produced a prototype “difference engine” to help process astronomical data – into a composite digital image.

The DASCH project’s final product will be a database, an archive of astronomical information publicly accessible on-line, containing the brightness and position of all the stars on all the HCO plates. When running at full capacity, the machine can process two plates simultaneously in less than two minute, generating data equivalent to a DVD containing a typical Hollywood film. Eventually, the astronomical information contained in those 300 tons of glass will be refashioned into about 1500 gigabytes of processed, searchable, and available digital data.

When I visited DASCH this past summer, I was reminded of two things: First, DASCH encourages us to keep in mind that astrophysics – like geology, paleontology and so forth – is a historical as well as observational science. Digital data archives for astronomy, besides rejuvenating “old” data, offer a “Janus-faced perspective” for scientists to look into the past while creating new data for the future.

Second, DASCH highlights the fact that sharing and circulation of data are the central activities in science. Without these, in fact, there is no science. However, sharing and circulation of data demands an increasing fraction of researchers’ time, money, and expertise. There is what Paul Edwards and others call “science friction,” an obstacle to overcome in order for data to move and do useful work. DASCH’s conversion of analog data into a digital format is one example of how this data friction is made less sticky.

The importance of sharing data is a concern that transcends specific institutions, individual research questions, and national boundaries. For all astronomers, it is, in both senses of the phrase, a universal concern.

Scientists as Customers?

Would Karl Marx smile and nod sagely if he observed how scientists do their work today?What would a business efficiency expert say to a scientist today? I had these thoughts while recently thumbing through a new issue of the pop science magazine Nautilus. Because, right on page 3, there’s this:

Screen Shot 2014-10-06 at 11.15.11 AM

The text at the bottom is hard to read so here’s a detail:

Screen Shot 2014-10-06 at 11.15.20 AM

At first I thought nothing of it and just kept reading. But this announcement kept coming back to me, raising all sorts of questions. For example – At who is this message aimed? Presumably not many readers of Nautilus will be jetting off to Chile to use the Very Large Telescope or any of the other science facilities the European Southern Observatory operates.

Screen Shot 2014-10-06 at 11.20.48 AM

OK then, so this isn’t an advertisement to drum up visitors to Cerro Paranal or solicit proposals for telescope time.

No, something else is going on here. ESO’s advertisement must be read as a boast – it’s trumpeting the efficiency and effectiveness of its scientific facilities. Its observatories are, ESO claims, the “most productive” in the world. This is not the same as proclaiming that they produce the “best science” which is a much harder claim to make.

This focus on productivity, and its close cousin, efficiency, got me thinking about Frederick Winslow Taylor. In 1911,Taylor published his book The Principles of Scientific Management

Screen Shot 2014-10-06 at 11.32.04 AM

Although little remembered today, it’s one of the 20th century’s most influential books. In it, Taylor laid out a philosophy of managing workers and work flow with the aim of solving some of that era’s labor problems (and making business more profitable). In short, he wanted to get manual laborers to do more work in the same amount of time. Workers, to put it mildly, objected to Taylor’s intrusion into their workplace. Moreover, in some cases, they proved that Taylor’s methods were anything but scientific. When you read today about managers monitoring the workplace, keeping track of key strokes, and recording service calls – thank Taylor.

Shift from scientific management to managing science. Until the 1990s, telescopes used to be operated most often in what’s called “classical mode.” You can picture the scene – astronomer at the telescope, late at night, alone, cold, heroically working to unravel the mysteries of the Universe. Something like this, although maybe without the coat and tie:

Screen Shot 2014-10-06 at 11.42.05 AM

1936 image by Russell Porter of astronomer using the 200-inch at Palomar.

Fast forward 40 years…astronomers’ nightly work now looked very much like this.  As I’ve written, computers changed everything about how astronomy was done.

Screen Shot 2014-10-06 at 11.45.02 AM

Astronomer Caty Pilachowski, c. 1988, using 4-meter telescope at Kitt Peak.

Along with computers came the introduction in the 1990s of what’s known as queue observing. In fact, computers and computer models made this possible. We might think of new way of doing science as an application of Taylor’s general goals of maximizing efficiency to science. Successful proposals for telescope time are put into an observatory’s queue and executed by staff astronomers when observing conditions are suitable. ESO operates its big facilities in Chile in this fashion, as do many other major observatories.1

Advocates of this queue observing stress that it enables science facilities can be used more efficiently. This isn’t trivial when a night of observing time can cost upwards of a $1/second. Opponents of queue scheduling argued that this mode of doing science might produce a generation of researchers who were, as Karl Marx might have said, alienated from the means of production. As one scientist remarked in 1996, “I am really worried about the Nintendo mentality in astronomy.”

Decades earlier, physicists accepted arguments about cost-effective use. At a 1966 meeting at the Stanford Linear Accelerator, for example, Berkeley’s Luis Alvarez encouraged colleagues to think in terms of the number of interesting “events per dollar” produced by ever-more expensive Big Science machines.

By the late 1990s, queue scheduling had prevailed at places like the Very Large Telescope and the international Gemini Observatory. Coincident with this was a shift in language about the effective use of science facilities. Look at the questions posed at a meeting in the mid-1990s to discuss telescope use:

The choice of language here is striking. Astronomers are referred to as “customers” seeking a product. So, what’s the product? As Matt Mountain, currently director of the Space Telescope Science Institute, told me in an interview several years ago, “We produce high quality, corrected beams of light pointed at the right direction at good instruments and detectors and collect the data.”

Queue scheduling allows on-site observers to select observing programs that are best suited for prevailing weather conditions. Moreover, telescope design has been done to increase the rapidity with which this “high quality” stream of photons can be switched from one instrument to another. (Observatories typically have several highly complex instruments clustered underneath or nearby the actual telescope.)

Queue scheduling at places like Gemini and the VLT was set up to maximize the efficiency and productivity. We might think of this emphasis on flexibility, efficiency, and productivity as resembling the famous “just in time” manufacturing techniques pushed by Japanese car makers in the 1950s (and widely admired by executives in the U.S.).

It’s this shift in telescope use – where efficiency is paramount – that is reflected in the advertisement ESO placed in Nautilus.

Screen Shot 2014-10-06 at 11.15.20 AM

Did the quest for better science drive the shift toward emphasizing productivity and efficiency? Yes, but that’s only part of the story. In the United States, these concerns followed larger trends. In 1993, for example, Congress passed the Government Performance and Results Act requiring each federal agency, including the NSF, to devise yardsticks to measure performance and progress. This was not just an American trend. European astronomers did similar studies evaluating telescope productivity. As ESO’s advertisement indicates, this way of thinking is still very much alive.

The need to demonstrate greater efficiency and productivity encouraged scientists to accept models and metaphors from the business world to describe observatory management and telescope operation. Astronomy in the 1990s, like particle physics in the 1950s and 60s, became a “big business” or, at the least, a very expensive one. The next generation of giant telescopes will drive this trend forward even more. Astronomers started describing observatories as “data factories.” So, perhaps its not a surprise that perhaps some observatory directors and their staff started to see the researchers who came to their facilities as customers.

None of this addresses the question of what one means by “productive” though. Is the proper metric of productivity the number of times a publication was cited? Perhaps it could be the number of scientific problems “solved?” Or prizes won by a paper published using data from a particular facility?

Screen Shot 2014-10-06 at 12.48.58 PM

Could a time come when observatories and other science facilities take a cue from the Golden Arches and simply tout the number of customers served? Let’s hope not.

 

  1. To be fair, I’m talking here largely about ground-based optical astronomers. Radio astronomers had long been accustomed to receiving data collected by others. And, of course, all space-based observations are done in queue mode. If you’re unclear why, watch this. []

DNA…From Blueprint to Brick

In 2005, Caltech researcher Paul W. K. Rothemund made a smiley face. In fact, he made about 50 billion of them. Other than the sheer amount, what was remarkable about the accomplishment was what he made his smiley faces from — DNA.

Screen shot 2014-09-24 at 17.40.41

Some of the 50 billion smiley faces made out of DNA; each is about 100 nanometers in size.

Rothemund’s tour de force lab accomplishment was one of the most highly visible milestones in the emergence of a new scientific community. Rather than seeing DNA as primarily an information containing molecule – a blueprint – DNA nanotechnologists treat the iconic molecule as something to build with – a brick.

For much of the late 20th century, scientists, writers, and the general public imagined DNA as information. It was code in the form of a chemical, a molecule that directed our development and determined our destiny. This discourse served to organize, guide, and inform the research agenda of scientists for decades.

Starting in the late 1970s, an interdisciplinary group of chemists, crystallographers, molecular biologists and computer scientists began to reconceptualize what DNA was and what people might be able to do with it. The main person – for years, really the only person – at the vanguard of this effort was Nadrian “Ned” Seeman. In the late 1970s, Seeman, a biochemist whose main field of expertise was crystallography, was an assistant professor languishing in the biology department at SUNY-Albany.

Screen shot 2014-09-24 at 17.13.48

Seeman, 1978. (image courtesy of Seeman)

Seeman worked with complex organic molecules. These are notoriously difficult to crystallize. As a result, Seeman, about to come up for tenure, faced a dilemma captured nicely in the image below. Basically, “no crystals, no crystallography, no crystallographer.”

Screen shot 2014-09-24 at 17.33.36At the same time, Seeman was thinking about what it might be possible to build with DNA. Why DNA? First of all, it’s well studied – what historians of science call a model system. DNA’s structure is predictable, made of four different types of nucleotide subunits—adenine, cytosine, guanine, and thymine. The exact sequence of an organism’s DNA is determined by what scientists call complementary base pairing: adenine always pairs with thymine; guanine connects with cytosine.

Screen shot 2014-09-24 at 17.33.23

This predictability allows scientists to synthesize strands of artificial DNA—a technique perfected and automated in the 1980s—which, when properly treated in the lab, can link up to form a desired structure.

Seeman was thinking about a particular form of DNA that occurs during a process known as genetic recombination. This involves the breaking and rejoining of two homologous DNA double helices as shown below. If the two DNA molecules have regions of similar nucleotide sequences, they can “cross over” and form a novel nucleotide sequence. A crucial intermediate stage in this recombination of DNA is a structure known as a Holliday junction (named after the British molecular biologist who first proposed it in 1964).

Screen shot 2014-09-24 at 17.13.59In 1979, Seeman started doing computer modeling of these Holliday junctions. His goal was to try to better understand their motion during recombination. Seeman soon recognized that in principle one could – using synthetic DNA – create junctions that didn’t move.

To do this, Seeman would have to take advantage of another feature of DNA. This is called a “sticky end.” These occur when one strand of the double helix extends several base pairs beyond the other strand. This presence of sticky ends meant, Seeman reasoned, that one could – using junctions as well as sticky ended cohesion – build DNA structures that formed lattices and networks.

In Seeman’s telling, his epiphany came while sitting in a bar in Albany. Crystallographers, not surprisingly, are quite familiar, even fond, of works by M.C. Escher given the Dutch artist’s focus on periodicity, symmetry, and the overall representation of objects in space. Seeman recalled Escher’s 1955 painting Depth:

Screen shot 2014-09-24 at 17.35.22

M.C. Escher, Depth, 1955 woodcut and its DNA analog in form of 6-armed junction.

Seeman realized that the fish in the Escher picture were just like a 6-arm junction arrayed in periodic fashion. Wondering if he might be able to make a similar structure from branched DNA molecules, Seeman proposed his ideas in an article that came out in the Journal of Theoretical Biology in early 1982.

Screen shot 2014-09-24 at 17.34.49

Header and key passage in Seeman’s 1982 article, with subsequent citation count.

It’s important to note where Seeman’s paper – now cited nearly 900 times – appeared. It was in a journal of theoretical biology. Seeman was noting that it was possible to make these DNA lattices…but he had yet to demonstrate this could actually be done in the lab.

Doing this required overcoming one key obstacle – getting the right amount and right sequence of DNA. In the early 1980s, one could buy synthetic DNA but it was very expensive. A strand of DNA with the desired order of base pairs, maybe only 10-12 nucleotides, might cost around $6000.

Screen shot 2014-09-24 at 17.13.22

Synthetic DNA, which today can be made for pennies a base pair, was an expensive raw material c. 1980. (image courtesy of George Church).

So, Seeman opted for the cheaper but more time consuming option. Over the next several years, he learned the arcane craft of making DNA with custom sequences of nucleotide bases. A DNA synthesizer paid for by the National Institutes of Health became his lab’s most important piece of equipment.

In late 1983, Seeman finally published a paper in Nature that gave experimental proof of the theoretical idea he had published 18 months earlier. Working with Neville Kallenbach, then at University of Pennsylvania, they demonstrated that it was possible to construct an immobile DNA junction. Kallenbach left Penn for New York University and in 1988, recruited Seeman to NYU’s Chemistry department.

Two years later, working with his graduate student Junghuei Chen, he synthesized a DNA molecule that had ten strands. When combined with similar molecules, the result was a macromolecule with the “connectivity of a cube” – it was the first lab demonstration that one could make 3-D structures with DNA.

Screen shot 2014-09-24 at 17.13.33

Seeman and Junghuei Chen, 1990, with model of their DNA cube (image courtesy Seeman)

Seeman would later tell me that he always “regarded DNA as a four letter word” – referring to its four nucleotide bases and “not very interesting when it’s linear.” It was in the 3-D realm, building things with it that he found excitement.

Today, DNA nanotechnology is one part of the growing field of synthetic biology. Today, more than 60 labs – including Paul Rothemund’s at Caltech – are doing various forms of nanotechnology with DNA and RNA. To date, successes with DNA nanotechnology have included the construction of increasingly complex three-dimensional shapes, carrying out massively parallel computations, and building “DNA walkers” that can traverse a substrate and deliver “cargoes” of nanoscale particles.

For a historian of science, what is fascinating about this evolving field is this new interpretation of DNA. DNA made the transition from genes to machines. In the end, it all comes back to a fundamental shift in how researchers saw DNA: from a code to a construction material.