How will astronomy archives survive the data tsunami

No, a pole shift won't cause global superstorms By Phil Plait February 9, 6:

How will astronomy archives survive the data tsunami

Astronomers are collecting more data than ever. What practices can keep them ahead of the flood? The availability of this data has already transformed research in astronomy, and the STScI Space Telescope Science Institute now reports that more papers are published with archived data sets than with newly acquired data.

These new projects will use much larger arrays of telescopes and detectors or much higher data acquisition rates than are now used. Projections indicate that bymore than 60 PB of archived data will be accessible to astronomers. It is going through a period of exceptional growth in its science holdings, as shown in figure 1, because it is assuming responsibility for the curation of data sets released by the Spitzer Space Telescope and WISE Wide-field Infrared Survey Explorer mission.

The volume of these two data sets alone exceeds the total volume of the plus missions and projects already archived. The availability of the data, together with rapid growth in program-based queries, has driven up usage of the archive, as shown by the annual growth in downloaded data volume and queries in figure 2.

Usage is expected to accelerate as new data sets are released through the archive, yet the response times to queries have already suffered, primarily because of a growth in requests for large volumes of data. The degradation in performance cannot be corrected simply by adding infrastructure as usage increases, as is common in commercial enterprises, because astronomy archives generally operate on limited budgets that are fixed for several years.

Without intervention, the current data-access and computing model used in astronomy, in which data downloaded from archives is analyzed on local machines, will break down rapidly.

How will astronomy archives survive the data tsunami

Moreover, data discovery, access, and processing are likely to be distributed across several archives, given that the maximum science return will involve federation of data from several archives, usually over a broad wavelength range, and in some cases will involve confrontation with large and complex simulations.

Managing the impact of PB-scale data sets on archives and the community was recognized as an important infrastructure issue in the report of the Decadal Survey of Astronomy and Astrophysics,5 commissioned by the National Academy of Sciences to recommend national priorities in astronomy for the coming decade.

Figure 3 illustrates the impact of the growth of archive holdings. As holdings grow, so does the demand for data, for more sophisticated types of queries, and for new areas of support, such as analysis of massive new data sets to understand how astronomical objects vary with time, described in the Decadal Survey as the "last frontier in astronomy.

Given that archives are likely to operate on shoestring budgets for the foreseeable future, the rest of this article looks at strategies and techniques for managing the data tsunami.

How to Keep The Tsunami from Engulfing Us At the Innovations in Data-intensive Astronomy workshop earlier this year Green Bank, West Virginia, May participants recognized that the problems of managing and serving massive data sets will require a community effort and partnerships with national cyber-infrastructure programs.

The solutions will require rigorous investigation of emerging technologies and innovative approaches to discovering and serving, especially as archives are likely to continue to operate on limited budgets.

How can archives develop new and efficient ways of discovering data? When should, for example, an archive adopt technologies such as GPUs graphical processing units or cloud computing?

What kinds of technologies are needed to manage distribution of data time, computation-intensive data-access jobs, and end-user processing jobs? This article emphasizes those issues that we believe need to be addressed by archives to support their end users in the coming decade, as well as those issues that affect end users in their interactions with archives.

Innovations in Serving and Discovering Data The discipline of astronomy needs new data-discovery techniques that respond to the anticipated growth in the size of data sets and that support efficient discovery of large data sets across distributed archives.

These techniques must aim to offer data discovery and access across PB-sized data sets e. The VAO Virtual Astronomical Observatory ,18 part of a worldwide effort to offer seamless international astronomical data-discovery services, is exploring such techniques.

It is developing an R-tree-based indexing scheme that supports fast, scalable access to massive databases of astronomical sources and imaging data sets.

They are commonly used to index database records and thereby speed up access times. In the current implementation, the indices are stored outside the database, in memory-mapped files that reside on a dedicated Linux cluster.

Current News from

It offers speed-ups of up to 1, times over database table scans and has been implemented on databases containing 2 billion records and TB-scale image sets. Expanding techniques such as this to PB-scale data is an important next step.Feb 09,  · Like the monkeys at the typewriter (or keyboard, if you prefer), they eventually WILL get it right.

Then, of course, they’ll crow loud and long (if they are still alive to crow and we are still. How Will Astronomy Archives Survive the Data Tsunami? Performance Degradation Issues: The astronomical data is usually very large as compared to other data stor.

Note: OCR errors may be found in this Reference List extracted from the full text article.

How Will Astronomy Archives Survive The Data Tsunami. Tang Astronomy - A 18 August ! Fascination for the Sky Georg C. Lichtenberg, a German scientist, once said, “Astronomy is perhaps the science whose discoveries owe least to chance, in which human understanding appears in its whole magnitude, and through which man can . Dear Twitpic Community - thank you for all the wonderful photos you have taken over the years. We have now placed Twitpic in an archived state. How will astronomy archives survive the data tsunami? Full Text: Html PDF. see source materials below for more options. Authors: G. Bruce Berriman: IPAC: Steven L. Groom: IPAC: Published in: · Magazine: Communications of the ACM CACM Homepage archive: Volume 54 Issue 12, December Pages

ACM has opted to expose the complete List rather than only correct and linked references. Barsdell, B.R., Barnes, D.G.

Case Study 2: How Will Astronomy Archives Survive the Data Tsunami? -

and Fluke, C. J. Analysing astronomy algorithms for graphics processing units and beyond. Abstract: The field of astronomy is starting to generate more data than can be managed, served and processed by current techniques.

This paper has outlined practices for developing next-generation tools and techniques for surviving this data tsunami, including rigorous evaluation of new technologies, partnerships between astronomers and . The Sun is the star at the center of the Solar is a nearly perfect sphere of hot plasma, with internal convective motion that generates a magnetic field via a dynamo process.

It is by far the most important source of energy for life on diameter is about million kilometers, or times that of Earth, and its mass is about , times that of Earth. Dear Twitpic Community - thank you for all the wonderful photos you have taken over the years.

We have now placed Twitpic in an archived state.

How will astronomy archives survive the data tsunami
Sun - Wikipedia