What is, and is not, climate “data”?

The word “data” is misused a lot in conversations and publications about climate change. How many times have you heard or read the phrase “future climate data”?

Data, according to the Webster-Miriam dictionary, is factual information (as measurements or statistics) used as a basis for reasoning, discussion, or calculation. It is something that was measured. Numbers, words, sounds, emotional responses can all be data, so long as they were observed and recorded. Without a time machine, there is no “data” about the future.

Future projections from climate models are not, strictly-speaking, data. A numerical model, whether a complicated computer model of the atmosphere or a Newtonian physics equation from high school, may rely on actual data as initial inputs. The numbers produced by that model are not data. They were not directly measured. They are predictions derived from inputting the measurements into a simplified representation of the system being studied.

To clearly distinguish between actual data and numbers produced by models, climate scientists usually refer to the numbers produced by a model as “model output”, or sometimes “model results”.

The line between data and output can sometimes be blurry. For example, the ocean temperatures from a remote sensing group like NOAA Coral Reef Watch are the result of an algorithm integrating a series of point satellite observations over a region or “grid cell”, and controlling for factors like clouds. It is therefore the output of a simple model, which is why it is common to refer to those values as “satellite-derived data” rather than “satellite data”.

And if you really decompose the way much “data” is physically measured, you will find the line between data and output disappear entirely. Often the instrument, whether a simple thermometer or a complicated fluorometer, used to make measurements itself relies on some embedded empirical relationship that translates a raw independent observation into the desired variable.

Regardless, the rhetorical confusion between climate “data” and climate model “output” is not just semantics. In some settings, people’s choice of wording reflects a real confusion about what climate science can do for the world.

One of the central challenges in helping the developing world adapt to climate change is the gap between the desire for precise information about the future and the uncertain, probabilistic projections that science can deliver. During workshops, research interviews and private conversations in the Pacific Islands, I often here laments that “we need the data”. The use of the word data to describe future climate projections exemplifies that gap. People want precise, down-scaled climate predictions for their island or village, not a probabilistic range for an entire region. People want science to deliver something – precise answers – that is not possible.

Explaining and demonstrating the difference between data and output can go a long way to bridging the gap between the ability and the demands of climate science.

6 thoughts on “What is, and is not, climate “data”?

  1. I am glad people agree that being careful with language is one way to bridge the gap. My solution of dividing data and output comes from experience in the Pacific Islands, but I am open to others. They key, as David Lewis wrote, is that people speaking publicly think about these issues in depth.

  2. We already have the words: observation, measurement, retrieval, reanalysis data, model output for the cases where we need to be more specific. I would say that data encompass all of these categories.

    Even for a thermometer, we have a conceptual model of how it works, that converts the abstract variable air temperature into a reading on the instrument. Observation and model/theory are interdependent.

    To emphasis its importance, a colleague of mine likes to talk about holy measurements data. Sounds better to me than redefining the word.

  3. Here’s a definition from another dictionary, i.e. the Cambridge Dictionaries Online, the American English sub page, definition of “data”. (http://dictionary.cambridge.org/us/dictionary/american-english/data):

    “information collected for use”.

    This definition was illustrated with an example of use: “They had data on health, education and economic development”.

    Isn’t what you are getting at that you wish people who debate climate science in public had studied the topic in more depth?

  4. Our disagreement, in that case, is simply over this being a “redefinition” of data; I am suggested that people think data means something that was physically measured.

    Here’s the full definition from a dictionary:
    1. factual information (as measurements or statistics) used as a basis for reasoning, discussion, or calculation
    2. information output by a sensing device or organ that includes both useful and irrelevant or redundant information and must be processed to be meaningful
    3. information in numerical form that can be digitally transmitted or processed

  5. I completely agree that it’s important to understand the differences between predictions and observations, between models and reality and so on. As you point out, the critical problem is “bridging the gap between the ability and the demands of climate science”. That comes down to understanding the limitations of all aspects of the problem: observations, reanalyses, models of varying complexity, projections, initialised seasonal and decadal predictions, weather forecasts, impacts and so on.

    The distinction between “data” and “output” is blurred (often heavily) so it might not be the best place to make a definitive cut. Redefining “data” in this way could also confuse people because a common use of the word which is also in Merriam-Webster is “information in numerical form that can be digitally transmitted or processed”.

Leave a Reply

Your email address will not be published. Required fields are marked *