Chapter 11A. Where do 'numbers' come from?

Most everything we work with involves numbers of something, or counts – the most common type of data.  In some cases, the sources of these numbers may be straightforward, in other cases, the sources may involve many layers of extrapolation and inference.  Numbers in the media are often made up.

Four ways to obtain numbers

News articles often involve counts of something – a budget deficit, unemployment rates, university enrollment, body counts, beer consumption on college campuses, days below freezing or days above 100 F.  It may appear that all numbers have the same validity or authenticity (same types of error), but that assumption is generally false.  A count may be obtained in at least four different ways.  Thus, a count may be

1)    an actual count – a total

2)    extrapolation from one group to another or to the entire group

3)    data conversions and data translations

4)    fictitious, or made-up

Each of these categories is prone to different types of error, as will be explained below.

 

(1) Actual counts (totals)

examples:  athletic stats, actuarial stats, census

The simplest type of data to understand are actual counts, in which the numbers reported represent all the events.  Statistics of athletic teams and athletics are usually of this type, whether it is of a team's record or an individual's record (e.g., batting average).  Stock exchange statistics are also likely based on total counts.  Various medical and vital statistics are also the total counts (e.g., number of traffic deaths, some cancer statistics). 

In many cases, actual counts are accurate.  Our home team's win/loss record is probably not subject to error.  The performance of an individual stock can be documented through recorded transactions.  However, actual counts may often be wrong due to under- or over-reporting.  In some cases, we may simply not know the full data.  For example, soldiers missing in action may have died but are not counted as dead because their bodies have not been found.  The U.S. census is an attempt to count all residents of the United States every 10 years.  It is simply not possible to count every individual, and there is enough flux across borders just from travelers that, even if we could count everyone in the U.S. at a particular moment in time, we would not know how to convert that number into residents.  Thus, actual counts may or may not be accurate.

Errors in actual counts may be generally attributed to

human and technical error (including inadequate protocol). 

Human and technical error merely means that the counts get mis-recorded in some fashion.  In contrast, an inadequate protocol means that the procedure for gathering data is simply not good enough to get an exact count; fixing the protocol may not be easy, however.  Indeed, it will in practice be impossible to develop a protocol that avoids all possibility of error, and it may even be impossible to develop a protocol that gets rid of important errors.

(2) Extrapolations from one group to another (or to the entire group)

examples:  polls, future based on past, samples extrapolated to the entire population (includes polls), extrapolations across groups, virtually all experiments on a subset of the population

It is often impractical to conduct total counts, and a total count many be unnecessary if we merely want to know the approximate magnitude.  What is then commonly done is to count a small subset and extrapolate to the total.  For example, if we were interested in the fraction of city inhabitants willing to support higher property taxes, we might survey your class for opinions.  If 63% of you were supportive of higher taxes, we might then 'report' that 63% of people support higher taxes, even though we did not survey the entire city.  Or we might survey some residents by phone or by asking pedestrians to fill out a survey.  The principle is the same, extrapolating from a subset of the population to the total.

The same type of extrapolation is often applied to future events based on past events.  Attempting to predict the future necessarily involves some type of extrapolation – we cannot directly gather data from the future.  In the simplest case, we may project the future according to numbers obtained from the present and past, adjusting for any trends.  Budget projections necessarily do this; the budget of a corporation would not be a simple extrapolation, but our own personal budgets probably are.  We often plan annual events in our lives (vacations, trips, activities) based on past experiences.

There are two types of errors that can creep into simple extrapolations in addition to those that apply to total counts: 

bias and sampling error

Bias is obvious.  In the case of using the opinion of students in a class to reflect the opinions of city residents about taxes, students are younger and have different financial constraints than do most other residents; their collective opinions might well differ from those of residents not attending the university.  Virtually any survey or poll faces the same problem, because it is nearly impossible to obtain a truly representative sample on many issues. 

Sampling error should also be obvious.  Any time a subset of the whole is used to represent the whole, the subset may, by chance, differ from the whole.  However, it is easy to calculate the effect of sampling error, and surveys reported on the news frequently refer to a 'margin of error' which is typically the amount of (sampling) variation that is expected 95% of the time when doing a survey of that size.  Sampling error is not likely to be much of a problem when the survey sample exceeds a hundred or so individuals.  Bias is more serious because it is difficult to quantify, and it can be large.

Polls are used for a familiar type of extrapolation.  Election polls are used by candidates to develop and hone campaign strategies to increase approval of voters, and the media use polls to forecast election outcomes.  It is well appreciated that election polls often do not predict election outcomes.  The reasons for these 'errors' include bias and the change in public attitudes during the weeks leading up to election times (although exit polls should not be affected by the latter problem).  Great effort goes into eliminating errors in polls.  Furthermore, polls can be no better than the truthfulness of the poll-taker.  If the answer provided on the poll does not reflect the person's true attitude, the standard ideal data fixes will not overcome the error. 

 

(3) Data conversions and translations (does not exclude the other 3 categories)

Examples:

breathalyzer, satellite data, communications data (voice, images), probably all data that use electronics

disease diagnoses based on symptoms

alcohol consumption rates based on sales; effort at work measured as time on computer

conversions can be simple or convoluted

It is often not possible to measure directly what we want.  For example, public health agencies attempt to monitor influenza (flu) cases and deaths.  There are accurate tests to determine if someone has influenza, but they are expensive and time consuming.  Instead, physicians typically rely on symptoms to diagnose whether patients have influenza, and they report cases according to those symptoms.  As there are many other respiratory diseases with similar symptoms as flu, these diagnoses may be in error because they are not direct measures of what is wanted – influenza status.  As another example, alcohol consumption rates of a population may be calculated from sales rather than from actual consumption.  A third example is the use of sensors to detect 'events,' as commonly employed in the intelligence community and military.  For example, launch of a rocket is inferred from infrared (heat) sensors in satellites.  Detection of nuclear detonations by others countries is inferred from satellite heat and light sensors and from seismic signal detectors (earthquake detectors). 

In these cases, the reported data are converted or translated from the true data because the true data have no meaning to most of the target audience.  Thus, virtually none of us would care about nor understand the infrared signal data from a satellite, but we would definitely be interested in whether rockets were being launched in an attack or a rocket test that violated a treaty.  The reported data reflect the interests and knowledge base of the audience, not the raw data. 

The types of errors that may affect conversions are

bias (if humans are doing the conversions)

H&T (inadequate protocol) for conversion of the raw data into the reported data

sampling error (if the sample is small) 

Furthermore, bias and inadequate protocol may work together to increase the error.  Thus, a physician may be more likely to report a respiratory illness as flu if she/he is aware that a flu epidemic is underway.  In this case, the inadequate protocol for diagnosing influenza is combined with the physician's awareness that flu is about, possibly leading to reporting more flu cases than are actually encountered.

In 1964, a case of bias and inadequate protocol had profound implications for young men in the U.S. and for the country of North Vietnam.  On August 2, in the Gulf of Tonkin, a brief exchange of fire occurred between 3 torpedo boats of North Vietnam and a U.S. destroyer ship.  Two days later, a second exchange was reported between two U.S. ships and N. Vietnam torpedo boats.  The U.S. ships reported torpedo attacks, but no N. Vietnamese boats were sighted, nor were any torpedoes sighted.  The actual data were sonar recordings interpreted as torpedoes.  On the basis of both attacks, President Lyndon Johnson received Congressional authorization (in what is known as the Tonkin Gulf Resolution) to use whatever conventional military force he deemed necessary in Southeast Asia. 

The Tonkin Gulf Resolution led directly to what became known as the Vietnam War.  That war had implications for U.S. politics and economics that last today.  College students participated in demonstrations and even riots.  The military escalation resulted in a huge build-up of U.S. troops in Vietnam (ultimately 58,236 U.S. deaths), extensive bombing of N. Vietnam, with as many as 1.1 million N. Vietnamese military deaths and 2 million N. Vietnamese civilian deaths (some estimates are much lower). 

It is now known that the second exchange in the Gulf never happened.  There is no evidence that the torpedo data were fabricated, rather it seems that the sonar data were merely misinterpreted or mis-translated.  With a heightened U.S. naval awareness, the ships' crews could not afford to take chances, thus leading to a biased interpretation of data with a protocol that was inadequate to determine whether torpedoes were actually present.  In the absence of this second exchange in the Gulf, it would have been more difficult to get Congressional support for the Tonkin Gulf Resolution.

(4) Fabrication

A recent book, Sex, Drugs and Body Counts (2010, edited by P. Andreas and K. Greenhill), highlights many cases in which the media has simply invented numbers for a story.  The choice of a number is not 'random' or arbitrary but seems to be driven by two factors.  First, it must be plausible at face value.  Second, it must be of a sufficient magnitude to get the reader's attention.  At least for some media agencies, the number reported need not be based on data.

As given in a recent NPR story (linked on our web site), a common but fabricated number satisfying both criteria is 50,000.  (For entertainment, you might do a web search of that number.)  One story in the 1980s reported the annual number of Satanic cult sacrifices in the U.S. to be 50,000.  Given that the annual murder count from all causes in the U.S. then was only 20,000 –23,000, it is unlikely that cult murders would have been twice the number of murders from all causes.  Indeed, the true number was apparently a few hundred.  

There is no particular formula for deciding whether a number reported in the media is correct.  Furthermore, numbers reported in one story often get repeated in subsequent stories, so merely looking for consensus in different stories on the same topic does not indicate whether the number might be in error.  The only solution is to go to what we call the primary literature in which the original data are reported along with the methods used to obtain and calculate the data.

Types of error are not relevant here, since there are no actual data.

Uncertainty

If some or all limitations of a number are understood, the uncertainty of a number will be indicated, such as 45% ± 3%.  This means that the actual, true number likely lies between 42% and 48%, although there is a small chance that the true number lies outside the extremes.  If the uncertainty is large, either extreme can be emphasized.  Thus, if the estimated value was 25%, but the uncertainty spanned 10% to 40%, it would be fair for someone to say that the number might be as low as 10% and for someone else to say that the number might be as high as 40%.  Both values, as well as anything in between, are considered legitimate.  A news article might well use whichever number suited the story.

The uncertainty can be large both from possible bias and when the data come from a small sample.  Even so, news articles rarely present the uncertainty.  Rather, the concept is most likely to appear in a scientific article that presents the original data and the methods used to gather them.

What you can do to check

Understanding where a number comes from and just how wrong it may be will not be easy, and indeed, may be impossible when reading or listening to a story from the news media.  There are some simple steps you can take to at least make yourself aware of the possible gross inaccuracy of a number.  In general, these steps amount to thinking about where the number comes from. 

 

 

1)    Plausibility.  Rather than merely accepting a number, ask yourself if the number seems reasonable, given what you know.  For example, the number 50,000 per U.S. means an average of 1,000 per state.  If the 50,000 was per population, that would probably mean about 100 in Austin.  If 100 such events were happening per year in Austin (8 per month), you would likely hear about them if they were newsworthy.

 

2)    Related numbers.  Is the number based on a different number or is a subset of a different number that you can verify?  In the example of 50,000 annual human sacrifices mentioned above, that number of human sacrifices must be far less than the total number of murders per year.  You can easily find the latter number.  Or, in the summer of 2010 when the BP oil spill occurred and the well was finally plugged, it was claimed on the news in early August that 75% of the oil was 'gone,' either from cleanup or from bacterial degradation.  However, there were no good estimates of just how much oil escaped the well in the first place, so it should not have been possible to know what fraction remained.

 

3)    Internal consistency in the report.  Reports of marijuana plant raids may give both the numbers of plants and the size of the plot.  You can at least calculate how close together the plants would need to be to fit in the area given. 

 

4)    Details of data properties.  Does the article how the data were gathered or the uncertainty?  In general, the more information given about the nature of the data, the more you can probably trust it.

 

5)    Independent sources.  In some cases, you might be able to identify the source of the report and find out how the data were gathered and calculated.  As a general practice, you can identify and use trusted sources of information.  Snopes (snopes.com) is a reliable source of information about possible urban legends.  A source about numbers is http://www.carlbialik.com/ (the 'numbers guy'), but this person does not attempt to cover numbers issues comprehensively.  Be aware, however, that it may be difficult to know whether a different report is independent – numbers are commonly cycled in the media, so that when one source reports it, another uses them as their source.

 

Realize that there may be no source of reliable data on a specific topic.  Thus, you may find that a number reported somewhere cannot be supported.  However, lack of support does not mean the number is necessarily wrong.  It just means the number cannot be trusted.

Table of contents

Copyright 2010 Craig M. Pease & James J. Bull