Let me ask you an important question: How do you feel when you see “the data are” or “the data show” in print? Do you accept it as correct usage? Or do you notice that no matter how many times you hear that the plural usage of “data” is correct, somehow it always looks wrong?
Any grammar Nazi will bleat about “the data is...” or “the data shows...” But given what the word “data” actually means today, the plural usage no longer makes any sense. In fact, we ought to complain whenever anybody does use “data” as a plural; it only shows they care more for propriety than logic. The argument behind the plural usage is an etymological one: In Latin, “data” literally means “things given,” and nobody would ever write “the things given is...” Thus, “the data is...” must be incorrect.
But the Romans did not use “data” the same way we do. What we call “data” today is usually produced by computers or scientific machinery, things I understand the average Roman did not have. Now, we use “data” to mean something like “a heap of results or figures produced by some mechanical process,” and that is a concept for which there was never a Latin word. So while “data” may have Latin origins, it only exists within the English language. It is our word to use as we please.
And how do we use it? Even the grammar Nazi who insists on saying “the data are…” would never ask a lab partner, “How many data did we collect?” or expect the answer to be something like “1,476 data.” Nor would anybody ever say, “We have too many data, and we need fewer.” Like “sand” or “wheat” or “water,” we use “data” as a singular mass noun, except when we are able to catch ourselves doing it. We always talk about data in units, but if “data” really meant “many datums,” why would we say things like “a gigabyte of data” or “two days’ worth of data”?
We have allowed other Latinate words that were originally plurals to become singular. You would never ask at a club meeting, “Are there any agenda left?” Likewise, you would never boast that you won a quiz bowl because you “know many trivia,” or tell a running buddy that “stamina are needed” for your favorite run. There is no reason why “data” can’t go the same way.
And it really should, if only because every time you read “the data are” you have to stop and think about it for a while. If you want to write well—or even if you just want to convey something important in writing—you want your reader to be paying attention to what you are saying and not to how you are saying it. “The data are...” is awkward and forced, but worst of all, it is a distraction.
Until that happens, though, perhaps the best advice is just not to use the word—or, if you have to, use it only where you can leave its number ambiguous. Garner’s Modern American Usage lists “data” as a “skunked term,” meaning that no matter how you use it, you are bound to upset somebody. Yet the entry suggests that things might soon change in favor of “the data is.” Spectator style prefers the plural construction, as far as I can tell—but maybe you should send our copy staff a letter?
Sinclair Target is a Columbia College junior majoring in computer science. On Target runs alternate Fridays.