Development of Statistics During the 19th Century
By Townsend Rowland
This paper’s purpose is to respond to Theodore Porter’s claim in The rise of Statistical thinking, 1820-1900 that
The history of this curve, now known as the Gaussian or normal distribution, is practically coextensive with the history of statistical mathematics during the nineteenth century, and its reinterpretation as a law of genuine variation, rather than of mere error, was the central achievement of nineteenth-century statistical thought. (Porter 91)
Porter claims that this reinterpretation happened from Laplace through figures like Quetelet and then to Maxwell and Galton due to changes in social thought that happened during this time period. I disagree with Porter’s analysis and hope to show with this paper that 1) the development of mathematical statistics over the 19th century was more diverse than simply the reinterpretation of the normal distribution (and it isn’t clear that there really was a reinterpretation like Porter says there was); and 2) the social reasons that Porter outlines for this development in statistics don’t tell the full story of why statistics changed the way it did during this time period. This paper will first go over Porter’s arguments for why the reinterpretation of the normal distribution was the key event in the development in 19th-century statistics. Then, I will explain why Porter’s claims provide an incomplete story due to mathematical developments unrelated to the normal distribution like the development of the method of least squares and notions of correlation seen through Galton. Finally, I will argue that the social causes that Porter gives for the development of statistics during the 19th century are insufficient to explain its developments, and that we have to look at a wider range of evidence to get the full picture. Namely, we must see the influence that empire had on the development of statistics at the time.
Porter’s arguments for how the normal distribution was reinterpreted starts with, logically, how it was originally interpreted around the start of the 19th century. He argues that Laplace “developed the general method of error analysis, and applied it to several problems in a wide variety of fields” (Porter 94). Laplace argued that, with probability calculations, as you were able to observe more events, the probability that you got became more developed. That is, you were able to get a smaller and smaller range of possible errors. Eventually, if you were able to observe infinite events, you would be able to eliminate the error altogether (Porter 95). Laplace used the method of error analysis to calculate things like the probability that variation in atmospheric pressure was subject to a regular cause, and eventually to fields such as astronomy (Porter 95). Astronomers, like Laplace, took error analysis and developed ways to “reduce great numbers of observations to a single value or curve, and to estimate the accuracy of this final result based on some hypothesis as to the occurrence of single errors” (Porter 95). Through work done by Legendre, Gauss, and others on the method of least squares, error analysis, by the 1830s could, in turn, produce the error curve (or as it was later known as the normal distribution) (Porter 96). The method of least squares becomes important later because of how it was produced, but for now, this, I believe, is a fair summary of Porter’s argument.
From the mathematical development of the 1830s, Porter argues, comes the social reinterpretation of the normal distribution from describing error to describing variation. In 1844, Quetelet announced that the error law, typically only applied to astronomy, also applied to human features such as height and girth. Porter also argues that this is where the “domain of probability function[s]… began to break down its exclusive association with error. Quetelet saw that the error curve had applicability everywhere in human affairs: in height and girth, in “suicide by hanging in Paris, in marriages between sexagenarian woman and young men in their twenties in Belgium,” and crime rates (Porter 105).
There were two reasons the error curve now applied to outliers and were not seen as errors for Quetelet. The first was his philosophy of L'homme moyen (the average man). Quetelet’s L'homme moyen came from Victor Cousin who argued that there needed to be juste milieu which was a state of moderation between monarchy and radical democracy that occurred in Belgium during and after the 1830s (Porter 101). The moderation espoused in Cousin’s philosophy inspired Quetelet to argue not just that moderation was a virtue, but also that of “an individual who epitomized in himself, at a given time, all the qualities of the average man, who represent[s] at once all the greatness, beauty and goodness of that being” (Porter 102). That is to say that Quetelet was arguing that the most virtuous a person could be was someone who was exactly average. Here, Quetelet also drew on Aristotle and Archimedes who he said also espoused the qualities of moderation and the virtues of the mean value (Porter 103).
The second factor that influenced Quetelet not to see the normal distribution as a measure of error, but rather of variation, was his project of social physics. Quetelet believed that humanity could be described in mathematical terms with the same rigor that astronomers were able to describe the cosmos. He wanted to be “the Newton ‘of this other celestial mechanics,’ to find the laws that assured equilibrium in the social domain” (Porter 104).
The two factors (L'homme moyen and social physics) came together, Porter argues, for Quetelet to reinterpret the normal distribution, not as showing error, but instead as showing variation in whatever it was being applied to (Porter 105). On this Quetelet writes, “one will find that what was regarded as the effect of chance, is subjected to fixed principles, and that nothing escapes the laws imposed by the all-powerful onto organized beings… All is foreseen, all is lawlike: only our ignorance leads us to suppose that all is subject to the whims of chance” (Porter 105). This is where Porter argues that “Error was thus banished from the universe, and Quetelet accordingly resolved that the expression ‘law of errors” was inconsistent with the achievements of modern science” (Porter 105). Here is where I think Porter’s argument breaks down, even in his own piece—it isn’t clear that the supposed variation is seen as anything other than error, both in Quetelet’s uses of the normal distribution and later natural philosophers' uses as well.
The first way that Porter’s argument breaks down is that it isn’t clear that the error curve was seen as (or actually did) anything other than measuring error by Quetelet and others. Quetelet distinguished two types of causes. These were constant causes and perturbing causes (Porter 108). Constant cause was a thing’s type or average, and perturbing causes were things that caused something to deviate from the average value of that type (Porter 108). Thus, if you were doing a study of Scottish soldiers’ sizes (like Quetelet did), then any variation in size was due to perturbing causes such as nutrition, climate, etc. If not for these variations, Scottish soldiers would all be the average size (Porter 108). Since this was the case, soldiers' sizes could be seen in the same way that drawing white and black balls from an urn could be seen. The constant cause of drawing the balls would dictate that you would draw them with whatever ratio they were originally put in the urn with. However, due to errors in drawing them—perturbing causes—the balls would be drawn out with a ratio that was different than existed in the urn. Therefore, since the normal distribution applied to drawing a ball from an urn, and an urn had the same causes governing balls drawn from it as did Scottish soldiers, it also applied to Scottish soldiers (Porter 108). Porter then says that “Quetelet was enormously proud of his discovery that the error law governed human variability” (Porter 108). I argue that another implication of this statement is that, whether consciously or not on Quetelet’s part, this is where the error curve started to be reinterpreted as variation instead of just showing error. But, if we look back at Quetelet’s philosophy of L'homme moyen, it isn’t clear that just because the error curve could be applied to human affairs, that it concerned variation instead of error. If we take into account Quetelet’s upholding of mean value as virtuous, then it seems like, to Quetelet, any variation away from the mean could be seen as an error. This same reasoning holds true in light of Quetelet’s social physics. If human behavior could be mathematized in the same way that error was modeled in astronomy, then there is no reason to think that the normal distribution, when applied to human behavior, meant variability instead of error as it did for its roots in astronomy. Therefore, instead of Quetelet reinterpreting the normal distribution later in the 19th century, it seems more accurate to say that he reapplied it.
Porter rightfully points out that Quetelet didn’t see his work as reinterpreting the normal distribution from error to variation, but argues that, nonetheless, that’s what happened (Porter 106). However, from what I have argued above, this seems not to be the case or at least not the full case.
At this point, I hope to have shown Porter’s argument for why the normal distribution was changed from measuring error to showing variation through Quetelet, and why this interpretation doesn’t fully hold up. Now I will go over other major events in statistics that happened over the course of the 19th century that were not involved with the normal distribution to push back against Porter’s claims that “the normal distribution, is practically coextensive with the history of statistical mathematics during the nineteenth century,” and that the reinterpretation of the normal curve “was the central achievement of nineteenth-century statistical thought.”
One major event that occurred during the 19th century not related to the development of the normal distribution was the development of the law of least squares. The law of least squares was developed first by Legendre in 1805 to minimize error in calculating cometary orbits (Stigler 57). Gauss then claimed he had been doing the same since 1798 (Porter 95). Eventually, as I said earlier, this method was applied to error analysis to produce the error curve. What is to say the reinterpretation of the normal curve was the crowning achievement for 19th-century statistics? Seemingly, just as important was the development of the law of least squares. This is because if there was no law of least squares, there would be no error curve to reinterpret in the first place.
Another development of 19th-century statistics that had nothing to do with the normal distribution was the concept of correlation which was developed by Francis Galton during this time. Galton came from a high upper-class English family and used statistics to look at different levels of society. In 1869 Galton published Hereditary Genius which was a book on biological hereditary that “intended to establish that exceptional ability in a variety of pursuits from music, justice, and statesmanship to wrestling and rowing was inherited” (Porter 141). Galton made his case that talents like the ones listed above were inherited because “the most distinguished judges, statesmen, and wrestlers were far more likely to be near relatives of other eminences in these activities than was an ordinary member of the population” (Porter 141). Thoughts like these led Galton to advocate for eugenics, and, in trying to figure out the way that everything fit together, the development of “correlation diagram[s]” (Porter 143). Galton’s first attempt at this was to plot head size against weight but the “data proved too unreliable to be handled by rigorous techniques” (Porter 143).
The concept of correlation cannot be untangled from a contemporary understanding of statistics, so its development through Galton during the 19th century, which was unrelated to the reinterpretation of the normal curve, seems very important looking back. I don’t mean to be doing Whig history here, at least not any more than Porter. But his claim that 19th-century statistical thought had a “central achievement” calls for it. Porter gives no criteria for what central achievement in statistical thought looks like, so to push back against that claim, I must somehow show achievement by other means. Therefore, I bring up correlation to show that the reinterpretation of the normal was not the only thing during the 19th century that moved us closer to modern statistics.
Now I will discuss why the reasons that Porter gives for the development of statistics during the 19th century are insufficient when you consider the bigger picture. I argue that you must look to the concept of empire. This can be seen through the figure of John Herschel.
The general story that Porter gives is that Quetelet applied the normal distribution not just to cases of astronomical error, but also to variations in population. This led to people studying physics, inspired by Herschel, to apply the error law to molecular theory (Porter 118). Quetelet also paved the way for Galton to apply the error curve to populations in regard to things like eugenics. Porter points out that all the social science figures in this story believed in a progressive society. Quetelet through his L'homme moyen believed that man could progress through being more like the average (Porter 104). Herschel, as I will discuss more below believed in the progress of society (Musselman 420). Galton believed that society could be made better through eugenics (Porter 140). I think, though, that there is something more fundamental than a progressive view driving the development of statistics during the 19th century which is empire.
Porter argues that the reason the normal distribution was able to be applied to physics was because of an essay written by Herschel. Herschel was the preeminent natural philosopher in Britain at the time, so when he wrote an essay about the ways that Quetelet’s methods could be applied to a wide range of things (such as the kinetic gas theory), Maxwell, a natural philosopher working on the kinetic gas theory, began to use the Quetelet’s methods in his investigations. The force of Herschel’s arguments came from him reasserting that the normal distribution applied to phenomena universally and from his place in British science (Porter 122). However, I think it is likely that a large factor in Herschel’s belief that the normal distribution could be applied everywhere was because of the time he spent in Africa. In her paper “Swords into ploughshares: John Herschel's progressive view of astronomical and imperial governance,” Elizabeth Musselman argues that Herschel’s time in South Africa impacted the way that he did science and the way he saw social progress (Musselman 425). Particularly that “savage” civilizations like those Herschel experienced in South Africa could progress, through settling and farming, to a more advanced civilization. During the same time in Africa, Herschel’s view of science evolved in that he realized it could not be a fast hunt, and instead had to be a steady push to more orderly knowledge (Musselman 434). Herschel’s development in Africa, both his social thoughts and those about science, point towards the fact that some things could be universal. It seems likely that this universalization impacted Herschel’s belief that the normal distribution could also be applied universally. Moreover, these realizations for Herschel would not have happened without the British Empire at his back, as both the reason he was in Africa, and as an example of how civilizations could progress.
Beyond Herschel though, empire seems to lurk in the background in each of Porter’s examples. Quetelet would not have been able to get his data for French heights if not for data taken from men who were conscripted in the French army (ostensibly they were conscripted for the glory of the French empire) (Porter 106). Empire was also in other examples that Quetelet applied the error curve to. These were Native Americans and blacks passing through Belgium because they were displaced during the civil war (Porter 108). Furthermore, Galton cannot be seen but as a product of empire. Porter writes that a large influence on Galton and his beliefs was that Christianity seemed to fail, and where it failed science seemed like a new path forward (Porter 132). Galton, Porter writes, would not have developed these beliefs if not for his extensive travels in the middle east and Africa. These travels would have been significantly harder (and unlikely to have occurred at all) if the British empire did not exist. Therefore, it seems to be the case that progressive beliefs cannot fully account for the development of statistics during the 19th century. Instead, there is another social factor that must be explored:the pervasiveness and ubiquity of European empires during the 19th century.
Throughout this paper, I hope to have effectively argued for two main claims. The first is that it is not clear if the normal distribution was reinterpreted in the 19th century to account for variation instead of error. I showed this through Quetelet’s projects of social physics and his philosophy of L'homme moyen. Then, I reviewed how even if a reinterpretation of the normal distribution did occur during the 19th century it cannot be called the “central achievement” of statistics during this time because there were other developments in statistics unrelated to the error curve. The two examples I highlighted were the law of least squares and the concept of correlation. I also briefly pushed back on the idea that something like a “central achievement” is a historical possibility because it is hard to decide why an achievement is important or not, and it is even more difficult to rank one achievement over another. Finally, I argued that the social influences which Porter outlined as having influenced statistics during the time (mainly progressive beliefs) were insufficient to tell the whole story of how statistics developed. The other factor I pointed to influencing statistics during the 19th century was the concept of empire. I argued that the influence of empire could be seen through John Herschel, the examples of data it provided to Quetelet, and Galton’s travel opportunities. It seems to me that to have progressive ideas in the first place, uneven hierarchy (such as empire) is needed to demonstrate exactly what peoples or countries low on the totem pole should aspire to.
Works Cited
Green Musselman, Elizabeth. “Swords into Ploughshares: John Herschel’s Progressive View of
Astronomical and Imperial Governance.” The British Journal for the History of Science, vol. 31, no. 4, 1998, pp. 419-435.
Porter, Theodore M. The Rise of Statistical Thinking, 1820-1900. Princeton University Press,
1986.
Stigler, Stephen M. The History of Statistics: the Measurement of Uncertainty before 1900. Belknap Press of Harvard University Press, 1986.