Benford’s Law says that in a large set of numbers that are spread uniformly over several orders of magnitude, numbers that start with a 1 or a 2 should be more common than numbers that start with an 8 or a 9. In fact, Benford’s law gives says that the distribution of leading digits should follow this probability model:
| Leading Digit | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|---|---|---|---|---|---|---|---|---|---|
| Proportion | 0.301 | 0.176 | 0.125 | 0.097 | 0.079 | 0.067 | 0.058 | 0.051 | 0.046 |
The predicted proportions in Benford’s law come from the formula \[P(\text{Leading digit} = k) = \log_{10}(k+1) - \log_{10}(k)\] for each \(k\) from 1 to 9. Here a graph showing the trend.
p <- log10(2:10) - log10(1:9)
barplot(p, names.arg = 1:9, main="Benford's Law", xlab = "Leading Digit", ylab = "Predicted Proportion")
Below, the data frame df contains data from 2025 with
the populations of all the countries in the world. The
leading.digits variable contains a table with the counts
for each leading digit. Use this data to answer the questions below.
df <- read.csv("https://bclins.github.io/spring26/math222/Examples/WorldPopulations2025.csv")
leading.digits <- table(df$Leading.Digit)
The Chi-Squared Goodness of Fit Test can be used to check if the values of a single categorical variable are significantly different than the predicted proportions from a model. We always test the same two hypotheses:
Test Goodness of Fit. Use R to carry-out a chi-square goodness of fit test. For the test, you should use the following command:
chisq.test(leading.digits, p = log10(2:10) - log10(1:9))
What is the result? Does the p-value mean that world populations do or don’t appear to follow Benford’s law?