(2) Decimals are nothing special in this context. Benford`s law also applies to other bases; It is enough to replace the 10 10 10 by the base B B B in the logarithms. The exact form of Benford`s law can be explained by assuming that the logarithms of numbers are evenly distributed; This means, for example, that a number is just as likely to be between 100 and 1000 (logarithm between 2 and 3) as it is between 10,000 and 100,000 (logarithm between 4 and 5). For many numbers, especially those that are growing exponentially, such as income and stock prices, this is a reasonable assumption. Here is a table of percentages. The Prediction BL column is the percentage that Benford`s law predicts for each digit. (These figures are explained in the full explanation of the law in the next section.) In this article, I have given an overview of Benford`s law, with some context and history. I explained normal distributions and logarithms as an overview of understanding lognormal distributions. With a few theoretical dice rolls, I showed how multiple independent variables can lead to normal distributions (with addition) and lognormal distributions (with multiplication). I then showed how some records with lognormal distributions can make them BL compliant. Finally, I reviewed three examples of real-world datasets (urban/community population, accounts payable, and river length) to show how records with a logarithmic normal distribution tend to adhere to BL.
A few days ago, I explained the content of my previous article on Benford`s Law to a friend who is one of the few people who makes my enthusiasm humorous with mathematics. And since she is the brilliant person that she is, she asked how we could comply with Benford`s law and see her in action. It`s one thing to create graphs and Python programs that show correlation, but it`s another to see how the model plays out and why it behaves that way. Benford`s law (also known as the law of the first digit) states that the main digits in a collection of records are likely to be small. For example, most numbers in a sentence (about 30%) have a number of 1 if the expected probability is 11.1% (i.e., one of the nine digits). This is followed by about 17.5%, starting with a number 2. This is an unexpected phenomenon; If all major numbers (0 to 9) had the same probability, each would occur in 11.1% of cases. To put it simply, Benford`s law is a probability distribution for the probability of the first digit in a set of numbers (Frunza, 2015). This discussion is not a complete explanation of Benford`s law because it did not explain why datasets are encountered so often that, when presented as the probability distribution of the logarithm of variables, they are relatively uniform over several orders of magnitude. [12] It is difficult for humans to manually construct distributions that conform to Benford`s law. Fraudulent digital data can often be identified by simply looking at the frequency of the first digits, although in practice more than one digit is often used for more accurate verification. In particular, the Benford Act has been applied to entries in tax forms, election results, economic figures and accounting figures.
If you haven`t seen it yet, check out the Netflix Connected series. It`s a good show. The moderator, Latif Nasser, addresses various topics of popular science. Netflix advertises this as a series that examines the surprising and complicated ways in which we are connected to each other, to the world, and to the universe.â[1] For systems of numbers b = 2.1 (binary and unary), Benford`s law is true, but trivial: all binary and unary numbers (except 0 or the empty set) begin with the number 1. trivial, even for binary numbers.) Both graphs show the same data on different horizontal scales. The top chart shows the data points on a linear axis, and the bottom chart shows the data points on a log axis. Note that data points are more evenly distributed across the log scale. Also note in the diagram below that the intervals between numbers with a main digit of one are much larger than the other intervals. I`ll tell you a little more about these intervals in the next section. If we observe the powers of two with a number of 1: (Short side note: The fact that things often bear the name of someone who did not discover them first is common. In fact, there is a name for it, Stigler`s law of naming. It was proposed in 1980 by the American professor of statistics Stephen Stigler when he wrote that no scientific discovery is named after its original discoverer [7].
Ironically, Stigler admitted that American sociologist Robert Merton had already discovered “Stigler`s law.” In summary, it should seem obvious that there will be more “1s” than any other number, because that`s where we start counting something. [13] Durtschi, C., Hillison, W. Pacini C., “The effective use of Benford`s law to help in detect fraud in accounting data,” Journal of Forensic Accounting, 2004, www.agacgfm.org/AGA/FraudToolkit/documents/BenfordsLaw.pdf Benford, F. “The Law of Anomalous Numbers,” Proceedings of the American Philosophical Society, 78, 551–572. 1938. Diaconis, P. and Freedman, D. “On Rounding Percentages,” J.
Amer. Stat. Assoc., 74 (1979) 359–364. MR 81d:62014 Frunza, M. (2015). Solving Modern Crime in Financial Markets: Analyses and Case Studies. Academic press. Hill, T. P. “The phenomenon of the first number.” Amer.
Sci. 86, 358-363, 1998. Newcomb, p. “Note on the frequency of use of numbers in natural numbers.” Amer. J. Math. 4, 39-40, 1881. Nigrini, N.
(1999). I have your number. Retrieved on November 8, 2017, from: www.journalofaccountancy.com/issues/1999/may/nigrini.html Poincaré, H. Distribution of decimal places in a numerical table. pp. 313-320 in: Calcul des Probabilités, Gauthier-Villars, Paris. Rauch, B. et al. “Fact and fiction in the economic data of the EU government.” German Economic Review.
In other words, each order of magnitude in the powers of two has exactly one number with a principal digit of one. And since each order of magnitude has 3 or 4 numbers, we can expect 25% to 33% of the powers of two to have a main number of 1 (and Benford`s 30.1% falls into that range, by the way). [12] The state of Oklahoma, “Oklahoma`s Open Data,” data.ok.gov, 2019 [3] Berger, A., Hill, T.P., “A basic theory of Benford`s Law,” Probability Surveys, 2011, projecteuclid.org/download/pdfview_1/euclid.ps/1311860830 logarithms are useful for examining data where values are grouped near zero but higher values are more widely distributed. Consider the following two diagrams. Datasets consisting of numbers that are the product of several independent factors tend to follow Benford`s law. The square roots and reciprocals of successive natural numbers do not obey this law. [42] Telephone directories violate Benford`s law because (local) numbers generally have a fixed length and do not begin with the remote dialling code (in the North American dial plan, number 1). [43] Benford`s law is violated by the population of all places with at least 2500 residents of five U.S. states according to the 1960 and 1970 censuses, where only 19% started with No.
1 but 20% began with No. 2, for the simple reason that the reduction introduces statistical biases to 2500. [42] The latest figures from pathology reports violate Benford`s law due to rounding. [44] You can see that the large green strip extends over about 30% of the segment, and the next 8 intervals between tick marks become smaller from left to right until the next green strip begins.