{"id":32,"date":"2025-04-21T03:23:59","date_gmt":"2025-04-21T03:23:59","guid":{"rendered":"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/?post_type=chapter&#038;p=32"},"modified":"2026-06-07T14:29:03","modified_gmt":"2026-06-07T14:29:03","slug":"monte-carlo-simulation","status":"publish","type":"chapter","link":"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/chapter\/monte-carlo-simulation\/","title":{"raw":"Monte Carlo Simulation","rendered":"Monte Carlo Simulation"},"content":{"raw":"<div class=\"textbox\">\r\n<p class=\"import-epf\">Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin.<\/p>\r\n<p class=\"import-ept\" style=\"text-align: right\">John von Neumann<\/p>\r\n\r\n<\/div>\r\n<h2 class=\"import-ahaft\">3.1 Free Throw Shooting with MCSim<\/h2>\r\n<p class=\"import-pf\">You are the new kid on the block, and it is time to choose teams at the rec center. You think you are pretty good, so you say, \u201cI\u2019m a 90% free throw shooter.\u201d This is quite impressive. Someone hands you a basketball and says, \u201cProve it.\u201d<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">You shoot 100 free throws, but it does not go as well as you hoped. You make only 75. Someone says, \u201cYou are not a 90% free throw shooter.\u201d You insist, however, that you really are. \u201cIt was just bad luck. Honestly,\u201d you say, \u201cI really am a 90% free throw shooter.\u201d<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The question is, Should we believe you? Anyone who has ever shot free throws knows there is luck involved. We would not expect you to make 9 out of every 10 shots like clockwork. So it could be that by chance, you missed a few more than expected. Another way to ask the question is, Can randomness explain this poor outcome? Or, in yet other words, how uncommon is missing this many free throws for a 90% shooter?<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">We would not be having this conversation if you had made 89 or even 88 out of 100. Then it would be easy to believe you are actually a 90% shooter and it was just bad luck. But how do we handle the fact that you missed 15 more than expected? That seems like a lot, but how rare is that?<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">There is a way to answer this question analytically\u2014that is, with mathematics. We will not go that route. Instead, we will use the method of simulation.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Monte Carlo simulation simply means the repeated running of a chance process and then direct examination of the results. It can be used in frontier research work, but we will use it just like we used numerical methods to solve optimization problems\u2014simulation enables us to understand complicated concepts without advanced mathematics.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Monte Carlo simulation is based on brute force\u2014repeat the chance process and examine the results. It requires no imagination or mathematics at all. It will be our go-to method for understanding randomness and answering questions like, \u201cDo we believe you are a 90% free throw shooter if you make only 75 out of 100?\u201d<\/p>\r\n\r\n<h3 class=\"import-bh\">Gauss and Two Approaches<\/h3>\r\n<p class=\"import-paft\">Carl Friedrich Gauss (1777\u20131855) was perhaps the greatest mathematician of all time. Before the euro, Germany\u2019s 10 deutsche mark note featured him along with a graph of the normal curve (which he made famous, called the Gaussian distribution). Look carefully in Figure 3.1 and you can see that it even displays the equation of the normal distribution.<\/p>\r\n\r\n\r\n[caption id=\"attachment_284\" align=\"aligncenter\" width=\"960\"]<img class=\"wp-image-284 size-full\" src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/960px-10_Mark_Obverse.jpg\" alt=\"\" width=\"960\" height=\"480\" \/> <strong>Figure 3.1: German currency featuring Gauss.<\/strong> <br \/>Source: <a href=\"https:\/\/commons.wikimedia.org\/w\/index.php?curid=71505000\">YavarPS on Wikimedia<\/a> \/ <a href=\"https:\/\/creativecommons.org\/licenses\/by-sa\/4.0\">CC BY-SA 4.0<\/a>.[\/caption]\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">There is a story, probably apocryphal, of how he amazed his kindergarten teacher. Apparently, the children were especially unruly one day, so the teacher assigned a dreary problem as punishment. He told them to add all the numbers from 1 to 1,000. This starts easily but gets tedious and painful pretty quickly. 1 + 2 = 3, 3 + 3 = 6, 4 + 6 = 10, and 5 + 10 = 15. It will take a long time to get to 1,000.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Gauss waited a minute, then stood up and announced the answer: 500,500. The stunned teacher asked him where he got that number, which is correct, and Gauss said he noticed a pattern. Remember, he was five years old.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">If you make a list of the numbers, then create a second list, but flipped, the pairs always add up to 1,001: 1 goes with 1,000, 2 with 999, 3 with 998, and so on until the end, when 998 is with 3, 999 with 2, and 1 with 1,000.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The rest is easy (well, maybe not for the usual five-year-old, but this is Gauss). Multiply 1,001 by 1,000 (since there are 1,000 pairs) and divide by 2 to get 500,500. As they say, QED.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">This is clever, remarkable, and beautiful. It is like Michelangelo and the Sistine Chapel. Is Monte Carlo simulation like this? No.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Monte Carlo simulation is a different approach to problems that uses little creativity or subtlety. It is a direct attack on a question.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Monte Carlo simulation is like solving the teacher\u2019s tedious problem by using a spreadsheet to add the numbers.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Make a list from 1 to 1000 in cells A1:A1000 (using fill down, of course), and then, in cell B1, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=SUM(A1:A1000)<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Excel displays 500,500. This is nowhere as magnificent as what Gauss did, but it does give the answer.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Monte Carlo simulation was developed during World War II by physicists working on the Manhattan Project. Nicholas Metropolis coined the term because his colleague, Stanislaw Ulam, was an avid poker player. They were simulating how radiation propagates and incorporating randomness. The connection to chance and gambling is why Metropolis named the method after the famous Monte Carlo casino in Monaco.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">So Monte Carlo simulation, or \u201csimulation\u201d for short, is an alternative to the analytical approach. Instead of using equations and algebraic manipulations, simulation uses computers to repeat the chance process many times and then directly observe the outcomes.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">You can think of simulation as the much younger sibling of analytical methods. Let\u2019s apply it to free throw shooting to show how it works.<\/p>\r\n\r\n<h3 class=\"import-bh\">Are You Really a 90% Shooter?<\/h3>\r\n<p class=\"import-paft\">Our solution strategy will be simulation, but make no mistake: Gauss would not have needed simulation. He would have immediately rejected your claim. He knows a formula can be used, [latex]\\sqrt{n}\\sigma[\/latex], that answers the question quickly. The formula is the product of sophisticated mathematics and can be called beautiful, but most people find it extremely difficult to understand and cannot use it to answer the question.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">All Monte Carlo simulations use a random number generator (RNG). Excel\u2019s RNG function is RAND(). This draws uniformly distributed random numbers in the interval from zero to one.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Insert a sheet in your Excel workbook and, in cell A1, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=RAND()<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">You see a number with several decimal places displayed that is between zero and one. The number is actually much longer.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Widen the column and add decimal places to see this. Keep adding decimal places (widening column A as needed) until you start seeing zeroes.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">As you learned when we explained Solver\u2019s false precision, most modern spreadsheets use 64-bit double-precision floating point format. If you count carefully, you will see that RAND() has a zero, then a decimal point, and then 15 decimal places with values from zero to nine. After that, they are all zero, so we have reached the maximum precision. It is important to understand that our spreadsheet\u2019s random number is finite but also that it has many more decimal digits than what was originally displayed.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Repeatedly press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> (you may have to hold down the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">f<\/em><em class=\"import-i\">n<\/em><\/span> key on your keyboard). <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> is the keyboard shortcut to recalculate the sheet.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The number in cell A1 changes each time you recalculate the sheet. This is the beating heart of the simulation.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The bouncing numbers show that although RAND() is finite, it has a massive set of numbers to choose from. If it had only one decimal place, RAND() would have 10 possible numbers (from 0.0, 0.1, 0.2, and so on until 0.9). Six decimal places would give it 1 million different numbers. Twelve gives a trillion numbers. Fifteen is a quadrillion!<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">So you can think of RAND() as plucking a number from a humongous box with a quadrillion numbers in it.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Full disclosure: This is not exactly right because RAND() is using an algorithm to produce the next number. This is why computer-generated random numbers are called <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">pseudorandom<\/em><\/span>, where the prefix <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">pseud<\/em><em class=\"import-i\">o<\/em><\/span> means \u201cfalse.\u201d<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">To model a 90% free throw shooter, we use an IF statement.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell B1, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(A1<\/em><em class=\"import-i\">&lt;<\/em><em class=\"import-i\">0.9<\/em><\/span>, <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">1<\/em><\/span>, <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">0)<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The IF function has three arguments (or inputs) separated by commas. The first argument is the test, the second is what happens if the test is true (or yes), and the third is what happens if it is false (or no). If the random number in cell A1 is less than 90%, then cell B1 shows a one, which means the free throw was made; otherwise, it shows a zero, which means it was missed.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Some students (usually really smart, careful ones) obsess about whether A3 should be less than (&lt;) or less than or equal to (\u2264) 90%. This does not matter because RAND() has so many random numbers available to it. The chances of drawing exactly 0.900000000000000 are ridiculously small.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">When the IF function evaluates to 1, the free throw is made, and when it is 0, it is missed. This is a binomial random variable, since it can only take on two values, 0 or 1.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">We do not need to actually see the random number generated, so we can embed RAND() directly in the IF statement.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell B2, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(RAND()<\/em><em class=\"import-i\">&lt;<\/em><em class=\"import-i\">0.9,1,0)<\/em><\/span>. Fill down this formula to cell B100. Press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> a few times to see the 0 and 1 values bouncing around.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">We have implemented the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">dat<\/em><em class=\"import-i\">a <\/em><em class=\"import-i\">generatio<\/em><em class=\"import-i\">n <\/em><em class=\"import-i\">proces<\/em><em class=\"import-i\">s<\/em><\/span> (DGP) in Excel. The DGP tells us how our data are produced.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Rename the sheet (double-click on the sheet tab) <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">DG<\/em><em class=\"import-i\">P<\/em><\/span> and save the workbook as <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">FreeThrowSim<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">As you scroll back up to the top row, you will see many ones and a few zeroes. With a 90% success rate, roughly 1 in 10 cells will have a random number greater than 0.9 and, therefore, show a 0.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The fact that each cell in column B stands alone and does not depend on or influence other cells means we are assuming <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">independence<\/em><\/span>. In our model, a miss or make does not affect the chances of hitting the next shot.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">If you believe in the hot hand (Cohen, 2020), this implementation of the chance process is wrong. If making the previous shot increases the chances of making the current shot, there is <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">autocorrelatio<\/em><em class=\"import-i\">n<\/em><\/span>, and we cannot use 0.9 as the threshold value for every shot. We assume independence from one shot to the next.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">How many shots out of 100 will a 90% shooter make?<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=SUM(B1:B100<\/em><em class=\"import-i\">)<\/em><\/span> in cell C1 and the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Numbe<\/em><em class=\"import-i\">r <\/em><em class=\"import-i\">made<\/em><\/span> in cell D1.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">You will see a number around 90 in cell C1. Each press of <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> gives you the result of a new outcome from 100 attempted free throws.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The number of made free throws from the virtual shooter you have constructed in Excel is not always exactly 90 because you incorporated RAND() in each shot.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Use your keyboard shortcut, <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F9<\/em><\/span>, to recalculate the sheet a few times to get a sense of the variability in the number of made shots from 100 free throws.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">There is no doubt that the number of made shots is a random variable, since it is bouncing around when you recalculate the sheet. It makes common sense that adding 100 bouncing numbers will produce a random outcome.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">A <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">statisti<\/em><em class=\"import-i\">c<\/em><\/span> is a recipe for the data. Cell C1 is a sample statistic because the recipe is to add up the results from a sample of 100 shots. We are interested in the distribution of the sum of 100 free throws from a 90% free throw shooter, including its central tendency, dispersion, and shape of a histogram of outcomes. With this, we can decide if a result of 75 made shots is merely unlikely or so rare that we reject the claim that you are a 90% shooter.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Simulation is simply repeating the experiment many times so we can approximate the center, dispersion, and distribution of the outcomes. Since we are working with a sample statistic, the distribution of the sum of 100 free throws is called a <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">samplin<\/em><em class=\"import-i\">g <\/em><em class=\"import-i\">distribution<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">To process the many outcomes, we need software. A free Excel add-in that does Monte Carlo simulation is available here: <a href=\"http:\/\/dub.sh\/addins\">dub.sh\/addins<\/a>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Download the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCSim.xl<\/em><em class=\"import-i\">a<\/em><\/span> file from the link above and use the Add-ins Manager (File \u2192 Options \u2192 Add-ins \u2192 Go) to install it. Click the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Add-ins<\/em><\/span> tab and click <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCSim<\/em><\/span>.<\/p>\r\n\r\n<div class=\"textbox\">\r\n<p class=\"import-bxt\" style=\"padding-left: 40px\"><span style=\"color: #006838\"><strong><em>EXCEL TIP<\/em> <\/strong><\/span>The keyboard shortcut to call the Add-ins Manager is <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Al<\/em><em class=\"import-i\">t<\/em><\/span>, <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">t<\/em><\/span>, <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">i<\/em><\/span> (press these keys in order without holding any of them down).<\/p>\r\n\r\n<\/div>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Figure 3.2 shows the MCSim add-in dialog box. On the left are three required choices. You must select a cell to track (C1 in our example), the number of repetitions (the default is 1,000), and the random number generator to use. The MCSim add-in comes with its own RNG, RANDOM. Selecting it will replace all RAND in the sheet with RANDOM. The default is no changes.<\/p>\r\n&nbsp;\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"822\"]<img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p41-1.png\" alt=\"Screenshot of a Monte Carlo Simulation dialog box. The dialog is divided into two sections: Required (left) and Optional (right). Required section (left): &quot;Select a cell&quot; field contains the cell reference $A$1, with a collapse button. &quot;Enter the Number of Repetitions&quot; field is set to 1000. &quot;Choose RNG&quot; group box has &quot;No Changes&quot; selected. Optional section (right) shows nothing selected in any of the fields. Two buttons are at the bottom: &quot;Proceed&quot; (left, with a bold border indicating it is the default action button) and &quot;Cancel&quot; (right).\" width=\"822\" height=\"516\" \/> <strong>Figure 3.2: The Monte Carlo simulation Excel add-in.<\/strong><br \/>Source: Screenshot of Excel interface, \u00a9 Microsoft Corporation. Add-in by H. Barreto.[\/caption]\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">On the right are some advanced options. Some of these will be used in future work. The <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Se<\/em><em class=\"import-i\">t <\/em><em class=\"import-i\">See<\/em><em class=\"import-i\">d<\/em><\/span> option forces the RNG to begin from the same initial position, which allows for replication of results.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Click in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Selec<\/em><em class=\"import-i\">t a <\/em><em class=\"import-i\">cel<\/em><em class=\"import-i\">l<\/em><\/span> box, clear it, and click cell A1 (displaying RAND()). Click in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Se<\/em><em class=\"import-i\">t <\/em><em class=\"import-i\">See<\/em><em class=\"import-i\">d<\/em><\/span> box and enter 123. Click <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Proceed<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">A new sheet is inserted in the workbook. It shows the first 100 outcomes in column B, summary statistics, and a histogram. It is roughly, but not exactly, a rectangle. If you ran more repetitions, it would be less jagged.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Return to the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">DG<\/em><em class=\"import-i\">P<\/em><\/span> sheet and repeat the simulation of cell A1.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The results are exactly the same as before because the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Se<\/em><em class=\"import-i\">t <\/em><em class=\"import-i\">See<\/em><em class=\"import-i\">d<\/em><\/span> option started the RNG from the same initial value.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Return to the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">DG<\/em><em class=\"import-i\">P<\/em><\/span> sheet and click the <span class=\"import-ccust1\">MCSi<\/span><span class=\"import-ccust1\">m<\/span> button in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Add-in<\/em><em class=\"import-i\">s<\/em><\/span> tab. Select cell C1 (the sum of made free throws) and clear the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Se<\/em><em class=\"import-i\">t <\/em><em class=\"import-i\">See<\/em><em class=\"import-i\">d<\/em><\/span> box. Click <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Proceed<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Figure 3.3 shows the results. Yours will be different because we cleared the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Se<\/em><em class=\"import-i\">t <\/em><em class=\"import-i\">See<\/em><em class=\"import-i\">d<\/em><\/span> box. However, your results will be quite close in the sense that your average is near 90 and the standard deviation (SD) is around 3.<\/p>\r\n\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"789\"]<a href=\"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/back-matter\/alt-text-long-description\/#:~:text=Figure%203.3%3A%20Screenshot,ending%20near%2098.\"><img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p42-1.png\" alt=\"Screenshot of a Monte Carlo simulation, consisting of a summary statistics table and a histogram. Long description linked from image.\" width=\"789\" height=\"664\" \/><\/a> <strong>Figure 3.3: Simulation results for sum of 100 attempts from a 90% shooter.<\/strong>[\/caption]\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The average and SD are approximations of their true exact analogues, the expected value (EV) is exactly 90, and the standard error (SE) is exactly 3. The EV is the center of the sampling distribution, while the SE is the typical deviation, or bounce, in the statistic.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">We would say that we expect a 90% free throw shooter to make 90 out of 100 attempts, plus or minus 3 free throws. The plus or minus is critical because it tells us the variability in the number of made free throws.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The sim also shows the maximum and minimum shots made from 100 free throws in 1,000 repetitions. In Figure 3.3, the max is 98 and the min is 78.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">This gives an answer to our question. In 1,000 repetitions, the worst a 90% shooter did was 78. It certainly looks like you are not a 90% free throw shooter.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">What if we did more repetitions?<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Run a simulation of 10,000 sets of 100 free throws.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Once again, you are unlikely to see 75 or fewer. It seems the bad luck defense is not going to work. While it is possible that you really are a 90% free throw shooter and had an incredibly unlikely run of bad luck, such an outcome is incredibly rare\u2014so rare, in fact, that we do not believe your claim to be a 90% free throw shooter.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The average and SD values changed a little with the second simulation. This shows that simulation always gives an approximate answer with some variability. Simulation can never give us an exact answer because we cannot run an infinity of repetitions. As the number of repetitions increases, the approximation gets better, but it is never exact.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">By the way, as mentioned earlier, Gauss and statisticians using his work would have answered this question differently. A simple formula would lead immediately to the rejection of your claim.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The procedure begins by computing the SE with the formula [latex]\\sqrt{n}\\sigma=\\sqrt{100}\\times0.3=3[\/latex]. Next, express the observed from the expected difference in standard units: [latex]\\frac{75-90}{3}=-5[\/latex]. This is so far in the tail of the normal (Gaussian) curve that the claim is rejected.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">In other words, 75 out of 100 when 90 was claimed is 5 standard units away from what we expected to see, and this is ridiculously unlikely, so, sorry, we do not believe that you are a 90% free throw shooter who had some bad luck.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">In fact, neither analytical methods nor simulation can ever give a definitive, guaranteed answer. Both agree that, given the evidence, 75 out of 100 means we do not believe the claim that you are a 90% shooter. Since chance is involved, it is possible that you are a 90% shooter and missed every shot. We are not interested in what is possible. We want to know how to use the evidence to decide whether or not we believe a claim.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">If you have taken a statistics course, you might recognize that we are doing hypothesis testing without explicitly saying so. The null is that you are a 90% shooter, and the alternative hypothesis is that you are not. Seventy-five out of 100 produces a test statistic far from the expected 90, so the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">p<\/em><\/span>-value is really small. Thus, we reject the null.<\/p>\r\n\r\n<h3 class=\"import-bh\">Max Streak<\/h3>\r\n<p class=\"import-paft\">A second example of simulation involves streaks, also known as runs. A streak in this case is a consecutive set of made shots.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Return to the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">DG<\/em><em class=\"import-i\">P<\/em><\/span> sheet. Starting from cell B1, find the first 1 (it could be cell B1), and then count how many 1s in a row you see before you encounter a miss. Write that number down and count the next streak. Continue until you reach the 100th shot attempt. The longest streak is the max streak.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The question is, What is the length of the typical max streak in a set of 100 free throws from a 90% free throw shooter?<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">This is an exceedingly difficult question. It is asking not to count the streaks (also a hard question) but to find the biggest streak in 100 shots. You do not simply add up the number made; you have to find the length of all the streaks and then identify the longest one.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Unlike how many free throws in 100 attempts a 90% shooter will make, you have no easy way to guess the typical max streak. It could be 20, 40, or maybe 50. Who knows? How can we answer this question?<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The analytical approach is a bit of a dead end. There are formulas that approximate a solution (Feller, 1968, p. 325), but the math is somewhat complicated. No exact analytical solution has been found.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Simulation can be used if we can figure out a way to ask the question in Excel so that a cell displays the answer. This means simulation requires some ingenuity. We need a cell that computes the max streak so we can use the MCSim add-in. We do this in two steps: first we figure out how to report the current streak, then we use the MAX function to find the longest streak.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell E1, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=B1<\/em><\/span>. In cell E2, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(B2=1,E1+1,0)<\/em><\/span>. Fill it down a few cells.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Now you can see what the formula is doing. If the shot is made, we add it to the previous running sum, but if it is missed, it resets the running sum to 0. The B2=1 part of the formula tests if the current shot is made, and E1 + 1 increases the current streak length by one. The zero means you missed and the streak is now zero.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Fill the formula down to E100 and look at the values as you scroll back up.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">You should see several streaks in a set of 100 free throws. We want the longest streak. That is the second step in our implementation of the question in Excel, and it is easy.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell F1, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=MAX(E1:E100<\/em><\/span>) and enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">ma<\/em><em class=\"import-i\">x <\/em><em class=\"import-i\">streak<\/em><\/span> in cell G1. Press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> a few times.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Cell F1 displays the max streak from each set of 100 free throws. Max streak is a statistic, just like the sum, because it is a recipe\u2014albeit much more complicated than the sum.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">It has an expected value, standard error, and sampling distribution. We can approximate all of these with simulation.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Run a simulation, with 10,000 repetitions, of cell F1.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Figure 3.4 shows the results. Yours will be a little different. The average is an approximate answer to our question: The max streak is about 27 or so. The exact answer is the expected value, but we have no way of computing it.<\/p>\r\n\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"803\"]<img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p46-1.png\" alt=\"Screenshot of a Monte Carlo simulation with a summary statistics table and a histogram. Summary Statistics table (top) has two columns labeled &quot;Summary Statistics&quot; (blue header) and &quot;Notes&quot; (red header). The Notes column is empty. The statistics reported are: Average = 26.858, SD = 9.5715, Max = 85.000, Min = 9.000. Histogram (bottom) is titled &quot;Histogram of $F$1.&quot; The chart displays the frequency distribution of simulation results as a blocky curve. The horizontal axis runs from 8 to 78. The vertical axis is unlabeled. The bars rise quickly from the left tail starting near 8, reach their peak height around values of 20, then taper off quickly toward the right tail ending near 78. \" width=\"803\" height=\"675\" \/> <strong>Figure 3.4: Sim results for a max streak in 100 attempts from a 90% shooter.<\/strong>[\/caption]\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The histogram is an approximation to the exact sampling distribution (which no one has figured out how to exactly derive). The graph tells us which values are unlikely: roughly 10 or fewer and 50 or more.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Notice that the sampling distribution of the max streak statistic, unlike the sum, does not appear to follow the normal curve. The max streak has a long right tail and is not symmetric.<\/p>\r\n\r\n<div class=\"textbox textbox--key-takeaways\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Takeaways<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p class=\"import-paft\">Sometimes a function or problem is deterministic, but other times we are faced with <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">stochasti<\/em><em class=\"import-i\">c <\/em><em class=\"import-i\">data<\/em><\/span>\u2014the numbers depend on chance, luck, and randomness. The values we observe are produced by a DGP, and they are volatile.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Monte Carlo simulation is a brute-force approach to answering questions involving stochastic data. A much older alternative, the analytical approach, relies on brainpower to derive formulas.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">To run a simulation, the problem must be implemented in Excel (or some software that can generate random numbers). Of course, one can manually flip a coin many times in the real world, but this is tedious. Simulation did not become a powerful tool until modern computers were invented and enabled a great many repetitions in a short period of time.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Simulation is always only an approximation. By running more repetitions, the approximation improves, but it can never give an exact answer because it would have to run forever.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Often, we are searching for the sampling distribution of a statistic. This tells us the chances of each outcome, the typical result (called the expected value), and the dispersion in values (called the standard error, or SE).<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The MCSim add-in always produces summary statistics and a histogram. If the cell that is tracked is a statistic, then the average is the approximate expected value and the SD is the approximate SE.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">In case you think streaks are a waste of time, look at this headline to an article in <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Th<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">Wal<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">Stree<\/em><em class=\"import-i\">t <\/em><em class=\"import-i\">Journa<\/em><em class=\"import-i\">l<\/em><\/span> on July 27, 2023, on page B1:<\/p>\r\n<p class=\"import-wls\" style=\"margin-left: 36pt;margin-right: 36pt\">Dow Sets Longest Winning Streak Since 87<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The Dow\u2019s streak of 13 consecutive sessions with gains ended the next day. The longest streak ever (as of this writing) is 14, back in 1897.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">References<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p class=\"hanging-indent\">The epigraph is a famous quote from 1951, when computer science was taking off. In \u201cVarious Techniques Used in Connection with Random Digits\u201d (freely available at <span style=\"border: none windowtext 0pt;padding: 0\"><a class=\"rId48\" href=\"https:\/\/mcnp.lanl.gov\/pdf_files\/InBook_Computing_1961_Neumann_JohnVonNeumannCollectedWorks_VariousTechniquesUsedinConnectionwithRandomDigits.pdf\"><span class=\"import-url\">https:\/\/mcnp.lanl.gov\/pdf_files\/InBook_Computing_1961_Neumann_JohnVonNeumannCollectedWorks_VariousTechniquesUsedinConnectionwithRandomDigits.pdf<\/span><\/a><\/span>), von Neumann supported the use of pseudorandom number generation but warned against misinterpreting what these numbers meant. There are many algorithms for random number generation, some are better and others worse. Excel\u2019s RAND() is not great.<\/p>\r\n<p class=\"hanging-indent\">Cohen, B. (2020). <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Th<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">Ho<\/em><em class=\"import-i\">t <\/em><em class=\"import-i\">Hand<\/em><em class=\"import-i\">: <\/em><em class=\"import-i\">Th<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">Myster<\/em><em class=\"import-i\">y <\/em><em class=\"import-i\">an<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">Scienc<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">o<\/em><em class=\"import-i\">f <\/em><em class=\"import-i\">Streaks<\/em><\/span> (Custom House). Russ Roberts interviews Cohen in an August 10, 2020, episode of EconTalk, available at <a href=\"http:\/\/www.econtalk.org\/ben-cohen-on-the-hot-hand\"><span style=\"border: none windowtext 0pt;padding: 0\"><span class=\"import-url\">www.econtalk.org\/ben-cohen-on-the-hot-hand<\/span><\/span><\/a>.<\/p>\r\n<p class=\"hanging-indent\">Feller, W. (1968, 3rd ed.). <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">A<\/em><em class=\"import-i\">n <\/em><em class=\"import-i\">Introductio<\/em><em class=\"import-i\">n <\/em><em class=\"import-i\">t<\/em><em class=\"import-i\">o <\/em><em class=\"import-i\">Probabilit<\/em><em class=\"import-i\">y <\/em><em class=\"import-i\">Theor<\/em><em class=\"import-i\">y <\/em><em class=\"import-i\">an<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">It<\/em><em class=\"import-i\">s <\/em><em class=\"import-i\">Applications<\/em><\/span> (John Wiley &amp; Sons), <a href=\"http:\/\/archive.org\/details\/introductiontopr0001fell\"><span style=\"border: none windowtext 0pt;padding: 0\"><span class=\"import-url\">archive.org\/details\/introductiontopr0001fell<\/span><\/span><\/a>.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<h2 class=\"import-ah\">3.2 Simulating Parrondo\u2019s Paradox<\/h2>\r\n<p class=\"import-paft\">A paradox is something (such as a situation) with opposing elements that seems impossible but is actually true.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">An optical illusion is related to a paradox in that you see something that is not easily explained or can seem impossible. Figure 3.5 is an example. Do you see the old woman, or the young lady, or both? Your age affects what you see in this drawing\u2014older people are more likely to see the old lady (Nicholls et al., 2018).<\/p>\r\n&nbsp;\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"586\"]<img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p48-1.png\" alt=\"A drawing that produces an optical illusion where you see an old lady or young girl depending on how you look at it.\" width=\"586\" height=\"818\" \/> <strong>Figure 3.5: Old woman or young lady?<\/strong><br \/>Source: <a href=\"https:\/\/www.nature.com\/articles\/s41598-018-31129-7\/figures\/1\"><i>My Wife and My Mother-In-Law<\/i>, by the cartoonist W. E. Hill, 1915<\/a> \/ <a href=\"https:\/\/creativecommons.org\/public-domain\/pdm\/\">Public domain.<\/a>[\/caption]\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Juan Parrondo is a physicist who discovered the paradox named after him in 1996. Parrondo\u2019s Paradox occurs when <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">tw<\/em><em class=\"import-i\">o <\/em><em class=\"import-i\">losin<\/em><em class=\"import-i\">g <\/em><em class=\"import-i\">game<\/em><em class=\"import-i\">s <\/em><em class=\"import-i\">ar<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">combine<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">an<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">the<\/em><em class=\"import-i\">y <\/em><em class=\"import-i\">produc<\/em><em class=\"import-i\">e a <\/em><em class=\"import-i\">winnin<\/em><em class=\"import-i\">g <\/em><em class=\"import-i\">game<\/em><\/span>. That is puzzling and counterintuitive.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Almost always, game outcomes are additive, so a losing game plus a losing game equals a losing game just like adding two negative numbers gives an even more negative number. Parrondo found what can be described as a black hole in the parameter space where adding two losers yields a winner.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The paradoxical nature of the result will become apparent when we implement the games in Excel and directly examine the outcomes. Our goal is to show how simulation makes Parrondo\u2019s Paradox crystal clear.<\/p>\r\n\r\n<h3 class=\"import-bh\">Losing Game A<\/h3>\r\n<p class=\"import-paft\">Game A is a coin flip with a slightly negatively biased coin. Heads earns you +1 monetary units (M) and tails \u22121. The coin is flipped 100 times, and we keep a running sum after each flip. On average, at the end of the game, the result is negative, so we say Game A is a losing game.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell A1, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Gam<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">A<\/em><\/span>. In cell A3, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">epsilon<\/em><\/span>; and in cell B3, enter the number 0.005. In cell A4, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">p(H)<\/em><\/span>; this is the probability of flipping a head. In cell B4, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=0.5-B3<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">With a probability of heads less than 50%, we will flip heads less often than tails. This is why this game is a loser.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell A6, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Fli<\/em><em class=\"import-i\">p <\/em><em class=\"import-i\">#<\/em><\/span> and create a series from 1 to 100 in cells A7:A106. In cell B6, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Result<\/em><\/span>, and in cell B7, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(RAND()<\/em><em class=\"import-i\">&lt;<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">B<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">4,1,0)<\/em><\/span>. Fill it down to cell B106.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Column B has our simulated coin flips. We know RAND() is uniformly distributed on the interval [0,1]. It will produce tails slightly more frequently than heads because cell B4<span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">&lt;<\/em><\/span>0.5.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">We track the money in column C with an IF statement.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell C6, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">En<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">M<\/em><\/span>, and in cell C7, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(B7=1,1,-1)<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The formula for the next cell is different because we have to track how much money we had at the end of the previous flip. Thus, we add the cell above.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell C8, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(B8=1,1<\/em><em class=\"import-i\">,<\/em><em class=\"import-i\">-1)+C7<\/em><\/span>. Fill it down to cell C106. Select C6:C106 and make a <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Scatte<\/em><em class=\"import-i\">r<\/em><\/span> chart. Press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> a few times.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The chart shows the entire game, and cell C106 tells us the outcome of Game A. If it is positive, the game was won; if negative, it was lost. The number tells us how much we won or lost.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">We can use the MCSim Excel add-in to examine the sampling distribution of C106. If needed, download and install MCSim from <a href=\"http:\/\/dub.sh\/addins\">dub.sh\/addins<\/a>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Select cell range C7:C106 and click <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCSi<\/em><em class=\"import-i\">m<\/em><\/span> (in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Add-in<\/em><em class=\"import-i\">s<\/em><\/span> tab). Check the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Recor<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">Al<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">Selecte<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">Cell<\/em><em class=\"import-i\">s<\/em><\/span> option and click <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Proceed<\/em><\/span>. The simulation is fast, but Excel may take some time to display the results.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Excel inserts two sheets in the workbook. The <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCSi<\/em><em class=\"import-i\">m<\/em><\/span> sheet has the result for C7 (the first coin flip), but the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCRa<\/em><em class=\"import-i\">w<\/em><\/span> sheet has 1,000 rows and 100 columns of numbers (which is why Excel took so long to display the results).<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">We can process these numbers to understand Game A. Each row is a game with 100 flips running left to right. Each flip number (column) is called an <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">ensembl<\/em><em class=\"import-i\">e<\/em><\/span>, and the average of each flip number is called the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">ensembl<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">average<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell A1003 of the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCRa<\/em><em class=\"import-i\">w<\/em><\/span> sheet, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=AVERAGE(A2:A1001)<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">This value will agree exactly with the average in J5 of the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCSi<\/em><em class=\"import-i\">m<\/em><\/span> sheet. This value is the average M after one flip. It is slightly negative.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Select cell A1003 in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCRaw<\/em><\/span> sheet and fill it right to the 100th column (CV). With these cells selected, make a chart by clicking <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Inser<\/em><em class=\"import-i\">t<\/em><\/span> and choosing <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Scatte<\/em><em class=\"import-i\">r<\/em><\/span> with lines and no markers. Title the chart \u201cFlip #\u201d (since the title is right above the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">x<\/em><\/span>-axis, it labels the axis).<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Your chart is an approximation to the exact ensemble average in Figure 3.6. Simulation produces random deviation from the exact object. We could improve the approximation by increasing the number of repetitions. The squiggly graph in the simulation would converge to the line in Figure 3.6.<\/p>\r\n\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"870\"]<img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p51-1.png\" alt=\"Plot of the exact evolution of losing Game A in Parrondo's Paradox. The y axis is labeled as money from 0.0 to negative 1.2. The x axis is the flip number from 0-120. There is a straight diagonally line starting at (0,0) down to the right, ending at (100, negative 1.0).\" width=\"870\" height=\"503\" \/> <strong>Figure 3.6: The exact evolution of Game A.<\/strong>[\/caption]\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Your simulation and Figure 3.6 make clear that Game A is a loser. From the first flip, it gets steadily worse, and by the last flip, you can expect to lose 1 monetary unit.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Of course, not every single game is a loser. Column CV in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCRa<\/em><em class=\"import-i\">w<\/em><\/span> sheet shows many cells that are positive. On average, however, we can expect to lose playing Game A.<\/p>\r\n\r\n<h3 class=\"import-bh\">Losing Game B<\/h3>\r\n<p class=\"import-paft\">Game B is also a loser, but it is more complicated than Game A. Game B is based on two coins, and you use Coin 1 if your current monetary holding is divisible by 3; otherwise, you use Coin 2. The MOD function enables us to determine which coin to use.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Return to the sheet with Game A, and in cell E1, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=MOD(13,12)<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The cell displays 1 because it is doing modulo arithmetic. Converting military time to a.m.\/p.m. time uses the modulo operator: 1300 is 1 p.m. because you divide by 12 and the remainder is the answer.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Game B follows the flow chart in Figure 3.7. We will use the MOD function to divide any number by 3, and if the result is 0, we know it is evenly divisible, and we flip Coin 1. If not, we flip Coin 2.<\/p>\r\n\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"893\"]<img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p52-1.png\" alt=\"Display of how Game B works in Parrondo's Paradox using a flow chart. Is M divisible by 3? Choose yes or no. From yes, the option is Coin 1: Prob Heads = 10% - epsilon. From no, the option is Coin 2: Prob Heads = 75% - epsilon.\" width=\"893\" height=\"392\" \/> <strong>Figure 3.7: The rules of Game B.<\/strong>[\/caption]\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell G1, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Gam<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">B<\/em><\/span>. In cells G3 and G4, enter the labels \u201cCoin 1\u201d and \u201cp(H).\u201d In cell H4, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=0.1-B3<\/em><\/span>. In cells J3 and J4, enter the labels \u201cCoin 2\u201d and \u201cp(H).\u201d In cell K4, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=0.75-B3<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Obviously, we would rather flip Coin 2. It comes up heads almost 75% of the time, so we win 1 monetary unit. Coin 1 is the opposite. It is strongly biased against heads, so we lose often with Coin 1.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Copy cells A6:A107 and paste in cell G6. In cell H6, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Star<\/em><em class=\"import-i\">t <\/em><em class=\"import-i\">M<\/em><\/span>, and in cell H7, enter a 0. In cell I6, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MOD(M,3)<\/em><\/span>, and in cell I7, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=MOD(H7,3)<\/em><\/span>. In cell J6, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Coin<\/em><\/span>, and in cell J7, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(I7=0,1,2)<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">You start with 0 monetary units. That is evenly divisible by 3, so we will use Coin 1, but we need a more general formula to determine what happens if it is Coin 1 or Coin 2. An IF statement can handle this.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell K6, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Result<\/em><\/span>, and in cell K7, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(J7=1,IF(RAND()<\/em><em class=\"import-i\">&lt;<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">H<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">4,1,0),IF(RAND()<\/em><em class=\"import-i\">&lt;<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">K<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">4,1,0))<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Next, we report our money position.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell L6, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">En<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">M<\/em><\/span>, and in cell L7, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(K7=1,1,-1)<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">In the next row, we walk through the cells in order.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell H8, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=L<\/em><em class=\"import-i\">7<\/em><\/span> and fill it down. Select cell range I7:K7 and fill it down. In cell L8, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(K8=1,L7+1,L7<\/em><em class=\"import-i\">-<\/em><em class=\"import-i\">1)<\/em><\/span>. Fill it down. Select L6:L106 and make a Scatter chart.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">This completes Game B. Cell L106 gives the final result of the game, but as we did before, we can track every flip of the game to better understand it.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Select cell range L7:L106 and click <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCSi<\/em><em class=\"import-i\">m<\/em><\/span> (in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Add-in<\/em><em class=\"import-i\">s<\/em><\/span> tab). Check (if needed) <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Recor<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">Al<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">Selecte<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">Cell<\/em><em class=\"import-i\">s<\/em><\/span> and click <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Proceed<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">As before, two sheets are inserted, and we will process the data in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCRa<\/em><em class=\"import-i\">w<\/em><\/span> sheet to show how Game B works.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Return to the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCRaw<\/em><em class=\"import-i\">3<\/em><\/span> sheet and copy row 1003. Go to the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCRaw<\/em><em class=\"import-i\">5<\/em><\/span> sheet, select cell A1003, and paste. Make a chart of row 1003.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Your results are surprising. In the first few flips of the ensemble average, it jaggedly oscillates and then settles down to a downward-sloping relationship.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The exact ensemble average is given by Figure 3.8. The simulation is correct in that there is an oscillation in the expected value in the first few flips before convergence to a single line that heads downward.<\/p>\r\n\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"865\"]<img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p53-1.png\" alt=\"Line chart showing the cumulative change in money (vertical axis, labeled &quot;Money,&quot; ranging from -1.6 to 0.0) over a series of coin flips (horizontal axis, labeled &quot;Flip #,&quot; ranging from 0 to 120). The line begins near 0 at flip 0 and immediately drops into negative territory. In the early flips (approximately 0 to 30), the line oscillates with noticeable up-and-down fluctuations, suggesting alternating wins and losses, but with a generally downward trend. The amplitude of the oscillations decreases over time. From approximately flip 30 onward, the oscillations dampen and the line transitions into a smooth, steadily declining curve with little variation, trending consistently downward. By flip 100, the cumulative money value reaches a labeled endpoint of -1.392320, indicating a net loss of approximately $1.39 over 100 flips.\" width=\"865\" height=\"509\" \/> <strong>Figure 3.8: The exact evolution of Game B.<\/strong>[\/caption]\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Your simulation results and Figure 3.8 show that Game B is a loser and a bigger loser than Game A. We can expect to lose about 1.4 monetary units playing Game B.<\/p>\r\n\r\n<h3 class=\"import-bh\">Mixing Two Losing Games<\/h3>\r\n<p class=\"import-paft\">Having set up and run Games A and B separately, we are now ready to mix these two losing games. This will demonstrate Parrondo\u2019s Paradox because, somehow, mixing the losing games results in a winning game. In fact, there is an optimal mixing strategy, but we will randomly mix the two games by flipping a fair coin.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell N1, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Rando<\/em><em class=\"import-i\">m <\/em><em class=\"import-i\">Mixing<\/em><\/span>. Copy cell range A6:A106, select cell N6, and paste. In cell O6, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Game<\/em><\/span>, and in cell O7, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(RAND()<\/em><em class=\"import-i\">&lt;<\/em><em class=\"import-i\">0.5,\u201cA<\/em><em class=\"import-i\">\u201d,<\/em><em class=\"import-i\">\u201cB\u201d)<\/em><\/span>. Fill it down.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Column O tells us which game we will play at each coin flip. It is easy to see by pressing <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> repeatedly that the letters A and B are bouncing around, indicating that we are mixing the games randomly.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">We need to input Game B again (we cannot just use Game B in columns G:L) because it depends on the value of M to decide which coin to play.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell P5, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">I<\/em><em class=\"import-i\">f <\/em><em class=\"import-i\">Gam<\/em><em class=\"import-i\">e B <\/em><em class=\"import-i\">i<\/em><em class=\"import-i\">s <\/em><em class=\"import-i\">chosen<\/em><\/span>. Copy cell range H6:K7, select cell P6, and paste.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">We need an IF statement to display the actual outcome of this game based on whether we play Game A or Game B. We take Game A from column B, since it does not depend on the amount of M we have, but we take Game B from column S.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell T6, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Actua<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">Result<\/em><\/span>, and in cell T7, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(O7=\u201cA<\/em><em class=\"import-i\">,\u201d<\/em><em class=\"import-i\">B7,S7)<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Next, we determine our monetary position.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell U6, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">En<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">M<\/em><\/span>, and in cell U7, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(T7=1,1,-1)<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">We process the second flip and fill down to complete the implementation.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell P8, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=U7<\/em><\/span>. Fill it down. Select cell range Q7:T7 and fill it down. In cell U8, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(T8=1,U7+1,U7-1)<\/em><\/span>. Fill it down. Select cell range U6:U106 and make a <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Scatte<\/em><em class=\"import-i\">r<\/em><\/span> chart.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">This completes the random mixing of two losing games. Cell U106 gives the final result of the game. Pressing <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> does not reveal much. We need to run a simulation.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Select cell range U7:U106 and click <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCSi<\/em><em class=\"import-i\">m<\/em><\/span> (in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Add-in<\/em><em class=\"import-i\">s<\/em><\/span> tab). Confirm that the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Recor<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">Al<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">Selecte<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">Cell<\/em><em class=\"import-i\">s<\/em><\/span> option is still checked and click <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Proceed<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">We have the data to demonstrate Parrondo\u2019s Paradox, but we need to create an ensemble average chart.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Return to the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCRaw<\/em><em class=\"import-i\">5<\/em><\/span> sheet and copy row 1003. Go to the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCRaw<\/em><em class=\"import-i\">7<\/em><\/span> sheet, select cell A1003, and paste. Make a chart of row 1003.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The results are absolutely stunning. Unlike our two previous charts, this one points upward, and the final value is positive! This is a winning game! Figure 3.9 shows the exact evolution of the randomly mixed games. The expected value of playing a random combination of Games A and B keeps rising the more you play. That is mind-boggling.<\/p>\r\n\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"842\"]<img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p55-1.png\" alt=\"Line chart showing cumulative money (vertical axis, labeled &quot;Money,&quot; ranging from -0.6 to 1.4) over a series of coin flips (horizontal axis, labeled &quot;Flip #,&quot; ranging from 0 to 120) showing positive and rising expected value. The line begins at 0 and drops sharply into negative territory in the first few flips, reaching a trough of approximately -0.4 around flip 5, with notable early volatility shown by small oscillations. From approximately flip 10 onward, the line transitions into a smooth, steadily and nearly linearly increasing trajectory, crossing into positive territory around flip 15\u201320 and continuing upward without significant fluctuation. By flip 100, the cumulative money value reaches a labeled endpoint of 1.287327. \" width=\"842\" height=\"511\" \/> <strong>Figure 3.9: The exact evolution of randomly mixing A and B.<\/strong>[\/caption]\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">By randomly mixing individually losing Games A and B, we can expect to win about 1.3 monetary units playing 100 times. This is Parrondo\u2019s Paradox.<\/p>\r\n\r\n<h3 class=\"import-bh\">What Is Going on Here?<\/h3>\r\n<p class=\"import-paft\">Does this work show that you can walk into a casino and take turns playing blackjack and roulette and come out a winner? No.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Does it mean that you can combine two losing stocks and somehow make money? No.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Does it mean that I can take something poisonous and then drink another poison and the two will combine to heal me? No.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Parrondo\u2019s Paradox does not say that mixing <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">an<\/em><em class=\"import-i\">y<\/em><\/span> two losing games produces a winner. The two games and epsilon value were chosen carefully. Parrondo found a parameter value that generated the anomalous result. Think of a Cartesian plane with a coordinate that is like a black hole\u2014all the other coordinates behave as expected, but this particular point is really weird. Parrondo found such a point by carefully picking the bias (epsilon) in Games A and B.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Applying this paradox to the real world is a challenge. Explaining the inspiration for Parrondo\u2019s discovery of the paradox will help us understand how the paradox emerges.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Parrondo is a physicist, and his discovery of an epsilon value that produced the paradox was influenced by something called the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">fl<\/em><em class=\"import-i\">ashin<\/em><em class=\"import-i\">g <\/em><em class=\"import-i\">Brownia<\/em><em class=\"import-i\">n <\/em><em class=\"import-i\">ratchet<\/em><\/span>. This is a process that alternates between two regimes in a sawtooth fashion.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Watch this two-minute video to see the ratchet in action and how it applies to Parrondo\u2019s Paradox: <a href=\"http:\/\/vimeo.com\/econexcel\/parrondo\"><span style=\"border: none windowtext 0pt;padding: 0\"><span class=\"import-url\">vimeo.com\/<\/span><span class=\"import-url\">econexcel<\/span><span class=\"import-url\">\/<\/span><span class=\"import-url\">parrondo<\/span><\/span><\/a>. You can control the ratchet yourself here: <a href=\"http:\/\/dub.sh\/ratchet\">dub.sh\/ratchet<\/a>.<\/p>\r\n[embed]http:\/\/vimeo.com\/econexcel\/parrondo[\/embed]\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">It is true that Parrondo\u2019s Paradox requires a specific, and we might add rare, type of losing game to be mixed. The paradox would never emerge if the two losing games were like Game A. Mixing two Game As would produce a bigger negative outcome.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Game B, with its two coins, one of which is biased in our favor, holds the key to the paradox. Figure 3.8 tells us that for the first few flips, the expected value of Game B alternates. Mixing takes advantage of the positive parts of Game B in those first few flips.<\/p>\r\n\r\n<div class=\"textbox textbox--key-takeaways\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Takeaways<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p class=\"import-paft\">Optical illusions and paradoxes are mind-bending. They violate what we expect to happen and force us to deal with something unbelievable.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #b00000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Watch a classic two-minute video of an optical illusion with an explanation of how it works: <a href=\"http:\/\/dub.sh\/faceillusion\">dub.sh\/faceillusion<\/a>.<\/p>\r\n[embed]https:\/\/www.youtube.com\/watch?v=QbKw0_v2clo[\/embed]\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Like an optical illusion, Parrondo\u2019s Paradox produces a shocking result: Loser plus loser equals winner. That should not happen.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">At a magic show, we know a trick is involved, so the person was not really cut in half or made to disappear. There is a logical reason\u2014we just do not know what it is unless it is explained to us.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Similarly, Parrondo\u2019s Paradox can be explained. The key lies in the ratchet, which trends down but in a sawtooth, herky-jerky motion. The video (<a href=\"http:\/\/dub.sh\/ratchet\">dub.sh\/ratchet<\/a>) shows how it catches the ball at just the right time and pushes it upward, even as it is heading downward, producing an overall upward movement.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Parrondo\u2019s Paradox requires specific values for epsilon, the bias in coins being flipped, and Game B is actually a combination of two coins that are used based on whether the player\u2019s total amount of money is evenly divisible by 3.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">It is Game B that has the ratchet that explains the paradox. Figure 3.8 shows that it is a losing game (heading downward), but look carefully at the beginning\u2014the oscillations during the first few flips contain the explanation to the paradox.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #b00000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Listen to this 10-minute podcast at <a href=\"http:\/\/dub.sh\/parrondo\">dub.sh\/parrondo<\/a>.<\/p>\r\n[embed]https:\/\/barretoh.github.io\/GitHuBratchet\/ParrondoParadox.mp3[\/embed]\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">This podcast was generated by NotebookLM (a Google AI experiment in October of 2024 freely available at <span style=\"border: none windowtext 0pt;padding: 0\"><a class=\"rId59\" href=\"https:\/\/notebooklm.google.com\/\"><span class=\"import-url\">https:\/\/notebooklm.google.com\/<\/span><\/a><\/span>). In October of 2024, I was shocked by what this AI could do, and 18 months later, I remain quite impressed by NotebookLM!<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">We used simulation to explain Parrondo\u2019s Paradox, but it is not an integral part of the paradox. An analytical solution using Markov chains provides exact results. The analytical solution was used to create the exact evolution charts (Barreto, 2009).<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Usually, we use simulation to represent a real-world process. We do not have to make it a perfect representation, but it must capture the essential elements.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">We can go, however, in the other direction, from an artificial environment to the real world. Parrondo found something paradoxical, and now we are asking, \u201cIs there something like this in reality?\u201d Could it ever make sense to combine stocks or medicines or anything else in a way that reverses the negative result? The search is on.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Finally, random mixing produces a winning game with an expected value of about 1.3 monetary units at the 100th flip, but there is an optimal mix. Playing AB, then ABBABBABB repeatedly, produces an expected value of a little over 6 monetary units. Barreto (2009) explains the analytical solution and optimal mixing with Excel.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">References<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p class=\"hanging-indent\">Barreto, H. (2009). \u201cA Microsoft Excel Version of Parrondo\u2019s Paradox.\u201d <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">SSR<\/em><em class=\"import-i\">N <\/em><em class=\"import-i\">Workin<\/em><em class=\"import-i\">g <\/em><em class=\"import-i\">Paper. <\/em><\/span>Available at SSRN: <a href=\"https:\/\/ssrn.com\/abstract=1431958\" target=\"_blank\" rel=\"noopener\">https:\/\/ssrn.com\/abstract=1431958<\/a>\u00a0or\u00a0<a class=\"textlink\" href=\"https:\/\/dx.doi.org\/10.2139\/ssrn.1431958\" target=\"_blank\" rel=\"noopener\">http:\/\/dx.doi.org\/10.2139\/ssrn.1431958<\/a><\/p>\r\n<p class=\"hanging-indent\">Barreto, H. (2018, Sept. 28). <em>Ratchet effect<\/em>. [Video]. Vimeo. <a href=\"http:\/\/vimeo.com\/econexcel\/parrondo\"><span style=\"border: none windowtext 0pt;padding: 0\"><span class=\"import-url\">vimeo.com\/<\/span><span class=\"import-url\">econexcel<\/span><span class=\"import-url\">\/<\/span><span class=\"import-url\">parrondo<\/span><\/span><\/a><\/p>\r\n<p class=\"hanging-indent\">Nicholls, E., Churches, O., and Loetscher, T. (2018). \u201cPerception of an Ambiguous Figure Is Affected by Own-Age Social Biases.\u201d <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Scienti<\/em><em class=\"import-i\">fi<\/em><em class=\"import-i\">c <\/em><em class=\"import-i\">Report<\/em><em class=\"import-i\">s<\/em><\/span> 8, no. 12661. Open access: <a href=\"http:\/\/www.nature.com\/articles\/s41598-018-31129-7\"><span style=\"border: none windowtext 0pt;padding: 0\"><span class=\"import-url\">www.nature.com\/articles\/s41598-018-31129-7<\/span><\/span><\/a>.<\/p>\r\n<p class=\"hanging-indent\">RayOman. (2009, August 17). <em data-start=\"104\" data-end=\"136\">Charlie Chaplin Optic Illusion<\/em> [Video]. YouTube. <a class=\"\" href=\"https:\/\/www.youtube.com\/watch?v=QbKw0_v2clo\" target=\"_new\" rel=\"noopener\" data-start=\"155\" data-end=\"198\">https:\/\/www.youtube.com\/watch?v=QbKw0_v2clo<\/a>.<\/p>\r\n<p class=\"hanging-indent\">A Mathematica version: <a href=\"http:\/\/demonstrations.wolfram.com\/TheParrondoParadox\/\"><span style=\"border: none windowtext 0pt;padding: 0\"><span class=\"import-url\">demonstrations.wolfram.com\/<\/span><span class=\"import-url\">TheParrondoParadox<\/span><span class=\"import-url\">\/<\/span><\/span><\/a>.<\/p>\r\n<p class=\"hanging-indent\">A YouTube demonstration: <a href=\"http:\/\/www.youtube.com\/watch?v=PpvboBJEozM\"><span style=\"border: none windowtext 0pt;padding: 0\"><span class=\"import-url\">www.youtube.com\/watch?v=PpvboBJEozM<\/span><\/span><\/a>.<\/p>\r\n<p class=\"hanging-indent\">For an entertaining read on paradoxes, try <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Perplexing Paradoxes<\/em><\/span> by George Szpiro (2024).<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<h2 class=\"import-ah\">3.3 Pooled Testing via Simulation<\/h2>\r\n<p class=\"import-paft\">Those who lived through the COVID-19 pandemic are certain to remember it for a long time: masks, mandates, social distancing, vaccines, virtual meetings, and for many of us, losing loved ones. We will also remember testing for coronavirus with nasal swabs and at-home kits.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">It never caught on in the United States during the COVID-19 pandemic, but pooled testing was considered because it could reduce the number of tests needed and save time (Mandavilli, 2020). Pooled testing was used successfully by the United States military during World War II to test men for syphilis (Dorfman, 1943).<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The logic of pooled testing is straightforward. A university, for example, could take saliva or nasal swab samples from each student and test them individually, or it could combine a part of each sample from several people into a single group and test the pooled sample. If it is negative, then all the individuals in the combined pool are negative, and we have saved on testing every person in that group. If the pooled sample is positive, then individual tests would be performed on the reserved parts of each individual\u2019s sample to determine exactly who is infected.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">This leads to a crucial question: <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Wha<\/em><em class=\"import-i\">t <\/em><em class=\"import-i\">i<\/em><em class=\"import-i\">s <\/em><em class=\"import-i\">th<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">optima<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">grou<\/em><em class=\"import-i\">p <\/em><em class=\"import-i\">size<\/em><em class=\"import-i\">?<\/em><\/span> The bigger the group, the lower the number of groups tested but the higher the chances a group is positive, and then everyone in the group has to be tested (we ignore the possibility of subgroup testing, false positives or negatives, and other complications).<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">We solve this optimization problem by constructing an Excel spreadsheet and using Monte Carlo simulation. We proceed step by step and reveal Excel functions and tools as we create our model of pooled testing.<\/p>\r\n\r\n<h3 class=\"import-bh\">The Data Generation Process<\/h3>\r\n<p class=\"import-paft\">The first thing we need to do is implement the random process by which some people get infected and others do not. We do this by drawing a random number and comparing it to a threshold value, so we get either a 0 (not infected) or a 1 (infected).<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">We make the simplifying assumption that everyone has the same likelihood of catching the virus\u2014say, 5%. This is an <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">exogenou<\/em><em class=\"import-i\">s <\/em><em class=\"import-i\">variabl<\/em><em class=\"import-i\">e<\/em><\/span> (also called a <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">parameter<\/em><\/span>) in our model and it will serve as our threshold value for determining whether someone is infected.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Enter 5% in cell A1 of a blank spreadsheet and label it as \u201cinfection rate\u201d in cell B1. Save the Excel file (<span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">PooledTesting.xls<\/em><em class=\"import-i\">x<\/em><\/span> is a good name).<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Cell A1 displays 5%, which is the same as 0.05 in decimal notation. The number 0.05 is what the spreadsheet stores in its internal memory. It is worth remembering that what is displayed may be different from what is stored.<\/p>\r\n\r\n<div class=\"textbox\">\r\n<p class=\"import-bxt\" style=\"padding-left: 40px\"><span style=\"color: #006838\"><strong><em>EXCEL TIP <\/em><\/strong><\/span>Name cells, especially if you have many or complicated formulas. We have been using cell addresses, and we can, of course, refer to cell A1 in a formula, but cell addresses can be difficult to read. It is good practice to name a cell or a cell range so that formulas can use natural language to reference cells.<\/p>\r\n\r\n<\/div>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Name cell A1 <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">InfectionRat<\/em><em class=\"import-i\">e<\/em><\/span> because this will make our future formulas easier to understand. If needed, search Excel\u2019s Help for \u201cnames in formulas.\u201d<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Next, we incorporate randomness. As you know, Excel draws uniformly distributed random numbers in the interval from 0 to 1 with the RAND() function.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell A3, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=RAND()<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Just like the free throw shooting and coin-flipping examples, we use Excel\u2019s random number generator to determine whether or not a person gets COVID-19. We use an IF statement to group the RAND-generated values into two categories, 0 and 1.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell A4, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(A3<\/em><em class=\"import-i\">&lt;<\/em><em class=\"import-i\">InfectionRate<\/em><\/span>, <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">1<\/em><\/span>, <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">0)<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">You probably see a 0 displayed in cell A4; if not, press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> (you may have to use the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">f<\/em><em class=\"import-i\">n<\/em><\/span> key). Zero means the person is not infected.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> repeatedly to recalculate the sheet until you see a 1 in cell A4. The chances are only 1 in 20, so be patient.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">As you recalculated, cell A3 constantly changed, but cell A4 changed only if A3 switched from being above or below the infection rate. Cell A4 is a <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">binomia<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">rando<\/em><em class=\"import-i\">m <\/em><em class=\"import-i\">variabl<\/em><em class=\"import-i\">e<\/em><\/span> because it can take only the values 0 or 1.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Now that we know how to implement a random process that outputs whether or not an individual is infected, we can create an entire population of people, some who get infected with the virus and others who do not.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell C1, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(RAND()<\/em><em class=\"import-i\">&lt;<\/em><em class=\"import-i\">InfectionRate,1,0)<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Notice how we directly embedded the RAND() function in the cell formula. We do not know which random number was drawn, but we do know if it was less than 5% because then cell C1 would display \u201c1.\u201d<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Fill down this formula all the way to cell C1000.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">As you scroll back up to the top row, you will see a sprinkling of ones among many zeroes. With a 5% infection rate, roughly 1 in 20 cells will have random number draws less than 5% and therefore show the number one.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The fact that each cell in column C stands alone and does not depend on or influence other cells means we are assuming <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">independence<\/em><\/span>. In our model, one person with the virus does not affect the chances of anyone else being infected. This condition is surely violated in the real world. To improve our analysis, we should make the chances of infection depend on whether people with whom they come in contact have the virus.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">However, since our focus here is on showing how pooled testing works, we will not model infection as dependent on people nearby. This would be a fun project where you would create clusters of cells, and if one got sick, the nearby cells would have a much higher chance of infection.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">How many people in our population of 1,000 are infected?<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=SUM(C1:C1000<\/em><\/span>) in cell D1 and the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Numbe<\/em><em class=\"import-i\">r <\/em><em class=\"import-i\">infected<\/em><\/span> in cell E1.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">You will see a number around 50 in cell D1. The number of infected people is not always exactly 50 because chance is involved in who gets infected.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Recalculate by pressing <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> a few times to get a sense of the variability in the number of infected people.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The total number of infected people can be less than 40 or more than 60, but that is not common. Usually, there are around 45 to 55 infected. There is no doubt that the number of infected people is a random number, since it is bouncing around when you recalculate the sheet. It makes common sense that adding binomial random variables will produce a random outcome.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">We can make it easier to identify who is infected with a spreadsheet\u2019s conditional formatting capability. This offers the viewer visual cues that make data easier to understand.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Select the entire column C and apply a formatting rule that highlights, with color, cells with a value of 1. Choose font and fill colors that you think emphasize being infected. If needed, search Excel\u2019s Help for \u201cconditional formatting.\u201d<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Now when you scroll down, it is easy to see who is infected. Recalculation changes who is infected\u2014it is as if we rewound and replayed the world with each press of <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F9<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Having implemented the chance process for being infected or not, we turn to pooled testing. Instead of testing each person, we can group individuals and test their combined sample. If the pooled sample tests positive, then we know at least one person is infected; if not, we know no one is infected, and we do not have to test each individual in the group.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Instead of directly choosing the number of groups, it is more convenient to make group size the choice variable. Choosing group size determines how many groups we have, since:<\/p>\r\n&nbsp;\r\n<p style=\"text-align: center\">[latex]\\text{Number} \\text{of} \\text{Groups} = \\frac{\\text{Population}}{\\text{Group} \\text{Size}}[\/latex]<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">With our population of 1,000 people, a group size of 100 means we will have 10 groups. Intuitively, with an infection rate of 5%, 100 people in a group means that at least 1 person will be infected, and the group is probably going to test positive. We can make our intuition more convincing by computing the exact chances.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Begin by entering 100 in cell D3 and the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Grou<\/em><em class=\"import-i\">p <\/em><em class=\"import-i\">Size<\/em><\/span> next to it in cell E3. Enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=1000\/D<\/em><em class=\"import-i\">3<\/em><\/span> in cell D4 and the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Numbe<\/em><em class=\"import-i\">r <\/em><em class=\"import-i\">o<\/em><em class=\"import-i\">f <\/em><em class=\"import-i\">Groups<\/em><\/span> in cell E4.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">An infection rate of 5% means each person has a 95% chance of not being infected. If there are 2 people (assuming the chances of infection are independent), then there is a 0.95 \u00d7 0.95 = 0.95<span style=\"border: none windowtext 0pt;padding: 0\"><sup class=\"import-sup\">2<\/sup><\/span> = 0.9025, or 90.25%, chance that neither is infected. This means there is a 100% \u2212 90.25% = 9.75% chance that at least 1 of the 2 people is infected.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">What are the chances that at least 1 person is infected in a group of 100 people? Remember, if even 1 person is infected in a group, we have to test everyone in the group to find out who is infected.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell D6, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=(1<\/em><em class=\"import-i\">-<\/em><em class=\"import-i\">InfectionRate)<\/em><em class=\"import-i\">\u02c6<\/em><em class=\"import-i\">D<\/em><em class=\"import-i\">3<\/em><\/span> and label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">pro<\/em><em class=\"import-i\">b <\/em><em class=\"import-i\">n<\/em><em class=\"import-i\">o <\/em><em class=\"import-i\">on<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">i<\/em><em class=\"import-i\">n <\/em><em class=\"import-i\">th<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">grou<\/em><em class=\"import-i\">p <\/em><em class=\"import-i\">infected<\/em><\/span> in cell E6. Format D6 as a percentage so that it displays 0.59%.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Next, we compute 100% minus the probability that no one in the group is infected to find the probability that at least 1 person is infected.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell D7, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=1<\/em><em class=\"import-i\">-<\/em><em class=\"import-i\">D<\/em><em class=\"import-i\">6<\/em><\/span> and label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">pro<\/em><em class=\"import-i\">b <\/em><em class=\"import-i\">a<\/em><em class=\"import-i\">t <\/em><em class=\"import-i\">leas<\/em><em class=\"import-i\">t <\/em><em class=\"import-i\">on<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">i<\/em><em class=\"import-i\">n <\/em><em class=\"import-i\">th<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">grou<\/em><em class=\"import-i\">p <\/em><em class=\"import-i\">infected<\/em><\/span> in cell E7. Format D7 as a percentage (if needed).<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">With an infection rate of 5%, doing pooled testing with a group size of 100 is wasteful. After all, it seems overwhelmingly likely (over 99%) that we will have to test everyone in each of the 10 groups, so we would end up doing 1,010 tests.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Can we make our spreadsheet show how many people are infected in each group and confirm the computations we just made? We can, but the approach we adopt uses a function that may be unfamiliar and advanced\u2014the OFFSET reference function. Thus, we proceed slowly.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell G1, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=OFFSET(E1,3,0)<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Cell G1 computes the number of groups because the OFFSET function went to cell E1 (the first argument in the function), then went three rows down (the second argument). The third argument is 0, so it stayed in column E. If the movement arguments are a positive integer, we move down or right; negative integers move us up or left.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Change the formula in cell G1 to <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=SUM(OFFSET(D1,0,0,3,1))<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Why does G1 display D1 plus 100? The two zeroes mean it did not move from the reference cell D1, but the fourth and fifth arguments control the height and width, respectively, of the cell range. Therefore, the formula says to add up the values in cells D1, D2 (which is blank), and D3 (100).<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">We want to add up the values in column C into 10 separate groups of 100 each. We can modify our OFFSET function to do the first group of 100.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Change the formula in cell G1 to <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=SUM(OFFSET(C1,0,0,100,1))<\/em><\/span>. To be clear, change the D1 to C1 and the 3 to 100.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The value reported in cell G1 is the sum of the first 100 people in the population. How can we get the second group of 100 people?<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Change the formula in cell G1 to <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=SUM(OFFSET(<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">C<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">1,0,0,100,1)<\/em><em class=\"import-i\">)<\/em><\/span> and fill it down to cell G2.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Adding the dollar signs made C1 an absolute reference, so we kept our C1 starting point in cell G2, but we need to change the formula so it adds up the number of infected people in the second set of 100. We do that by changing the second argument because it controls how many rows to move from the reference cell.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Change the formula in cell G2 to <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=SUM(OFFSET(<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">C<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">1,100,0,100,1))<\/em><\/span>.<\/p>\r\n\r\n<div class=\"textbox\">\r\n<p class=\"import-bxt\" style=\"padding-left: 40px\"><span style=\"color: #006838\"><strong><em>EXCEL TIP\u00a0<\/em><\/strong><\/span>Cell G2 reports how many people are infected in the second group of 100. We could fill down eight more cells and then change the second argument manually to 200, 300, and so on, but this is poor spreadsheet practice. <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Neve<\/em><em class=\"import-i\">r <\/em><em class=\"import-i\">manuall<\/em><em class=\"import-i\">y <\/em><em class=\"import-i\">repea<\/em><em class=\"import-i\">t<\/em><\/span> the same entry or an ordered sequence (e.g., numbers or dates). In addition, you want to <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">maximiz<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">fl<\/em><em class=\"import-i\">exibility<\/em><\/span>. Hard-coding numbers, like 100, in formulas is poor practice because you might want to change that number in the future.<\/p>\r\n\r\n<\/div>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">In this case, we want our groupings to respond to changes in cell D3. If, for example, we have a group size of 50, we would then have 20 groups. We want the spreadsheet to automatically show how many people are infected in each of the 20 groups.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">This task requires that we modify the second and fourth arguments. The fourth argument is the group size, which is simply cell D3. The second argument is more complicated. It is 0 for the first group, then increases by D3 for each group. One way to do this is to use the ROW function, which returns the row number of a cell.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Replace the formula in cell G2 with <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=ROW(D6)<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Cell G2 displays 6, the row number of cell D6. What happens if the ROW function does not have an argument?<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Change the formula in cell G2 to <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=ROW(<\/em><em class=\"import-i\">)<\/em><\/span> and fill it down to G10.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Without an argument, the ROW function returns the row number of the cell that contains ROW() in the formula. We can use this to create a series that starts at 0 and increases by the amount in cell D3.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Change the formula in cell G2 to <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=(ROW()<\/em><em class=\"import-i\">-<\/em><em class=\"import-i\">1)*<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">D<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">3<\/em><\/span> and fill it down to G10.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">We can use our ROW function strategy in the OFFSET function\u2019s second argument to create a formula that gives us the number of infected people for any group size from 2 to 500 entered in D3. A \u201cgroup\u201d of 1 is simply individual testing, and with 1,000 people, a group size of 2 yields 500 groups. Choosing a group size of 500 gives us 2 groups.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">We start with G1 (notice that ROW()-1 is zero for G1 so the second argument evaluates to zero) and fill down to G500 (since 500 is the maximum number of groups we can have).<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Change the formula in cell G1 to <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=SUM(OFFSET(<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">C<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">1,(ROW()<\/em><em class=\"import-i\">-<\/em><em class=\"import-i\">1)*<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">D<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">3,0,<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">D<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">3,1)<\/em><em class=\"import-i\">)<\/em><\/span> and fill it down to G500.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Cells G1 to G10 now display the number of infected people in each of the 10 groups of 100 people.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Click the letter C in column C to select the entire column, and then click the <span class=\"import-ccust1\">Forma<\/span><span class=\"import-ccust1\">t <\/span><span class=\"import-ccust1\">Painte<\/span><span class=\"import-ccust1\">r<\/span> button (in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Hom<\/em><em class=\"import-i\">e<\/em><\/span> tab in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Ribbon<\/em><\/span>, or top menu). Now click the letter G in column G.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">You applied the formatting in column C, including your conditional formatting to highlight the infected people, to column G. It (probably) shows all the groups highlighted, but it will soon come in handy when we lower the group size so that some groups have no infected people.<\/p>\r\n\r\n<div class=\"textbox\">\r\n<p class=\"import-bxt\" style=\"padding-left: 40px\"><span style=\"color: #006838\"><strong><em>EXCEL TIP <\/em><\/strong><\/span>It is good practice to <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">includ<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">check<\/em><em class=\"import-i\">s<\/em><\/span> in your spreadsheets. In this case, an easy check is to see if the sum of infected people in the 10 groups equals the total number of infected people in the population in column C.<\/p>\r\n\r\n<\/div>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell H1, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=SUM(G1:G500<\/em><em class=\"import-i\">)<\/em><\/span> and the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">check<\/em><\/span> in cell I1, then recalculate the sheet a few times.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">It is easy to see that cells D1 and H1 are the same. If not, something is wrong, and you will have to go back to each step to find and fix the mistake.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Change cell D3 to 200 and recalculate the sheet a few times.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Now only five cells in column G have nonzero values, representing the number of infected people in each of the five groups.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Group sizes of 100 and 200 are way too big to be the optimal size because we are extremely unlikely to get a group where everyone tests negative, so we almost always have to test everyone in the group. We need to try much smaller group sizes.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Change cell D3 to 20 and recalculate the sheet a few times.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Now we are really getting somewhere. Column G is showing the number of infected people in each of 50 groups. You can see values of 0, 1, 2, 3, and less frequently, higher numbers. We love to see zeroes because they mean we do not have to test anyone in that group, so we saved 20 tests.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">How many tests will we have to run in total? The COUNTIF function allows us to count the number of cells in a range that meet a specific condition.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell H2, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=COUNTIF(G1:G500,\u201c<\/em><\/span>&gt; 0<span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">\u201d<\/em><em class=\"import-i\">)<\/em><\/span> and the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">numbe<\/em><em class=\"import-i\">r <\/em><em class=\"import-i\">o<\/em><em class=\"import-i\">f <\/em><em class=\"import-i\">group<\/em><em class=\"import-i\">s <\/em><em class=\"import-i\">t<\/em><em class=\"import-i\">o <\/em><em class=\"import-i\">test<\/em><\/span> in I2.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The COUNTIF function reports the number of cells in the range G1:G500 that have a value greater than zero. If we multiply this by the group size, we know how many individual tests we have to run. This is added to the number of group tests to give us our total number of tests.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell H3, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=H2*D<\/em><em class=\"import-i\">3<\/em><\/span> and the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">test<\/em><em class=\"import-i\">s <\/em><em class=\"import-i\">fro<\/em><em class=\"import-i\">m <\/em><em class=\"import-i\">infecte<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">groups<\/em><\/span> in I3. In cell H4, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=D4+H<\/em><em class=\"import-i\">3<\/em><\/span> and the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">tota<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">tests<\/em><\/span> in I4.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Notice that once again, we did not hard-code numbers (like 20 for group size) into the formula. We want our spreadsheet to respond to changes in group size (cell D3) automatically.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Cell H4 is certainly giving us good news. \u201cTotal tests\u201d is a random variable that is almost certainly less than 1,000. You are likely to see numbers around 690 tests, give or take 70 or so. This is about a 30% decrease in the number of tests from the 1,000 required by individual testing.<\/p>\r\n\r\n<h3 class=\"import-bh\">Finding the Optimal Group Size<\/h3>\r\n<p class=\"import-paft\">In our spreadsheet, we have implemented a stochastic (or chance) process of getting infected and demonstrated the power of pooled testing. Grouping allows us to save on testing because when groups have no infected people, we do not have to test those individual samples.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Our spreadsheet shows that a group size of 20 is better than individual testing, but we do not want to do merely better than 1,000 tests. We want to perform the fewest number of tests. Our fundamental question is, <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Wha<\/em><em class=\"import-i\">t <\/em><em class=\"import-i\">i<\/em><em class=\"import-i\">s <\/em><em class=\"import-i\">th<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">optima<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">grou<\/em><em class=\"import-i\">p <\/em><em class=\"import-i\">size<\/em><em class=\"import-i\">?<\/em><\/span><\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">There is a complication that we have to confront to answer our question: \u201cTotal tests\u201d is a random variable. We cannot just look at a single outcome because we know there is chance involved. Suppose two dice are on a table, each showing 1, and I asked you to guess the sum of the next roll. You would not guess 2 because you know that is really unlikely.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">We deal with the fact that \u201cTotal tests\u201d is a random variable by focusing on the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">expecte<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">valu<\/em><em class=\"import-i\">e<\/em><\/span> of total tests. This is what we would typically observe. The best guess for the sum of two dice rolls is 7, the expected value. We need to find the expected value of total tests for a given group size so we can figure out which group size minimizes it.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">There are mathematical rules for computing the expected value, but we will use Monte Carlo simulation. This approach is based on the idea that we can simply run the chance process (throwing two dice or hitting <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F9<\/em><\/span>) many times and directly examine the results. We can compute the average of many repetitions (like rolling dice many times) to give us an approximation to the expected value.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">So we seek the group size that minimizes the expected value of total tests, which we will approximate by simulation. We will run many repetitions (recalculating the sheet repeatedly) and keep track of the total number of tests to see how many total tests we can expect to run as we vary the group size.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">While there are many simulation add-ins available for Excel, we can easily run a simulation using Excel\u2019s <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Dat<\/em><em class=\"import-i\">a <\/em><em class=\"import-i\">Tabl<\/em><em class=\"import-i\">e<\/em><\/span> tool. It was designed not to run a simulation but to display multiple outcomes as inputs vary. To do this, it recalculates the sheet, which enables us to perform Monte Carlo analysis.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell L1, enter the number 1, and enter 2 in cell L2. Select both cells and fill down to row 400 so that you have a series from 1 to 400 in column L.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Next, we provide the cell that we wish to track: total tests.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell M1, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=H4<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">We are now ready to create the Data Table.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Select the cell range L1:M400, click the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Dat<\/em><em class=\"import-i\">a<\/em><\/span> tab in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Ribbon<\/em><\/span>, click <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">What-I<\/em><em class=\"import-i\">f <\/em><em class=\"import-i\">Analysi<\/em><em class=\"import-i\">s<\/em><\/span> in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Forecas<\/em><em class=\"import-i\">t<\/em><\/span> group, and select <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Dat<\/em><em class=\"import-i\">a <\/em><em class=\"import-i\">Table<\/em><em class=\"import-i\">\u00a0.\u00a0.\u00a0. <\/em><\/span>A keyboard shortcut is <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Alt<\/em><em class=\"import-i\">-a-w-t<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Excel pops up the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Dat<\/em><em class=\"import-i\">a <\/em><em class=\"import-i\">Tabl<\/em><em class=\"import-i\">e<\/em><\/span> input box.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Click in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">colum<\/em><em class=\"import-i\">n <\/em><em class=\"import-i\">inpu<\/em><em class=\"import-i\">t <\/em><em class=\"import-i\">cel<\/em><em class=\"import-i\">l<\/em><\/span> field, click on cell K1, and click <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">OK<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Clicking on an empty cell would be meaningless if we were using the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Dat<\/em><em class=\"import-i\">a <\/em><em class=\"import-i\">Tabl<\/em><em class=\"import-i\">e<\/em><\/span> tool for its intended purpose, which is to show how an input cell affects a formula in another cell. All we want, however, is for Excel to recalculate the sheet and show us the total tests for that newly recalculated population in column C.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The display in column M shows 400 repetitions of hitting <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> and keeping track of total tests. This is exactly what we want because now we can take the average of the total tests\u2019 values to approximate the expected number of total tests when the group size is 20.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">But before we do this, let\u2019s be clear about what a Data Table is actually doing. Be aware in the next step, however, that if you double-click on a cell in column M or click in the formula bar, you might get trapped in a cell. <span style=\"border: none windowtext 0pt;padding: 0\"><strong class=\"import-b\">I<\/strong><strong class=\"import-b\">f <\/strong><strong class=\"import-b\">yo<\/strong><strong class=\"import-b\">u <\/strong><strong class=\"import-b\">ge<\/strong><strong class=\"import-b\">t <\/strong><strong class=\"import-b\">stuck<\/strong><strong class=\"import-b\">, <\/strong><strong class=\"import-b\">pres<\/strong><strong class=\"import-b\">s <\/strong><strong class=\"import-b\">th<\/strong><strong class=\"import-b\">e <\/strong><\/span><span style=\"border: none windowtext 0pt;padding: 0\"><strong class=\"import-bi\"><em>Esc<\/em><\/strong><\/span> <span style=\"border: none windowtext 0pt;padding: 0\"><strong class=\"import-b\">(escape<\/strong><strong class=\"import-b\">) <\/strong><strong class=\"import-b\">ke<\/strong><strong class=\"import-b\">y <\/strong><\/span><span style=\"border: none windowtext 0pt;padding: 0\"><strong class=\"import-b\">t<\/strong><strong class=\"import-b\">o <\/strong><strong class=\"import-b\">ge<\/strong><strong class=\"import-b\">t <\/strong><strong class=\"import-b\">out<\/strong><strong class=\"import-b\">.<\/strong><\/span><\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Click on a few cells from M2 to M400 to see that they have an <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">arra<\/em><em class=\"import-i\">y <\/em><em class=\"import-i\">formula<\/em><\/span>: { <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=TABLE(,K1)<\/em><\/span>}.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Excel has a friendly front end via <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Data<\/em><em class=\"import-i\">: <\/em><em class=\"import-i\">What-I<\/em><em class=\"import-i\">f <\/em><em class=\"import-i\">Analysis<\/em><em class=\"import-i\">: <\/em><em class=\"import-i\">Dat<\/em><em class=\"import-i\">a <\/em><em class=\"import-i\">Table<\/em><em class=\"import-i\">\u00a0.\u00a0.\u00a0. <\/em><\/span>to create an array formula (indicated by the curly brackets, {}) that can display multiple outputs. You cannot change or delete an individual cell in the range M2:M400. They are, in a sense, a single unit sharing the same formula.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">You might also notice that the sheet is much slower as we enter formulas or press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F9<\/em><\/span>. This is due to the Data Table. Excel now has many more cells to recalculate and evaluate. We could do many more repetitions (usually simulations have tens of thousands of repetitions), but the delay in recalculation is not worth it. With 400 repetitions, the approximation is good enough for our purposes.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell N1, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=AVERAGE(M1:M400<\/em><em class=\"import-i\">)<\/em><\/span> and the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">approximat<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">expecte<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">valu<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">o<\/em><em class=\"import-i\">f <\/em><em class=\"import-i\">tota<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">tests<\/em><\/span> in cell O1.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Cell N1 is our simulation\u2019s approximation to what we want to minimize. It gives us a handle on the center of the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">samplin<\/em><em class=\"import-i\">g <\/em><em class=\"import-i\">distributio<\/em><em class=\"import-i\">n<\/em><\/span> of the statistic \u201cTotal tests.\u201d A statistic is a recipe for what to do with observations (in this case, given by the formula in cell H4). If we make a <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">histogra<\/em><em class=\"import-i\">m<\/em><\/span> of the data in column M, we get an approximation to the sampling distribution of total tests.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Excel 2016 or greater is needed to make the histogram chart. This is not the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Histogra<\/em><em class=\"import-i\">m<\/em><\/span> option in the Data Analysis add-in (from the Analysis Tool-Pak). The histogram chart allows for dynamic updating and is a marked improvement over the histogram in the Data Analysis add-in.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Select cell range M1:M400, click the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Inser<\/em><em class=\"import-i\">t<\/em><\/span> tab in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Ribbon<\/em><\/span>, and select <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Histogra<\/em><em class=\"import-i\">m<\/em><\/span> from the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Chart<\/em><em class=\"import-i\">s<\/em><\/span> group. It is in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Statisti<\/em><em class=\"import-i\">c<\/em><\/span> chart group and, of course, in the collection of all charts (available by clicking the bottom-right corner square in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Chart<\/em><em class=\"import-i\">s<\/em><\/span> group).<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The default bin widths are a little too big, but they are easy to adjust.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Double-click the chart\u2019s <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">x<\/em><\/span>-axis, and in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Axi<\/em><em class=\"import-i\">s <\/em><em class=\"import-i\">Options<\/em><\/span>, set the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Bi<\/em><em class=\"import-i\">n <\/em><em class=\"import-i\">Widt<\/em><em class=\"import-i\">h<\/em><\/span> to 20. Make the title \u201cApproximate Sampling Distribution of Total Tests.\u201d<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The chart is an approximation because it is based on only 400 repetitions. The exact sampling distribution of total tests would require an infinite number of repetitions. We can never get the exact sampling distribution or the exact expected value via simulation, but the more repetitions we do, the better the approximation.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Even with just 400 realizations of total tests, the graph looks a lot like the classic bell-shaped distribution of the normal (or Gaussian) curve. The center is the expected value of the total tests we will have with a group size of 20, and the dispersion in total tests is measured by its <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">standar<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">error<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell N2, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=STDEV.P(M1:M400<\/em><em class=\"import-i\">)<\/em><\/span> and the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">approximat<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">standar<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">erro<\/em><em class=\"import-i\">r <\/em><em class=\"import-i\">o<\/em><em class=\"import-i\">f <\/em><em class=\"import-i\">tota<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">tests<\/em><\/span> in cell O2.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The standard error of total tests tells us the variability in total tests. It is a measure of the size of the typical bounce in total tests.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F9<\/em><\/span> a few times and watch cell H4.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Cell H4 is bouncing. It is centered around 690 and jumps by roughly plus or minus 70 total tests as you hit <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F9<\/em><\/span>. You can also scroll up and down column M to see that the \u201cTotal tests\u201d numbers are around 690 \u00b1 70.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Simulation cannot give us the exact standard error, but the standard deviation of our 400 realizations of total tests is a good approximation of the standard error.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">There are many ways to be confused here. One of them is to fixate on the computation of the standard deviation. We used STDEV.P (for population) instead of STDEV.S (for sample) because we are not using the standard deviation to estimate the population standard deviation, so we do not need to make a correction for degrees of freedom. Although the population standard deviation is correct, this makes almost no difference with 400 numbers.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell N3, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=STDEV.S(M1:M400<\/em><em class=\"import-i\">)<\/em><\/span> and compare the result to cell N2.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The emphasis on population versus sample standard deviation in many Statistics courses is only relevant for small sample sizes\u2014say, fewer than 30 observations. As the number of observations rises, the two grow ever closer.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">To summarize, cells N1 and N2 tell us that we can expect to perform about 690 total tests, give or take roughly 70 tests. These are the numbers reported at the end of the previous section. This is for a group size of 20. Can we do better? We get to choose the group size, so we should explore how the expected number of total tests responds as we vary the group size.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Change the group size (in cell D3) to 10 and press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> a few times. Which specific cell should you focus on, and what do you conclude?<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The cell we care about the most is cell N1 because it tells us (approximately) the expected number of tests we will have to run. Cell N1 is reporting good news. We can expect to perform about 500 \u00b1 50 total tests. That beats the group size of 20 by almost 200 tests, on average, and is a large savings of a half versus 1,000 individual tests.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Why does a group size of 10 do better than 20? In various cells of the spreadsheet, there is evidence of what is happening. Lowering the group size from 20 to 10 increased the number of group tests from 50 to 100 (see cell D4), but the number of infected groups only went up a little bit (from roughly 32 to 40), and the groups are now much smaller. This is where the big savings are\u2014instead of 32 \u00d7 20 = 640 tests, we only have to run, on average, 40 \u00d7 10 = 400 tests with a group size of 10.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Confirm the claims about group size in the paragraph above by switching back and forth from 10 to 20 in cell D3. Notice how the other cells (especially H4) and the chart react to D3.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Spreadsheets are powerful because they can display a lot of information and dynamically update when you make changes. Your job is to make comparisons and process the information.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Can we do even better than a group size of 10?<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Change the group size (in cell D3) to 5.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Amazing! The number in cell N1 fell again. The expected number of total tests is now about 425 (426.22 is a more exact answer, found by analytical methods), give or take roughly 30 tests. That is a gain of almost 60% versus individual testing. Pooled testing saves a lot of tests compared to individual testing.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">As before, the number of groups we have to test has risen (this time to 200), but many groups are found to be uninfected. Cell D6 reports a 77.4% chance that no one in a 5-person group will be infected. Thus, even though we test more groups, we more than make up for this because many groups test negative, saving us the need to test 5 people in the group.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The group size of 5 is, in fact, the optimal solution and answer to our question. Figure 3.10 reveals that we traveled down the expected number of total tests curve as we changed the group size from 20 to 10 and finally 5.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Figure 3.10 makes it easy to see that a group size of 5 is the minimum for the expected number of total tests curve, but it also reveals the trade-off involved. The two curves are added up to produce the top, total curve. At 10, we test 100 groups (the bottom curve) and we add that to 400 (the expected number of tests from positive groups), and this gives us 500 (the top curve).<\/p>\r\n\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"856\"]<a href=\"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/back-matter\/alt-text-long-description\/#:~:text=Figure%203.10%3A%20Line,5.0%25%20infection%20rate.\"><img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p73-1.png\" alt=\"Plot of the optimal solution for the pooled testing optimization problem. Long description linked from image.\" width=\"856\" height=\"515\" \/><\/a> <strong>Figure 3.10: The expected number of tests as a function of group size.<\/strong>[\/caption]\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">When we moved from 10 to 5, we added 100 tests (the bottom curve) but saved about 175 tests (the middle curve), lowering our expected total tests from 500 to 425. We cannot do any better than 425 total tests. Further reductions in group size will increase total tests.<\/p>\r\n\r\n<h3 class=\"import-bh\">Comparative Statics Analysis<\/h3>\r\n<p class=\"import-paft\">We can ask another question that again shows off the power of spreadsheets: What happens if the infection rate changes\u2014say, to 1%? What would be the optimal group size?<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">This kind of question is called <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">comparativ<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">static<\/em><em class=\"import-i\">s <\/em><em class=\"import-i\">analysi<\/em><em class=\"import-i\">s<\/em><\/span> because we want to know how our solution responds to a shock. We want to compare our initial optimal group size of five when the infection rate was 5%, to the new solution when the infection rate is 1%. This comparison reveals how the shock (changing the infection rate) affects the optimal response (group size).<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Change cell A1 to 1%, then use the spreadsheet to find the optimal group size. What group size would you recommend? Why?<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">You may have struggled with this because it turns out that the total tests curve is rather flat at its minimum. Thus, a simulation with 400 repetitions does not have the resolution to distinguish between group sizes in the range from 8 to 14 or so. Figure 3.11 makes this clear.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The exact answer for the optimal group size is, in fact, 11 groups. It has an expected number of total tests of 195.57 (again, using analytical methods). Choosing group sizes of 10 or 12 leads to a slightly higher number of total tests\u2014although it is impossible to see this in Figure 3.11.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">An infection rate of 1% shows simulation may not be an effective solution strategy for every problem. Of course, you could create a Data Table with more repetitions, but using simulation to distinguish between group sizes of 10 and 11 requires a Data Table so large that Excel would be unresponsive.<\/p>\r\n\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"856\"]<a href=\"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/back-matter\/alt-text-long-description\/#:~:text=Figure%203.11%3A%20Line,1.0%25%20infection%20rate.\"><img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p74-1.png\" alt=\"Plot of the optimal solution with lower infection rate. Long description linked from image.\" width=\"856\" height=\"513\" \/><\/a> <strong>Figure 3.11: Optimal group size with a 1% infection rate.<\/strong>[\/caption]\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">As mentioned earlier, there are many Excel Monte Carlo simulation add-ins, and they can do millions of repetitions. We used MCSim in earlier work in section 3.2.. Even if we ran enough repetitions to see that 11 is the optimal solution, you should remember that simulation will never give you an exact result because it can never do an infinity of repetitions.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The good news is that any group size around 10 is going to be a little under 200, which is an 80% improvement over individual testing. There is no doubt about it\u2014pooled testing can be a smart, effective way to reduce the number of total tests performed.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Our comparative statics analysis tells us that the lower the infection rate (from 5% to 1%), the bigger the optimal group size (from 5 to 11) and the greater the savings from pooled testing versus individual testing (from about 675 to 800 tests).<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Finally, if you carefully compare Figures 3.10 and 3.11, you will see that the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">#Group<\/em><em class=\"import-i\">s <\/em><em class=\"import-i\">Teste<\/em><em class=\"import-i\">d<\/em><\/span> curve (a rectangular hyperbola, since the numerator is constant at 1,000) stays the same in both graphs. Changing the infection rate shifts down the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">E[#Po<\/em><em class=\"import-i\">s <\/em><em class=\"import-i\">Grou<\/em><em class=\"import-i\">p <\/em><em class=\"import-i\">Tests<\/em><em class=\"import-i\">]<\/em><\/span> relationship, and this brings down and alters the shape of the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">E[#Tota<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">Tests<\/em><em class=\"import-i\">]<\/em><\/span> curve.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Comparative statics analysis shows that pooled testing is more effective when the infection rate falls. A lower infection rate means we can have bigger groups, yet they may still have no infected individuals in them.<\/p>\r\n\r\n<div class=\"textbox textbox--key-takeaways\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Takeaways<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p class=\"import-paft\">Pooled testing means you combine individual samples. A negative test of the pooled sample saves on testing because you know all the individuals in the group are not infected.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">There is an optimization problem here: Too big a group size means someone will be infected, so you have to test everyone in the group, but too small a group size means too many group tests. The sweet spot minimizes the total number of tests.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The optimal group size depends on the infection rate. The smaller the rate, the bigger the optimal group size.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">By creating this spreadsheet, you have improved your Excel skills and confidence in using spreadsheets. You have added to your stock of knowledge that will help you next time you work with a spreadsheet.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The OFFSET function is really powerful, but it is difficult to understand and apply.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The Data Table is meant for what-if analysis, but it can be used as a simple Monte Carlo simulation tool. Each press of <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> recalculates the sheet and the Data Table.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">You also learned or reinforced a great deal of statistical and economics concepts. Economics has a toolkit that gets used over and over again\u2014look for similar concepts in future models and courses. Try to spot the patterns and repeated logic. Although it may not be explicitly stated, getting you to think like an economist is a fundamental goal of almost every Econ course.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Reference was made several times to the analytical solution. This was not shown because the math is somewhat advanced. To see it, download the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">PooledTesting.xlsx<\/em><\/span> file from <a href=\"http:\/\/dub.sh\/gbae\">dub.sh\/gbae<\/a> and go to the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Analytical<\/em><\/span> sheet.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">One methodology issue that is easy to forget but crucial is that we made many simplifying assumptions in our implementation of the data generation process. There may be other factors at play in the spread of COVID-19 or how tests actually work that affect the efficacy of pooling. Spatial connection was mentioned as something that would violate the independence assumed in our implementation. Another complication is that \u201ca positive specimen can only get diluted so much before the coronavirus becomes undetectable. That means pooling will miss some people who harbor very low amounts of the virus\u201d (Wu, 2020).<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Our results apply to an imaginary, perfect world, not the real world. We need to be careful in moving from theory to reality. This requires both art and science.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The introduction cited Robert Dorfman as writing a paper on pooled testing back in 1943. It is a clever idea that you now understand can be used to greatly reduce the number of tests, which saves a lot of resources. Perhaps you will not be surprised to hear that Robert Dorfman was an economist.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">References<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p class=\"hanging-indent\">Dorfman, R. (1943). \u201cThe Detection of Defective Members of Large Populations.\u201d <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Ann<\/em><em class=\"import-i\">. <\/em><em class=\"import-i\">Math<\/em><em class=\"import-i\">. <\/em><em class=\"import-i\">Statist<\/em><em class=\"import-i\">.<\/em><\/span> 14, no. 4, pp. 436\u2013440, <a href=\"http:\/\/projecteuclid.org\/euclid.aoms\/1177731363\"><span style=\"border: none windowtext 0pt;padding: 0\"><span class=\"import-url\">projecteuc<\/span><span class=\"import-url\">lid.org\/<\/span><span class=\"import-url\">euclid.aoms<\/span><span class=\"import-url\">\/1177731363<\/span><\/span><\/a>.<\/p>\r\n<p class=\"hanging-indent\">Mandavilli, A. (2020). \u201cFederal Officials Turn to a New Testing Strategy as Infections Surge.\u201d <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Th<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">Ne<\/em><em class=\"import-i\">w <\/em><em class=\"import-i\">Yor<\/em><em class=\"import-i\">k <\/em><em class=\"import-i\">Times<\/em><\/span>, July 1, 2020, <a href=\"http:\/\/projecteuclid.org\/euclid.aoms\/1177731363\"><span style=\"border: none windowtext 0pt;padding: 0\"><span class=\"import-url\">www.nytimes.com\/2020\/07\/01\/health\/<\/span><span class=\"import-url\">coronavirus-pooled-testing.html<\/span><\/span><\/a>.<\/p>\r\n<p class=\"hanging-indent\">Wu, K. (2020). \u201cWhy Pooled Testing for the Coronavirus Isn\u2019t Working in America.\u201d <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Th<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">Ne<\/em><em class=\"import-i\">w <\/em><em class=\"import-i\">Yor<\/em><em class=\"import-i\">k <\/em><em class=\"import-i\">Times<\/em><\/span>, August 18, 2020, <a href=\"http:\/\/www.nytimes.com\/2020\/08\/18\/health\/coronavirus-pool-testing.html\"><span style=\"border: none windowtext 0pt;padding: 0\"><span class=\"import-url\">www.nytimes.com\/2020\/08\/18\/health\/coronavirus-pool-<\/span><span class=\"import-url\">testing.html<\/span><\/span><\/a>.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<h2 class=\"import-ah\">3.4 Search Theory Simulation<\/h2>\r\n<p class=\"import-paft\">You want to buy something that many stores sell, but they charge different prices. Suppose that you cannot just google it to find the lowest price. Maybe you are at a huge farmer's market, and there are lots of vendors selling green beans. They are all the same, but the prices are different. How do you decide where to buy?<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Believe it or not, this problem has been extensively studied. It is part of <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">searc<\/em><em class=\"import-i\">h <\/em><em class=\"import-i\">theor<\/em><em class=\"import-i\">y<\/em><\/span> and has produced several Nobel Prize winners in Economics. It also has a long history in mathematics, where it is known as an optimal stopping problem.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">There are many different search scenarios and models. For example, you could be deciding which job to take. Once you pass on an offer, you cannot go back (this is called <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">sequentia<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">search<\/em><\/span>). Or you could be involved in some complicated game with asymmetric information, where one agent\u2014say, the seller of a house\u2014has more knowledge about the house than the potential buyers.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Fortunately, your green bean search problem is straightforward. The green beans are <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">homogeneou<\/em><em class=\"import-i\">s<\/em><\/span> (exactly alike), and you can gather as many prices as you want, then choose the cheapest one. The catch is that it is costly to search\u2014<span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">searc<\/em><em class=\"import-i\">h<\/em><\/span> is just another way of saying \u201cgather prices,\u201d but there are search costs.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">If searches were costless, then the problem would be trivial\u2014simply get all the prices and buy the cheapest one. The problem becomes interesting when collecting price information takes effort and time. In that case, you can search too little (so you would have found a much lower price with more searching) or search too much (so the slightly lower price you found was not worth it). You are facing an optimization problem!<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">We will set up and solve this optimization problem in Excel. We will use the Monte Carlo simulation add-in to explore how our total cost changes as we vary the amount we search. We will then do comparative statics analysis to see how the optimal solution responds when we shock the model.<\/p>\r\n\r\n<h3 class=\"import-bh\">Setting Up the Problem<\/h3>\r\n<p class=\"import-paft\">First, we create a population of prices.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Open a blank Excel workbook and name it <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Search.xlsx<\/em><\/span>. Name the sheet <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">DG<\/em><em class=\"import-i\">P<\/em><\/span> (for data generation process). In cell A1, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Price<\/em><\/span>. In cell A2, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=RAND()<\/em><\/span>. Fill down to cell A101.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">You now have 100 prices on your spreadsheet ranging from zero to one. Our target is the lowest price. We can easily find it with the MIN function.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Minimum<\/em><\/span> in cell B1 and the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=MIN(A2:A101<\/em><em class=\"import-i\">)<\/em><\/span> in cell B2. Scroll down until you find that lowest price. Press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> to get a new set of prices and a new minimum. Each time you press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F9<\/em><\/span>, it is like a new day at the farmers market and the vendors have all changed their prices.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The cheapest price is close to zero (since RAND goes from zero to one), and it can be anywhere in the list of 100 numbers. As a buyer, you will not, however, do what you just did and simply enter a formula that yields the minimum price, because we assume that you cannot see the prices until you visit the store. You have to search to reveal each price.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Let\u2019s suppose that each search will cost you 0.04 monetary units. This is an exogenous variable. The 100 prices are also outside of your control. Your endogenous, or choice, variable is how many prices to reveal.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Your goal is to minimize the total cost of purchase, composed of the price you pay plus the search costs. The more you search, the lower the price you pay the vendor but the higher the costs of the search. You have to balance these two opposing forces.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Your spreadsheet is like a card game. Pressing <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> is like shuffling 100 cards. You want the lowest-numbered card in the deck. A search is like flipping a card over, but it costs you 0.04 for each card you reveal. What is the best number of cards to flip over? To answer this key question, we proceed slowly.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Suppose you decide to search just once. This has the advantage of the lowest search costs possible but the disadvantage that you will only get one price. How will you do if you adopt this strategy?<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell C1, enter a 1 (this represents how many prices you gather), and in cell C2, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=A2<\/em><\/span>. In cell C3, enter 0.04 (this is the cost of your single search). In cell C4, we add the two cells above it together, so enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=C2+C3<\/em><\/span>. Press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> a few times.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Each time you press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F9<\/em><\/span>, you get a new price in cell C2 (because the prices all change) and a new total cost in cell C4. Sometimes you do pretty well, close to 0, but sometimes you end up near 1, which is not good. But how can we know how you will usually do? How you do on average, not just in a single outcome, is how we evaluate the results of chance processes.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Monte Carlo simulation can tell us the typical result. We will use the MCSim add-in to run our simulation in Excel. If needed, download and install MCSim from <a href=\"http:\/\/dub.sh\/addins\">dub.sh\/addins<\/a>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Run a simulation that tracks cell C4 with 10,000 repetitions by clicking <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCSi<\/em><em class=\"import-i\">m<\/em><\/span> in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Add-in<\/em><em class=\"import-i\">s<\/em><\/span> tab, entering C4 in the <a id=\"_Hlk191054204\"><\/a><span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Selec<\/em><em class=\"import-i\">t a <\/em><em class=\"import-i\">cel<\/em><em class=\"import-i\">l<\/em><\/span> input box, adding a 0 to the default 1,000 repetitions, and clicking <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Proceed<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Your results show an average around 0.54. This is what you can expect to usually pay, in total, for your green beans. This makes sense, since the average of RAND is 0.5 and you have to pay 0.04 for one search. Notice that the simulation values are not normally distributed, with a bell shape. Instead, you are equally likely to do really well (low total cost), badly (around 1), or somewhere in the middle (around 0.5).<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The expected value of 0.54 is the number we use to convey the performance of the search-and-buy-at-one-store strategy. Of course, it does not matter if you pick the first store (in cell A2). You could pick any one of the 100 stores and get the same simulation results because each press of <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> puts up new random prices for all the stores, just like reshuffling a deck of cards.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>It is easy to confirm this by changing the formula in cell C2 to <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=A20<\/em><\/span> or <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=A54<\/em><\/span> or any other cell from A3 to A101 and tracking cell C4 in a new simulation. Your results are substantially (but not exactly) the same as the simulation with <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=A<\/em><em class=\"import-i\">2<\/em><\/span> in cell C2.<\/p>\r\n\r\n<h3 class=\"import-bh\">Finding the Optimal Number of Searches<\/h3>\r\n<p class=\"import-paft\">What happens to your total cost of buying green beans if we search more than once? Let\u2019s try 5 searches.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell D1, enter a 5 (this represents gathering prices from 5 vendors), and in cell D2, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=MIN(A2:A6)<\/em><\/span>. This formula shows the lowest price in your sample from 5 stores, which is the one we would buy. As mentioned, you could pick any set of 5 stores, and you would get the same result. In cell D3, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=0.04*<\/em><em class=\"import-i\">5<\/em><\/span> (0.2 is the cost of 5 searches). In cell D4, we add the two cells above it together, so enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=D2+D3<\/em><\/span>. Press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> a few times.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Each press of <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> gives a single outcome, or realization, of the chance process. Sometimes you get lucky and get a low price, other times not. Notice that the total cost (in cell D4) is the sum of the lowest price and 0.2 (the cost of searching).<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Do you think 5 searches are better than 1? We cannot answer this question by looking at cells C4 and D4 because they show just one realization. We need to compare the typical result of these two strategies. We know the expected value of the total cost of 1 search is 0.54. What is the typical result of 5 searches?<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Use the MCSim add-in to track cell D4. What do your results show?<\/p>\r\n\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"671\"]<a href=\"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/back-matter\/alt-text-long-description\/#:~:text=Figure%203.12%3A%20Monte,as%20values%20increase.\"><img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p80-1.png\" alt=\"Monte Carlo simulation output page including summary statistics, notes, and histogram. Long description linked from image.\" width=\"671\" height=\"566\" \/><\/a> <strong>Figure 3.12: Total costs when gathering five prices.<\/strong>[\/caption]\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Your results should be similar to Figure 3.12. These simulation results tell us that <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em> <\/span>= 5 is better than <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em> <\/span>= 1 because the typical result for 5 searches (approximated by the average of 10,000 repetitions) is around 0.37, which is much less than 0.54 (a roughly 30% decrease).<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Maybe more searches are even better? How do 10 searches compare to 5?<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Run a simulation of 10 searches by setting up a 10-search scenario on the spreadsheet (in column E) and running a simulation. Try to figure it out first, but check the appendix, if needed, for more detailed help.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Your work shows that <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em><\/span> = 10 is much worse than <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em><\/span> = 5. How about <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em><\/span> = 4?<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell F1, enter a 4, and in cell F2, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=MIN(A2:A5)<\/em><\/span>. In cell F3, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=0.04*<\/em><em class=\"import-i\">4<\/em><\/span> (0.16 is the cost of 4 searches). In cell F4, we add the two cells above it together, so enter <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=F2+F3<\/em><\/span>. Use the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCSi<\/em><em class=\"import-i\">m<\/em><\/span> add-in to directly compare cells D4 and F4 by putting D4 in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Selec<\/em><em class=\"import-i\">t a <\/em><em class=\"import-i\">cel<\/em><em class=\"import-i\">l<\/em><\/span> input box and F4 in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Selec<\/em><em class=\"import-i\">t a <\/em><em class=\"import-i\">secon<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">cel<\/em><em class=\"import-i\">l<\/em><\/span> input box, then click <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Proceed<\/em><\/span>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Your results should show a close race. In fact, it is so close that we need to improve the resolution of the sim by increasing the number of repetitions.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Track cells D4 and F4 again, but this time with 100,000 repetitions. This will take 10 times longer than the last sim.<\/p>\r\n\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"728\"]<a href=\"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/back-matter\/alt-text-long-description\/#:~:text=Figure%203.13%3A%20Monte,candidate%20group%20sizes.\"><img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p81-1.png\" alt=\"Histogram of $D$4 and DGP!$F$4. Long description linked from image.\" width=\"728\" height=\"568\" \/><\/a> <strong>Figure 3.13: Four searches are slightly better than five.<\/strong>[\/caption]\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">It is still quite close, but as shown in Figure 3.13, you will get a slightly lower sim average of 0.361 or so with <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em><\/span> = 4 than the sim average of about 0.367 with <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em><\/span> = 5. In fact, it can be shown with analytical methods that <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em><\/span> = 4 is the optimal solution. To see the math involved, download <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Search.xls<\/em><em class=\"import-i\">x<\/em><\/span> from <a href=\"http:\/\/dub.sh\/gbae\">dub.sh\/gbae<\/a>.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Let\u2019s step back and think about what you have done. It took some work, but you used simulation to explore the U-shaped curve in Figure 3.14. It plots the exact expected value as the search increases from 1 to 10. The minimum, the answer to what you should do, is found at <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em><\/span> = 4.<\/p>\r\n\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"923\"]<a href=\"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/back-matter\/alt-text-long-description\/#:~:text=Figure%203.14%3A%20Combined,of%20c%3D0.04.\"><img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p82-1.png\" alt=\"Table and Chart of Expected Value of Total Cost. Long description linked from image.\" width=\"923\" height=\"317\" \/><\/a> <strong>Figure 3.14: Total costs are U-shaped with a minimum at n = 4.<\/strong>[\/caption]\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Notice that 1 and 10 searches both yield high total costs, but for different reasons. With <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em><\/span> = 1, you only pay 0.04 to search, but your usual purchase price is around 0.5. By searching 10 times, you lower the purchase price a lot (you are likely to find a seller with a low price, typically around 0.091), but you have to pay 0.4 in search costs.<\/p>\r\n\r\n<h3 class=\"import-bh\">Comparative Statics<\/h3>\r\n<p class=\"import-paft\">An interesting shock to this model involves the cost of searching. What if something happened, like the internet, that lowered search costs? Instead of having to visit each store to find out the price, you can go to their web page and see the price. This makes searching much easier and cheaper. How would your search behavior respond to this shock?<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Suppose the per-unit cost of searching fell from 0.04 to 0.01. What effect would that have on the optimal number of searches?<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Copy the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">DG<\/em><em class=\"import-i\">P<\/em><\/span> sheet and rename it <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">DGPLowerCost<\/em><\/span>. Change row 4 in columns C to F to reflect the new <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">c<\/em><\/span> = 0.01. Use the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCSi<\/em><em class=\"import-i\">m<\/em><\/span> add-in to find the new optimal number of searches. You can check your work (or get a few hints) using the discussion that follows, but try to do it yourself first.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The first thing to realize when search costs fall from 0.04 to 0.01 is that total costs are going to be lower for all search values. Instead of 0.54 for one search, the expected value of total costs is 0.51 when <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">c<\/em><\/span> = 0.01. For <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em><\/span> = 4, the expected total cost falls from 0.36 to 0.24. Notice that costs fall by more the more you search.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">If you actually tried to run simulations for different values of <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em><\/span>, you might be confused by how close the results ended up being. Because of this, simulation is going to have trouble finding the exact answer. Figure 3.15 explains what is going on.<\/p>\r\n\r\n\r\n[caption id=\"\" align=\"aligncenter\" width=\"923\"]<a href=\"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/back-matter\/alt-text-long-description\/#:~:text=Figure%203.15%3A%20Combined,optimal%20search%20theory.\"><img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p83-1.png\" alt=\"Table and Chart of Expected Value of Total Cost with two different numbers of searches. Long description linked from image.\" width=\"923\" height=\"434\" \/><\/a> <strong>Figure 3.15: Comparative statics: Shocking the per-unit search cost.<\/strong>[\/caption]\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">With <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">c<\/em><\/span> = 0.01, the expected value of the total cost curve has a minimum at <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em><\/span> = 9, so this is the exactly correct answer, but notice how flat the curve is around that minimum. If your answer was 8 or 10, you missed by only 0.001.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Simulation struggles to get an exact answer because the total cost function is so shallow. You would have to run millions of repetitions to identify the exact minimum solution at <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em><\/span> = 9.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">However, simulation does give you the correct answer in the sense that the number of searches goes up as the cost of searching falls. This key result makes sense, since you will take advantage of cheaper search costs by searching more.<\/p>\r\n\r\n<h3 class=\"import-bh\">Simulation Versus Analytical Methods<\/h3>\r\n<p class=\"import-paft\">Figures 3.14 and 3.15 show the exact expected value of the total cost. As mentioned earlier, if you are interested, you can download <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Search.xls<\/em><em class=\"import-i\">x<\/em><\/span> from <a href=\"http:\/\/dub.sh\/gbae\">dub.sh\/gbae<\/a> to see how these analytical results were derived.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">You might wonder why we used simulation when analytical methods give us an exact answer. There are two reasons. First, by implementing the problem in Excel, we get a deep, clear understanding of the role of randomness in this problem. It is one thing to say that prices are random, but seeing them bounce on the screen really conveys the data generation process.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Simulation is often helpful in understanding a problem because it requires building a model that reflects core components of a real-world scenario. This often enables a richer, fuller grasp of the forces at play.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">The second reason for using simulation is that we have another independent method that is confirming the analytical solution. The averages in Figures 3.12 and 3.13 are very close to their expected values. We can be sure that we have found the right answer when both methods agree. And if they do not agree, we are alerted to a potential error in one of the methods.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Neither approach is foolproof. Simulation\u2019s main drawback is that it cannot give an exact answer. In addition, sometimes so many repetitions are needed to obtain a clear result that it is impractical.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">But analytical methods using equations, algebra, and calculus are not perfect either. Sometimes, there is no way to derive the answer, and simulation is all we can do. Other times, the analytical method fails disastrously and gives us an incorrect answer. Simulation helps us avoid that trap.<\/p>\r\n\r\n<div class=\"textbox textbox--key-takeaways\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Takeaways<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p class=\"import-paft\">Economists believe in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">la<\/em><em class=\"import-i\">w <\/em><em class=\"import-i\">o<\/em><em class=\"import-i\">f <\/em><em class=\"import-i\">on<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">price<\/em><\/span>, the idea that competition makes prices converge. But this only applies in a frictionless world of perfect information. In our model, if <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">c<\/em><\/span> = 0, you simply get all the prices and pick the lowest one. In such a world, there would be no price dispersion, since everyone would go to the cheapest vendor, so all the sellers would have to match that price. The law of one price would hold.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">In the real world, there are all sorts of frictions. An important one is incomplete information, so buyers do not know all the prices (and qualities) of goods and services. The real world has many different prices (just think of all the prices you see at gas stations as you are driving down the road), and buyers have to search to find low prices. Economists say that search is price discovery, which emphasizes how searching is a productive activity.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Consumers face a search optimization problem. The more they search, the lower the price they are likely to pay, but they have to spend resources\u2014time and effort\u2014to search. You can definitely oversearch, which means that the gain from the lower price you found was not worth the extra cost of searching. On the other hand, you cannot search enough\u2014you saved on search costs, but you did not take advantage of the lower prices you would have found by searching more.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Consumers optimize and search an optimal amount like Goldilocks: not too little and not too much but just right. The fact that buyers will not choose to find every price explains why price dispersion exists. This is a key result. As Stigler (1961) famously said, \u201cPrice dispersion is a manifestation\u2014and, indeed, it is the measure\u2014of ignorance in the market.\u201d<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">We also showed that lowering search costs would increase the optimal number of searches, but we can point out a few interesting real-world implications of this result. For example, not all consumers face the same search costs. Suppose you are in a hurry (perhaps you have an important deadline at work), your search costs are high, and therefore it is optimal for you to search less. Different people in different situations have different optimal solutions.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Our comparative statics result that lower <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">c<\/em><\/span> leads to higher optimal <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em><\/span> points to the fact that lower search costs reduce price dispersion. If the internet allows you to quickly scan gas stations in an area and go to the cheapest one, prices are going to come closer together. They will not all be exactly the same (as the law of one price says) because the search is not free, but they will not be as spread out.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Noneconomists sometimes demonize advertising. They see consumers as dupes, easily fooled and tricked by ads to buy things they do not need or want. But search theory shows advertising in a different light. It is a way to lower search costs. Sellers are trying to be noticed in a noisy, chaotic marketplace, so they provide consumers with information about prices and product characteristics.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">Since we introduced the internet as a shock that lowered search costs, we close by pointing out that new online technologies have radically affected search theory. You know that every click is tracked, and the prices you see are personalized just for you. Optimal online searching is the subject of intense research today. Both buyers and sellers are faced with complicated, intertwined optimization problems.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">References<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p class=\"hanging-indent\">Stigler, G. (1961). \u201cThe Economics of Information.\u201d <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Journa<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">o<\/em><em class=\"import-i\">f <\/em><em class=\"import-i\">Politica<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">Economy<\/em><\/span> 69, no. 3, pp. 213\u2013225, <span style=\"border: none windowtext 0pt;padding: 0\"><a class=\"rId73\" href=\"http:\/\/www.jstor.org\/stable\/1829263\"><span class=\"import-url\">www.jstor.org\/stable\/1829263<\/span><\/a><\/span>. This paper is recognized as the beginning of the economics of search. Stigler was recognized as the founder of information economics when he was awarded the Nobel Prize in Economics in 1982.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Appendix<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p class=\"import-paft\">For 10 searches, repeat the same procedure as for 5 searches, slightly changing the formula for the minimum price and costs of searching to account for 10, instead of 5, searches. It goes like this: In cell E1, enter a 10 (this represents gathering prices from 10 vendors), and in cell E2, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=MIN(A2:A11)<\/em><\/span>. This formula shows the lowest price in your sample from 10 stores, which is the one we would buy. In cell E3, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=0.04*1<\/em><em class=\"import-i\">0<\/em><\/span> (0.4 is the cost of 10 searches). In cell E4, we add the two cells above it together, so enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=E2+E3<\/em><\/span>. You are now ready to track cell E4 in a simulation to see the typical result for this search strategy.<\/p>\r\n<p class=\"import-p\" style=\"text-indent: 36pt\">You should find that 10 searches have an approximate expected value of around 0.49. This is higher than 5 searches and therefore is clearly not an optimal solution.<\/p>\r\n\r\n<\/div>\r\n<\/div>","rendered":"<div class=\"textbox\">\n<p class=\"import-epf\">Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin.<\/p>\n<p class=\"import-ept\" style=\"text-align: right\">John von Neumann<\/p>\n<\/div>\n<h2 class=\"import-ahaft\">3.1 Free Throw Shooting with MCSim<\/h2>\n<p class=\"import-pf\">You are the new kid on the block, and it is time to choose teams at the rec center. You think you are pretty good, so you say, \u201cI\u2019m a 90% free throw shooter.\u201d This is quite impressive. Someone hands you a basketball and says, \u201cProve it.\u201d<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">You shoot 100 free throws, but it does not go as well as you hoped. You make only 75. Someone says, \u201cYou are not a 90% free throw shooter.\u201d You insist, however, that you really are. \u201cIt was just bad luck. Honestly,\u201d you say, \u201cI really am a 90% free throw shooter.\u201d<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The question is, Should we believe you? Anyone who has ever shot free throws knows there is luck involved. We would not expect you to make 9 out of every 10 shots like clockwork. So it could be that by chance, you missed a few more than expected. Another way to ask the question is, Can randomness explain this poor outcome? Or, in yet other words, how uncommon is missing this many free throws for a 90% shooter?<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">We would not be having this conversation if you had made 89 or even 88 out of 100. Then it would be easy to believe you are actually a 90% shooter and it was just bad luck. But how do we handle the fact that you missed 15 more than expected? That seems like a lot, but how rare is that?<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">There is a way to answer this question analytically\u2014that is, with mathematics. We will not go that route. Instead, we will use the method of simulation.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Monte Carlo simulation simply means the repeated running of a chance process and then direct examination of the results. It can be used in frontier research work, but we will use it just like we used numerical methods to solve optimization problems\u2014simulation enables us to understand complicated concepts without advanced mathematics.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Monte Carlo simulation is based on brute force\u2014repeat the chance process and examine the results. It requires no imagination or mathematics at all. It will be our go-to method for understanding randomness and answering questions like, \u201cDo we believe you are a 90% free throw shooter if you make only 75 out of 100?\u201d<\/p>\n<h3 class=\"import-bh\">Gauss and Two Approaches<\/h3>\n<p class=\"import-paft\">Carl Friedrich Gauss (1777\u20131855) was perhaps the greatest mathematician of all time. Before the euro, Germany\u2019s 10 deutsche mark note featured him along with a graph of the normal curve (which he made famous, called the Gaussian distribution). Look carefully in Figure 3.1 and you can see that it even displays the equation of the normal distribution.<\/p>\n<figure id=\"attachment_284\" aria-describedby=\"caption-attachment-284\" style=\"width: 960px\" class=\"wp-caption aligncenter\"><img class=\"wp-image-284 size-full\" src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/960px-10_Mark_Obverse.jpg\" alt=\"\" width=\"960\" height=\"480\" srcset=\"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/960px-10_Mark_Obverse.jpg 960w, https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/960px-10_Mark_Obverse-300x150.jpg 300w, https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/960px-10_Mark_Obverse-768x384.jpg 768w, https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/960px-10_Mark_Obverse-65x33.jpg 65w, https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/960px-10_Mark_Obverse-225x113.jpg 225w, https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/960px-10_Mark_Obverse-350x175.jpg 350w\" \/><figcaption id=\"caption-attachment-284\" class=\"wp-caption-text\"><strong>Figure 3.1: German currency featuring Gauss.<\/strong> <br \/>Source: <a href=\"https:\/\/commons.wikimedia.org\/w\/index.php?curid=71505000\">YavarPS on Wikimedia<\/a> \/ <a href=\"https:\/\/creativecommons.org\/licenses\/by-sa\/4.0\">CC BY-SA 4.0<\/a>.<\/figcaption><\/figure>\n<p class=\"import-p\" style=\"text-indent: 36pt\">There is a story, probably apocryphal, of how he amazed his kindergarten teacher. Apparently, the children were especially unruly one day, so the teacher assigned a dreary problem as punishment. He told them to add all the numbers from 1 to 1,000. This starts easily but gets tedious and painful pretty quickly. 1 + 2 = 3, 3 + 3 = 6, 4 + 6 = 10, and 5 + 10 = 15. It will take a long time to get to 1,000.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Gauss waited a minute, then stood up and announced the answer: 500,500. The stunned teacher asked him where he got that number, which is correct, and Gauss said he noticed a pattern. Remember, he was five years old.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">If you make a list of the numbers, then create a second list, but flipped, the pairs always add up to 1,001: 1 goes with 1,000, 2 with 999, 3 with 998, and so on until the end, when 998 is with 3, 999 with 2, and 1 with 1,000.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The rest is easy (well, maybe not for the usual five-year-old, but this is Gauss). Multiply 1,001 by 1,000 (since there are 1,000 pairs) and divide by 2 to get 500,500. As they say, QED.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">This is clever, remarkable, and beautiful. It is like Michelangelo and the Sistine Chapel. Is Monte Carlo simulation like this? No.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Monte Carlo simulation is a different approach to problems that uses little creativity or subtlety. It is a direct attack on a question.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Monte Carlo simulation is like solving the teacher\u2019s tedious problem by using a spreadsheet to add the numbers.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Make a list from 1 to 1000 in cells A1:A1000 (using fill down, of course), and then, in cell B1, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=SUM(A1:A1000)<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Excel displays 500,500. This is nowhere as magnificent as what Gauss did, but it does give the answer.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Monte Carlo simulation was developed during World War II by physicists working on the Manhattan Project. Nicholas Metropolis coined the term because his colleague, Stanislaw Ulam, was an avid poker player. They were simulating how radiation propagates and incorporating randomness. The connection to chance and gambling is why Metropolis named the method after the famous Monte Carlo casino in Monaco.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">So Monte Carlo simulation, or \u201csimulation\u201d for short, is an alternative to the analytical approach. Instead of using equations and algebraic manipulations, simulation uses computers to repeat the chance process many times and then directly observe the outcomes.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">You can think of simulation as the much younger sibling of analytical methods. Let\u2019s apply it to free throw shooting to show how it works.<\/p>\n<h3 class=\"import-bh\">Are You Really a 90% Shooter?<\/h3>\n<p class=\"import-paft\">Our solution strategy will be simulation, but make no mistake: Gauss would not have needed simulation. He would have immediately rejected your claim. He knows a formula can be used, [latex]\\sqrt{n}\\sigma[\/latex], that answers the question quickly. The formula is the product of sophisticated mathematics and can be called beautiful, but most people find it extremely difficult to understand and cannot use it to answer the question.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">All Monte Carlo simulations use a random number generator (RNG). Excel\u2019s RNG function is RAND(). This draws uniformly distributed random numbers in the interval from zero to one.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Insert a sheet in your Excel workbook and, in cell A1, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=RAND()<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">You see a number with several decimal places displayed that is between zero and one. The number is actually much longer.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Widen the column and add decimal places to see this. Keep adding decimal places (widening column A as needed) until you start seeing zeroes.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">As you learned when we explained Solver\u2019s false precision, most modern spreadsheets use 64-bit double-precision floating point format. If you count carefully, you will see that RAND() has a zero, then a decimal point, and then 15 decimal places with values from zero to nine. After that, they are all zero, so we have reached the maximum precision. It is important to understand that our spreadsheet\u2019s random number is finite but also that it has many more decimal digits than what was originally displayed.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Repeatedly press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> (you may have to hold down the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">f<\/em><em class=\"import-i\">n<\/em><\/span> key on your keyboard). <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> is the keyboard shortcut to recalculate the sheet.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The number in cell A1 changes each time you recalculate the sheet. This is the beating heart of the simulation.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The bouncing numbers show that although RAND() is finite, it has a massive set of numbers to choose from. If it had only one decimal place, RAND() would have 10 possible numbers (from 0.0, 0.1, 0.2, and so on until 0.9). Six decimal places would give it 1 million different numbers. Twelve gives a trillion numbers. Fifteen is a quadrillion!<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">So you can think of RAND() as plucking a number from a humongous box with a quadrillion numbers in it.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Full disclosure: This is not exactly right because RAND() is using an algorithm to produce the next number. This is why computer-generated random numbers are called <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">pseudorandom<\/em><\/span>, where the prefix <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">pseud<\/em><em class=\"import-i\">o<\/em><\/span> means \u201cfalse.\u201d<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">To model a 90% free throw shooter, we use an IF statement.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell B1, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(A1<\/em><em class=\"import-i\">&lt;<\/em><em class=\"import-i\">0.9<\/em><\/span>, <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">1<\/em><\/span>, <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">0)<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The IF function has three arguments (or inputs) separated by commas. The first argument is the test, the second is what happens if the test is true (or yes), and the third is what happens if it is false (or no). If the random number in cell A1 is less than 90%, then cell B1 shows a one, which means the free throw was made; otherwise, it shows a zero, which means it was missed.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Some students (usually really smart, careful ones) obsess about whether A3 should be less than (&lt;) or less than or equal to (\u2264) 90%. This does not matter because RAND() has so many random numbers available to it. The chances of drawing exactly 0.900000000000000 are ridiculously small.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">When the IF function evaluates to 1, the free throw is made, and when it is 0, it is missed. This is a binomial random variable, since it can only take on two values, 0 or 1.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">We do not need to actually see the random number generated, so we can embed RAND() directly in the IF statement.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell B2, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(RAND()<\/em><em class=\"import-i\">&lt;<\/em><em class=\"import-i\">0.9,1,0)<\/em><\/span>. Fill down this formula to cell B100. Press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> a few times to see the 0 and 1 values bouncing around.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">We have implemented the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">dat<\/em><em class=\"import-i\">a <\/em><em class=\"import-i\">generatio<\/em><em class=\"import-i\">n <\/em><em class=\"import-i\">proces<\/em><em class=\"import-i\">s<\/em><\/span> (DGP) in Excel. The DGP tells us how our data are produced.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Rename the sheet (double-click on the sheet tab) <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">DG<\/em><em class=\"import-i\">P<\/em><\/span> and save the workbook as <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">FreeThrowSim<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">As you scroll back up to the top row, you will see many ones and a few zeroes. With a 90% success rate, roughly 1 in 10 cells will have a random number greater than 0.9 and, therefore, show a 0.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The fact that each cell in column B stands alone and does not depend on or influence other cells means we are assuming <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">independence<\/em><\/span>. In our model, a miss or make does not affect the chances of hitting the next shot.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">If you believe in the hot hand (Cohen, 2020), this implementation of the chance process is wrong. If making the previous shot increases the chances of making the current shot, there is <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">autocorrelatio<\/em><em class=\"import-i\">n<\/em><\/span>, and we cannot use 0.9 as the threshold value for every shot. We assume independence from one shot to the next.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">How many shots out of 100 will a 90% shooter make?<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=SUM(B1:B100<\/em><em class=\"import-i\">)<\/em><\/span> in cell C1 and the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Numbe<\/em><em class=\"import-i\">r <\/em><em class=\"import-i\">made<\/em><\/span> in cell D1.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">You will see a number around 90 in cell C1. Each press of <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> gives you the result of a new outcome from 100 attempted free throws.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The number of made free throws from the virtual shooter you have constructed in Excel is not always exactly 90 because you incorporated RAND() in each shot.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Use your keyboard shortcut, <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F9<\/em><\/span>, to recalculate the sheet a few times to get a sense of the variability in the number of made shots from 100 free throws.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">There is no doubt that the number of made shots is a random variable, since it is bouncing around when you recalculate the sheet. It makes common sense that adding 100 bouncing numbers will produce a random outcome.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">A <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">statisti<\/em><em class=\"import-i\">c<\/em><\/span> is a recipe for the data. Cell C1 is a sample statistic because the recipe is to add up the results from a sample of 100 shots. We are interested in the distribution of the sum of 100 free throws from a 90% free throw shooter, including its central tendency, dispersion, and shape of a histogram of outcomes. With this, we can decide if a result of 75 made shots is merely unlikely or so rare that we reject the claim that you are a 90% shooter.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Simulation is simply repeating the experiment many times so we can approximate the center, dispersion, and distribution of the outcomes. Since we are working with a sample statistic, the distribution of the sum of 100 free throws is called a <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">samplin<\/em><em class=\"import-i\">g <\/em><em class=\"import-i\">distribution<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">To process the many outcomes, we need software. A free Excel add-in that does Monte Carlo simulation is available here: <a href=\"http:\/\/dub.sh\/addins\">dub.sh\/addins<\/a>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Download the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCSim.xl<\/em><em class=\"import-i\">a<\/em><\/span> file from the link above and use the Add-ins Manager (File \u2192 Options \u2192 Add-ins \u2192 Go) to install it. Click the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Add-ins<\/em><\/span> tab and click <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCSim<\/em><\/span>.<\/p>\n<div class=\"textbox\">\n<p class=\"import-bxt\" style=\"padding-left: 40px\"><span style=\"color: #006838\"><strong><em>EXCEL TIP<\/em> <\/strong><\/span>The keyboard shortcut to call the Add-ins Manager is <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Al<\/em><em class=\"import-i\">t<\/em><\/span>, <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">t<\/em><\/span>, <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">i<\/em><\/span> (press these keys in order without holding any of them down).<\/p>\n<\/div>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Figure 3.2 shows the MCSim add-in dialog box. On the left are three required choices. You must select a cell to track (C1 in our example), the number of repetitions (the default is 1,000), and the random number generator to use. The MCSim add-in comes with its own RNG, RANDOM. Selecting it will replace all RAND in the sheet with RANDOM. The default is no changes.<\/p>\n<p>&nbsp;<\/p>\n<figure style=\"width: 822px\" class=\"wp-caption aligncenter\"><img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p41-1.png\" alt=\"Screenshot of a Monte Carlo Simulation dialog box. The dialog is divided into two sections: Required (left) and Optional (right). Required section (left): &quot;Select a cell&quot; field contains the cell reference $A$1, with a collapse button. &quot;Enter the Number of Repetitions&quot; field is set to 1000. &quot;Choose RNG&quot; group box has &quot;No Changes&quot; selected. Optional section (right) shows nothing selected in any of the fields. Two buttons are at the bottom: &quot;Proceed&quot; (left, with a bold border indicating it is the default action button) and &quot;Cancel&quot; (right).\" width=\"822\" height=\"516\" \/><figcaption class=\"wp-caption-text\"><strong>Figure 3.2: The Monte Carlo simulation Excel add-in.<\/strong><br \/>Source: Screenshot of Excel interface, \u00a9 Microsoft Corporation. Add-in by H. Barreto.<\/figcaption><\/figure>\n<p class=\"import-p\" style=\"text-indent: 36pt\">On the right are some advanced options. Some of these will be used in future work. The <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Se<\/em><em class=\"import-i\">t <\/em><em class=\"import-i\">See<\/em><em class=\"import-i\">d<\/em><\/span> option forces the RNG to begin from the same initial position, which allows for replication of results.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Click in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Selec<\/em><em class=\"import-i\">t a <\/em><em class=\"import-i\">cel<\/em><em class=\"import-i\">l<\/em><\/span> box, clear it, and click cell A1 (displaying RAND()). Click in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Se<\/em><em class=\"import-i\">t <\/em><em class=\"import-i\">See<\/em><em class=\"import-i\">d<\/em><\/span> box and enter 123. Click <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Proceed<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">A new sheet is inserted in the workbook. It shows the first 100 outcomes in column B, summary statistics, and a histogram. It is roughly, but not exactly, a rectangle. If you ran more repetitions, it would be less jagged.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Return to the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">DG<\/em><em class=\"import-i\">P<\/em><\/span> sheet and repeat the simulation of cell A1.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The results are exactly the same as before because the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Se<\/em><em class=\"import-i\">t <\/em><em class=\"import-i\">See<\/em><em class=\"import-i\">d<\/em><\/span> option started the RNG from the same initial value.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Return to the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">DG<\/em><em class=\"import-i\">P<\/em><\/span> sheet and click the <span class=\"import-ccust1\">MCSi<\/span><span class=\"import-ccust1\">m<\/span> button in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Add-in<\/em><em class=\"import-i\">s<\/em><\/span> tab. Select cell C1 (the sum of made free throws) and clear the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Se<\/em><em class=\"import-i\">t <\/em><em class=\"import-i\">See<\/em><em class=\"import-i\">d<\/em><\/span> box. Click <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Proceed<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Figure 3.3 shows the results. Yours will be different because we cleared the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Se<\/em><em class=\"import-i\">t <\/em><em class=\"import-i\">See<\/em><em class=\"import-i\">d<\/em><\/span> box. However, your results will be quite close in the sense that your average is near 90 and the standard deviation (SD) is around 3.<\/p>\n<figure style=\"width: 789px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/back-matter\/alt-text-long-description\/#:~:text=Figure%203.3%3A%20Screenshot,ending%20near%2098.\"><img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p42-1.png\" alt=\"Screenshot of a Monte Carlo simulation, consisting of a summary statistics table and a histogram. Long description linked from image.\" width=\"789\" height=\"664\" \/><\/a><figcaption class=\"wp-caption-text\"><strong>Figure 3.3: Simulation results for sum of 100 attempts from a 90% shooter.<\/strong><\/figcaption><\/figure>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The average and SD are approximations of their true exact analogues, the expected value (EV) is exactly 90, and the standard error (SE) is exactly 3. The EV is the center of the sampling distribution, while the SE is the typical deviation, or bounce, in the statistic.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">We would say that we expect a 90% free throw shooter to make 90 out of 100 attempts, plus or minus 3 free throws. The plus or minus is critical because it tells us the variability in the number of made free throws.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The sim also shows the maximum and minimum shots made from 100 free throws in 1,000 repetitions. In Figure 3.3, the max is 98 and the min is 78.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">This gives an answer to our question. In 1,000 repetitions, the worst a 90% shooter did was 78. It certainly looks like you are not a 90% free throw shooter.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">What if we did more repetitions?<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Run a simulation of 10,000 sets of 100 free throws.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Once again, you are unlikely to see 75 or fewer. It seems the bad luck defense is not going to work. While it is possible that you really are a 90% free throw shooter and had an incredibly unlikely run of bad luck, such an outcome is incredibly rare\u2014so rare, in fact, that we do not believe your claim to be a 90% free throw shooter.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The average and SD values changed a little with the second simulation. This shows that simulation always gives an approximate answer with some variability. Simulation can never give us an exact answer because we cannot run an infinity of repetitions. As the number of repetitions increases, the approximation gets better, but it is never exact.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">By the way, as mentioned earlier, Gauss and statisticians using his work would have answered this question differently. A simple formula would lead immediately to the rejection of your claim.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The procedure begins by computing the SE with the formula [latex]\\sqrt{n}\\sigma=\\sqrt{100}\\times0.3=3[\/latex]. Next, express the observed from the expected difference in standard units: [latex]\\frac{75-90}{3}=-5[\/latex]. This is so far in the tail of the normal (Gaussian) curve that the claim is rejected.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">In other words, 75 out of 100 when 90 was claimed is 5 standard units away from what we expected to see, and this is ridiculously unlikely, so, sorry, we do not believe that you are a 90% free throw shooter who had some bad luck.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">In fact, neither analytical methods nor simulation can ever give a definitive, guaranteed answer. Both agree that, given the evidence, 75 out of 100 means we do not believe the claim that you are a 90% shooter. Since chance is involved, it is possible that you are a 90% shooter and missed every shot. We are not interested in what is possible. We want to know how to use the evidence to decide whether or not we believe a claim.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">If you have taken a statistics course, you might recognize that we are doing hypothesis testing without explicitly saying so. The null is that you are a 90% shooter, and the alternative hypothesis is that you are not. Seventy-five out of 100 produces a test statistic far from the expected 90, so the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">p<\/em><\/span>-value is really small. Thus, we reject the null.<\/p>\n<h3 class=\"import-bh\">Max Streak<\/h3>\n<p class=\"import-paft\">A second example of simulation involves streaks, also known as runs. A streak in this case is a consecutive set of made shots.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Return to the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">DG<\/em><em class=\"import-i\">P<\/em><\/span> sheet. Starting from cell B1, find the first 1 (it could be cell B1), and then count how many 1s in a row you see before you encounter a miss. Write that number down and count the next streak. Continue until you reach the 100th shot attempt. The longest streak is the max streak.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The question is, What is the length of the typical max streak in a set of 100 free throws from a 90% free throw shooter?<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">This is an exceedingly difficult question. It is asking not to count the streaks (also a hard question) but to find the biggest streak in 100 shots. You do not simply add up the number made; you have to find the length of all the streaks and then identify the longest one.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Unlike how many free throws in 100 attempts a 90% shooter will make, you have no easy way to guess the typical max streak. It could be 20, 40, or maybe 50. Who knows? How can we answer this question?<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The analytical approach is a bit of a dead end. There are formulas that approximate a solution (Feller, 1968, p. 325), but the math is somewhat complicated. No exact analytical solution has been found.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Simulation can be used if we can figure out a way to ask the question in Excel so that a cell displays the answer. This means simulation requires some ingenuity. We need a cell that computes the max streak so we can use the MCSim add-in. We do this in two steps: first we figure out how to report the current streak, then we use the MAX function to find the longest streak.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell E1, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=B1<\/em><\/span>. In cell E2, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(B2=1,E1+1,0)<\/em><\/span>. Fill it down a few cells.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Now you can see what the formula is doing. If the shot is made, we add it to the previous running sum, but if it is missed, it resets the running sum to 0. The B2=1 part of the formula tests if the current shot is made, and E1 + 1 increases the current streak length by one. The zero means you missed and the streak is now zero.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Fill the formula down to E100 and look at the values as you scroll back up.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">You should see several streaks in a set of 100 free throws. We want the longest streak. That is the second step in our implementation of the question in Excel, and it is easy.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell F1, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=MAX(E1:E100<\/em><\/span>) and enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">ma<\/em><em class=\"import-i\">x <\/em><em class=\"import-i\">streak<\/em><\/span> in cell G1. Press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> a few times.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Cell F1 displays the max streak from each set of 100 free throws. Max streak is a statistic, just like the sum, because it is a recipe\u2014albeit much more complicated than the sum.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">It has an expected value, standard error, and sampling distribution. We can approximate all of these with simulation.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Run a simulation, with 10,000 repetitions, of cell F1.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Figure 3.4 shows the results. Yours will be a little different. The average is an approximate answer to our question: The max streak is about 27 or so. The exact answer is the expected value, but we have no way of computing it.<\/p>\n<figure style=\"width: 803px\" class=\"wp-caption aligncenter\"><img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p46-1.png\" alt=\"Screenshot of a Monte Carlo simulation with a summary statistics table and a histogram. Summary Statistics table (top) has two columns labeled &quot;Summary Statistics&quot; (blue header) and &quot;Notes&quot; (red header). The Notes column is empty. The statistics reported are: Average = 26.858, SD = 9.5715, Max = 85.000, Min = 9.000. Histogram (bottom) is titled &quot;Histogram of $F$1.&quot; The chart displays the frequency distribution of simulation results as a blocky curve. The horizontal axis runs from 8 to 78. The vertical axis is unlabeled. The bars rise quickly from the left tail starting near 8, reach their peak height around values of 20, then taper off quickly toward the right tail ending near 78.\" width=\"803\" height=\"675\" \/><figcaption class=\"wp-caption-text\"><strong>Figure 3.4: Sim results for a max streak in 100 attempts from a 90% shooter.<\/strong><\/figcaption><\/figure>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The histogram is an approximation to the exact sampling distribution (which no one has figured out how to exactly derive). The graph tells us which values are unlikely: roughly 10 or fewer and 50 or more.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Notice that the sampling distribution of the max streak statistic, unlike the sum, does not appear to follow the normal curve. The max streak has a long right tail and is not symmetric.<\/p>\n<div class=\"textbox textbox--key-takeaways\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Takeaways<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p class=\"import-paft\">Sometimes a function or problem is deterministic, but other times we are faced with <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">stochasti<\/em><em class=\"import-i\">c <\/em><em class=\"import-i\">data<\/em><\/span>\u2014the numbers depend on chance, luck, and randomness. The values we observe are produced by a DGP, and they are volatile.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Monte Carlo simulation is a brute-force approach to answering questions involving stochastic data. A much older alternative, the analytical approach, relies on brainpower to derive formulas.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">To run a simulation, the problem must be implemented in Excel (or some software that can generate random numbers). Of course, one can manually flip a coin many times in the real world, but this is tedious. Simulation did not become a powerful tool until modern computers were invented and enabled a great many repetitions in a short period of time.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Simulation is always only an approximation. By running more repetitions, the approximation improves, but it can never give an exact answer because it would have to run forever.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Often, we are searching for the sampling distribution of a statistic. This tells us the chances of each outcome, the typical result (called the expected value), and the dispersion in values (called the standard error, or SE).<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The MCSim add-in always produces summary statistics and a histogram. If the cell that is tracked is a statistic, then the average is the approximate expected value and the SD is the approximate SE.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">In case you think streaks are a waste of time, look at this headline to an article in <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Th<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">Wal<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">Stree<\/em><em class=\"import-i\">t <\/em><em class=\"import-i\">Journa<\/em><em class=\"import-i\">l<\/em><\/span> on July 27, 2023, on page B1:<\/p>\n<p class=\"import-wls\" style=\"margin-left: 36pt;margin-right: 36pt\">Dow Sets Longest Winning Streak Since 87<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The Dow\u2019s streak of 13 consecutive sessions with gains ended the next day. The longest streak ever (as of this writing) is 14, back in 1897.<\/p>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">References<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p class=\"hanging-indent\">The epigraph is a famous quote from 1951, when computer science was taking off. In \u201cVarious Techniques Used in Connection with Random Digits\u201d (freely available at <span style=\"border: none windowtext 0pt;padding: 0\"><a class=\"rId48\" href=\"https:\/\/mcnp.lanl.gov\/pdf_files\/InBook_Computing_1961_Neumann_JohnVonNeumannCollectedWorks_VariousTechniquesUsedinConnectionwithRandomDigits.pdf\"><span class=\"import-url\">https:\/\/mcnp.lanl.gov\/pdf_files\/InBook_Computing_1961_Neumann_JohnVonNeumannCollectedWorks_VariousTechniquesUsedinConnectionwithRandomDigits.pdf<\/span><\/a><\/span>), von Neumann supported the use of pseudorandom number generation but warned against misinterpreting what these numbers meant. There are many algorithms for random number generation, some are better and others worse. Excel\u2019s RAND() is not great.<\/p>\n<p class=\"hanging-indent\">Cohen, B. (2020). <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Th<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">Ho<\/em><em class=\"import-i\">t <\/em><em class=\"import-i\">Hand<\/em><em class=\"import-i\">: <\/em><em class=\"import-i\">Th<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">Myster<\/em><em class=\"import-i\">y <\/em><em class=\"import-i\">an<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">Scienc<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">o<\/em><em class=\"import-i\">f <\/em><em class=\"import-i\">Streaks<\/em><\/span> (Custom House). Russ Roberts interviews Cohen in an August 10, 2020, episode of EconTalk, available at <a href=\"http:\/\/www.econtalk.org\/ben-cohen-on-the-hot-hand\"><span style=\"border: none windowtext 0pt;padding: 0\"><span class=\"import-url\">www.econtalk.org\/ben-cohen-on-the-hot-hand<\/span><\/span><\/a>.<\/p>\n<p class=\"hanging-indent\">Feller, W. (1968, 3rd ed.). <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">A<\/em><em class=\"import-i\">n <\/em><em class=\"import-i\">Introductio<\/em><em class=\"import-i\">n <\/em><em class=\"import-i\">t<\/em><em class=\"import-i\">o <\/em><em class=\"import-i\">Probabilit<\/em><em class=\"import-i\">y <\/em><em class=\"import-i\">Theor<\/em><em class=\"import-i\">y <\/em><em class=\"import-i\">an<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">It<\/em><em class=\"import-i\">s <\/em><em class=\"import-i\">Applications<\/em><\/span> (John Wiley &amp; Sons), <a href=\"http:\/\/archive.org\/details\/introductiontopr0001fell\"><span style=\"border: none windowtext 0pt;padding: 0\"><span class=\"import-url\">archive.org\/details\/introductiontopr0001fell<\/span><\/span><\/a>.<\/p>\n<\/div>\n<\/div>\n<h2 class=\"import-ah\">3.2 Simulating Parrondo\u2019s Paradox<\/h2>\n<p class=\"import-paft\">A paradox is something (such as a situation) with opposing elements that seems impossible but is actually true.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">An optical illusion is related to a paradox in that you see something that is not easily explained or can seem impossible. Figure 3.5 is an example. Do you see the old woman, or the young lady, or both? Your age affects what you see in this drawing\u2014older people are more likely to see the old lady (Nicholls et al., 2018).<\/p>\n<p>&nbsp;<\/p>\n<figure style=\"width: 586px\" class=\"wp-caption aligncenter\"><img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p48-1.png\" alt=\"A drawing that produces an optical illusion where you see an old lady or young girl depending on how you look at it.\" width=\"586\" height=\"818\" \/><figcaption class=\"wp-caption-text\"><strong>Figure 3.5: Old woman or young lady?<\/strong><br \/>Source: <a href=\"https:\/\/www.nature.com\/articles\/s41598-018-31129-7\/figures\/1\"><i>My Wife and My Mother-In-Law<\/i>, by the cartoonist W. E. Hill, 1915<\/a> \/ <a href=\"https:\/\/creativecommons.org\/public-domain\/pdm\/\">Public domain.<\/a><\/figcaption><\/figure>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Juan Parrondo is a physicist who discovered the paradox named after him in 1996. Parrondo\u2019s Paradox occurs when <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">tw<\/em><em class=\"import-i\">o <\/em><em class=\"import-i\">losin<\/em><em class=\"import-i\">g <\/em><em class=\"import-i\">game<\/em><em class=\"import-i\">s <\/em><em class=\"import-i\">ar<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">combine<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">an<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">the<\/em><em class=\"import-i\">y <\/em><em class=\"import-i\">produc<\/em><em class=\"import-i\">e a <\/em><em class=\"import-i\">winnin<\/em><em class=\"import-i\">g <\/em><em class=\"import-i\">game<\/em><\/span>. That is puzzling and counterintuitive.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Almost always, game outcomes are additive, so a losing game plus a losing game equals a losing game just like adding two negative numbers gives an even more negative number. Parrondo found what can be described as a black hole in the parameter space where adding two losers yields a winner.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The paradoxical nature of the result will become apparent when we implement the games in Excel and directly examine the outcomes. Our goal is to show how simulation makes Parrondo\u2019s Paradox crystal clear.<\/p>\n<h3 class=\"import-bh\">Losing Game A<\/h3>\n<p class=\"import-paft\">Game A is a coin flip with a slightly negatively biased coin. Heads earns you +1 monetary units (M) and tails \u22121. The coin is flipped 100 times, and we keep a running sum after each flip. On average, at the end of the game, the result is negative, so we say Game A is a losing game.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell A1, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Gam<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">A<\/em><\/span>. In cell A3, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">epsilon<\/em><\/span>; and in cell B3, enter the number 0.005. In cell A4, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">p(H)<\/em><\/span>; this is the probability of flipping a head. In cell B4, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=0.5-B3<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">With a probability of heads less than 50%, we will flip heads less often than tails. This is why this game is a loser.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell A6, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Fli<\/em><em class=\"import-i\">p <\/em><em class=\"import-i\">#<\/em><\/span> and create a series from 1 to 100 in cells A7:A106. In cell B6, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Result<\/em><\/span>, and in cell B7, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(RAND()<\/em><em class=\"import-i\">&lt;<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">B<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">4,1,0)<\/em><\/span>. Fill it down to cell B106.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Column B has our simulated coin flips. We know RAND() is uniformly distributed on the interval [0,1]. It will produce tails slightly more frequently than heads because cell B4<span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">&lt;<\/em><\/span>0.5.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">We track the money in column C with an IF statement.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell C6, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">En<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">M<\/em><\/span>, and in cell C7, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(B7=1,1,-1)<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The formula for the next cell is different because we have to track how much money we had at the end of the previous flip. Thus, we add the cell above.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell C8, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(B8=1,1<\/em><em class=\"import-i\">,<\/em><em class=\"import-i\">-1)+C7<\/em><\/span>. Fill it down to cell C106. Select C6:C106 and make a <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Scatte<\/em><em class=\"import-i\">r<\/em><\/span> chart. Press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> a few times.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The chart shows the entire game, and cell C106 tells us the outcome of Game A. If it is positive, the game was won; if negative, it was lost. The number tells us how much we won or lost.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">We can use the MCSim Excel add-in to examine the sampling distribution of C106. If needed, download and install MCSim from <a href=\"http:\/\/dub.sh\/addins\">dub.sh\/addins<\/a>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Select cell range C7:C106 and click <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCSi<\/em><em class=\"import-i\">m<\/em><\/span> (in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Add-in<\/em><em class=\"import-i\">s<\/em><\/span> tab). Check the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Recor<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">Al<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">Selecte<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">Cell<\/em><em class=\"import-i\">s<\/em><\/span> option and click <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Proceed<\/em><\/span>. The simulation is fast, but Excel may take some time to display the results.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Excel inserts two sheets in the workbook. The <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCSi<\/em><em class=\"import-i\">m<\/em><\/span> sheet has the result for C7 (the first coin flip), but the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCRa<\/em><em class=\"import-i\">w<\/em><\/span> sheet has 1,000 rows and 100 columns of numbers (which is why Excel took so long to display the results).<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">We can process these numbers to understand Game A. Each row is a game with 100 flips running left to right. Each flip number (column) is called an <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">ensembl<\/em><em class=\"import-i\">e<\/em><\/span>, and the average of each flip number is called the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">ensembl<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">average<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell A1003 of the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCRa<\/em><em class=\"import-i\">w<\/em><\/span> sheet, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=AVERAGE(A2:A1001)<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">This value will agree exactly with the average in J5 of the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCSi<\/em><em class=\"import-i\">m<\/em><\/span> sheet. This value is the average M after one flip. It is slightly negative.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Select cell A1003 in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCRaw<\/em><\/span> sheet and fill it right to the 100th column (CV). With these cells selected, make a chart by clicking <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Inser<\/em><em class=\"import-i\">t<\/em><\/span> and choosing <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Scatte<\/em><em class=\"import-i\">r<\/em><\/span> with lines and no markers. Title the chart \u201cFlip #\u201d (since the title is right above the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">x<\/em><\/span>-axis, it labels the axis).<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Your chart is an approximation to the exact ensemble average in Figure 3.6. Simulation produces random deviation from the exact object. We could improve the approximation by increasing the number of repetitions. The squiggly graph in the simulation would converge to the line in Figure 3.6.<\/p>\n<figure style=\"width: 870px\" class=\"wp-caption aligncenter\"><img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p51-1.png\" alt=\"Plot of the exact evolution of losing Game A in Parrondo's Paradox. The y axis is labeled as money from 0.0 to negative 1.2. The x axis is the flip number from 0-120. There is a straight diagonally line starting at (0,0) down to the right, ending at (100, negative 1.0).\" width=\"870\" height=\"503\" \/><figcaption class=\"wp-caption-text\"><strong>Figure 3.6: The exact evolution of Game A.<\/strong><\/figcaption><\/figure>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Your simulation and Figure 3.6 make clear that Game A is a loser. From the first flip, it gets steadily worse, and by the last flip, you can expect to lose 1 monetary unit.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Of course, not every single game is a loser. Column CV in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCRa<\/em><em class=\"import-i\">w<\/em><\/span> sheet shows many cells that are positive. On average, however, we can expect to lose playing Game A.<\/p>\n<h3 class=\"import-bh\">Losing Game B<\/h3>\n<p class=\"import-paft\">Game B is also a loser, but it is more complicated than Game A. Game B is based on two coins, and you use Coin 1 if your current monetary holding is divisible by 3; otherwise, you use Coin 2. The MOD function enables us to determine which coin to use.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Return to the sheet with Game A, and in cell E1, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=MOD(13,12)<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The cell displays 1 because it is doing modulo arithmetic. Converting military time to a.m.\/p.m. time uses the modulo operator: 1300 is 1 p.m. because you divide by 12 and the remainder is the answer.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Game B follows the flow chart in Figure 3.7. We will use the MOD function to divide any number by 3, and if the result is 0, we know it is evenly divisible, and we flip Coin 1. If not, we flip Coin 2.<\/p>\n<figure style=\"width: 893px\" class=\"wp-caption aligncenter\"><img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p52-1.png\" alt=\"Display of how Game B works in Parrondo's Paradox using a flow chart. Is M divisible by 3? Choose yes or no. From yes, the option is Coin 1: Prob Heads = 10% - epsilon. From no, the option is Coin 2: Prob Heads = 75% - epsilon.\" width=\"893\" height=\"392\" \/><figcaption class=\"wp-caption-text\"><strong>Figure 3.7: The rules of Game B.<\/strong><\/figcaption><\/figure>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell G1, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Gam<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">B<\/em><\/span>. In cells G3 and G4, enter the labels \u201cCoin 1\u201d and \u201cp(H).\u201d In cell H4, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=0.1-B3<\/em><\/span>. In cells J3 and J4, enter the labels \u201cCoin 2\u201d and \u201cp(H).\u201d In cell K4, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=0.75-B3<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Obviously, we would rather flip Coin 2. It comes up heads almost 75% of the time, so we win 1 monetary unit. Coin 1 is the opposite. It is strongly biased against heads, so we lose often with Coin 1.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Copy cells A6:A107 and paste in cell G6. In cell H6, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Star<\/em><em class=\"import-i\">t <\/em><em class=\"import-i\">M<\/em><\/span>, and in cell H7, enter a 0. In cell I6, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MOD(M,3)<\/em><\/span>, and in cell I7, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=MOD(H7,3)<\/em><\/span>. In cell J6, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Coin<\/em><\/span>, and in cell J7, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(I7=0,1,2)<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">You start with 0 monetary units. That is evenly divisible by 3, so we will use Coin 1, but we need a more general formula to determine what happens if it is Coin 1 or Coin 2. An IF statement can handle this.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell K6, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Result<\/em><\/span>, and in cell K7, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(J7=1,IF(RAND()<\/em><em class=\"import-i\">&lt;<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">H<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">4,1,0),IF(RAND()<\/em><em class=\"import-i\">&lt;<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">K<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">4,1,0))<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Next, we report our money position.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell L6, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">En<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">M<\/em><\/span>, and in cell L7, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(K7=1,1,-1)<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">In the next row, we walk through the cells in order.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell H8, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=L<\/em><em class=\"import-i\">7<\/em><\/span> and fill it down. Select cell range I7:K7 and fill it down. In cell L8, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(K8=1,L7+1,L7<\/em><em class=\"import-i\">&#8211;<\/em><em class=\"import-i\">1)<\/em><\/span>. Fill it down. Select L6:L106 and make a Scatter chart.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">This completes Game B. Cell L106 gives the final result of the game, but as we did before, we can track every flip of the game to better understand it.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Select cell range L7:L106 and click <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCSi<\/em><em class=\"import-i\">m<\/em><\/span> (in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Add-in<\/em><em class=\"import-i\">s<\/em><\/span> tab). Check (if needed) <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Recor<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">Al<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">Selecte<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">Cell<\/em><em class=\"import-i\">s<\/em><\/span> and click <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Proceed<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">As before, two sheets are inserted, and we will process the data in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCRa<\/em><em class=\"import-i\">w<\/em><\/span> sheet to show how Game B works.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Return to the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCRaw<\/em><em class=\"import-i\">3<\/em><\/span> sheet and copy row 1003. Go to the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCRaw<\/em><em class=\"import-i\">5<\/em><\/span> sheet, select cell A1003, and paste. Make a chart of row 1003.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Your results are surprising. In the first few flips of the ensemble average, it jaggedly oscillates and then settles down to a downward-sloping relationship.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The exact ensemble average is given by Figure 3.8. The simulation is correct in that there is an oscillation in the expected value in the first few flips before convergence to a single line that heads downward.<\/p>\n<figure style=\"width: 865px\" class=\"wp-caption aligncenter\"><img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p53-1.png\" alt=\"Line chart showing the cumulative change in money (vertical axis, labeled &quot;Money,&quot; ranging from -1.6 to 0.0) over a series of coin flips (horizontal axis, labeled &quot;Flip #,&quot; ranging from 0 to 120). The line begins near 0 at flip 0 and immediately drops into negative territory. In the early flips (approximately 0 to 30), the line oscillates with noticeable up-and-down fluctuations, suggesting alternating wins and losses, but with a generally downward trend. The amplitude of the oscillations decreases over time. From approximately flip 30 onward, the oscillations dampen and the line transitions into a smooth, steadily declining curve with little variation, trending consistently downward. By flip 100, the cumulative money value reaches a labeled endpoint of -1.392320, indicating a net loss of approximately $1.39 over 100 flips.\" width=\"865\" height=\"509\" \/><figcaption class=\"wp-caption-text\"><strong>Figure 3.8: The exact evolution of Game B.<\/strong><\/figcaption><\/figure>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Your simulation results and Figure 3.8 show that Game B is a loser and a bigger loser than Game A. We can expect to lose about 1.4 monetary units playing Game B.<\/p>\n<h3 class=\"import-bh\">Mixing Two Losing Games<\/h3>\n<p class=\"import-paft\">Having set up and run Games A and B separately, we are now ready to mix these two losing games. This will demonstrate Parrondo\u2019s Paradox because, somehow, mixing the losing games results in a winning game. In fact, there is an optimal mixing strategy, but we will randomly mix the two games by flipping a fair coin.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell N1, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Rando<\/em><em class=\"import-i\">m <\/em><em class=\"import-i\">Mixing<\/em><\/span>. Copy cell range A6:A106, select cell N6, and paste. In cell O6, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Game<\/em><\/span>, and in cell O7, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(RAND()<\/em><em class=\"import-i\">&lt;<\/em><em class=\"import-i\">0.5,\u201cA<\/em><em class=\"import-i\">\u201d,<\/em><em class=\"import-i\">\u201cB\u201d)<\/em><\/span>. Fill it down.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Column O tells us which game we will play at each coin flip. It is easy to see by pressing <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> repeatedly that the letters A and B are bouncing around, indicating that we are mixing the games randomly.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">We need to input Game B again (we cannot just use Game B in columns G:L) because it depends on the value of M to decide which coin to play.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell P5, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">I<\/em><em class=\"import-i\">f <\/em><em class=\"import-i\">Gam<\/em><em class=\"import-i\">e B <\/em><em class=\"import-i\">i<\/em><em class=\"import-i\">s <\/em><em class=\"import-i\">chosen<\/em><\/span>. Copy cell range H6:K7, select cell P6, and paste.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">We need an IF statement to display the actual outcome of this game based on whether we play Game A or Game B. We take Game A from column B, since it does not depend on the amount of M we have, but we take Game B from column S.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell T6, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Actua<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">Result<\/em><\/span>, and in cell T7, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(O7=\u201cA<\/em><em class=\"import-i\">,\u201d<\/em><em class=\"import-i\">B7,S7)<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Next, we determine our monetary position.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell U6, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">En<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">M<\/em><\/span>, and in cell U7, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(T7=1,1,-1)<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">We process the second flip and fill down to complete the implementation.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell P8, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=U7<\/em><\/span>. Fill it down. Select cell range Q7:T7 and fill it down. In cell U8, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(T8=1,U7+1,U7-1)<\/em><\/span>. Fill it down. Select cell range U6:U106 and make a <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Scatte<\/em><em class=\"import-i\">r<\/em><\/span> chart.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">This completes the random mixing of two losing games. Cell U106 gives the final result of the game. Pressing <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> does not reveal much. We need to run a simulation.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Select cell range U7:U106 and click <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCSi<\/em><em class=\"import-i\">m<\/em><\/span> (in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Add-in<\/em><em class=\"import-i\">s<\/em><\/span> tab). Confirm that the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Recor<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">Al<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">Selecte<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">Cell<\/em><em class=\"import-i\">s<\/em><\/span> option is still checked and click <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Proceed<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">We have the data to demonstrate Parrondo\u2019s Paradox, but we need to create an ensemble average chart.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Return to the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCRaw<\/em><em class=\"import-i\">5<\/em><\/span> sheet and copy row 1003. Go to the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCRaw<\/em><em class=\"import-i\">7<\/em><\/span> sheet, select cell A1003, and paste. Make a chart of row 1003.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The results are absolutely stunning. Unlike our two previous charts, this one points upward, and the final value is positive! This is a winning game! Figure 3.9 shows the exact evolution of the randomly mixed games. The expected value of playing a random combination of Games A and B keeps rising the more you play. That is mind-boggling.<\/p>\n<figure style=\"width: 842px\" class=\"wp-caption aligncenter\"><img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p55-1.png\" alt=\"Line chart showing cumulative money (vertical axis, labeled &quot;Money,&quot; ranging from -0.6 to 1.4) over a series of coin flips (horizontal axis, labeled &quot;Flip #,&quot; ranging from 0 to 120) showing positive and rising expected value. The line begins at 0 and drops sharply into negative territory in the first few flips, reaching a trough of approximately -0.4 around flip 5, with notable early volatility shown by small oscillations. From approximately flip 10 onward, the line transitions into a smooth, steadily and nearly linearly increasing trajectory, crossing into positive territory around flip 15\u201320 and continuing upward without significant fluctuation. By flip 100, the cumulative money value reaches a labeled endpoint of 1.287327.\" width=\"842\" height=\"511\" \/><figcaption class=\"wp-caption-text\"><strong>Figure 3.9: The exact evolution of randomly mixing A and B.<\/strong><\/figcaption><\/figure>\n<p class=\"import-p\" style=\"text-indent: 36pt\">By randomly mixing individually losing Games A and B, we can expect to win about 1.3 monetary units playing 100 times. This is Parrondo\u2019s Paradox.<\/p>\n<h3 class=\"import-bh\">What Is Going on Here?<\/h3>\n<p class=\"import-paft\">Does this work show that you can walk into a casino and take turns playing blackjack and roulette and come out a winner? No.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Does it mean that you can combine two losing stocks and somehow make money? No.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Does it mean that I can take something poisonous and then drink another poison and the two will combine to heal me? No.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Parrondo\u2019s Paradox does not say that mixing <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">an<\/em><em class=\"import-i\">y<\/em><\/span> two losing games produces a winner. The two games and epsilon value were chosen carefully. Parrondo found a parameter value that generated the anomalous result. Think of a Cartesian plane with a coordinate that is like a black hole\u2014all the other coordinates behave as expected, but this particular point is really weird. Parrondo found such a point by carefully picking the bias (epsilon) in Games A and B.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Applying this paradox to the real world is a challenge. Explaining the inspiration for Parrondo\u2019s discovery of the paradox will help us understand how the paradox emerges.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Parrondo is a physicist, and his discovery of an epsilon value that produced the paradox was influenced by something called the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">fl<\/em><em class=\"import-i\">ashin<\/em><em class=\"import-i\">g <\/em><em class=\"import-i\">Brownia<\/em><em class=\"import-i\">n <\/em><em class=\"import-i\">ratchet<\/em><\/span>. This is a process that alternates between two regimes in a sawtooth fashion.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Watch this two-minute video to see the ratchet in action and how it applies to Parrondo\u2019s Paradox: <a href=\"http:\/\/vimeo.com\/econexcel\/parrondo\"><span style=\"border: none windowtext 0pt;padding: 0\"><span class=\"import-url\">vimeo.com\/<\/span><span class=\"import-url\">econexcel<\/span><span class=\"import-url\">\/<\/span><span class=\"import-url\">parrondo<\/span><\/span><\/a>. You can control the ratchet yourself here: <a href=\"http:\/\/dub.sh\/ratchet\">dub.sh\/ratchet<\/a>.<\/p>\n<p><iframe id=\"oembed-1\" title=\"Ratchet Effect\" src=\"https:\/\/player.vimeo.com\/video\/292312218?dnt=1&amp;app_id=122963\" width=\"500\" height=\"281\" frameborder=\"0\"><\/iframe><\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">It is true that Parrondo\u2019s Paradox requires a specific, and we might add rare, type of losing game to be mixed. The paradox would never emerge if the two losing games were like Game A. Mixing two Game As would produce a bigger negative outcome.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Game B, with its two coins, one of which is biased in our favor, holds the key to the paradox. Figure 3.8 tells us that for the first few flips, the expected value of Game B alternates. Mixing takes advantage of the positive parts of Game B in those first few flips.<\/p>\n<div class=\"textbox textbox--key-takeaways\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Takeaways<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p class=\"import-paft\">Optical illusions and paradoxes are mind-bending. They violate what we expect to happen and force us to deal with something unbelievable.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #b00000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Watch a classic two-minute video of an optical illusion with an explanation of how it works: <a href=\"http:\/\/dub.sh\/faceillusion\">dub.sh\/faceillusion<\/a>.<\/p>\n<p><iframe id=\"oembed-2\" title=\"Charlie Chaplin Optic Illusion\" width=\"500\" height=\"375\" src=\"https:\/\/www.youtube.com\/embed\/QbKw0_v2clo?feature=oembed&#38;rel=0\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\"><\/iframe><\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Like an optical illusion, Parrondo\u2019s Paradox produces a shocking result: Loser plus loser equals winner. That should not happen.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">At a magic show, we know a trick is involved, so the person was not really cut in half or made to disappear. There is a logical reason\u2014we just do not know what it is unless it is explained to us.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Similarly, Parrondo\u2019s Paradox can be explained. The key lies in the ratchet, which trends down but in a sawtooth, herky-jerky motion. The video (<a href=\"http:\/\/dub.sh\/ratchet\">dub.sh\/ratchet<\/a>) shows how it catches the ball at just the right time and pushes it upward, even as it is heading downward, producing an overall upward movement.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Parrondo\u2019s Paradox requires specific values for epsilon, the bias in coins being flipped, and Game B is actually a combination of two coins that are used based on whether the player\u2019s total amount of money is evenly divisible by 3.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">It is Game B that has the ratchet that explains the paradox. Figure 3.8 shows that it is a losing game (heading downward), but look carefully at the beginning\u2014the oscillations during the first few flips contain the explanation to the paradox.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #b00000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Listen to this 10-minute podcast at <a href=\"http:\/\/dub.sh\/parrondo\">dub.sh\/parrondo<\/a>.<\/p>\n<p><!--[if lt IE 9]><script>document.createElement('audio');<\/script><![endif]--><br \/>\n<audio class=\"wp-audio-shortcode\" id=\"audio-32-1\" preload=\"none\" style=\"width: 100%;\" controls=\"controls\"><source type=\"audio\/mpeg\" src=\"https:\/\/barretoh.github.io\/GitHuBratchet\/ParrondoParadox.mp3?_=1\" \/><a href=\"https:\/\/barretoh.github.io\/GitHuBratchet\/ParrondoParadox.mp3\">https:\/\/barretoh.github.io\/GitHuBratchet\/ParrondoParadox.mp3<\/a><\/audio><\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">This podcast was generated by NotebookLM (a Google AI experiment in October of 2024 freely available at <span style=\"border: none windowtext 0pt;padding: 0\"><a class=\"rId59\" href=\"https:\/\/notebooklm.google.com\/\"><span class=\"import-url\">https:\/\/notebooklm.google.com\/<\/span><\/a><\/span>). In October of 2024, I was shocked by what this AI could do, and 18 months later, I remain quite impressed by NotebookLM!<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">We used simulation to explain Parrondo\u2019s Paradox, but it is not an integral part of the paradox. An analytical solution using Markov chains provides exact results. The analytical solution was used to create the exact evolution charts (Barreto, 2009).<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Usually, we use simulation to represent a real-world process. We do not have to make it a perfect representation, but it must capture the essential elements.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">We can go, however, in the other direction, from an artificial environment to the real world. Parrondo found something paradoxical, and now we are asking, \u201cIs there something like this in reality?\u201d Could it ever make sense to combine stocks or medicines or anything else in a way that reverses the negative result? The search is on.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Finally, random mixing produces a winning game with an expected value of about 1.3 monetary units at the 100th flip, but there is an optimal mix. Playing AB, then ABBABBABB repeatedly, produces an expected value of a little over 6 monetary units. Barreto (2009) explains the analytical solution and optimal mixing with Excel.<\/p>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">References<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p class=\"hanging-indent\">Barreto, H. (2009). \u201cA Microsoft Excel Version of Parrondo\u2019s Paradox.\u201d <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">SSR<\/em><em class=\"import-i\">N <\/em><em class=\"import-i\">Workin<\/em><em class=\"import-i\">g <\/em><em class=\"import-i\">Paper. <\/em><\/span>Available at SSRN: <a href=\"https:\/\/ssrn.com\/abstract=1431958\" target=\"_blank\" rel=\"noopener\">https:\/\/ssrn.com\/abstract=1431958<\/a>\u00a0or\u00a0<a class=\"textlink\" href=\"https:\/\/dx.doi.org\/10.2139\/ssrn.1431958\" target=\"_blank\" rel=\"noopener\">http:\/\/dx.doi.org\/10.2139\/ssrn.1431958<\/a><\/p>\n<p class=\"hanging-indent\">Barreto, H. (2018, Sept. 28). <em>Ratchet effect<\/em>. [Video]. Vimeo. <a href=\"http:\/\/vimeo.com\/econexcel\/parrondo\"><span style=\"border: none windowtext 0pt;padding: 0\"><span class=\"import-url\">vimeo.com\/<\/span><span class=\"import-url\">econexcel<\/span><span class=\"import-url\">\/<\/span><span class=\"import-url\">parrondo<\/span><\/span><\/a><\/p>\n<p class=\"hanging-indent\">Nicholls, E., Churches, O., and Loetscher, T. (2018). \u201cPerception of an Ambiguous Figure Is Affected by Own-Age Social Biases.\u201d <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Scienti<\/em><em class=\"import-i\">fi<\/em><em class=\"import-i\">c <\/em><em class=\"import-i\">Report<\/em><em class=\"import-i\">s<\/em><\/span> 8, no. 12661. Open access: <a href=\"http:\/\/www.nature.com\/articles\/s41598-018-31129-7\"><span style=\"border: none windowtext 0pt;padding: 0\"><span class=\"import-url\">www.nature.com\/articles\/s41598-018-31129-7<\/span><\/span><\/a>.<\/p>\n<p class=\"hanging-indent\">RayOman. (2009, August 17). <em data-start=\"104\" data-end=\"136\">Charlie Chaplin Optic Illusion<\/em> [Video]. YouTube. <a class=\"\" href=\"https:\/\/www.youtube.com\/watch?v=QbKw0_v2clo\" target=\"_new\" rel=\"noopener\" data-start=\"155\" data-end=\"198\">https:\/\/www.youtube.com\/watch?v=QbKw0_v2clo<\/a>.<\/p>\n<p class=\"hanging-indent\">A Mathematica version: <a href=\"http:\/\/demonstrations.wolfram.com\/TheParrondoParadox\/\"><span style=\"border: none windowtext 0pt;padding: 0\"><span class=\"import-url\">demonstrations.wolfram.com\/<\/span><span class=\"import-url\">TheParrondoParadox<\/span><span class=\"import-url\">\/<\/span><\/span><\/a>.<\/p>\n<p class=\"hanging-indent\">A YouTube demonstration: <a href=\"http:\/\/www.youtube.com\/watch?v=PpvboBJEozM\"><span style=\"border: none windowtext 0pt;padding: 0\"><span class=\"import-url\">www.youtube.com\/watch?v=PpvboBJEozM<\/span><\/span><\/a>.<\/p>\n<p class=\"hanging-indent\">For an entertaining read on paradoxes, try <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Perplexing Paradoxes<\/em><\/span> by George Szpiro (2024).<\/p>\n<\/div>\n<\/div>\n<h2 class=\"import-ah\">3.3 Pooled Testing via Simulation<\/h2>\n<p class=\"import-paft\">Those who lived through the COVID-19 pandemic are certain to remember it for a long time: masks, mandates, social distancing, vaccines, virtual meetings, and for many of us, losing loved ones. We will also remember testing for coronavirus with nasal swabs and at-home kits.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">It never caught on in the United States during the COVID-19 pandemic, but pooled testing was considered because it could reduce the number of tests needed and save time (Mandavilli, 2020). Pooled testing was used successfully by the United States military during World War II to test men for syphilis (Dorfman, 1943).<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The logic of pooled testing is straightforward. A university, for example, could take saliva or nasal swab samples from each student and test them individually, or it could combine a part of each sample from several people into a single group and test the pooled sample. If it is negative, then all the individuals in the combined pool are negative, and we have saved on testing every person in that group. If the pooled sample is positive, then individual tests would be performed on the reserved parts of each individual\u2019s sample to determine exactly who is infected.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">This leads to a crucial question: <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Wha<\/em><em class=\"import-i\">t <\/em><em class=\"import-i\">i<\/em><em class=\"import-i\">s <\/em><em class=\"import-i\">th<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">optima<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">grou<\/em><em class=\"import-i\">p <\/em><em class=\"import-i\">size<\/em><em class=\"import-i\">?<\/em><\/span> The bigger the group, the lower the number of groups tested but the higher the chances a group is positive, and then everyone in the group has to be tested (we ignore the possibility of subgroup testing, false positives or negatives, and other complications).<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">We solve this optimization problem by constructing an Excel spreadsheet and using Monte Carlo simulation. We proceed step by step and reveal Excel functions and tools as we create our model of pooled testing.<\/p>\n<h3 class=\"import-bh\">The Data Generation Process<\/h3>\n<p class=\"import-paft\">The first thing we need to do is implement the random process by which some people get infected and others do not. We do this by drawing a random number and comparing it to a threshold value, so we get either a 0 (not infected) or a 1 (infected).<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">We make the simplifying assumption that everyone has the same likelihood of catching the virus\u2014say, 5%. This is an <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">exogenou<\/em><em class=\"import-i\">s <\/em><em class=\"import-i\">variabl<\/em><em class=\"import-i\">e<\/em><\/span> (also called a <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">parameter<\/em><\/span>) in our model and it will serve as our threshold value for determining whether someone is infected.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Enter 5% in cell A1 of a blank spreadsheet and label it as \u201cinfection rate\u201d in cell B1. Save the Excel file (<span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">PooledTesting.xls<\/em><em class=\"import-i\">x<\/em><\/span> is a good name).<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Cell A1 displays 5%, which is the same as 0.05 in decimal notation. The number 0.05 is what the spreadsheet stores in its internal memory. It is worth remembering that what is displayed may be different from what is stored.<\/p>\n<div class=\"textbox\">\n<p class=\"import-bxt\" style=\"padding-left: 40px\"><span style=\"color: #006838\"><strong><em>EXCEL TIP <\/em><\/strong><\/span>Name cells, especially if you have many or complicated formulas. We have been using cell addresses, and we can, of course, refer to cell A1 in a formula, but cell addresses can be difficult to read. It is good practice to name a cell or a cell range so that formulas can use natural language to reference cells.<\/p>\n<\/div>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Name cell A1 <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">InfectionRat<\/em><em class=\"import-i\">e<\/em><\/span> because this will make our future formulas easier to understand. If needed, search Excel\u2019s Help for \u201cnames in formulas.\u201d<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Next, we incorporate randomness. As you know, Excel draws uniformly distributed random numbers in the interval from 0 to 1 with the RAND() function.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell A3, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=RAND()<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Just like the free throw shooting and coin-flipping examples, we use Excel\u2019s random number generator to determine whether or not a person gets COVID-19. We use an IF statement to group the RAND-generated values into two categories, 0 and 1.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell A4, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(A3<\/em><em class=\"import-i\">&lt;<\/em><em class=\"import-i\">InfectionRate<\/em><\/span>, <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">1<\/em><\/span>, <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">0)<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">You probably see a 0 displayed in cell A4; if not, press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> (you may have to use the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">f<\/em><em class=\"import-i\">n<\/em><\/span> key). Zero means the person is not infected.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> repeatedly to recalculate the sheet until you see a 1 in cell A4. The chances are only 1 in 20, so be patient.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">As you recalculated, cell A3 constantly changed, but cell A4 changed only if A3 switched from being above or below the infection rate. Cell A4 is a <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">binomia<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">rando<\/em><em class=\"import-i\">m <\/em><em class=\"import-i\">variabl<\/em><em class=\"import-i\">e<\/em><\/span> because it can take only the values 0 or 1.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Now that we know how to implement a random process that outputs whether or not an individual is infected, we can create an entire population of people, some who get infected with the virus and others who do not.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell C1, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=IF(RAND()<\/em><em class=\"import-i\">&lt;<\/em><em class=\"import-i\">InfectionRate,1,0)<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Notice how we directly embedded the RAND() function in the cell formula. We do not know which random number was drawn, but we do know if it was less than 5% because then cell C1 would display \u201c1.\u201d<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Fill down this formula all the way to cell C1000.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">As you scroll back up to the top row, you will see a sprinkling of ones among many zeroes. With a 5% infection rate, roughly 1 in 20 cells will have random number draws less than 5% and therefore show the number one.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The fact that each cell in column C stands alone and does not depend on or influence other cells means we are assuming <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">independence<\/em><\/span>. In our model, one person with the virus does not affect the chances of anyone else being infected. This condition is surely violated in the real world. To improve our analysis, we should make the chances of infection depend on whether people with whom they come in contact have the virus.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">However, since our focus here is on showing how pooled testing works, we will not model infection as dependent on people nearby. This would be a fun project where you would create clusters of cells, and if one got sick, the nearby cells would have a much higher chance of infection.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">How many people in our population of 1,000 are infected?<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=SUM(C1:C1000<\/em><\/span>) in cell D1 and the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Numbe<\/em><em class=\"import-i\">r <\/em><em class=\"import-i\">infected<\/em><\/span> in cell E1.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">You will see a number around 50 in cell D1. The number of infected people is not always exactly 50 because chance is involved in who gets infected.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Recalculate by pressing <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> a few times to get a sense of the variability in the number of infected people.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The total number of infected people can be less than 40 or more than 60, but that is not common. Usually, there are around 45 to 55 infected. There is no doubt that the number of infected people is a random number, since it is bouncing around when you recalculate the sheet. It makes common sense that adding binomial random variables will produce a random outcome.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">We can make it easier to identify who is infected with a spreadsheet\u2019s conditional formatting capability. This offers the viewer visual cues that make data easier to understand.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Select the entire column C and apply a formatting rule that highlights, with color, cells with a value of 1. Choose font and fill colors that you think emphasize being infected. If needed, search Excel\u2019s Help for \u201cconditional formatting.\u201d<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Now when you scroll down, it is easy to see who is infected. Recalculation changes who is infected\u2014it is as if we rewound and replayed the world with each press of <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F9<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Having implemented the chance process for being infected or not, we turn to pooled testing. Instead of testing each person, we can group individuals and test their combined sample. If the pooled sample tests positive, then we know at least one person is infected; if not, we know no one is infected, and we do not have to test each individual in the group.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Instead of directly choosing the number of groups, it is more convenient to make group size the choice variable. Choosing group size determines how many groups we have, since:<\/p>\n<p>&nbsp;<\/p>\n<p style=\"text-align: center\">[latex]\\text{Number} \\text{of} \\text{Groups} = \\frac{\\text{Population}}{\\text{Group} \\text{Size}}[\/latex]<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">With our population of 1,000 people, a group size of 100 means we will have 10 groups. Intuitively, with an infection rate of 5%, 100 people in a group means that at least 1 person will be infected, and the group is probably going to test positive. We can make our intuition more convincing by computing the exact chances.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Begin by entering 100 in cell D3 and the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Grou<\/em><em class=\"import-i\">p <\/em><em class=\"import-i\">Size<\/em><\/span> next to it in cell E3. Enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=1000\/D<\/em><em class=\"import-i\">3<\/em><\/span> in cell D4 and the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Numbe<\/em><em class=\"import-i\">r <\/em><em class=\"import-i\">o<\/em><em class=\"import-i\">f <\/em><em class=\"import-i\">Groups<\/em><\/span> in cell E4.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">An infection rate of 5% means each person has a 95% chance of not being infected. If there are 2 people (assuming the chances of infection are independent), then there is a 0.95 \u00d7 0.95 = 0.95<span style=\"border: none windowtext 0pt;padding: 0\"><sup class=\"import-sup\">2<\/sup><\/span> = 0.9025, or 90.25%, chance that neither is infected. This means there is a 100% \u2212 90.25% = 9.75% chance that at least 1 of the 2 people is infected.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">What are the chances that at least 1 person is infected in a group of 100 people? Remember, if even 1 person is infected in a group, we have to test everyone in the group to find out who is infected.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell D6, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=(1<\/em><em class=\"import-i\">&#8211;<\/em><em class=\"import-i\">InfectionRate)<\/em><em class=\"import-i\">\u02c6<\/em><em class=\"import-i\">D<\/em><em class=\"import-i\">3<\/em><\/span> and label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">pro<\/em><em class=\"import-i\">b <\/em><em class=\"import-i\">n<\/em><em class=\"import-i\">o <\/em><em class=\"import-i\">on<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">i<\/em><em class=\"import-i\">n <\/em><em class=\"import-i\">th<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">grou<\/em><em class=\"import-i\">p <\/em><em class=\"import-i\">infected<\/em><\/span> in cell E6. Format D6 as a percentage so that it displays 0.59%.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Next, we compute 100% minus the probability that no one in the group is infected to find the probability that at least 1 person is infected.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell D7, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=1<\/em><em class=\"import-i\">&#8211;<\/em><em class=\"import-i\">D<\/em><em class=\"import-i\">6<\/em><\/span> and label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">pro<\/em><em class=\"import-i\">b <\/em><em class=\"import-i\">a<\/em><em class=\"import-i\">t <\/em><em class=\"import-i\">leas<\/em><em class=\"import-i\">t <\/em><em class=\"import-i\">on<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">i<\/em><em class=\"import-i\">n <\/em><em class=\"import-i\">th<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">grou<\/em><em class=\"import-i\">p <\/em><em class=\"import-i\">infected<\/em><\/span> in cell E7. Format D7 as a percentage (if needed).<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">With an infection rate of 5%, doing pooled testing with a group size of 100 is wasteful. After all, it seems overwhelmingly likely (over 99%) that we will have to test everyone in each of the 10 groups, so we would end up doing 1,010 tests.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Can we make our spreadsheet show how many people are infected in each group and confirm the computations we just made? We can, but the approach we adopt uses a function that may be unfamiliar and advanced\u2014the OFFSET reference function. Thus, we proceed slowly.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell G1, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=OFFSET(E1,3,0)<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Cell G1 computes the number of groups because the OFFSET function went to cell E1 (the first argument in the function), then went three rows down (the second argument). The third argument is 0, so it stayed in column E. If the movement arguments are a positive integer, we move down or right; negative integers move us up or left.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Change the formula in cell G1 to <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=SUM(OFFSET(D1,0,0,3,1))<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Why does G1 display D1 plus 100? The two zeroes mean it did not move from the reference cell D1, but the fourth and fifth arguments control the height and width, respectively, of the cell range. Therefore, the formula says to add up the values in cells D1, D2 (which is blank), and D3 (100).<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">We want to add up the values in column C into 10 separate groups of 100 each. We can modify our OFFSET function to do the first group of 100.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Change the formula in cell G1 to <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=SUM(OFFSET(C1,0,0,100,1))<\/em><\/span>. To be clear, change the D1 to C1 and the 3 to 100.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The value reported in cell G1 is the sum of the first 100 people in the population. How can we get the second group of 100 people?<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Change the formula in cell G1 to <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=SUM(OFFSET(<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">C<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">1,0,0,100,1)<\/em><em class=\"import-i\">)<\/em><\/span> and fill it down to cell G2.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Adding the dollar signs made C1 an absolute reference, so we kept our C1 starting point in cell G2, but we need to change the formula so it adds up the number of infected people in the second set of 100. We do that by changing the second argument because it controls how many rows to move from the reference cell.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Change the formula in cell G2 to <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=SUM(OFFSET(<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">C<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">1,100,0,100,1))<\/em><\/span>.<\/p>\n<div class=\"textbox\">\n<p class=\"import-bxt\" style=\"padding-left: 40px\"><span style=\"color: #006838\"><strong><em>EXCEL TIP\u00a0<\/em><\/strong><\/span>Cell G2 reports how many people are infected in the second group of 100. We could fill down eight more cells and then change the second argument manually to 200, 300, and so on, but this is poor spreadsheet practice. <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Neve<\/em><em class=\"import-i\">r <\/em><em class=\"import-i\">manuall<\/em><em class=\"import-i\">y <\/em><em class=\"import-i\">repea<\/em><em class=\"import-i\">t<\/em><\/span> the same entry or an ordered sequence (e.g., numbers or dates). In addition, you want to <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">maximiz<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">fl<\/em><em class=\"import-i\">exibility<\/em><\/span>. Hard-coding numbers, like 100, in formulas is poor practice because you might want to change that number in the future.<\/p>\n<\/div>\n<p class=\"import-p\" style=\"text-indent: 36pt\">In this case, we want our groupings to respond to changes in cell D3. If, for example, we have a group size of 50, we would then have 20 groups. We want the spreadsheet to automatically show how many people are infected in each of the 20 groups.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">This task requires that we modify the second and fourth arguments. The fourth argument is the group size, which is simply cell D3. The second argument is more complicated. It is 0 for the first group, then increases by D3 for each group. One way to do this is to use the ROW function, which returns the row number of a cell.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Replace the formula in cell G2 with <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=ROW(D6)<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Cell G2 displays 6, the row number of cell D6. What happens if the ROW function does not have an argument?<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Change the formula in cell G2 to <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=ROW(<\/em><em class=\"import-i\">)<\/em><\/span> and fill it down to G10.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Without an argument, the ROW function returns the row number of the cell that contains ROW() in the formula. We can use this to create a series that starts at 0 and increases by the amount in cell D3.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Change the formula in cell G2 to <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=(ROW()<\/em><em class=\"import-i\">&#8211;<\/em><em class=\"import-i\">1)*<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">D<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">3<\/em><\/span> and fill it down to G10.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">We can use our ROW function strategy in the OFFSET function\u2019s second argument to create a formula that gives us the number of infected people for any group size from 2 to 500 entered in D3. A \u201cgroup\u201d of 1 is simply individual testing, and with 1,000 people, a group size of 2 yields 500 groups. Choosing a group size of 500 gives us 2 groups.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">We start with G1 (notice that ROW()-1 is zero for G1 so the second argument evaluates to zero) and fill down to G500 (since 500 is the maximum number of groups we can have).<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Change the formula in cell G1 to <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=SUM(OFFSET(<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">C<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">1,(ROW()<\/em><em class=\"import-i\">&#8211;<\/em><em class=\"import-i\">1)*<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">D<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">3,0,<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">D<\/em><em class=\"import-i\">$<\/em><em class=\"import-i\">3,1)<\/em><em class=\"import-i\">)<\/em><\/span> and fill it down to G500.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Cells G1 to G10 now display the number of infected people in each of the 10 groups of 100 people.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Click the letter C in column C to select the entire column, and then click the <span class=\"import-ccust1\">Forma<\/span><span class=\"import-ccust1\">t <\/span><span class=\"import-ccust1\">Painte<\/span><span class=\"import-ccust1\">r<\/span> button (in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Hom<\/em><em class=\"import-i\">e<\/em><\/span> tab in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Ribbon<\/em><\/span>, or top menu). Now click the letter G in column G.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">You applied the formatting in column C, including your conditional formatting to highlight the infected people, to column G. It (probably) shows all the groups highlighted, but it will soon come in handy when we lower the group size so that some groups have no infected people.<\/p>\n<div class=\"textbox\">\n<p class=\"import-bxt\" style=\"padding-left: 40px\"><span style=\"color: #006838\"><strong><em>EXCEL TIP <\/em><\/strong><\/span>It is good practice to <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">includ<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">check<\/em><em class=\"import-i\">s<\/em><\/span> in your spreadsheets. In this case, an easy check is to see if the sum of infected people in the 10 groups equals the total number of infected people in the population in column C.<\/p>\n<\/div>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell H1, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=SUM(G1:G500<\/em><em class=\"import-i\">)<\/em><\/span> and the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">check<\/em><\/span> in cell I1, then recalculate the sheet a few times.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">It is easy to see that cells D1 and H1 are the same. If not, something is wrong, and you will have to go back to each step to find and fix the mistake.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Change cell D3 to 200 and recalculate the sheet a few times.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Now only five cells in column G have nonzero values, representing the number of infected people in each of the five groups.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Group sizes of 100 and 200 are way too big to be the optimal size because we are extremely unlikely to get a group where everyone tests negative, so we almost always have to test everyone in the group. We need to try much smaller group sizes.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Change cell D3 to 20 and recalculate the sheet a few times.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Now we are really getting somewhere. Column G is showing the number of infected people in each of 50 groups. You can see values of 0, 1, 2, 3, and less frequently, higher numbers. We love to see zeroes because they mean we do not have to test anyone in that group, so we saved 20 tests.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">How many tests will we have to run in total? The COUNTIF function allows us to count the number of cells in a range that meet a specific condition.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell H2, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=COUNTIF(G1:G500,\u201c<\/em><\/span>&gt; 0<span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">\u201d<\/em><em class=\"import-i\">)<\/em><\/span> and the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">numbe<\/em><em class=\"import-i\">r <\/em><em class=\"import-i\">o<\/em><em class=\"import-i\">f <\/em><em class=\"import-i\">group<\/em><em class=\"import-i\">s <\/em><em class=\"import-i\">t<\/em><em class=\"import-i\">o <\/em><em class=\"import-i\">test<\/em><\/span> in I2.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The COUNTIF function reports the number of cells in the range G1:G500 that have a value greater than zero. If we multiply this by the group size, we know how many individual tests we have to run. This is added to the number of group tests to give us our total number of tests.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell H3, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=H2*D<\/em><em class=\"import-i\">3<\/em><\/span> and the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">test<\/em><em class=\"import-i\">s <\/em><em class=\"import-i\">fro<\/em><em class=\"import-i\">m <\/em><em class=\"import-i\">infecte<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">groups<\/em><\/span> in I3. In cell H4, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=D4+H<\/em><em class=\"import-i\">3<\/em><\/span> and the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">tota<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">tests<\/em><\/span> in I4.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Notice that once again, we did not hard-code numbers (like 20 for group size) into the formula. We want our spreadsheet to respond to changes in group size (cell D3) automatically.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Cell H4 is certainly giving us good news. \u201cTotal tests\u201d is a random variable that is almost certainly less than 1,000. You are likely to see numbers around 690 tests, give or take 70 or so. This is about a 30% decrease in the number of tests from the 1,000 required by individual testing.<\/p>\n<h3 class=\"import-bh\">Finding the Optimal Group Size<\/h3>\n<p class=\"import-paft\">In our spreadsheet, we have implemented a stochastic (or chance) process of getting infected and demonstrated the power of pooled testing. Grouping allows us to save on testing because when groups have no infected people, we do not have to test those individual samples.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Our spreadsheet shows that a group size of 20 is better than individual testing, but we do not want to do merely better than 1,000 tests. We want to perform the fewest number of tests. Our fundamental question is, <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Wha<\/em><em class=\"import-i\">t <\/em><em class=\"import-i\">i<\/em><em class=\"import-i\">s <\/em><em class=\"import-i\">th<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">optima<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">grou<\/em><em class=\"import-i\">p <\/em><em class=\"import-i\">size<\/em><em class=\"import-i\">?<\/em><\/span><\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">There is a complication that we have to confront to answer our question: \u201cTotal tests\u201d is a random variable. We cannot just look at a single outcome because we know there is chance involved. Suppose two dice are on a table, each showing 1, and I asked you to guess the sum of the next roll. You would not guess 2 because you know that is really unlikely.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">We deal with the fact that \u201cTotal tests\u201d is a random variable by focusing on the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">expecte<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">valu<\/em><em class=\"import-i\">e<\/em><\/span> of total tests. This is what we would typically observe. The best guess for the sum of two dice rolls is 7, the expected value. We need to find the expected value of total tests for a given group size so we can figure out which group size minimizes it.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">There are mathematical rules for computing the expected value, but we will use Monte Carlo simulation. This approach is based on the idea that we can simply run the chance process (throwing two dice or hitting <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F9<\/em><\/span>) many times and directly examine the results. We can compute the average of many repetitions (like rolling dice many times) to give us an approximation to the expected value.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">So we seek the group size that minimizes the expected value of total tests, which we will approximate by simulation. We will run many repetitions (recalculating the sheet repeatedly) and keep track of the total number of tests to see how many total tests we can expect to run as we vary the group size.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">While there are many simulation add-ins available for Excel, we can easily run a simulation using Excel\u2019s <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Dat<\/em><em class=\"import-i\">a <\/em><em class=\"import-i\">Tabl<\/em><em class=\"import-i\">e<\/em><\/span> tool. It was designed not to run a simulation but to display multiple outcomes as inputs vary. To do this, it recalculates the sheet, which enables us to perform Monte Carlo analysis.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell L1, enter the number 1, and enter 2 in cell L2. Select both cells and fill down to row 400 so that you have a series from 1 to 400 in column L.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Next, we provide the cell that we wish to track: total tests.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell M1, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=H4<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">We are now ready to create the Data Table.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Select the cell range L1:M400, click the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Dat<\/em><em class=\"import-i\">a<\/em><\/span> tab in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Ribbon<\/em><\/span>, click <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">What-I<\/em><em class=\"import-i\">f <\/em><em class=\"import-i\">Analysi<\/em><em class=\"import-i\">s<\/em><\/span> in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Forecas<\/em><em class=\"import-i\">t<\/em><\/span> group, and select <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Dat<\/em><em class=\"import-i\">a <\/em><em class=\"import-i\">Table<\/em><em class=\"import-i\">\u00a0.\u00a0.\u00a0. <\/em><\/span>A keyboard shortcut is <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Alt<\/em><em class=\"import-i\">-a-w-t<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Excel pops up the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Dat<\/em><em class=\"import-i\">a <\/em><em class=\"import-i\">Tabl<\/em><em class=\"import-i\">e<\/em><\/span> input box.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Click in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">colum<\/em><em class=\"import-i\">n <\/em><em class=\"import-i\">inpu<\/em><em class=\"import-i\">t <\/em><em class=\"import-i\">cel<\/em><em class=\"import-i\">l<\/em><\/span> field, click on cell K1, and click <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">OK<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Clicking on an empty cell would be meaningless if we were using the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Dat<\/em><em class=\"import-i\">a <\/em><em class=\"import-i\">Tabl<\/em><em class=\"import-i\">e<\/em><\/span> tool for its intended purpose, which is to show how an input cell affects a formula in another cell. All we want, however, is for Excel to recalculate the sheet and show us the total tests for that newly recalculated population in column C.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The display in column M shows 400 repetitions of hitting <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> and keeping track of total tests. This is exactly what we want because now we can take the average of the total tests\u2019 values to approximate the expected number of total tests when the group size is 20.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">But before we do this, let\u2019s be clear about what a Data Table is actually doing. Be aware in the next step, however, that if you double-click on a cell in column M or click in the formula bar, you might get trapped in a cell. <span style=\"border: none windowtext 0pt;padding: 0\"><strong class=\"import-b\">I<\/strong><strong class=\"import-b\">f <\/strong><strong class=\"import-b\">yo<\/strong><strong class=\"import-b\">u <\/strong><strong class=\"import-b\">ge<\/strong><strong class=\"import-b\">t <\/strong><strong class=\"import-b\">stuck<\/strong><strong class=\"import-b\">, <\/strong><strong class=\"import-b\">pres<\/strong><strong class=\"import-b\">s <\/strong><strong class=\"import-b\">th<\/strong><strong class=\"import-b\">e <\/strong><\/span><span style=\"border: none windowtext 0pt;padding: 0\"><strong class=\"import-bi\"><em>Esc<\/em><\/strong><\/span> <span style=\"border: none windowtext 0pt;padding: 0\"><strong class=\"import-b\">(escape<\/strong><strong class=\"import-b\">) <\/strong><strong class=\"import-b\">ke<\/strong><strong class=\"import-b\">y <\/strong><\/span><span style=\"border: none windowtext 0pt;padding: 0\"><strong class=\"import-b\">t<\/strong><strong class=\"import-b\">o <\/strong><strong class=\"import-b\">ge<\/strong><strong class=\"import-b\">t <\/strong><strong class=\"import-b\">out<\/strong><strong class=\"import-b\">.<\/strong><\/span><\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Click on a few cells from M2 to M400 to see that they have an <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">arra<\/em><em class=\"import-i\">y <\/em><em class=\"import-i\">formula<\/em><\/span>: { <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=TABLE(,K1)<\/em><\/span>}.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Excel has a friendly front end via <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Data<\/em><em class=\"import-i\">: <\/em><em class=\"import-i\">What-I<\/em><em class=\"import-i\">f <\/em><em class=\"import-i\">Analysis<\/em><em class=\"import-i\">: <\/em><em class=\"import-i\">Dat<\/em><em class=\"import-i\">a <\/em><em class=\"import-i\">Table<\/em><em class=\"import-i\">\u00a0.\u00a0.\u00a0. <\/em><\/span>to create an array formula (indicated by the curly brackets, {}) that can display multiple outputs. You cannot change or delete an individual cell in the range M2:M400. They are, in a sense, a single unit sharing the same formula.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">You might also notice that the sheet is much slower as we enter formulas or press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F9<\/em><\/span>. This is due to the Data Table. Excel now has many more cells to recalculate and evaluate. We could do many more repetitions (usually simulations have tens of thousands of repetitions), but the delay in recalculation is not worth it. With 400 repetitions, the approximation is good enough for our purposes.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell N1, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=AVERAGE(M1:M400<\/em><em class=\"import-i\">)<\/em><\/span> and the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">approximat<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">expecte<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">valu<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">o<\/em><em class=\"import-i\">f <\/em><em class=\"import-i\">tota<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">tests<\/em><\/span> in cell O1.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Cell N1 is our simulation\u2019s approximation to what we want to minimize. It gives us a handle on the center of the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">samplin<\/em><em class=\"import-i\">g <\/em><em class=\"import-i\">distributio<\/em><em class=\"import-i\">n<\/em><\/span> of the statistic \u201cTotal tests.\u201d A statistic is a recipe for what to do with observations (in this case, given by the formula in cell H4). If we make a <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">histogra<\/em><em class=\"import-i\">m<\/em><\/span> of the data in column M, we get an approximation to the sampling distribution of total tests.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Excel 2016 or greater is needed to make the histogram chart. This is not the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Histogra<\/em><em class=\"import-i\">m<\/em><\/span> option in the Data Analysis add-in (from the Analysis Tool-Pak). The histogram chart allows for dynamic updating and is a marked improvement over the histogram in the Data Analysis add-in.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Select cell range M1:M400, click the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Inser<\/em><em class=\"import-i\">t<\/em><\/span> tab in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Ribbon<\/em><\/span>, and select <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Histogra<\/em><em class=\"import-i\">m<\/em><\/span> from the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Chart<\/em><em class=\"import-i\">s<\/em><\/span> group. It is in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Statisti<\/em><em class=\"import-i\">c<\/em><\/span> chart group and, of course, in the collection of all charts (available by clicking the bottom-right corner square in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Chart<\/em><em class=\"import-i\">s<\/em><\/span> group).<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The default bin widths are a little too big, but they are easy to adjust.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Double-click the chart\u2019s <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">x<\/em><\/span>-axis, and in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Axi<\/em><em class=\"import-i\">s <\/em><em class=\"import-i\">Options<\/em><\/span>, set the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Bi<\/em><em class=\"import-i\">n <\/em><em class=\"import-i\">Widt<\/em><em class=\"import-i\">h<\/em><\/span> to 20. Make the title \u201cApproximate Sampling Distribution of Total Tests.\u201d<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The chart is an approximation because it is based on only 400 repetitions. The exact sampling distribution of total tests would require an infinite number of repetitions. We can never get the exact sampling distribution or the exact expected value via simulation, but the more repetitions we do, the better the approximation.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Even with just 400 realizations of total tests, the graph looks a lot like the classic bell-shaped distribution of the normal (or Gaussian) curve. The center is the expected value of the total tests we will have with a group size of 20, and the dispersion in total tests is measured by its <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">standar<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">error<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell N2, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=STDEV.P(M1:M400<\/em><em class=\"import-i\">)<\/em><\/span> and the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">approximat<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">standar<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">erro<\/em><em class=\"import-i\">r <\/em><em class=\"import-i\">o<\/em><em class=\"import-i\">f <\/em><em class=\"import-i\">tota<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">tests<\/em><\/span> in cell O2.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The standard error of total tests tells us the variability in total tests. It is a measure of the size of the typical bounce in total tests.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F9<\/em><\/span> a few times and watch cell H4.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Cell H4 is bouncing. It is centered around 690 and jumps by roughly plus or minus 70 total tests as you hit <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F9<\/em><\/span>. You can also scroll up and down column M to see that the \u201cTotal tests\u201d numbers are around 690 \u00b1 70.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Simulation cannot give us the exact standard error, but the standard deviation of our 400 realizations of total tests is a good approximation of the standard error.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">There are many ways to be confused here. One of them is to fixate on the computation of the standard deviation. We used STDEV.P (for population) instead of STDEV.S (for sample) because we are not using the standard deviation to estimate the population standard deviation, so we do not need to make a correction for degrees of freedom. Although the population standard deviation is correct, this makes almost no difference with 400 numbers.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell N3, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=STDEV.S(M1:M400<\/em><em class=\"import-i\">)<\/em><\/span> and compare the result to cell N2.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The emphasis on population versus sample standard deviation in many Statistics courses is only relevant for small sample sizes\u2014say, fewer than 30 observations. As the number of observations rises, the two grow ever closer.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">To summarize, cells N1 and N2 tell us that we can expect to perform about 690 total tests, give or take roughly 70 tests. These are the numbers reported at the end of the previous section. This is for a group size of 20. Can we do better? We get to choose the group size, so we should explore how the expected number of total tests responds as we vary the group size.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Change the group size (in cell D3) to 10 and press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> a few times. Which specific cell should you focus on, and what do you conclude?<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The cell we care about the most is cell N1 because it tells us (approximately) the expected number of tests we will have to run. Cell N1 is reporting good news. We can expect to perform about 500 \u00b1 50 total tests. That beats the group size of 20 by almost 200 tests, on average, and is a large savings of a half versus 1,000 individual tests.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Why does a group size of 10 do better than 20? In various cells of the spreadsheet, there is evidence of what is happening. Lowering the group size from 20 to 10 increased the number of group tests from 50 to 100 (see cell D4), but the number of infected groups only went up a little bit (from roughly 32 to 40), and the groups are now much smaller. This is where the big savings are\u2014instead of 32 \u00d7 20 = 640 tests, we only have to run, on average, 40 \u00d7 10 = 400 tests with a group size of 10.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Confirm the claims about group size in the paragraph above by switching back and forth from 10 to 20 in cell D3. Notice how the other cells (especially H4) and the chart react to D3.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Spreadsheets are powerful because they can display a lot of information and dynamically update when you make changes. Your job is to make comparisons and process the information.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Can we do even better than a group size of 10?<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Change the group size (in cell D3) to 5.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Amazing! The number in cell N1 fell again. The expected number of total tests is now about 425 (426.22 is a more exact answer, found by analytical methods), give or take roughly 30 tests. That is a gain of almost 60% versus individual testing. Pooled testing saves a lot of tests compared to individual testing.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">As before, the number of groups we have to test has risen (this time to 200), but many groups are found to be uninfected. Cell D6 reports a 77.4% chance that no one in a 5-person group will be infected. Thus, even though we test more groups, we more than make up for this because many groups test negative, saving us the need to test 5 people in the group.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The group size of 5 is, in fact, the optimal solution and answer to our question. Figure 3.10 reveals that we traveled down the expected number of total tests curve as we changed the group size from 20 to 10 and finally 5.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Figure 3.10 makes it easy to see that a group size of 5 is the minimum for the expected number of total tests curve, but it also reveals the trade-off involved. The two curves are added up to produce the top, total curve. At 10, we test 100 groups (the bottom curve) and we add that to 400 (the expected number of tests from positive groups), and this gives us 500 (the top curve).<\/p>\n<figure style=\"width: 856px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/back-matter\/alt-text-long-description\/#:~:text=Figure%203.10%3A%20Line,5.0%25%20infection%20rate.\"><img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p73-1.png\" alt=\"Plot of the optimal solution for the pooled testing optimization problem. Long description linked from image.\" width=\"856\" height=\"515\" \/><\/a><figcaption class=\"wp-caption-text\"><strong>Figure 3.10: The expected number of tests as a function of group size.<\/strong><\/figcaption><\/figure>\n<p class=\"import-p\" style=\"text-indent: 36pt\">When we moved from 10 to 5, we added 100 tests (the bottom curve) but saved about 175 tests (the middle curve), lowering our expected total tests from 500 to 425. We cannot do any better than 425 total tests. Further reductions in group size will increase total tests.<\/p>\n<h3 class=\"import-bh\">Comparative Statics Analysis<\/h3>\n<p class=\"import-paft\">We can ask another question that again shows off the power of spreadsheets: What happens if the infection rate changes\u2014say, to 1%? What would be the optimal group size?<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">This kind of question is called <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">comparativ<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">static<\/em><em class=\"import-i\">s <\/em><em class=\"import-i\">analysi<\/em><em class=\"import-i\">s<\/em><\/span> because we want to know how our solution responds to a shock. We want to compare our initial optimal group size of five when the infection rate was 5%, to the new solution when the infection rate is 1%. This comparison reveals how the shock (changing the infection rate) affects the optimal response (group size).<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Change cell A1 to 1%, then use the spreadsheet to find the optimal group size. What group size would you recommend? Why?<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">You may have struggled with this because it turns out that the total tests curve is rather flat at its minimum. Thus, a simulation with 400 repetitions does not have the resolution to distinguish between group sizes in the range from 8 to 14 or so. Figure 3.11 makes this clear.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The exact answer for the optimal group size is, in fact, 11 groups. It has an expected number of total tests of 195.57 (again, using analytical methods). Choosing group sizes of 10 or 12 leads to a slightly higher number of total tests\u2014although it is impossible to see this in Figure 3.11.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">An infection rate of 1% shows simulation may not be an effective solution strategy for every problem. Of course, you could create a Data Table with more repetitions, but using simulation to distinguish between group sizes of 10 and 11 requires a Data Table so large that Excel would be unresponsive.<\/p>\n<figure style=\"width: 856px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/back-matter\/alt-text-long-description\/#:~:text=Figure%203.11%3A%20Line,1.0%25%20infection%20rate.\"><img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p74-1.png\" alt=\"Plot of the optimal solution with lower infection rate. Long description linked from image.\" width=\"856\" height=\"513\" \/><\/a><figcaption class=\"wp-caption-text\"><strong>Figure 3.11: Optimal group size with a 1% infection rate.<\/strong><\/figcaption><\/figure>\n<p class=\"import-p\" style=\"text-indent: 36pt\">As mentioned earlier, there are many Excel Monte Carlo simulation add-ins, and they can do millions of repetitions. We used MCSim in earlier work in section 3.2.. Even if we ran enough repetitions to see that 11 is the optimal solution, you should remember that simulation will never give you an exact result because it can never do an infinity of repetitions.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The good news is that any group size around 10 is going to be a little under 200, which is an 80% improvement over individual testing. There is no doubt about it\u2014pooled testing can be a smart, effective way to reduce the number of total tests performed.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Our comparative statics analysis tells us that the lower the infection rate (from 5% to 1%), the bigger the optimal group size (from 5 to 11) and the greater the savings from pooled testing versus individual testing (from about 675 to 800 tests).<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Finally, if you carefully compare Figures 3.10 and 3.11, you will see that the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">#Group<\/em><em class=\"import-i\">s <\/em><em class=\"import-i\">Teste<\/em><em class=\"import-i\">d<\/em><\/span> curve (a rectangular hyperbola, since the numerator is constant at 1,000) stays the same in both graphs. Changing the infection rate shifts down the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">E[#Po<\/em><em class=\"import-i\">s <\/em><em class=\"import-i\">Grou<\/em><em class=\"import-i\">p <\/em><em class=\"import-i\">Tests<\/em><em class=\"import-i\">]<\/em><\/span> relationship, and this brings down and alters the shape of the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">E[#Tota<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">Tests<\/em><em class=\"import-i\">]<\/em><\/span> curve.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Comparative statics analysis shows that pooled testing is more effective when the infection rate falls. A lower infection rate means we can have bigger groups, yet they may still have no infected individuals in them.<\/p>\n<div class=\"textbox textbox--key-takeaways\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Takeaways<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p class=\"import-paft\">Pooled testing means you combine individual samples. A negative test of the pooled sample saves on testing because you know all the individuals in the group are not infected.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">There is an optimization problem here: Too big a group size means someone will be infected, so you have to test everyone in the group, but too small a group size means too many group tests. The sweet spot minimizes the total number of tests.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The optimal group size depends on the infection rate. The smaller the rate, the bigger the optimal group size.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">By creating this spreadsheet, you have improved your Excel skills and confidence in using spreadsheets. You have added to your stock of knowledge that will help you next time you work with a spreadsheet.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The OFFSET function is really powerful, but it is difficult to understand and apply.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The Data Table is meant for what-if analysis, but it can be used as a simple Monte Carlo simulation tool. Each press of <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> recalculates the sheet and the Data Table.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">You also learned or reinforced a great deal of statistical and economics concepts. Economics has a toolkit that gets used over and over again\u2014look for similar concepts in future models and courses. Try to spot the patterns and repeated logic. Although it may not be explicitly stated, getting you to think like an economist is a fundamental goal of almost every Econ course.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Reference was made several times to the analytical solution. This was not shown because the math is somewhat advanced. To see it, download the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">PooledTesting.xlsx<\/em><\/span> file from <a href=\"http:\/\/dub.sh\/gbae\">dub.sh\/gbae<\/a> and go to the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Analytical<\/em><\/span> sheet.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">One methodology issue that is easy to forget but crucial is that we made many simplifying assumptions in our implementation of the data generation process. There may be other factors at play in the spread of COVID-19 or how tests actually work that affect the efficacy of pooling. Spatial connection was mentioned as something that would violate the independence assumed in our implementation. Another complication is that \u201ca positive specimen can only get diluted so much before the coronavirus becomes undetectable. That means pooling will miss some people who harbor very low amounts of the virus\u201d (Wu, 2020).<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Our results apply to an imaginary, perfect world, not the real world. We need to be careful in moving from theory to reality. This requires both art and science.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The introduction cited Robert Dorfman as writing a paper on pooled testing back in 1943. It is a clever idea that you now understand can be used to greatly reduce the number of tests, which saves a lot of resources. Perhaps you will not be surprised to hear that Robert Dorfman was an economist.<\/p>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">References<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p class=\"hanging-indent\">Dorfman, R. (1943). \u201cThe Detection of Defective Members of Large Populations.\u201d <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Ann<\/em><em class=\"import-i\">. <\/em><em class=\"import-i\">Math<\/em><em class=\"import-i\">. <\/em><em class=\"import-i\">Statist<\/em><em class=\"import-i\">.<\/em><\/span> 14, no. 4, pp. 436\u2013440, <a href=\"http:\/\/projecteuclid.org\/euclid.aoms\/1177731363\"><span style=\"border: none windowtext 0pt;padding: 0\"><span class=\"import-url\">projecteuc<\/span><span class=\"import-url\">lid.org\/<\/span><span class=\"import-url\">euclid.aoms<\/span><span class=\"import-url\">\/1177731363<\/span><\/span><\/a>.<\/p>\n<p class=\"hanging-indent\">Mandavilli, A. (2020). \u201cFederal Officials Turn to a New Testing Strategy as Infections Surge.\u201d <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Th<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">Ne<\/em><em class=\"import-i\">w <\/em><em class=\"import-i\">Yor<\/em><em class=\"import-i\">k <\/em><em class=\"import-i\">Times<\/em><\/span>, July 1, 2020, <a href=\"http:\/\/projecteuclid.org\/euclid.aoms\/1177731363\"><span style=\"border: none windowtext 0pt;padding: 0\"><span class=\"import-url\">www.nytimes.com\/2020\/07\/01\/health\/<\/span><span class=\"import-url\">coronavirus-pooled-testing.html<\/span><\/span><\/a>.<\/p>\n<p class=\"hanging-indent\">Wu, K. (2020). \u201cWhy Pooled Testing for the Coronavirus Isn\u2019t Working in America.\u201d <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Th<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">Ne<\/em><em class=\"import-i\">w <\/em><em class=\"import-i\">Yor<\/em><em class=\"import-i\">k <\/em><em class=\"import-i\">Times<\/em><\/span>, August 18, 2020, <a href=\"http:\/\/www.nytimes.com\/2020\/08\/18\/health\/coronavirus-pool-testing.html\"><span style=\"border: none windowtext 0pt;padding: 0\"><span class=\"import-url\">www.nytimes.com\/2020\/08\/18\/health\/coronavirus-pool-<\/span><span class=\"import-url\">testing.html<\/span><\/span><\/a>.<\/p>\n<\/div>\n<\/div>\n<h2 class=\"import-ah\">3.4 Search Theory Simulation<\/h2>\n<p class=\"import-paft\">You want to buy something that many stores sell, but they charge different prices. Suppose that you cannot just google it to find the lowest price. Maybe you are at a huge farmer&#8217;s market, and there are lots of vendors selling green beans. They are all the same, but the prices are different. How do you decide where to buy?<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Believe it or not, this problem has been extensively studied. It is part of <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">searc<\/em><em class=\"import-i\">h <\/em><em class=\"import-i\">theor<\/em><em class=\"import-i\">y<\/em><\/span> and has produced several Nobel Prize winners in Economics. It also has a long history in mathematics, where it is known as an optimal stopping problem.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">There are many different search scenarios and models. For example, you could be deciding which job to take. Once you pass on an offer, you cannot go back (this is called <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">sequentia<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">search<\/em><\/span>). Or you could be involved in some complicated game with asymmetric information, where one agent\u2014say, the seller of a house\u2014has more knowledge about the house than the potential buyers.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Fortunately, your green bean search problem is straightforward. The green beans are <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">homogeneou<\/em><em class=\"import-i\">s<\/em><\/span> (exactly alike), and you can gather as many prices as you want, then choose the cheapest one. The catch is that it is costly to search\u2014<span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">searc<\/em><em class=\"import-i\">h<\/em><\/span> is just another way of saying \u201cgather prices,\u201d but there are search costs.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">If searches were costless, then the problem would be trivial\u2014simply get all the prices and buy the cheapest one. The problem becomes interesting when collecting price information takes effort and time. In that case, you can search too little (so you would have found a much lower price with more searching) or search too much (so the slightly lower price you found was not worth it). You are facing an optimization problem!<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">We will set up and solve this optimization problem in Excel. We will use the Monte Carlo simulation add-in to explore how our total cost changes as we vary the amount we search. We will then do comparative statics analysis to see how the optimal solution responds when we shock the model.<\/p>\n<h3 class=\"import-bh\">Setting Up the Problem<\/h3>\n<p class=\"import-paft\">First, we create a population of prices.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Open a blank Excel workbook and name it <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Search.xlsx<\/em><\/span>. Name the sheet <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">DG<\/em><em class=\"import-i\">P<\/em><\/span> (for data generation process). In cell A1, enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Price<\/em><\/span>. In cell A2, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=RAND()<\/em><\/span>. Fill down to cell A101.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">You now have 100 prices on your spreadsheet ranging from zero to one. Our target is the lowest price. We can easily find it with the MIN function.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Enter the label <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Minimum<\/em><\/span> in cell B1 and the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=MIN(A2:A101<\/em><em class=\"import-i\">)<\/em><\/span> in cell B2. Scroll down until you find that lowest price. Press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> to get a new set of prices and a new minimum. Each time you press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F9<\/em><\/span>, it is like a new day at the farmers market and the vendors have all changed their prices.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The cheapest price is close to zero (since RAND goes from zero to one), and it can be anywhere in the list of 100 numbers. As a buyer, you will not, however, do what you just did and simply enter a formula that yields the minimum price, because we assume that you cannot see the prices until you visit the store. You have to search to reveal each price.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Let\u2019s suppose that each search will cost you 0.04 monetary units. This is an exogenous variable. The 100 prices are also outside of your control. Your endogenous, or choice, variable is how many prices to reveal.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Your goal is to minimize the total cost of purchase, composed of the price you pay plus the search costs. The more you search, the lower the price you pay the vendor but the higher the costs of the search. You have to balance these two opposing forces.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Your spreadsheet is like a card game. Pressing <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> is like shuffling 100 cards. You want the lowest-numbered card in the deck. A search is like flipping a card over, but it costs you 0.04 for each card you reveal. What is the best number of cards to flip over? To answer this key question, we proceed slowly.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Suppose you decide to search just once. This has the advantage of the lowest search costs possible but the disadvantage that you will only get one price. How will you do if you adopt this strategy?<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell C1, enter a 1 (this represents how many prices you gather), and in cell C2, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=A2<\/em><\/span>. In cell C3, enter 0.04 (this is the cost of your single search). In cell C4, we add the two cells above it together, so enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=C2+C3<\/em><\/span>. Press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> a few times.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Each time you press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F9<\/em><\/span>, you get a new price in cell C2 (because the prices all change) and a new total cost in cell C4. Sometimes you do pretty well, close to 0, but sometimes you end up near 1, which is not good. But how can we know how you will usually do? How you do on average, not just in a single outcome, is how we evaluate the results of chance processes.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Monte Carlo simulation can tell us the typical result. We will use the MCSim add-in to run our simulation in Excel. If needed, download and install MCSim from <a href=\"http:\/\/dub.sh\/addins\">dub.sh\/addins<\/a>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Run a simulation that tracks cell C4 with 10,000 repetitions by clicking <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCSi<\/em><em class=\"import-i\">m<\/em><\/span> in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Add-in<\/em><em class=\"import-i\">s<\/em><\/span> tab, entering C4 in the <a id=\"_Hlk191054204\"><\/a><span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Selec<\/em><em class=\"import-i\">t a <\/em><em class=\"import-i\">cel<\/em><em class=\"import-i\">l<\/em><\/span> input box, adding a 0 to the default 1,000 repetitions, and clicking <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Proceed<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Your results show an average around 0.54. This is what you can expect to usually pay, in total, for your green beans. This makes sense, since the average of RAND is 0.5 and you have to pay 0.04 for one search. Notice that the simulation values are not normally distributed, with a bell shape. Instead, you are equally likely to do really well (low total cost), badly (around 1), or somewhere in the middle (around 0.5).<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The expected value of 0.54 is the number we use to convey the performance of the search-and-buy-at-one-store strategy. Of course, it does not matter if you pick the first store (in cell A2). You could pick any one of the 100 stores and get the same simulation results because each press of <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> puts up new random prices for all the stores, just like reshuffling a deck of cards.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>It is easy to confirm this by changing the formula in cell C2 to <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=A20<\/em><\/span> or <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=A54<\/em><\/span> or any other cell from A3 to A101 and tracking cell C4 in a new simulation. Your results are substantially (but not exactly) the same as the simulation with <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=A<\/em><em class=\"import-i\">2<\/em><\/span> in cell C2.<\/p>\n<h3 class=\"import-bh\">Finding the Optimal Number of Searches<\/h3>\n<p class=\"import-paft\">What happens to your total cost of buying green beans if we search more than once? Let\u2019s try 5 searches.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell D1, enter a 5 (this represents gathering prices from 5 vendors), and in cell D2, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=MIN(A2:A6)<\/em><\/span>. This formula shows the lowest price in your sample from 5 stores, which is the one we would buy. As mentioned, you could pick any set of 5 stores, and you would get the same result. In cell D3, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=0.04*<\/em><em class=\"import-i\">5<\/em><\/span> (0.2 is the cost of 5 searches). In cell D4, we add the two cells above it together, so enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=D2+D3<\/em><\/span>. Press <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> a few times.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Each press of <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">F<\/em><em class=\"import-i\">9<\/em><\/span> gives a single outcome, or realization, of the chance process. Sometimes you get lucky and get a low price, other times not. Notice that the total cost (in cell D4) is the sum of the lowest price and 0.2 (the cost of searching).<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Do you think 5 searches are better than 1? We cannot answer this question by looking at cells C4 and D4 because they show just one realization. We need to compare the typical result of these two strategies. We know the expected value of the total cost of 1 search is 0.54. What is the typical result of 5 searches?<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Use the MCSim add-in to track cell D4. What do your results show?<\/p>\n<figure style=\"width: 671px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/back-matter\/alt-text-long-description\/#:~:text=Figure%203.12%3A%20Monte,as%20values%20increase.\"><img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p80-1.png\" alt=\"Monte Carlo simulation output page including summary statistics, notes, and histogram. Long description linked from image.\" width=\"671\" height=\"566\" \/><\/a><figcaption class=\"wp-caption-text\"><strong>Figure 3.12: Total costs when gathering five prices.<\/strong><\/figcaption><\/figure>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Your results should be similar to Figure 3.12. These simulation results tell us that <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em> <\/span>= 5 is better than <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em> <\/span>= 1 because the typical result for 5 searches (approximated by the average of 10,000 repetitions) is around 0.37, which is much less than 0.54 (a roughly 30% decrease).<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Maybe more searches are even better? How do 10 searches compare to 5?<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Run a simulation of 10 searches by setting up a 10-search scenario on the spreadsheet (in column E) and running a simulation. Try to figure it out first, but check the appendix, if needed, for more detailed help.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Your work shows that <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em><\/span> = 10 is much worse than <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em><\/span> = 5. How about <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em><\/span> = 4?<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>In cell F1, enter a 4, and in cell F2, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=MIN(A2:A5)<\/em><\/span>. In cell F3, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=0.04*<\/em><em class=\"import-i\">4<\/em><\/span> (0.16 is the cost of 4 searches). In cell F4, we add the two cells above it together, so enter <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=F2+F3<\/em><\/span>. Use the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCSi<\/em><em class=\"import-i\">m<\/em><\/span> add-in to directly compare cells D4 and F4 by putting D4 in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Selec<\/em><em class=\"import-i\">t a <\/em><em class=\"import-i\">cel<\/em><em class=\"import-i\">l<\/em><\/span> input box and F4 in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Selec<\/em><em class=\"import-i\">t a <\/em><em class=\"import-i\">secon<\/em><em class=\"import-i\">d <\/em><em class=\"import-i\">cel<\/em><em class=\"import-i\">l<\/em><\/span> input box, then click <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Proceed<\/em><\/span>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Your results should show a close race. In fact, it is so close that we need to improve the resolution of the sim by increasing the number of repetitions.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Track cells D4 and F4 again, but this time with 100,000 repetitions. This will take 10 times longer than the last sim.<\/p>\n<figure style=\"width: 728px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/back-matter\/alt-text-long-description\/#:~:text=Figure%203.13%3A%20Monte,candidate%20group%20sizes.\"><img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p81-1.png\" alt=\"Histogram of $D$4 and DGP!$F$4. Long description linked from image.\" width=\"728\" height=\"568\" \/><\/a><figcaption class=\"wp-caption-text\"><strong>Figure 3.13: Four searches are slightly better than five.<\/strong><\/figcaption><\/figure>\n<p class=\"import-p\" style=\"text-indent: 36pt\">It is still quite close, but as shown in Figure 3.13, you will get a slightly lower sim average of 0.361 or so with <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em><\/span> = 4 than the sim average of about 0.367 with <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em><\/span> = 5. In fact, it can be shown with analytical methods that <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em><\/span> = 4 is the optimal solution. To see the math involved, download <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Search.xls<\/em><em class=\"import-i\">x<\/em><\/span> from <a href=\"http:\/\/dub.sh\/gbae\">dub.sh\/gbae<\/a>.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Let\u2019s step back and think about what you have done. It took some work, but you used simulation to explore the U-shaped curve in Figure 3.14. It plots the exact expected value as the search increases from 1 to 10. The minimum, the answer to what you should do, is found at <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em><\/span> = 4.<\/p>\n<figure style=\"width: 923px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/back-matter\/alt-text-long-description\/#:~:text=Figure%203.14%3A%20Combined,of%20c%3D0.04.\"><img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p82-1.png\" alt=\"Table and Chart of Expected Value of Total Cost. Long description linked from image.\" width=\"923\" height=\"317\" \/><\/a><figcaption class=\"wp-caption-text\"><strong>Figure 3.14: Total costs are U-shaped with a minimum at n = 4.<\/strong><\/figcaption><\/figure>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Notice that 1 and 10 searches both yield high total costs, but for different reasons. With <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em><\/span> = 1, you only pay 0.04 to search, but your usual purchase price is around 0.5. By searching 10 times, you lower the purchase price a lot (you are likely to find a seller with a low price, typically around 0.091), but you have to pay 0.4 in search costs.<\/p>\n<h3 class=\"import-bh\">Comparative Statics<\/h3>\n<p class=\"import-paft\">An interesting shock to this model involves the cost of searching. What if something happened, like the internet, that lowered search costs? Instead of having to visit each store to find out the price, you can go to their web page and see the price. This makes searching much easier and cheaper. How would your search behavior respond to this shock?<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Suppose the per-unit cost of searching fell from 0.04 to 0.01. What effect would that have on the optimal number of searches?<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\"><span style=\"color: #e60000\"><em class=\"import-hemb-i\">STEP<\/em> <\/span>Copy the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">DG<\/em><em class=\"import-i\">P<\/em><\/span> sheet and rename it <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">DGPLowerCost<\/em><\/span>. Change row 4 in columns C to F to reflect the new <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">c<\/em><\/span> = 0.01. Use the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">MCSi<\/em><em class=\"import-i\">m<\/em><\/span> add-in to find the new optimal number of searches. You can check your work (or get a few hints) using the discussion that follows, but try to do it yourself first.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The first thing to realize when search costs fall from 0.04 to 0.01 is that total costs are going to be lower for all search values. Instead of 0.54 for one search, the expected value of total costs is 0.51 when <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">c<\/em><\/span> = 0.01. For <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em><\/span> = 4, the expected total cost falls from 0.36 to 0.24. Notice that costs fall by more the more you search.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">If you actually tried to run simulations for different values of <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em><\/span>, you might be confused by how close the results ended up being. Because of this, simulation is going to have trouble finding the exact answer. Figure 3.15 explains what is going on.<\/p>\n<figure style=\"width: 923px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/back-matter\/alt-text-long-description\/#:~:text=Figure%203.15%3A%20Combined,optimal%20search%20theory.\"><img src=\"http:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-content\/uploads\/sites\/73\/2025\/05\/GatewayBA-p83-1.png\" alt=\"Table and Chart of Expected Value of Total Cost with two different numbers of searches. Long description linked from image.\" width=\"923\" height=\"434\" \/><\/a><figcaption class=\"wp-caption-text\"><strong>Figure 3.15: Comparative statics: Shocking the per-unit search cost.<\/strong><\/figcaption><\/figure>\n<p class=\"import-p\" style=\"text-indent: 36pt\">With <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">c<\/em><\/span> = 0.01, the expected value of the total cost curve has a minimum at <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em><\/span> = 9, so this is the exactly correct answer, but notice how flat the curve is around that minimum. If your answer was 8 or 10, you missed by only 0.001.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Simulation struggles to get an exact answer because the total cost function is so shallow. You would have to run millions of repetitions to identify the exact minimum solution at <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em><\/span> = 9.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">However, simulation does give you the correct answer in the sense that the number of searches goes up as the cost of searching falls. This key result makes sense, since you will take advantage of cheaper search costs by searching more.<\/p>\n<h3 class=\"import-bh\">Simulation Versus Analytical Methods<\/h3>\n<p class=\"import-paft\">Figures 3.14 and 3.15 show the exact expected value of the total cost. As mentioned earlier, if you are interested, you can download <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Search.xls<\/em><em class=\"import-i\">x<\/em><\/span> from <a href=\"http:\/\/dub.sh\/gbae\">dub.sh\/gbae<\/a> to see how these analytical results were derived.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">You might wonder why we used simulation when analytical methods give us an exact answer. There are two reasons. First, by implementing the problem in Excel, we get a deep, clear understanding of the role of randomness in this problem. It is one thing to say that prices are random, but seeing them bounce on the screen really conveys the data generation process.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Simulation is often helpful in understanding a problem because it requires building a model that reflects core components of a real-world scenario. This often enables a richer, fuller grasp of the forces at play.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">The second reason for using simulation is that we have another independent method that is confirming the analytical solution. The averages in Figures 3.12 and 3.13 are very close to their expected values. We can be sure that we have found the right answer when both methods agree. And if they do not agree, we are alerted to a potential error in one of the methods.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Neither approach is foolproof. Simulation\u2019s main drawback is that it cannot give an exact answer. In addition, sometimes so many repetitions are needed to obtain a clear result that it is impractical.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">But analytical methods using equations, algebra, and calculus are not perfect either. Sometimes, there is no way to derive the answer, and simulation is all we can do. Other times, the analytical method fails disastrously and gives us an incorrect answer. Simulation helps us avoid that trap.<\/p>\n<div class=\"textbox textbox--key-takeaways\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Takeaways<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p class=\"import-paft\">Economists believe in the <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">la<\/em><em class=\"import-i\">w <\/em><em class=\"import-i\">o<\/em><em class=\"import-i\">f <\/em><em class=\"import-i\">on<\/em><em class=\"import-i\">e <\/em><em class=\"import-i\">price<\/em><\/span>, the idea that competition makes prices converge. But this only applies in a frictionless world of perfect information. In our model, if <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">c<\/em><\/span> = 0, you simply get all the prices and pick the lowest one. In such a world, there would be no price dispersion, since everyone would go to the cheapest vendor, so all the sellers would have to match that price. The law of one price would hold.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">In the real world, there are all sorts of frictions. An important one is incomplete information, so buyers do not know all the prices (and qualities) of goods and services. The real world has many different prices (just think of all the prices you see at gas stations as you are driving down the road), and buyers have to search to find low prices. Economists say that search is price discovery, which emphasizes how searching is a productive activity.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Consumers face a search optimization problem. The more they search, the lower the price they are likely to pay, but they have to spend resources\u2014time and effort\u2014to search. You can definitely oversearch, which means that the gain from the lower price you found was not worth the extra cost of searching. On the other hand, you cannot search enough\u2014you saved on search costs, but you did not take advantage of the lower prices you would have found by searching more.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Consumers optimize and search an optimal amount like Goldilocks: not too little and not too much but just right. The fact that buyers will not choose to find every price explains why price dispersion exists. This is a key result. As Stigler (1961) famously said, \u201cPrice dispersion is a manifestation\u2014and, indeed, it is the measure\u2014of ignorance in the market.\u201d<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">We also showed that lowering search costs would increase the optimal number of searches, but we can point out a few interesting real-world implications of this result. For example, not all consumers face the same search costs. Suppose you are in a hurry (perhaps you have an important deadline at work), your search costs are high, and therefore it is optimal for you to search less. Different people in different situations have different optimal solutions.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Our comparative statics result that lower <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">c<\/em><\/span> leads to higher optimal <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">n<\/em><\/span> points to the fact that lower search costs reduce price dispersion. If the internet allows you to quickly scan gas stations in an area and go to the cheapest one, prices are going to come closer together. They will not all be exactly the same (as the law of one price says) because the search is not free, but they will not be as spread out.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Noneconomists sometimes demonize advertising. They see consumers as dupes, easily fooled and tricked by ads to buy things they do not need or want. But search theory shows advertising in a different light. It is a way to lower search costs. Sellers are trying to be noticed in a noisy, chaotic marketplace, so they provide consumers with information about prices and product characteristics.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">Since we introduced the internet as a shock that lowered search costs, we close by pointing out that new online technologies have radically affected search theory. You know that every click is tracked, and the prices you see are personalized just for you. Optimal online searching is the subject of intense research today. Both buyers and sellers are faced with complicated, intertwined optimization problems.<\/p>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">References<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p class=\"hanging-indent\">Stigler, G. (1961). \u201cThe Economics of Information.\u201d <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">Journa<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">o<\/em><em class=\"import-i\">f <\/em><em class=\"import-i\">Politica<\/em><em class=\"import-i\">l <\/em><em class=\"import-i\">Economy<\/em><\/span> 69, no. 3, pp. 213\u2013225, <span style=\"border: none windowtext 0pt;padding: 0\"><a class=\"rId73\" href=\"http:\/\/www.jstor.org\/stable\/1829263\"><span class=\"import-url\">www.jstor.org\/stable\/1829263<\/span><\/a><\/span>. This paper is recognized as the beginning of the economics of search. Stigler was recognized as the founder of information economics when he was awarded the Nobel Prize in Economics in 1982.<\/p>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Appendix<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<p class=\"import-paft\">For 10 searches, repeat the same procedure as for 5 searches, slightly changing the formula for the minimum price and costs of searching to account for 10, instead of 5, searches. It goes like this: In cell E1, enter a 10 (this represents gathering prices from 10 vendors), and in cell E2, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=MIN(A2:A11)<\/em><\/span>. This formula shows the lowest price in your sample from 10 stores, which is the one we would buy. In cell E3, enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=0.04*1<\/em><em class=\"import-i\">0<\/em><\/span> (0.4 is the cost of 10 searches). In cell E4, we add the two cells above it together, so enter the formula <span style=\"border: none windowtext 0pt;padding: 0\"><em class=\"import-i\">=E2+E3<\/em><\/span>. You are now ready to track cell E4 in a simulation to see the typical result for this search strategy.<\/p>\n<p class=\"import-p\" style=\"text-indent: 36pt\">You should find that 10 searches have an approximate expected value of around 0.49. This is higher than 5 searches and therefore is clearly not an optimal solution.<\/p>\n<\/div>\n<\/div>\n","protected":false},"author":13,"menu_order":3,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"part":3,"_links":{"self":[{"href":"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-json\/pressbooks\/v2\/chapters\/32"}],"collection":[{"href":"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-json\/wp\/v2\/users\/13"}],"version-history":[{"count":35,"href":"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-json\/pressbooks\/v2\/chapters\/32\/revisions"}],"predecessor-version":[{"id":611,"href":"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-json\/pressbooks\/v2\/chapters\/32\/revisions\/611"}],"part":[{"href":"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-json\/pressbooks\/v2\/parts\/3"}],"metadata":[{"href":"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-json\/pressbooks\/v2\/chapters\/32\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-json\/wp\/v2\/media?parent=32"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-json\/pressbooks\/v2\/chapter-type?post=32"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-json\/wp\/v2\/contributor?post=32"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.palni.org\/gatewaytobusinessanalytics\/wp-json\/wp\/v2\/license?post=32"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}