tag:blogger.com,1999:blog-15324198057018363862021-10-15T22:04:08.042-07:00Random ProblemsHere are solutions to some random problemstheboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.comBlogger143125tag:blogger.com,1999:blog-1532419805701836386.post-48949351483822666052021-10-14T22:23:00.009-07:002021-10-14T22:24:34.482-07:00How Do I Determine My Raise Given Inflation?If you get a 10% raise, and inflation is 6%, did you actually get a raise?<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-6jD5n5ZJkcI/YWkQNZhT6uI/AAAAAAAAIcg/9Lz2R7kzYVAOH_8yFS2vf-04Hu66X068QCLcBGAsYHQ/s896/raise%2Bequation.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="157" data-original-width="896" height="auto" src="https://1.bp.blogspot.com/-6jD5n5ZJkcI/YWkQNZhT6uI/AAAAAAAAIcg/9Lz2R7kzYVAOH_8yFS2vf-04Hu66X068QCLcBGAsYHQ/s1600/raise%2Bequation.PNG" width="0%" /></a></div><a name='more'></a><br /><p></p><p>To get it out of the way, your actual raise is given by:<br /><br /></p><script async="" id="MathJax-script" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script><style>.MathJax { font-size: 1.3em !important; }</style><div style="font-size: 24px;"> \[{\text{actual raise = }\frac{\text{new salary}}{\text{old salary * (1 + inflation rate)}}} - 1\]</div><br /><div>Where does this come from? It's maybe easiest to think of this in terms of units. Say you make $50,000 now, and you made $40,000 last year. You make 20% more right? Not exactly. What the $ there really represents is some purchasing power. Inflation is a drop in purchasing power, so what you really need to do is convert the $ before and after to the same unit. To determine the value of $ in the current year in terms of the $ in the previous year, you just divide it by 1 + inflation rate. That gives you the equation above.</div><div><br /></div><div>Plugging in the numbers in the initial question then, the actual raise is:<br /><br /><br /></div><div style="font-size: 24px;"> \[{\frac{\text{new salary}}{\text{old salary * (1 + inflation rate)}}} - 1\]</div><br /><div style="font-size: 24px;"> \[{\frac{\text{old salary * (1 + 0.10)}}{\text{old salary * (1 + 0.06)}}} - 1\]</div><br /><div>Which is just 0.038, so the actual raise is 3.8%.</div><div><br /></div><div>It is very important to understand your raise in terms of local inflation. If you get a 5% raise but your area gets 10% more expensive, you actually got a paycut (4.5% paycut given those numbers).</div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com0tag:blogger.com,1999:blog-1532419805701836386.post-4511167942799153352021-10-02T22:32:00.003-07:002021-10-04T22:01:50.933-07:00Can You Confirm Performance Improvements With Noisy Software Benchmarks?Say you run 20 tests before and after a code change meant to speed up the code, but there's a lot of noise in your benchmarks. Some simple statistical tests can help you determine if you actually have an improvement in that noise.<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-KlEDPMOY9-Q/YVk-7jvvSwI/AAAAAAAAIbA/WfAFueWg66EO-mhoFzZ6ugYhfjLMmmeugCLcBGAsYHQ/s1346/benchmark%2Bsample%2Btimes.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="510" data-original-width="1346" height="auto" src="https://1.bp.blogspot.com/-KlEDPMOY9-Q/YVk-7jvvSwI/AAAAAAAAIbA/WfAFueWg66EO-mhoFzZ6ugYhfjLMmmeugCLcBGAsYHQ/s1600/benchmark%2Bsample%2Btimes.PNG" width="0%" /></a></div><a name='more'></a><h4 style="text-align: left;">Sample Data</h4><div style="text-align: left;">Imagine your 20 runs before and after look like this:</div><div style="text-align: left;"><br /></div><div><table style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr style="background-color: #e6a117;"><th>Before (ms)</th><th>After (ms)</th></tr><tr><td>241</td><td>272</td></tr><tr><td>224</td><td>211</td></tr><tr><td>202</td><td>226</td></tr><tr><td>243</td><td>234</td></tr><tr><td>246</td><td>205</td></tr><tr><td>229</td><td>279</td></tr><tr><td>209</td><td>208</td></tr><tr><td>231</td><td>212</td></tr><tr><td>258</td><td>218</td></tr><tr><td>287</td><td>198</td></tr><tr><td>270</td><td>215</td></tr><tr><td>262</td><td>244</td></tr><tr><td>227</td><td>215</td></tr><tr><td>200</td><td>175</td></tr><tr><td>291</td><td>220</td></tr><tr><td>290</td><td>218</td></tr><tr><td>184</td><td>218</td></tr><tr><td>319</td><td>247</td></tr><tr><td>250</td><td>245</td></tr><tr><td>229</td><td>199</td></tr></tbody></table><br />In case you prefer histograms:</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-KlEDPMOY9-Q/YVk-7jvvSwI/AAAAAAAAIbA/WfAFueWg66EO-mhoFzZ6ugYhfjLMmmeugCLcBGAsYHQ/s1346/benchmark%2Bsample%2Btimes.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="510" data-original-width="1346" height="auto" src="https://1.bp.blogspot.com/-KlEDPMOY9-Q/YVk-7jvvSwI/AAAAAAAAIbA/WfAFueWg66EO-mhoFzZ6ugYhfjLMmmeugCLcBGAsYHQ/s1600/benchmark%2Bsample%2Btimes.PNG" width="90%" /></a></div><br /><div><br /><br /></div><div>The 'after' numbers look like they're maybe smaller. If you take the average you get 245 ms before and 223 ms after. Is that really better though or are you just seeing noise?</div><div><br /></div><h4 style="text-align: left;">T-Test</h4><div>Assuming your benchmarking noise is roughly normally distributed, you can use a <a href="https://en.wikipedia.org/wiki/Student%27s_t-test" rel="" target="_blank">T-Test</a>. If you have never seen a T-test, a really rough description is that it will take two groups of numbers, and tell you if the means of the two groups are significantly different (i.e., the difference between them probably isn't just noise). </div><div><br /></div><div>What does 'probably' mean here? You get a p value out of T-Tests that is the probability that they're the same. E.g., a p value of 0.05 would mean roughly 'there's a 5% chance that the ~20 ms difference here is just noise'. </div><div><br /></div><div>You can do this in excel, google sheets, any of the many websites that do it, etc. I tend to use Python for this sort of stuff so a simple overview of how to do it in Python is:</div><div><ul style="text-align: left;"><li>import stats from scipy</li><li>call the <a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html" target="_blank">ttest_ind</a> method in it with the before numbers as the first arg and the after numbers as the second</li><li>the t value returned should be positive (since before should be higher than after) and the p value should be 2*target probability</li></ul><div>For the numbers in the example here, I get a p value of 0.03 which is less than the common target of 0.05, and recall earlier that I noted it's 2*target probability, so this is effectively a probability of 1.5% (p value of 0.015) which would generally mean 'significant difference'. Note that 'significant' here doesn't mean important...just unlikely to be noise. The difference in means is still the primary metric here. </div><div><br /></div><div>To summarize this then, you could say that the update significantly altered the benchmark time, and the difference in means is ~20 ms (or a ~10% performance improvement).</div><div><br /></div><div><i>Why divide by 2?</i></div><div><i><br /></i></div><div>This is an artefact of the method you use. In this case, the method I gave for testing this tests both sides of the assumption (i.e., tests both before > after and before < after). We only care about the before > after side though. This method actually handles this for you in current versions but I have an older version installed and wanted to put the more generic.</div><div><br /></div><div><i>Why ttest_ind?</i></div><div><i><br /></i></div><div>There are a lot of variants of T-Tests you can run. It's worth reading through them but I won't rewrite tons of info on them here. The ttest_ind I used is for independent samples of data. You might argue that a paired one is better here since 'making code faster' is sort of changing one aspect of a thing and testing it again, but ttest_ind works well in general usage.</div><div><br /></div><h4 style="text-align: left;">Mann-Whitney</h4><div>What if you have outliers and/or do not have a normal distribution of noise in your benchmarks? For a concrete example, what if the first number in the 'after' column is 600 instead of 272? T-Tests are not valid in these situations. Running it blindly returns a p of 0.4 which would indicate not significantly different, all from that single bad outlier.</div><div><br /></div><div>You can auto-exclude best and worst n times. You can manually scrub data. That sounds really manual though and we want to automate things. You can also use another type of test. One that's useful here is the <a href="https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test" target="_blank">Mann-Whitney U test</a>.</div><div><br /></div><div>The results are similar to a T-Test but the test itself is looking for something slightly different. Roughly, this test tells you how likely it is that the results are such that a random value chosen from after is just as likely to be greater than a random value chosen from before as vice-versa. Since it doesn't care about the magnitudes (only the orders), it is fine for outliers and non-normally distributed data.</div><div><br /></div><div>Same basic flow in Python:</div><div><ul><li>import stats from scipy</li><li>call the <a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mannwhitneyu.html" target="_blank">mannwhitneyu</a> method in it with the before numbers as the first arg and the after numbers as the second; also pass in 'two-sided' as the alternative to be consistent with the T-Test above if you want</li><li>the p value should be 2*target probability</li></ul><div>With the numbers here, I get a p value of 0.04, so dividing by 2, 0.02. This test was not tripped up by the outlier.</div></div><div><br /></div></div><div><br /></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com0tag:blogger.com,1999:blog-1532419805701836386.post-26210373466399322642021-09-13T22:18:00.014-07:002021-09-13T23:35:59.987-07:00Why Do We Multiply the Way We Do?We could just repeatedly add the numbers but we don't. Is the algorithm we use actually faster?<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-QUPyNJPO7OU/YUAxugpCidI/AAAAAAAAIZM/sT2-efUMlckFqN9yOFuXvVwTOJZm_2MaQCLcBGAsYHQ/s1353/multiplication%2Balgorithm%2Bcompare.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="772" data-original-width="1353" height="auto" src="https://1.bp.blogspot.com/-QUPyNJPO7OU/YUAxugpCidI/AAAAAAAAIZM/sT2-efUMlckFqN9yOFuXvVwTOJZm_2MaQCLcBGAsYHQ/s1600/multiplication%2Balgorithm%2Bcompare.PNG" width="0%" /></a></div><a name='more'></a><p>What I'm talking about here is multiplying 219 x 87 like the following:</p><p></p><ul style="text-align: left;"><li>7x9 to get 63</li><li>7x1 to get 7 and add a 0 to get 70</li><li>7x2 to get 14 and add two 0's to get 1400</li><li>8x9 to get 72 and add a 0 to get 720</li><li>8x1 to get 8 and add two 0's to get 800</li><li>8x2 to get 16 and add three 0's to get 16000</li><li>add all those together to get 19,053</li></ul><div>That's 6 simple multiplications and 6 additions. If we just added 219 to itself 87 times, that's 87 operations so clearly more steps with one big assumption: </div><div><br /></div><div style="text-align: center;"><i>you've memorized m x n for all integers m and n from 2 to 10. </i></div><div><br /></div><div>This is why we all had to learn times tables. How does this generalize as an algorithm?</div><div><br /></div><div>Repeated addition is ~ operation per 'smallest of the two numbers', so that is just O(n) where n is the smaller of the numbers.</div><div><br /></div><div>The algorithm we actually use is a bit harder. It scales as a x b where a and b are the number of digits in the two numbers. How does 'number of digits' scale? That's O(logn) where n is the number. Since it scales as the product of those, that algorithm scales as O(logm * logn) where m and n are the two numbers. </div><div><br /></div><div>What about the memorized simple multiplications? I have no idea how our memory access scales, but I'm going to just guess it's a constant time operation for simple multiplication so O(1) which doesn't contribute.</div><div><br /></div><div>For an example with actual calculations, here is the cost of multiplying each number up to 99 by 99 using each algorithm:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-QUPyNJPO7OU/YUAxugpCidI/AAAAAAAAIZM/sT2-efUMlckFqN9yOFuXvVwTOJZm_2MaQCLcBGAsYHQ/s1353/multiplication%2Balgorithm%2Bcompare.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="772" data-original-width="1353" height="auto" src="https://1.bp.blogspot.com/-QUPyNJPO7OU/YUAxugpCidI/AAAAAAAAIZM/sT2-efUMlckFqN9yOFuXvVwTOJZm_2MaQCLcBGAsYHQ/s1600/multiplication%2Balgorithm%2Bcompare.PNG" width="90%" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><br /></div><div><br /></div>It might not be obvious that O(logm * logn) is faster than O(n) but with actual numbers in the plot there it becomes pretty clear.</div><div><br /></div><div>It's cool to me that a basic math thing we all learn when we're little kids effectively uses a <a href="https://en.wikipedia.org/wiki/Dynamic_programming" target="_blank">dynamic programming</a> algorithm (memorize all m x n for m and n up to 10; convert every multiplication problem into a combination of m x n problems that you already solved).<br /><br /></div><p></p>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com0tag:blogger.com,1999:blog-1532419805701836386.post-68750193908089284782021-08-31T21:43:00.000-07:002021-08-31T21:43:01.891-07:00Exploring Senior Software Engineer Salary Data in levels.fyi<a href="https://www.levels.fyi/">levels.fyi</a> is a great resource for software salary info and it's easily mineable. I was curious how salaries in what are sometimes considered medium cost-of-living cities compare.<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-W4Egjoa1P0c/YS8Cf6kM-bI/AAAAAAAAIXY/Skc74bmHi14z4odUGE4Yul9nDtVVQStIQCLcBGAsYHQ/s1074/senior%2Bsw%2Bby%2Bcity.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="673" data-original-width="1074" height="auto" src="https://1.bp.blogspot.com/-W4Egjoa1P0c/YS8Cf6kM-bI/AAAAAAAAIXY/Skc74bmHi14z4odUGE4Yul9nDtVVQStIQCLcBGAsYHQ/s1600/senior%2Bsw%2Bby%2Bcity.png" width="0%" /></a></div><a name='more'></a><p>Software careers often have levels (hence the site name). Typically there's entry with 0-2.5 years, next at 2-5 years, then career level at 5-10 years. Some go above that (principal, chief, etc.). The one I'll play with here is the 5-10 year one. 5-10 year is often called 'senior software engineer'.</p><p>Here are the rough pay distributions in levels for that experience range in mid-priced cities (this is total compensation and not base salary):</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-W4Egjoa1P0c/YS8Cf6kM-bI/AAAAAAAAIXY/Skc74bmHi14z4odUGE4Yul9nDtVVQStIQCLcBGAsYHQ/s1074/senior%2Bsw%2Bby%2Bcity.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="673" data-original-width="1074" height="auto" src="https://1.bp.blogspot.com/-W4Egjoa1P0c/YS8Cf6kM-bI/AAAAAAAAIXY/Skc74bmHi14z4odUGE4Yul9nDtVVQStIQCLcBGAsYHQ/s1600/senior%2Bsw%2Bby%2Bcity.png" width="95%" /></a></div><p>A comparison is hard because it's not clear that each city represents the same thing. For example, many are state capitals so if 90% of the jobs are state ones then you'd expect them to be lower. Here's a list of the top-3 included employers for each city in that plot to hopefully provide more context:</p><p></p><ul style="text-align: left;"><li>Pittsburgh: Google, Uber, Argo AI<br /><br /></li><li>Chicago: Paypal, Expedia, Accenture<br /><br /></li><li>Denver: Amazon, Deloitte, Gusto<br /><br /></li><li>Austin: Apple, IBM, Amazon<br /><br /></li><li>Detroit: Amazon, General Motors, Quicken Loans<br /><br /></li><li>Atlanta: VMWare, Salesforce, Microsoft<br /><br /></li><li>Raleigh: IBM, Cisco, Microsoft<br /><br /></li><li>Nashville: Amazon, Asurion, HCA Healthcare<br /><br /></li><li>Phoenix: American Express, Intel, Amazon</li></ul><p></p><p>Amazon is everywhere apparently...</p><p>These numbers aren't perfect obviously. Many people do work for the state for example and they don't seem to be providing salaries here, so I'd wager that levels.fyi is biased towards higher-paying companies. Fun data though.</p><p><br /></p>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com0tag:blogger.com,1999:blog-1532419805701836386.post-72554932798682197152021-08-01T21:26:00.002-07:002021-08-01T21:26:53.364-07:00How to Add a Vertical Scrollbar to PlotlyPlotly doesn't have the built-in ability to scroll vertically with a fixed x axis unfortunately, but you can mimic that fairly easily...<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-Cbc0MUCU-Tg/YQdzbFfYNGI/AAAAAAAAIRU/quNF02sAXak4YbGwwJk-nOVWE28ckUPUQCLcBGAsYHQ/s817/scrolling%2Bplotly.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="360" data-original-width="817" height="auto" src="https://1.bp.blogspot.com/-Cbc0MUCU-Tg/YQdzbFfYNGI/AAAAAAAAIRU/quNF02sAXak4YbGwwJk-nOVWE28ckUPUQCLcBGAsYHQ/s1600/scrolling%2Bplotly.PNG" width="0" /></a></div><a name='more'></a><p>First, here's the demo:</p><p class="codepen" data-height="450" data-slug-hash="dyWeQJp" data-user="rhamner" style="align-items: center; border: 2px solid; box-sizing: border-box; display: flex; height: 450px; justify-content: center; margin: 1em 0px; padding: 1em;"> <span>See the Pen <a href="https://codepen.io/rhamner/pen/dyWeQJp"> vertical scroll plotly</a> by Robert Hamner (<a href="https://codepen.io/rhamner">@rhamner</a>) on <a href="https://codepen.io">CodePen</a>.</span></p><script async="" src="https://cpwebassets.codepen.io/assets/embed/ei.js"></script><p><br /></p><p>The basic model here is two stack two plots directly on top of each other where top is in a scrollable div and bottom is not.</p><p></p><ul style="text-align: left;"><li>Make two divs</li><ul><li>plot div</li><ul><li>scrollable</li><li>width = plot width + scroll width</li></ul><li>xaxis div</li><ul><li>not scrollable</li><li>width = plot width</li></ul></ul><li>Make two plots</li><ul><li>plot</li><ul><li>goes in plot div</li><li>y-axis zeroline is hidden</li><li>bottom margin is 0</li></ul><li>xaxis</li><ul><li>0 top margin</li><li>hide the modebar</li></ul></ul><li>Make the plot xaxis ranges equal</li></ul><div>You can then get as complicated as you need to here. I added really crude layout event linking to the demo...I'm hitting a weird double-click bug (should autoscale but isn't) but this works pretty easily/cleanly as the basic concept.</div><div><br /></div><br /><div><br /></div><p></p>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com1tag:blogger.com,1999:blog-1532419805701836386.post-65129511760470397342021-06-30T21:52:00.004-07:002021-06-30T21:54:43.698-07:00If 10 Vaccinated and 10 Unvaccinated People Die, Can We Still Say Vaccines Work?You will almost certainly be seeing headlines about vaccinated people dying and might even see that more vaccinated than unvaccinated die. <a href="https://www.wsj.com/articles/covid-19-killed-26-indonesian-doctors-in-juneat-least-10-had-taken-chinas-sinovac-vaccine-11624769885">Here's one from the week that I wrote this post</a>. Why do we still say vaccines work if this is happening?<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-nQcC4Laplpc/YN1JdXl-LpI/AAAAAAAAIKU/zJo261ZsHWIUQpsmKENkM-Nk5I2HQ5HZACLcBGAsYHQ/s1258/distribution.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="663" data-original-width="1258" src="https://1.bp.blogspot.com/-nQcC4Laplpc/YN1JdXl-LpI/AAAAAAAAIKU/zJo261ZsHWIUQpsmKENkM-Nk5I2HQ5HZACLcBGAsYHQ/s1600/distribution.png" width="0%" /></a></div><a name='more'></a>Imagine as an example that you see '10 vaccinated and 10 unvaccinated doctors died from COVID-19 today'. Your brain probably thinks 'well...the vaccine didn't work I guess.' We see those numbers, then just assume that the populations were similar. They're all doctors right?<p>Digging more, say that it turns out that 90% of the doctors were vaccinated. To make it easy, assume that there are 1,000 total doctors. 90% vaccinated means there were 900 vaccinated and 100 unvaccinated. If 10 died from each group, that means:<br /></p><ul style="text-align: left;"><li>10 / 100, or 10% of unvaccinated doctors died</li><li>10 / 900, or 1.1% of vaccinated doctors died</li></ul><div>Unvaccinated doctors were 9 times as likely to die as vaccinated ones. Another way of phrasing that is that the vaccine's efficacy was:<br /><br /><div style="text-align: center;"><b>vaccine efficacy</b> = 1 - (vaccinated risk/unvaccinated risk) = 1 - (0.011/0.1) = <b>89%</b></div></div><div style="text-align: center;"><b><br /></b></div><div style="text-align: left;">This is how you have to think about things like this. Vaccines, masks, seat belts, helmets, etc. aren't 100% effective. Use the calculation above whenever you see headlines like this and want to know the actual story. </div><div style="text-align: left;"><br /></div><div style="text-align: left;">You can even have more vaccinated deaths than unvaccinated. Imagine for the 89% efficacy vaccine above, you have 99% of the population vaccinated. For 10,000 doctors in that example, you'd expect to have 10% of the 100 unvaccinated die and 1.1% of the vaccinated 9900 die, so that's 10 unvaccinated deaths and 120 vaccinated deaths. <b>A highly effective vaccine can still have more vaccinated people die than unvaccinated ones.</b></div><div style="text-align: left;"><br /></div><div style="text-align: left;">In case a visual helps, here is the initial example's distribution as a colored grid (red = dead and green = alive):</div><div style="text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-nQcC4Laplpc/YN1JdXl-LpI/AAAAAAAAIKU/zJo261ZsHWIUQpsmKENkM-Nk5I2HQ5HZACLcBGAsYHQ/s1258/distribution.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="663" data-original-width="1258" src="https://1.bp.blogspot.com/-nQcC4Laplpc/YN1JdXl-LpI/AAAAAAAAIKU/zJo261ZsHWIUQpsmKENkM-Nk5I2HQ5HZACLcBGAsYHQ/s1600/distribution.png" width="80%" /></a></div><br /><div style="text-align: left;"><br /></div><p></p>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com1tag:blogger.com,1999:blog-1532419805701836386.post-71486582668205143522021-06-13T21:56:00.005-07:002021-06-14T07:51:13.203-07:00Wheel Options Strategy SimulationsThe 'Wheel' is an options strategy that combines cash-secured puts with covered calls. I sometimes have trouble really grasping options strategies in my head, so simulating some scenarios gives me a better feel.<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-ApkhSBA5hE8/YMbf7ow1-zI/AAAAAAAAIFc/3t4S6KCOg8U6DyQ5sHG_23htg3HJoc9QgCLcBGAsYHQ/s917/simulations.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="643" data-original-width="917" src="https://1.bp.blogspot.com/-ApkhSBA5hE8/YMbf7ow1-zI/AAAAAAAAIFc/3t4S6KCOg8U6DyQ5sHG_23htg3HJoc9QgCLcBGAsYHQ/s1600/simulations.png" width="0%" /></a></div><a name='more'></a><h4 style="text-align: left;">Basic strategy</h4>To keep it simple, I will just deal with 'at the money' (ATM) options here. The basic strategy then is:<br /><ol style="text-align: left;"><li>Sell a cash-secured put to start<br /><br /></li><li>If the stock goes up, the put expires so you sell another put<br /><br /></li><li>If the stock goes down, the put is exercised, you're assigned the shares, so you can sell a covered call<br /><br /></li><li>If the stock goes down from there, the call expires so you sell another call<br /><br /></li><li>If the stock goes up from there, the call is exercised, you sell the shares, so you sell another put<br /><br /></li><li>repeat...</li></ol><div>You can see that when you are assigned shares, you just sell a call, and when a call is exercised, you just use the cash to sell a put. This repeats indefinitely. Isn't this just free money? Sort of...what you're trading off here is a bit hard to see immediately. This is where playing with some numbers can make it easier to understand what's happening.</div><div><br /></div><h4 style="text-align: left;">Simple examples</h4><div>To get a better idea of how this works, let's look at 3 simple examples:<br /><ol style="text-align: left;"><li>stock doesn't change much<br /><br /></li><li>stock drops ~15% in a year<br /><br /></li><li>stock gains ~15% in a year<br /><br /></li><li>stock crashes ~15% and rebounds in a year</li></ol><div>In each scenario, I'll add a bit of noise and assume that selling a put yields $1.25/month, selling a call yields $1/month, and these are all monthly expirations and ATM strikes with a starting value of $100.</div></div><div><br /></div><div>Imagine in the first one the price for the first 5 months is 100, 102, 101, 97, 101. What does the wheel strategy look like?<br /><ul style="text-align: left;"><li>sell put for $1.25 with a $100 strike; gain $1.25 from the sell and lose nothing in stock</li></ul><ul style="text-align: left;"><li>price hits $102; that's above the $100 strike so it expires; sell another $1.25 put with a $102 strike and lose nothing in stock</li></ul><ul style="text-align: left;"><li>price hits $101; that's below the $102 strike so pay $102 for the shares and sell a $1 call with a $101 strike</li></ul><ul style="text-align: left;"><li>price hits $97; that's below the $101 strike so it expires; sell another $1 call with a $97 strike</li></ul><ul style="text-align: left;"><li>price hits $101; that's above the $97 strike so sell at $97 and sell another $1.25 put with a $101 strike</li></ul><div>Overall, the wheel earned $5.75 from selling options, but lost $4 in the stock (bought at $101 and sold at $97). That stock loss is the most obvious loss here but there's another more subtle one. Look at the first put again. The gain from selling the option was $1.25, but the stock itself gained $2 then. The gain was effectively capped at $1.25. The same is not true for the loss. When the stock fell, the entire loss was absorbed. Capping gains while having to absorb losses is a primary tradeoff here (some other ones are poor tax performance, low-liquidity, and potentially missing dividends).</div></div><div><br /></div><div>Now that that's understood, it's helpful to me to see this graphically, so here are sample runs of the examples from above:</div><div><br /><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-_HgM81YeDRg/YMbeZli8AOI/AAAAAAAAIFE/883L9T_B780lbXOkj4mpUNRcsMne-fBDgCLcBGAsYHQ/s898/flat%2Bstock.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="458" data-original-width="898" src="https://1.bp.blogspot.com/-_HgM81YeDRg/YMbeZli8AOI/AAAAAAAAIFE/883L9T_B780lbXOkj4mpUNRcsMne-fBDgCLcBGAsYHQ/s1600/flat%2Bstock.PNG" width="75%" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-LYLH2XYCFGs/YMbeesunGtI/AAAAAAAAIFI/5-HbWJB6-hMkiciIvkxB91bV45h5Q39egCLcBGAsYHQ/s899/declining%2Bstock.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="460" data-original-width="899" src="https://1.bp.blogspot.com/-LYLH2XYCFGs/YMbeesunGtI/AAAAAAAAIFI/5-HbWJB6-hMkiciIvkxB91bV45h5Q39egCLcBGAsYHQ/s1600/declining%2Bstock.PNG" width="75%" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-ZSlEpLiM5LA/YMbehokTh3I/AAAAAAAAIFM/R7angX0QZhEKE_5KgSJxESZa52nAluSSwCLcBGAsYHQ/s898/growing%2Bstock.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="461" data-original-width="898" src="https://1.bp.blogspot.com/-ZSlEpLiM5LA/YMbehokTh3I/AAAAAAAAIFM/R7angX0QZhEKE_5KgSJxESZa52nAluSSwCLcBGAsYHQ/s1600/growing%2Bstock.PNG" width="75%" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-rc92adMRhes/YMbekDgSbgI/AAAAAAAAIFQ/MZmy6tdr-TEBnSZqPllf6U4-wC9IxkiqgCLcBGAsYHQ/s897/crash%2Band%2Brebound.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="454" data-original-width="897" src="https://1.bp.blogspot.com/-rc92adMRhes/YMbekDgSbgI/AAAAAAAAIFQ/MZmy6tdr-TEBnSZqPllf6U4-wC9IxkiqgCLcBGAsYHQ/s1600/crash%2Band%2Brebound.PNG" width="75%" /></a></div><br /><div><br /></div><div>The general behavior here is that the wheel smooths out the plots a bit. Increases and decreases aren't quite as big. You can control how smooth it is by changing expiration dates and strike price offsets (e.g., selling calls with a strike price 10% above current price will allow for larger gains but give you less option premium, so the performance looks more like buy and hold). When a stock crashes, you'll probably do a bit better with the wheel. When a stock surges, you'll probably do a bit worse with the wheel.</div><div><br /></div><div>The above plots are single trials of the simulation. What does it look like if this is run thousands of times?</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-ApkhSBA5hE8/YMbf7ow1-zI/AAAAAAAAIFc/3t4S6KCOg8U6DyQ5sHG_23htg3HJoc9QgCLcBGAsYHQ/s917/simulations.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="643" data-original-width="917" src="https://1.bp.blogspot.com/-ApkhSBA5hE8/YMbf7ow1-zI/AAAAAAAAIFc/3t4S6KCOg8U6DyQ5sHG_23htg3HJoc9QgCLcBGAsYHQ/s1600/simulations.png" width="75%" /></a></div><br /><div>That's much clearer to me. In down periods, the wheel just minimizes your losses a bit (loss is stock loss, but you gain option premium). In good periods, the wheel caps your gains so you get a flattened distribution (max gain is the option premium).</div><div><br /></div><h4 style="text-align: left;">Summary</h4><div>Should you use this strategy? There's no perfect answer for that. In an extremely long bull market (like current), it's likely going to underperform. It does give you a bit of protection against drops and can do better in neutral markets. I personally don't like the thought of capping gains while not capping losses so this isn't a favorite of mine (see the fourth image with the crash and rebound to understand why that can be bad), but it's definitely viable if you want to smooth out your returns a bit.</div><div><br /></div><div><br /></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com1tag:blogger.com,1999:blog-1532419805701836386.post-10687219980270690432021-05-21T20:15:00.004-07:002021-05-21T20:18:12.028-07:00Negative values with a log axis in PlotlyAlthough log10(<any number less than or equal to 0>) is not defined, there are situations where you want to visualize data as if it were. How can you get plotly to do that? Another way of asking is 'how can you mimic symlog functionality in plotly?'<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-Rkm-hJMngXg/YKh2_h6M6WI/AAAAAAAAH-8/UubKGrQfXTEOdmunCHlRPofa3tl_tBAqQCLcBGAsYHQ/s1862/symlog.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="348" data-original-width="1862" src="https://1.bp.blogspot.com/-Rkm-hJMngXg/YKh2_h6M6WI/AAAAAAAAH-8/UubKGrQfXTEOdmunCHlRPofa3tl_tBAqQCLcBGAsYHQ/s1600/symlog.PNG" width="0%" /></a></div><a name='more'></a>First...a real example of when you'd want this. Imagine you do the following:<div><ul style="text-align: left;"><li>generate a 1 GHz tone</li><li>measure amplitude at +/- 10 kHz, +/- 100 kHz, +/- 1 MHz, ...</li><li>generate a 2 GHz tone</li><li>measure amplitude at +/- 10 kHz, +/- 100 kHz, +/- 1 MHz, ...</li><li>want to overlay those offset amplitude curves</li></ul><div>You could just plot vs absolute frequency to see one, but to overlay you need to center around a tone, and it just makes sense to show 'offset from tone' as the x axis. However, those steps imply a log scale.</div></div><div><br /></div><div>Below is a working example of exactly this situation in plotly.js. I've included the ideal here with both positive and negative on a log scale, and the normal linear plot so that the difference in parsing it quickly is obvious:</div><div><br /></div><p class="codepen" data-default-tab="js,result" data-height="500" data-pen-title="symlog approximation" data-slug-hash="KKWamLW" data-theme-id="light" data-user="rhamner" style="align-items: center; border: 2px solid; box-sizing: border-box; display: flex; height: 500px; justify-content: center; margin: 1em 0px; padding: 1em;"> <span>See the Pen <a href="https://codepen.io/rhamner/pen/KKWamLW"> symlog approximation</a> by Robert Hamner (<a href="https://codepen.io/rhamner">@rhamner</a>) on <a href="https://codepen.io">CodePen</a>.</span></p><script async="" src="https://cpwebassets.codepen.io/assets/embed/ei.js"></script><div><br /></div><div><br /></div><div>The basic algorithm is pretty simple:</div><div><ul style="text-align: left;"><li>Determine the max and min values and the value closest to zero; largest of max and abs(min) is upper bound...value closest to zero is lower<br /><br /></li><li>Split all traces into positive and negative (x values here since I just did this for x in the demo)<br /><br /></li><li>Create two x-axes: one for positive and one for negative</li><ul><li>give both the same bounds</li><li>reverse the negative x-axis</li><li>assign ticks with positive values but negative labels to the negative x-axis</li><li>put a small buffer between them to represent that zero is undefined<br /><br /></li></ul><li>Plot positive traces vs positive x-axis and negative traces vs negative x-axis, but make the negative x values positive<br /></li></ul><div>In that demo above you can just step through the javascript code and it should all be pretty clear.</div></div><div><br /></div><div>If you want a slight variant of this that matches <a href="https://matplotlib.org/stable/gallery/scales/symlog_demo.html">'symlog' in matplotlib</a>, just add a third, linear axis to connect these two instead of leaving a gap. I personally prefer the gap for this situation.</div><div><br /></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com0tag:blogger.com,1999:blog-1532419805701836386.post-51085009306756570162021-05-02T21:45:00.008-07:002021-05-02T22:26:45.703-07:00Simple way to see code coverage in pythonSometimes you want to quickly see unit test coverage of your code. <a href="https://coverage.readthedocs.io/en/coverage-5.5/">Coverage.py</a> makes that really simple.<div class="separator" style="clear: both; text-align: center;"><a href="https://lh3.googleusercontent.com/-0BnmIqZgxr8/YI9-RMJp-rI/AAAAAAAAH7c/fXQHYip3X9Y9zKlDzrpK8lJYVBLbqyw1gCLcBGAsYHQ/image.png" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="264" data-original-width="599" height="auto" src="https://lh3.googleusercontent.com/-0BnmIqZgxr8/YI9-RMJp-rI/AAAAAAAAH7c/fXQHYip3X9Y9zKlDzrpK8lJYVBLbqyw1gCLcBGAsYHQ/image.png" width="0%" /></a></div><a name='more'></a>First, what do I mean by test coverage? Below is an example for a really simple usage:<div><br /></div><div><div class="separator" style="clear: both; text-align: center;"><div class="separator" style="clear: both; text-align: center;"><a href="https://lh3.googleusercontent.com/-0BnmIqZgxr8/YI9-RMJp-rI/AAAAAAAAH7c/fXQHYip3X9Y9zKlDzrpK8lJYVBLbqyw1gCLcBGAsYHQ/image.png" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="264" data-original-width="599" height="auto" src="https://lh3.googleusercontent.com/-0BnmIqZgxr8/YI9-RMJp-rI/AAAAAAAAH7c/fXQHYip3X9Y9zKlDzrpK8lJYVBLbqyw1gCLcBGAsYHQ/image.png" /></a></div><br /><br /></div><div>That tells me how much of the code is executed when I executed my tests (start with test_* here). For that example, I just have two files with two methods in each. The files are identical and look like this:</div><div><br /></div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://lh3.googleusercontent.com/-MwqLmZPrhXY/YI993MTVABI/AAAAAAAAH7U/LVljKk55-zYJ6eclTe-kz69vC-egLQrUgCLcBGAsYHQ/image.png" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="368" data-original-width="354" height="240" src="https://lh3.googleusercontent.com/-MwqLmZPrhXY/YI993MTVABI/AAAAAAAAH7U/LVljKk55-zYJ6eclTe-kz69vC-egLQrUgCLcBGAsYHQ/image.png" width="231" /></a></div><br /></div><div><br /></div><div>I have one file with unit tests. I'm using <a href="https://docs.pytest.org/en/6.2.x/">pytest</a> for this but coverage works with other test frameworks. Here is the unit test file:</div><div><br /></div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://lh3.googleusercontent.com/-yzV3V3iteRY/YI-AH0q4akI/AAAAAAAAH7s/aCAv0S0ExkYQcLUYcPHmqHbNpGPTc1ctQCLcBGAsYHQ/image.png" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="256" data-original-width="536" height="153" src="https://lh3.googleusercontent.com/-yzV3V3iteRY/YI-AH0q4akI/AAAAAAAAH7s/aCAv0S0ExkYQcLUYcPHmqHbNpGPTc1ctQCLcBGAsYHQ/image.png" width="320" /></a></div><br /></div><div><br /></div><div>With a file this simple, we can see some obvious test gaps like:<br /><ul style="text-align: left;"><li>it doesn't test all functions in all files</li><li>it doesn't test all paths in the functions (e.g., no 0 test in file1)</li></ul><div>You can imagine how hard it is to see that for any realistic code though. This is where coverage checks can help. Clicking into the coverage for file2 here we get:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://lh3.googleusercontent.com/-ZWBL3S-6njo/YI9-XW1XHYI/AAAAAAAAH7g/NJAA3aOraUIotlFzynQdeXYxPyw3aCtLgCLcBGAsYHQ/image.png" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="404" data-original-width="505" height="240" src="https://lh3.googleusercontent.com/-ZWBL3S-6njo/YI9-XW1XHYI/AAAAAAAAH7g/NJAA3aOraUIotlFzynQdeXYxPyw3aCtLgCLcBGAsYHQ/image.png" width="300" /></a></div></div><div><br /></div><div>Really simple to see that we missed the non-zero case in is_zero and that we didn't call the is_zero_wrapper function at all in any of the tested paths.</div><div><br /></div><div>It is important to note that 100% coverage doesn't mean your code is perfectly tested and that less than 100% coverage doesn't mean your codebase is garbage. This is just one of many useful metrics for gauging test coverage and testing gaps.</div><div><br /></div><div>To set this up and run it:<br /><ol style="text-align: left;"><li>install coverage (e.g., 'pip install coverage')</li><li>install pytest (e.g., 'pip install pytest')</li><li>run 'coverage run -m pytest'</li><li>run 'coverage html'</li><li>open the index.html file in the htmlcov folder that it created</li></ol><div>That index.html file is what I have in the screenshot at the start.</div></div><div><br /></div><div>If you want to use my exact code to test this, it's available <a href="https://github.com/rhamner/coverage_example">here</a>.</div><div><br /></div></div></div><div><br /></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com0tag:blogger.com,1999:blog-1532419805701836386.post-14912543741358950362021-04-06T21:11:00.009-07:002021-04-06T22:51:23.444-07:00Thinking in terms of probabilitiesWe suck at probability. A common trap we fall into is failing to realize this and thinking in terms of absolutes.<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-G03fGZ7v-c0/YG0wLxaqBGI/AAAAAAAAH24/dhlu6HZI7moplJnQxrdX-JePpu1Z7hUyQCLcBGAsYHQ/s1058/wealth%2Bquintiles.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="449" data-original-width="1058" src="https://1.bp.blogspot.com/-G03fGZ7v-c0/YG0wLxaqBGI/AAAAAAAAH24/dhlu6HZI7moplJnQxrdX-JePpu1Z7hUyQCLcBGAsYHQ/s1600/wealth%2Bquintiles.png" width="0%" /></a></div><a name='more'></a>What's an example of this? One I've encountered many times is something like 'if you work hard you can stop being poor' or 'everyone decides their own wealth'. Is this true? This is what I mean...there is no absolute yes or no answer. Consider the following plot (<a href="https://www.stlouisfed.org/publications/regional-economist/july-2016/which-persists-more-from-generation-to-generation-income-or-wealth" target="_blank">source data</a>):<div><br /></div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-G03fGZ7v-c0/YG0wLxaqBGI/AAAAAAAAH24/dhlu6HZI7moplJnQxrdX-JePpu1Z7hUyQCLcBGAsYHQ/s1058/wealth%2Bquintiles.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="449" data-original-width="1058" src="https://1.bp.blogspot.com/-G03fGZ7v-c0/YG0wLxaqBGI/AAAAAAAAH24/dhlu6HZI7moplJnQxrdX-JePpu1Z7hUyQCLcBGAsYHQ/s1600/wealth%2Bquintiles.png" width="95%" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><br /></div></div><div>The following statements are all true:</div><div><ul style="text-align: left;"><li>some kids from the poorest households end up wealthy</li><li>some kids from the wealthiest households end up poor</li><li>most poor kids stay poor</li><li>most rich kids stay rich</li><li>parental wealth is a good predictor of your wealth as an adult</li></ul><div>Many people will see someone claim that last bullet and jump to 'what about this guy that grew up poor and made it?' The plot clearly shows that's possible and doesn't negate the last bullet. Thinking in terms of what's likely is a better model for this.</div></div><div><br /></div><div>Another common example is the classic 'it's cold today so global warming isn't real'. If you don't think of the distributions of temperatures, this is an easy fallacy to fall victim to. Here are two plots of temperature distributions for Denver, Colorado summer highs in 1900 and 2000 (respectively):<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-xwLFXXJ3ECw/YG0wPj8Ue8I/AAAAAAAAH28/Abiu1IvQZmgQD813a6ziBghSmPv6iSQVgCLcBGAsYHQ/s1072/1900%2Bdenver%2Bsummers.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="496" data-original-width="1072" src="https://1.bp.blogspot.com/-xwLFXXJ3ECw/YG0wPj8Ue8I/AAAAAAAAH28/Abiu1IvQZmgQD813a6ziBghSmPv6iSQVgCLcBGAsYHQ/s1600/1900%2Bdenver%2Bsummers.PNG" width="95%" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-zhIHE602iZg/YG1IFFOujXI/AAAAAAAAH3M/gmEevi4eMFY8BiJYspVzPFTJHJz1mlVpwCLcBGAsYHQ/s1071/2000%2Bdenver%2Bsummers.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="498" data-original-width="1071" src="https://1.bp.blogspot.com/-zhIHE602iZg/YG1IFFOujXI/AAAAAAAAH3M/gmEevi4eMFY8BiJYspVzPFTJHJz1mlVpwCLcBGAsYHQ/s1600/2000%2Bdenver%2Bsummers.PNG" width="95%" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><br /></div><br />There are clearly cold (for summer) days in both. There is also a clear shift towards higher temperatures in 2000 vs 1900. That 'the distribution has shifted towards higher temperatures' is the best mental model for global warming in my opinion. If you want to see more of these <a href="https://cityprojections.com/summerHighHistograms.html">I pulled them from this page.</a></div><div><br /></div><div>This could go on forever but I hope the general idea is clear. Many things are distribution-based and can be understood much more easily if thought of in terms of 'how does this distribution shift/compare?'</div><div><br /></div><div>If you're interested in a great book on this general topic, <a href="https://www.amazon.com/gp/product/0307275175/ref=as_li_qf_asin_il_tl?ie=UTF8&tag=rhamner-20&creative=9325&linkCode=as2&creativeASIN=0307275175&linkId=b33c0c74e4c510ecadee926cad4195b0" target="_blank">I liked 'The Drunkard's Walk'.</a></div><div><br /></div><div><br /></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com1tag:blogger.com,1999:blog-1532419805701836386.post-57591309907030021152021-02-07T22:11:00.002-08:002021-02-07T22:24:51.939-08:00If the square root of -1 is i, what is the cube root of -1?You probably learned at some point that the square root of -1 is i. What about the cubed root of it? There's the obvious answer of (-1)^3 = -1, but the answer isn't actually that simple.<div class="separator" style="clear: both; text-align: center;"><a href="https://lh3.googleusercontent.com/-tWhXh8XZhws/YCDP3jeyusI/AAAAAAAAHqQ/PUHtey2GXk4_osFLoZWsmJC1BuYHSzsYwCLcBGAsYHQ/image.png" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="1688" data-original-width="1200" height="449" src="https://lh3.googleusercontent.com/-tWhXh8XZhws/YCDP3jeyusI/AAAAAAAAHqQ/PUHtey2GXk4_osFLoZWsmJC1BuYHSzsYwCLcBGAsYHQ/w320-h449/image.png" width="0%" /></div><a name='more'></a><script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script> <script async="" id="MathJax-script" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"><style>.MathJax { font-size: 1.3em !important; }</style> </script>To answer this, we'll need Euler's identity which is:<div style="font-size: 60px;"> \[e^{i\pi}=-1\]</div>Just take the cubed root of each side:<div style="font-size: 60px;"> \[e^{i\pi*\frac{1}{3}}=-1^{\frac{1}{3}}\]</div><div style="font-size: 60px;"> \[e^{\frac{i\pi}{3}}=-1^{\frac{1}{3}}\]</div>Now we just need the following definition:<div style="font-size: 60px;"> \[e^{ix}=cos(x)+i*sin(x)\]</div>Plugging in our value:<div style="font-size: 60px;"> \[e^{\frac{i\pi}{3}}=-1^{\frac{1}{3}}\]</div><div style="font-size: 60px;"> \[cos(\frac{\pi}{3}) + i*sin(\frac{\pi}{3})=-1^{\frac{1}{3}}\]</div><div style="font-size: 60px;"> \[\frac{1}{2} + i*\frac{\sqrt{3}}{2}=-1^{\frac{1}{3}}\]</div>And that's it...there's another cube root of -1.<div><br />What does that actually mean? Consider this coordinate system: </div><div><br /></div></a><div><a href="https://lh3.googleusercontent.com/-tWhXh8XZhws/YCDP3jeyusI/AAAAAAAAHqQ/PUHtey2GXk4_osFLoZWsmJC1BuYHSzsYwCLcBGAsYHQ/image.png" style="margin-left: 1em; margin-right: 1em;"></a><div class="separator" style="clear: both; text-align: center;"><a href="https://lh3.googleusercontent.com/-tWhXh8XZhws/YCDP3jeyusI/AAAAAAAAHqQ/PUHtey2GXk4_osFLoZWsmJC1BuYHSzsYwCLcBGAsYHQ/image.png" style="margin-left: 1em; margin-right: 1em;"></a><a href="https://lh3.googleusercontent.com/-tWhXh8XZhws/YCDP3jeyusI/AAAAAAAAHqQ/PUHtey2GXk4_osFLoZWsmJC1BuYHSzsYwCLcBGAsYHQ/image.png" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="1688" data-original-width="1200" height="449" src="https://lh3.googleusercontent.com/-tWhXh8XZhws/YCDP3jeyusI/AAAAAAAAHqQ/PUHtey2GXk4_osFLoZWsmJC1BuYHSzsYwCLcBGAsYHQ/w320-h449/image.png" width="320" /></a></div><br /><br /></div><div>With real numbers on the horizontal axis and imaginary numbers on the vertical axis, you can draw complex numbers as vectors. This has a cool property. We got pi/3 radians as our angle there. That's equal to 60 degrees, or one-sixth of a full rotation. Looking at that coordinate system, if r = 1:<br /><br /><ul style="text-align: left;"><li>0 degrees = 1</li><li>90 degrees = i</li><li>180 degrees = -1</li><li>270 degrees = -i</li><li>360 degrees = 1</li><li>450 degrees = i</li><li>...</li></ul><div>It rotates around. Since an angle of pi/3 represents 60 degrees, cubing the value with r = 1 and angle = 60 degrees gives you the same thing as r = 1 and angle = 180 degrees, which is -1.</div></div><div><br /></div><div>Thinking through it a bit more, that's not unique. What if we used 300 degrees instead? Rotating by 300 degrees 3 times gives you 900 degrees which is just 2 revolutions + 180 degrees. Will that give you -1 also?</div><div><br /></div><div>60 degree answer cubed:<div style="font-size: 60px;"> \[(\frac{1}{2} + i*\frac{\sqrt{3}}{2})*(\frac{1}{2} + i*\frac{\sqrt{3}}{2})*(\frac{1}{2} + i*\frac{\sqrt{3}}{2})\]</div><div style="font-size: 60px;"> \[(\frac{1}{4} + i*\frac{\sqrt{3}}{2} - \frac{3}{4})*(\frac{1}{2} + i*\frac{\sqrt{3}}{2})\]</div><div style="font-size: 60px;"> \[\frac{1}{8} + i*\frac{\sqrt{3}}{8} + i*\frac{\sqrt{3}}{4} - \frac{3}{4} - \frac{3}{8} - i*\frac{3*\sqrt{3}}{8}\]</div>That adds up to -1 which is what we wanted.<br /><br />300 degree answer cubed:<div style="font-size: 60px;"> \[(\frac{1}{2} - i*\frac{\sqrt{3}}{2})*(\frac{1}{2} - i*\frac{\sqrt{3}}{2})*(\frac{1}{2} - i*\frac{\sqrt{3}}{2})\]</div><div style="font-size: 60px;"> \[(\frac{1}{4} - i*\frac{\sqrt{3}}{2} + \frac{3}{4})*(\frac{1}{2} + i*\frac{\sqrt{3}}{2})\]</div><div style="font-size: 60px;"> \[\frac{1}{8} - i*\frac{\sqrt{3}}{8} - i*\frac{\sqrt{3}}{4} + \frac{3}{4} + \frac{3}{8} + i*\frac{3*\sqrt{3}}{8}\]</div>That also adds up to -1 which is what we wanted. Finally, we have the -1^3 = -1 answer which is just the 180 degree one.<br /><br />Thus, we found three cubed roots of -1: 0.5 + 0.866i, 0.5 - 0.866i, and -1.<br><br>For the one we all learned...'square root of -1 is i'...is that really the only answer? Doing a similar exercise, you want to end up at m*360 + 180 degrees after n rotations where n is the root and m is an integer. Here, n = 2. That means 2*rotation = m*360 + 180, or rotation = 180*m + 90. Start with m = 0. rotation = 90 which means i is an answer which we know. Try m = 1. rotation = 270 which means -i is answer. Trying that out...-i * -i = i^2 = -1. That works. Try m = 2. rotation = 450 which is just 90 + 1 full cycle, so we're repeating now. i and -i are our square roots of -1.</div><div><br /></div><div><br /></div><div><br /></div><div><br /></div></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com1tag:blogger.com,1999:blog-1532419805701836386.post-79837416741239166202021-01-24T22:49:00.009-08:002021-01-25T15:48:05.898-08:00Regression Toward the Mean in the NFLI wanted to run some quick tests to see if <a href="https://en.wikipedia.org/wiki/Regression_toward_the_mean#:~:text=In%20statistics%2C%20regression%20toward%20the,or%20average%20on%20further%20measurements.&text=The%20answer%20was%20not%20%27on%20average%20directly%20above%27.">regression toward the mean</a> shows up clearly in NFL data.<a href="https://1.bp.blogspot.com/-QFQZegAeRdE/YA5n0yv7YAI/AAAAAAAAHnA/Fi2nHEFMGbMVHS6pVGppPb2hdL2xaE5mACLcBGAsYHQ/s1296/passing%2Byards%2Bscatter.png" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" data-original-height="632" data-original-width="1296" src="https://1.bp.blogspot.com/-QFQZegAeRdE/YA5n0yv7YAI/AAAAAAAAHnA/Fi2nHEFMGbMVHS6pVGppPb2hdL2xaE5mACLcBGAsYHQ/s1600/passing%2Byards%2Bscatter.png" width="0%" /></a><a name='more'></a><h4 style="text-align: left;"><br /></h4><h4 style="text-align: left;">Background</h4><div>In case you aren't familiar, 'regression toward the mean' roughly means that if a random variable is an outlier, a future instance is likely to be closer to the mean. For a really simple model to make this easy to understand for something like NFL player performance, imagine that each player's performance is X% skill and Y% luck. If X is 100 and Y is 0, then previous years will nearly perfectly predict future years. If Y is 100 and X is 0, then there will be no relationship between performance from one year to the next. If X and Y are both between 0 and 100, there will be some relationship between performance from year to year but it won't be perfect. <div><br /></div><div>There are two easy ways for me to look at this phenomenon:<br /><ol style="text-align: left;"><li>plot one year's performance against the previous year's along with a line with a slope of 1 (X = 100%) and a best-fitting line<br /><br /></li><li>bin the data by previous year's performance and look at how each bin shifted in the next year</li></ol><div>What might we see? There are many possibilities, but here are a few examples:</div><div><ul style="text-align: left;"><li>"Players that performed well perform even better the next season": plot 1 will show a slope greater than 1 and plot 2 will show the bottom bin doing worse and the top bin doing better<br /><br /></li><li>"Performance is driven by skill so it's the same year-to-year": plot 1 will show a slope of 1 and plot 2 will show all bins at roughly zero<br /><br /></li><li>"Performance is a mix of skill and luck so top performers will move back towards average and poor performers will move up towards average (<b>this is the regression toward the mean case</b>)": plot 1 will show a slope between 0 and 1, and plot 2 will show the bottom bin doing better and the top bin doing worse<br /><br /></li><li>"It's all random/luck": plot 1 will show a slope of ~0 and plot 2 will show all bins at roughly 0<br /><br /></li><li>"Poor performers overcompensate and end up better than average next season": plot 1 will show a slope less than 1 and plot 2 will show the bottom bin doing better and the top bin doing worse</li></ul></div><div><div>To test it out I ran with 5 different stats using data from all starters from 2000-2020. For example, for a 2010-2011 compare, year 1 is 2010 and year 2 is 2011. You would expect the best performers in 2010 to do a bit worse in 2011, and the worst in 2010 to do a bit better in 2011. In the bar plots, the 'bottom third' means the 33% of players that were worst in season 1 from the plot above.</div></div></div><div><br /></div><h4 style="text-align: left;">Results</h4><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-QFQZegAeRdE/YA5n0yv7YAI/AAAAAAAAHnA/Fi2nHEFMGbMVHS6pVGppPb2hdL2xaE5mACLcBGAsYHQ/s1296/passing%2Byards%2Bscatter.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="632" data-original-width="1296" src="https://1.bp.blogspot.com/-QFQZegAeRdE/YA5n0yv7YAI/AAAAAAAAHnA/Fi2nHEFMGbMVHS6pVGppPb2hdL2xaE5mACLcBGAsYHQ/s1600/passing%2Byards%2Bscatter.png" width="100%" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-RQVgzFIad6M/YA5n0ix8dUI/AAAAAAAAHm8/YMQhfYd97Aon_HTnd6_9nOEbXOIU1j6FACLcBGAsYHQ/s966/passing%2Byards%2Bbar.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="599" data-original-width="966" src="https://1.bp.blogspot.com/-RQVgzFIad6M/YA5n0ix8dUI/AAAAAAAAHm8/YMQhfYd97Aon_HTnd6_9nOEbXOIU1j6FACLcBGAsYHQ/s1600/passing%2Byards%2Bbar.png" width="75%" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-TOa0h2kC5ew/YA5n0KTPbHI/AAAAAAAAHmw/cpSWeswqN4Evk_9iZ9nVyTvSdhONjSLKgCLcBGAsYHQ/s1298/interceptions%2Bscatter.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="632" data-original-width="1298" src="https://1.bp.blogspot.com/-TOa0h2kC5ew/YA5n0KTPbHI/AAAAAAAAHmw/cpSWeswqN4Evk_9iZ9nVyTvSdhONjSLKgCLcBGAsYHQ/s1600/interceptions%2Bscatter.png" width="100%" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-HzEaYLn2Vrc/YA5n0HAYRkI/AAAAAAAAHms/5Dezm_RoAAMBMeG8ze0s0iQoxcPVK8Z3wCLcBGAsYHQ/s964/interceptions%2Bbar.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="599" data-original-width="964" src="https://1.bp.blogspot.com/-HzEaYLn2Vrc/YA5n0HAYRkI/AAAAAAAAHms/5Dezm_RoAAMBMeG8ze0s0iQoxcPVK8Z3wCLcBGAsYHQ/s1600/interceptions%2Bbar.png" width="75%" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-x3lwJWJJEi4/YA5n0pigllI/AAAAAAAAHm0/KqmfMB9PRBwVvykBO0Ol-Ub6buqv0a4UgCLcBGAsYHQ/s1289/passing%2Btds%2Bscatter.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="632" data-original-width="1289" src="https://1.bp.blogspot.com/-x3lwJWJJEi4/YA5n0pigllI/AAAAAAAAHm0/KqmfMB9PRBwVvykBO0Ol-Ub6buqv0a4UgCLcBGAsYHQ/s1600/passing%2Btds%2Bscatter.png" width="100%" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-kWXIDfrzYFE/YA5n0OHxEiI/AAAAAAAAHm4/6LfT8cBXAWUZyDFgEyTLGKZOVDByGrS4wCLcBGAsYHQ/s957/passing%2Btds%2Bbar.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="599" data-original-width="957" src="https://1.bp.blogspot.com/-kWXIDfrzYFE/YA5n0OHxEiI/AAAAAAAAHm4/6LfT8cBXAWUZyDFgEyTLGKZOVDByGrS4wCLcBGAsYHQ/s1600/passing%2Btds%2Bbar.png" width="75%" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-8UE3rj7dOuw/YA5n04KRWUI/AAAAAAAAHnI/R3aHBXjn_HE_DsuHpn9DBL1nADkUe1fAQCLcBGAsYHQ/s1298/rushing%2Btds%2Bscatter.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="632" data-original-width="1298" src="https://1.bp.blogspot.com/-8UE3rj7dOuw/YA5n04KRWUI/AAAAAAAAHnI/R3aHBXjn_HE_DsuHpn9DBL1nADkUe1fAQCLcBGAsYHQ/s1600/rushing%2Btds%2Bscatter.png" width="100%" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-Y4OCkEsxh8k/YA5n04fwDAI/AAAAAAAAHnE/LnB3xk4WUC04XVslvAuOeXEACPygajVbwCLcBGAsYHQ/s1019/rushing%2Btds%2Bbar.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="599" data-original-width="1019" src="https://1.bp.blogspot.com/-Y4OCkEsxh8k/YA5n04fwDAI/AAAAAAAAHnE/LnB3xk4WUC04XVslvAuOeXEACPygajVbwCLcBGAsYHQ/s1600/rushing%2Btds%2Bbar.png" width="75%" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-X2nbFmPJdQ4/YA5n00yoAPI/AAAAAAAAHnQ/IBW5Ly3Nl-sPbiKOuIKbVgyISLZEyuZAQCLcBGAsYHQ/s1293/rushing%2Byards%2Bscatter.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="632" data-original-width="1293" src="https://1.bp.blogspot.com/-X2nbFmPJdQ4/YA5n00yoAPI/AAAAAAAAHnQ/IBW5Ly3Nl-sPbiKOuIKbVgyISLZEyuZAQCLcBGAsYHQ/s1600/rushing%2Byards%2Bscatter.png" width="100%" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-BIInLBSZy2o/YA5n09Nkj8I/AAAAAAAAHnM/3mCuF1M-mXALmit_AlgiaLJ9xLvq3rviQCLcBGAsYHQ/s965/rushing%2Byards%2Bbar.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="599" data-original-width="965" src="https://1.bp.blogspot.com/-BIInLBSZy2o/YA5n09Nkj8I/AAAAAAAAHnM/3mCuF1M-mXALmit_AlgiaLJ9xLvq3rviQCLcBGAsYHQ/s1600/rushing%2Byards%2Bbar.png" width="75%" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><br /></div><div>and the data show regression toward the mean. Every stat I've tried (with a luck component obviously) followed the pattern above.</div><div><br /></div><div><br /></div></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com2tag:blogger.com,1999:blog-1532419805701836386.post-35851241756957448582021-01-15T22:00:00.003-08:002021-01-15T22:00:17.004-08:00Fourier Series AnimationsIt always seemed magical to me that you can get a square wave from adding together sine waves, so I threw together some animations of Fourier series.<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-vg3ZxH0-gw8/YAKAT-apG3I/AAAAAAAAHg0/1wofnIL257Y7HXxihcvjpCsUfIK_U-_MgCLcBGAsYHQ/s1329/fourier_square.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="542" data-original-width="1329" src="https://1.bp.blogspot.com/-vg3ZxH0-gw8/YAKAT-apG3I/AAAAAAAAHg0/1wofnIL257Y7HXxihcvjpCsUfIK_U-_MgCLcBGAsYHQ/s1600/fourier_square.gif" width="0%" /><a name='more'></a><h4 style="text-align: left;">Square Wave</h4><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-vg3ZxH0-gw8/YAKAT-apG3I/AAAAAAAAHg0/1wofnIL257Y7HXxihcvjpCsUfIK_U-_MgCLcBGAsYHQ/s1329/fourier_square.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="542" data-original-width="1329" src="https://1.bp.blogspot.com/-vg3ZxH0-gw8/YAKAT-apG3I/AAAAAAAAHg0/1wofnIL257Y7HXxihcvjpCsUfIK_U-_MgCLcBGAsYHQ/s1600/fourier_square.gif" width="100%" /></a></div><br /><div><br /></div><h4 style="text-align: left;">Pulse</h4><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-vpJy7aKFa54/YAKAXowtT7I/AAAAAAAAHg4/FR6azL15p50R9V0q7YvUj_PL04oBZsVNgCLcBGAsYHQ/s1329/fourier_pulse.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="542" data-original-width="1329" src="https://1.bp.blogspot.com/-vpJy7aKFa54/YAKAXowtT7I/AAAAAAAAHg4/FR6azL15p50R9V0q7YvUj_PL04oBZsVNgCLcBGAsYHQ/s1600/fourier_pulse.gif" width="100%" /></a></div><br /><div><br /></div><h4 style="text-align: left;">Parabola</h4><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-RRBsnZi7UAE/YAKAankvQkI/AAAAAAAAHg8/zI6IjeXc_owngmcFi6MuzY9tTNSCbqnCQCLcBGAsYHQ/s1329/fourier_parabolas.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="542" data-original-width="1329" src="https://1.bp.blogspot.com/-RRBsnZi7UAE/YAKAankvQkI/AAAAAAAAHg8/zI6IjeXc_owngmcFi6MuzY9tTNSCbqnCQCLcBGAsYHQ/s1600/fourier_parabolas.gif" width="100%" /></a></div><br /><div><br /></div><h4 style="text-align: left;">Pulse Variation</h4><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-TCn8OV2DgHE/YAKAfHfSf-I/AAAAAAAAHhA/9OO3-10UFUcK4rhwlk0NvMu031vKFaxRgCLcBGAsYHQ/s1329/fourier_sinpulse.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="542" data-original-width="1329" src="https://1.bp.blogspot.com/-TCn8OV2DgHE/YAKAfHfSf-I/AAAAAAAAHhA/9OO3-10UFUcK4rhwlk0NvMu031vKFaxRgCLcBGAsYHQ/s1600/fourier_sinpulse.gif" width="100%" /></a></div><br /><div><br /></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com1tag:blogger.com,1999:blog-1532419805701836386.post-27595343247794166902021-01-02T22:12:00.003-08:002021-01-02T22:13:31.752-08:00How Should You Bet on a Biased Coin Toss?If you know a coin is biased to come up heads 75% of the time, what betting strategy should you use to bet on the outcome of a flip?<img src="https://1.bp.blogspot.com/-QXUfd2an15E/XOddy0mjsQI/AAAAAAAAFAk/GYcuc-m0xDUVMsaBpf4rmi3BufTWPZkpwCLcBGAs/s1600/United_States_Quarter.jpg" width=0%"><a name='more'></a></img><p>What might seem intuitive is to have a mixed strategy of 75% heads and 25% tails. Maybe something like 'flip an unbiased coin twice and bet tails if you get tails twice and heads for any other outcome'. What result will that give you?</p><p>There are four possibilities here:</p><p></p><ol style="text-align: left;"><li>Coin lands on heads and you bet heads (75%*75% = 56.25% of the time)</li><li>Coin lands on heads and you bet tails (75%*25% = 18.75% of the time)</li><li>Coin lands on tails and you bet heads (25%*75% = 18.75% of the time)</li><li>Coin lands on tails and you bet tails (25%*25% = 6.25% of the time)</li></ol><div>1 and 4 are winning situations, so you'll win 62.5% of the time this way (just sum the 1 and 4 win rates).</div><div><br /></div><div>You might immediately notice that 62.5% is less than 75%. What if you just always bet heads? Filling out the same list as above:</div><div></div><div><ol><li>Coin lands on heads and you bet heads (75%*100% = 75% of the time)</li><li>Coin lands on heads and you bet tails (75%*0% = 0% of the time)</li><li>Coin lands on tails and you bet heads (25%*100% = 25% of the time)</li><li>Coin lands on tails and you bet tails (25%*0% = 0% of the time)</li></ol><div>1 and 4 are winning situations, so you'll win 75% of the time this way. In this situation, the general win rate is:<br /><br />(coin bias*head bet percentage) + [(1 - coin bias)*(1 - head bet percentage)] = win rate</div></div><div><br /></div><div>We want to maximize this. Using b for 'coin bias' and h for 'heat bet percentage':</div><div><br /></div><div>b*h + (1 - b)*(1 - h) = win rate</div><div><br /></div><div>b*h + 1 - h - b + b*h = win rate</div><div><br /></div><div>2*b*h + 1 - h - b = win rate</div><div><br /></div><div>h*(2*b - 1) + 1 - b = win rate</div><div><br /></div><div>At this point, we have an equation for a line. Win rate vs h is a line with a slope of 2*b - 1 and an x-intercept of 1 - b. Anytime 2*b - 1 is positive, this line will go up and to the right so h = 1 is the best bet (heads 100% of the time). Anytime 2*b - 1 is negative, h = 0 is the best bet (tails 100% of the time). 2*b - 1 is positive whenever b is greater than 0.5. Thus, the optimal strategy here is bet in the direction of the bias 100% of the time when you have a known, biased coin.</div><div><br /></div><div><br /></div><p></p>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com5tag:blogger.com,1999:blog-1532419805701836386.post-55627131940206901142020-12-27T22:02:00.004-08:002020-12-27T22:04:38.093-08:00Making a CSS Flashlight Effect Using Conic-gradientsThis is just a quick tutorial of conic-gradients showing a flashlight effect with very little code.<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-S8K75bL0ewc/X-l0zcdwY-I/AAAAAAAAHac/VHQ1ugh0T-IRHSjTrDNPLRThoh0MC-lpACLcBGAsYHQ/s1867/Capture.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="654" data-original-width="1867" src="https://1.bp.blogspot.com/-S8K75bL0ewc/X-l0zcdwY-I/AAAAAAAAHac/VHQ1ugh0T-IRHSjTrDNPLRThoh0MC-lpACLcBGAsYHQ/s1600/Capture.PNG" width="0%" /></a></div><a name='more'></a><p>The basic idea here is to use a <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/conic-gradient()" target="_blank">conic-gradient</a> and do the following:<br /></p><ul style="text-align: left;"><li>set it to be the flashlight color and fairly transparent for the bright area (yellow from -25 to 25 degrees in the example here)</li><li>set it to be dark and fairly opaque for the dark area (black with 95% opacity from 25 to 335 degrees in the example here)</li><li>make the flashlight layer(s) fixed position and sit on top of the page</li><li>to keep it from starting as a point, offset it (vertical location of 110% in the example here puts it 10% below the bottom of the page)</li></ul><div>And that's it...it's actually really simple. Here is a working example on top of a dummy html page:</div><div><br /></div><iframe allowfullscreen="true" allowtransparency="true" frameborder="no" height="600" loading="lazy" scrolling="no" src="https://codepen.io/rhamner/embed/VwKzNLd?height=602&theme-id=light&default-tab=css,result" style="width: 100%;" title="Flashlight"> See the Pen <a href='https://codepen.io/rhamner/pen/VwKzNLd'>Flashlight</a> by Robert Hamner (<a href='https://codepen.io/rhamner'>@rhamner</a>) on <a href='https://codepen.io'>CodePen</a>. </iframe><div><br /></div><div><br /></div><div>It's really clean and requires no javascript. It's probably possible to make it cleaner. An obvious question you might have is 'can I make a flashlight that moves with the mouse?' and the answer is sure...simply set the gradient position to the cursor location (this requires javascript but is simple):</div><div><br /></div><iframe allowfullscreen="true" allowtransparency="true" frameborder="no" height="600" loading="lazy" scrolling="no" src="https://codepen.io/rhamner/embed/dypJGZZ?height=476&theme-id=light&default-tab=css,result" style="width: 100%;" title="Flashlight mouse"> See the Pen <a href='https://codepen.io/rhamner/pen/dypJGZZ'>Flashlight mouse</a> by Robert Hamner (<a href='https://codepen.io/rhamner'>@rhamner</a>) on <a href='https://codepen.io'>CodePen</a>. </iframe><div><br /></div><div>All that took was adding a listener to the page for mouse or touch movements, updating --X and --Y variables on those events, and setting the conic-gradient position to be var(--X) var(--Y). Simple and looks pretty cool.</div><div><br /></div><div><br /></div><p></p>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com1tag:blogger.com,1999:blog-1532419805701836386.post-60114563780967415932020-12-11T22:03:00.011-08:002021-10-03T23:24:31.961-07:00What Are the Most Impressive NFL Combine Performances Ever?If you combine the major tests and adjust for weight and height, which NFL player had the most impressive combine performance?<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-qOWLlKhwTP4/X9Rc1t1e7BI/AAAAAAAAHYc/FTVOcMZqvGMiKsAKwxybmCaTs-m27e5NQCLcBGAsYHQ/s472/40%2Btimes.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="472" data-original-width="403" height="auto" src="https://1.bp.blogspot.com/-qOWLlKhwTP4/X9Rc1t1e7BI/AAAAAAAAHYc/FTVOcMZqvGMiKsAKwxybmCaTs-m27e5NQCLcBGAsYHQ/s1600/40%2Btimes.PNG" width="0%" /></a></div><a name='more'></a><h4 style="text-align: left;">Data</h4>Unfortunately, the modern combine hasn't existed for that long, so I only have data back to the year <a href="https://www.pro-football-reference.com/draft/2000-combine.htm" target="_blank">2000</a>. Still, that gives us a good-sized data set (~5000 players with at least some data).<p>To try to find 'best performance ever', I wanted to do two things:<br /></p><ol style="text-align: left;"><li>Adjust for weight and height...a 200 lb guy running a 4.5 forty is way less impressive than a 250 lb guy doing it.</li><li>Try to combine all metrics...a 200 lb guy running a 4.5 forty and getting 4 bench reps is way less impressive than a 200 lb running a 4.5 forty and getting 24 bench reps.</li></ol><div>The metrics that seem to be available for most people are:<br /><ul style="text-align: left;"><li>40 yard dash time</li><li>bench press reps (number of times they bench press 225 lbs)</li><li>broad jump</li><li>vertical jump</li></ul><div>So I used those.</div></div><div><br /></div><h4 style="text-align: left;">Calculation</h4><div>To calculate this, I used a three step process:</div><div><ol style="text-align: left;"><li>Perform linear regression for each metric using weight and height as inputs ('metric = C1*weight + C2*height + C3').</li><li>Divide actual value by value predicted from the regression for each metric to get a score. E.g., if a player ran a 4.5 40 and the model predicted a 4.7 one for his weight and height, he'd get 4.5/4.7, or 0.957 for that metric.</li><li>Calculate an overall score that's a weighted rss of the individual scores. The weights are 1, 1/5, 1/2, 1/2 for the four metrics in that order.</li></ol></div><div>It doesn't affect the calculation much, but throughout, I use weight as an input for everything but bench reps, and weight^2/3 as an input for bench reps.</div><div><br /></div><h4 style="text-align: left;">Results</h4><div>Using the calculation described above, these are the greatest combine performances (actual value to the left and predicted value in parentheses to the right):</div><div><br /></div><div><table style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr style="background-color: #e6a117;"><th>player</th><th>40 time (s)</th><th>bench (reps)</th><th>broad (inches)</th><th>vertical (inches)</th></tr><tr><td>Vernon Davis</td><td>4.38 (4.82)</td><td>33 (21)</td><td>128 (114)</td><td>42 (33)</td></tr><tr><td>Terna Nande</td><td>4.51 (4.70)</td><td>41 (20)</td><td>124 (115)</td><td>39 (34)</td></tr><tr><td>Vic Beasley</td><td>4.53 (4.78)</td><td>35 (20)</td><td>130 (115)</td><td>41 (33)</td></tr><tr><td>Mario Williams</td><td>4.70 (5.06)</td><td>35 (23)</td><td>120 (109)</td><td>40 (30)</td></tr><tr><td>Cornelius Washington</td><td>4.55 (4.89)</td><td>36 (22)</td><td>128 (112)</td><td>39 (32)</td></tr><tr><td>Myles Garrett</td><td>4.64 (4.93)</td><td>33 (23)</td><td>128 (111)</td><td>41 (31)</td></tr><tr><td>Nick Perry</td><td>4.55 (4.92)</td><td>35 (23)</td><td>124 (111)</td><td>38 (31)</td></tr><tr><td>Margus Hunt</td><td>4.62 (4.95)</td><td>38 (21)</td><td>121 (113)</td><td>34 (32)</td></tr><tr><td>D.K. Metcalf</td><td>4.33 (4.67)</td><td>27 (18)</td><td>134 (118)</td><td>40 (34)</td></tr><tr><td>Jerick McKinnon</td><td>4.41 (4.57)</td><td>32 (19)</td><td>132 (117)</td><td>40 (35)</td></tr><tr><td>Davis Tull</td><td>4.57 (4.78)</td><td>26 (21)</td><td>132 (114)</td><td>42 (33)</td></tr><tr><td>Jon Alston</td><td>4.50 (4.65)</td><td>30 (19)</td><td>132 (118)</td><td>40 (34)</td></tr><tr><td>Vernon Gholston</td><td>4.65 (4.89)</td><td>37 (23)</td><td>125 (112)</td><td>36 (32)</td></tr><tr><td>Sean Weatherspoon</td><td>4.62 (4.74)</td><td>34 (21)</td><td>123 (115)</td><td>40 (33)</td></tr><tr><td>Demario Davis</td><td>4.49 (4.71)</td><td>32 (19)</td><td>124 (116)</td><td>38 (34)</td></tr><tr><td>Scott Young</td><td>5.08 (5.15)</td><td>43 (27)</td><td>115 (104)</td><td>35 (29)</td></tr><tr><td>Michael Johnson</td><td>4.61 (4.89)</td><td>28 (20)</td><td>128 (114)</td><td>38 (32)</td></tr><tr><td>Alex Barnes</td><td>4.59 (4.66)</td><td>34 (20)</td><td>126 (117)</td><td>38 (34)</td></tr><tr><td>Benjamin Watson</td><td>4.50 (4.85)</td><td>34 (22)</td><td>123 (113)</td><td>36 (32)</td></tr><tr><td>Virgil Green</td><td>4.54 (4.79)</td><td>23 (20)</td><td>130 (115)</td><td>42 (33)</td></tr></tbody></table></div><div></div><div><br /></div><div>#1 there did not surprise me. Vernon Davis's 40 time is pretty well known as an insane combine performance.</div><div><br /></div><div>The first really odd one in that list is actually #2, Terna Nande. He had an extremely short NFL career with a single tackle in his entire career. However, at just 230 pounds he pulled off 41 reps on the bench, and all of his other performances were above average. No other non-lineman in history has gotten more than 40 reps. The rest of the top few had or are currently having pretty good NFL careers.</div><div><br /></div><div>Since the 40 time is the one that seems most discussed, here is the same analysis if you use only the 40 time to rank:<br /><br /></div><table style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr style="background-color: #e6a117;"><th>player</th><th>weight (lbs)</th><th>40 time (s)</th></tr><tr><td>Montez Sweat</td><td>260</td><td>4.41 (4.86)</td></tr><tr><td>Vernon Davis</td><td>254</td><td>4.38 (4.82)</td></tr><tr><td>Bryan Thomas</td><td>266</td><td>4.47 (4.89)</td></tr><tr><td>Dontari Poe</td><td>346</td><td>4.89 (5.35)</td></tr><tr><td>Dwight Freeney</td><td>266</td><td>4.48 (4.89)</td></tr><tr><td>Tank Johnson</td><td>304</td><td>4.69 (5.11)</td></tr><tr><td>Calvin Johnson</td><td>239</td><td>4.35 (4.74)</td></tr><tr><td>Dontay Moch</td><td>248</td><td>4.40 (4.79)</td></tr><tr><td>Matt Jones</td><td>242</td><td>4.37 (4.75)</td></tr><tr><td>Bruce Campbell</td><td>314</td><td>4.75 (5.16)</td></tr><tr><td>Taylor Mays</td><td>230</td><td>4.31 (4.69)</td></tr><tr><td>Terron Armstead</td><td>306</td><td>4.71 (5.12)</td></tr><tr><td>James Hanna</td><td>252</td><td>4.43 (4.81)</td></tr><tr><td>Martez Wilson</td><td>250</td><td>4.42 (4.80)</td></tr><tr><td>T.J. Duckett</td><td>254</td><td>4.45 (4.82)</td></tr><tr><td>Bruce Irvin</td><td>245</td><td>4.41 (4.77)</td></tr><tr><td>Rashan Gary</td><td>277</td><td>4.58 (4.95)</td></tr><tr><td>Connor Barwin</td><td>256</td><td>4.47 (4.83)</td></tr><tr><td>Nick Perry</td><td>271</td><td>4.55 (4.92)</td></tr><tr><td>Lane Johnson</td><td>303</td><td>4.72 (5.10)</td></tr></tbody></table><div><br /></div><div>It's interesting looking through both of these that the really legendary players aren't at the top. Many of them are good players, but Calvin Johnson and J.J Watt are the only ones near the top in either table that will definitely go down as all-time greats. Aaron Donald, Derrick Henry, etc. had above average combine performances but some that did clearly better went on to worse careers.</div><div><br /></div><div>I was curious about that and decided to go in the other direction. What great players had bad combine performances? To do that, I took all all-pro players and matched with names from the combine, and the worst were Max Unger, Tyrann Mathieu, and Tarik Cohen. All under-performed estimates in every metric here. The worst performance from an all-time great here was Adrian Peterson. He was roughly average, but I would have guessed his 40 time was way better (4.68 s).</div><div><br /></div><div><br /></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com1tag:blogger.com,1999:blog-1532419805701836386.post-76553892138031591622020-12-04T22:13:00.004-08:002020-12-04T22:14:41.646-08:00Split Violin Plots in plotly.jsSplit violins are a cool way to compare distributions, and plotly makes them simple.<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/--AM97WeNiLs/X8skxgbD7eI/AAAAAAAAHXQ/Sv06F1LgK-wogyDaUYZ9AmYaqRDvTMJQwCLcBGAsYHQ/s1459/split%2Bviolins.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="373" data-original-width="1459" src="https://1.bp.blogspot.com/--AM97WeNiLs/X8skxgbD7eI/AAAAAAAAHXQ/Sv06F1LgK-wogyDaUYZ9AmYaqRDvTMJQwCLcBGAsYHQ/s1600/split%2Bviolins.PNG" width="0%" /></a></div><a name='more'></a>There isn't much to explain here. I've embedded an example below showing how to use it. You just make a normal violin plot, but specify one trace as the negative side and another as the positive side, and plotly handles the rest.<br /><br /><iframe allowfullscreen="true" allowtransparency="true" frameborder="no" height="600px" loading="lazy" scrolling="no" src="https://codepen.io/rhamner/embed/qBaZdpe?height=265&theme-id=light&default-tab=js,result" style="width: 100%;" title="split violins in plotly js"> See the Pen <a href='https://codepen.io/rhamner/pen/qBaZdpe'>split violins in plotly js</a> by Robert Hamner (<a href='https://codepen.io/rhamner'>@rhamner</a>) on <a href='https://codepen.io'>CodePen</a>. </iframe><br /><br />If you've ever wanted to plot multiple distributions side-by-side, this is an easy option.<br />theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com0tag:blogger.com,1999:blog-1532419805701836386.post-20952557958848481262020-11-17T22:14:00.023-08:002020-11-18T21:13:43.569-08:00How Long Until My Investments Start Making Money?Say you invest some fixed amount of money every year. How long does it take for the investments to grow faster than the amount you're putting into them?<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/--W4XtFaZb2U/X7S_I6ToifI/AAAAAAAAHSY/R6kUx5TLIWg7bfyxd9hy_7tPBI_-7jYsACLcBGAsYHQ/s1356/plot.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="449" data-original-width="1356" height="auto" src="https://1.bp.blogspot.com/--W4XtFaZb2U/X7S_I6ToifI/AAAAAAAAHSY/R6kUx5TLIWg7bfyxd9hy_7tPBI_-7jYsACLcBGAsYHQ/s16000/plot.png" width="0%" /></a></div><br /><a name='more'></a><h4 style="text-align: left;">Basic math problem</h4> <script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script> <script async="" id="MathJax-script" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"><style>.MathJax { font-size: 1.3em !important; }</style> </script><div>You invest $X per year in an account that yields R in gains. How long does it take for the gain in a year to be greater than $X?</div><div><br /></div><div>Another way of asking this is 'when does R times the future value of investing X each year exceed X?'</div><div><br /></div><div>The future value of a regular yearly investment where N = number of years, X = yearly investment, and R = growth rate of the investment is:</div><div><br /></div><div><br /></div><div style="font-size: 40px;">\[FV = \frac{X*((1+R)^N - 1)}{R}\]</div><div style="font-size: 28px;"><br /></div><div>What we're looking for is the number of years it takes for R times that to exceed X. That is, we want to solve:</div><div><br /></div><div style="font-size: 40px;">\[\frac{R*X*((1+R)^N - 1)}{R} > X\]</div><div><br /></div><div>Noticing that the R's cancel in numerator and denominator and dividing both sides by X you get:</div><div style="font-size: 28px;"><br /></div><div style="font-size: 40px;">\[((1+R)^N - 1) > 1\]</div><div style="font-size: 28px;"><br /></div><div>Adding 1 to both sides:</div><div style="font-size: 28px;"><br /></div><div style="font-size: 40px;">\[((1+R)^N) > 2\]</div><div style="font-size: 28px;"><br /></div><div>Simplifying:</div><div style="font-size: 28px;"><br /></div><div style="font-size: 40px;">\[log((1+R)^N) > log(2)\]</div><div style="font-size: 28px;"><br /></div><div style="font-size: 40px;">\[N*log((1+R)) > log(2)\]</div><div style="font-size: 28px;"><br /></div><div style="font-size: 40px;">\[N > \frac{log(2)}{log(1+R)}\]</div><div style="font-size: 28px;"><br /></div><div>This is kind of cool. You might not recognize it right away, but that says '<a href="https://www.somesolvedproblems.com/2016/04/why-does-dividing-72-by-your-interest.html" target="_blank">N is greater than the doubling time of the investment</a>'. It's really cool that it works out that way. Because of some pretty good approximations that work out, that means that the investment growth takes over the new money moved in after roughly '72 divided by annual interest rate' years.</div><div><br /></div><div>For a quick concrete example of what this means...say you invest $10,000 per year into an account yielding 6%. The time it takes for the 6% yield each year to exceed $10,000 is log(2)/log(1.06) which is ~12 years.</div><div style="font-size: 28px;"><br /></div><h4 style="text-align: left;">Simple plot</h4><div>Here's a simple interactive plot showing the breakdown between money invested and money from gains for an annual $10,000 investment using the interest rate that you enter below:</div><div><br /></div><div><br /></div>Interest rate (%) <input id="rate" onchange="updatePlot(this.value)" value="6" /><div id="plot" style="height: 400px; width: 100%;"></div> <script src="https://cdn.plot.ly/plotly-latest.min.js"></script><script>function updatePlot(rate) { rate = parseFloat(rate); let time = []; let input = []; let yield = []; let value = 0; for (let i = 0; i < 25; i++) { time.push(i); value = (value*(1 + (rate/100))) + 10000; input.push(i*10000); yield.push(value - i*10000); } Plotly.react('plot', [ { x: time, y: input, stackgroup: 'one', name: 'total invested' }, { x: time, y: yield, stackgroup: 'one', name: 'total gain' } ], { font: { size: 16 }, yaxis: { title: 'balance ($)' }, xaxis: { title: 'years' }, title: 'Comparing contributions from amount invested and investment yields' }, { responsive: true }); } updatePlot(6);</script>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com1tag:blogger.com,1999:blog-1532419805701836386.post-53313873600885337662020-11-01T22:30:00.012-08:002020-11-03T14:55:37.762-08:00How Do American Betting Odds Convert to Percent Chance?If you've looked at betting odds, you've probably seen something like +140 and -175. What % chance does that imply for each participant?<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-sBxt3nM6of8/X5-nSHueBKI/AAAAAAAAHP0/Eqzun0EJKYIzEbZn-0Y4JaG0zuDU2Ee8QCLcBGAsYHQ/s800/800px-Las_Vegas_sportsbook.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="533" data-original-width="800" src="https://1.bp.blogspot.com/-sBxt3nM6of8/X5-nSHueBKI/AAAAAAAAHP0/Eqzun0EJKYIzEbZn-0Y4JaG0zuDU2Ee8QCLcBGAsYHQ/s1600/800px-Las_Vegas_sportsbook.jpg" width="0" /></a></div><a name='more'></a><h4 style="text-align: left;">Definition</h4>First, what do those numbers mean? A -175 means 'you win $100 for each $175 that you bet' and a +140 means 'you win $140 for each $100 that you bet'.<div><br /><h4 style="text-align: left;">Example</h4>Now, consider a matchup that's 60% chance for A and 40% chance for B. What does that convert to?<p>Assuming no cost to bet, if there were 10 matches and you bet $100 on A each time, you'd put in $1000 and expect to get out $1000. Since A wins 60% of the time, you'd get 6 payouts and they would sum to $1000 (since you lose the bet on the 40% where A loses). Each bet would pay out $167, and subtracting off the initial $100 means a profit of $67. Thus, a $100 bet on A yields a profit of $67 when A wins which means that to get a profit of $100 you'd bet 100/.67, or $150. From the definition above, that means that a 60% chance of winning is a line of -150.<br /></p><p>Doing the same with the 40% one, you'd get 4 payouts that sum to $1000, so $250 per payout and a profit of $150. You bet $100, and profit $150 on a win, so the line is +150.</p><p>Thus, a 60/40 matchup corresponds to a line of -150/+150. The general equation for the logic above is:<br /></p><ul style="text-align: left;"><li><b>favorite: </b>American line = - 100/[( 1/percent - 1)]</li><li><b>underdog: </b>American line = 100*[( 1/percent - 1)]</li></ul><div>Going the other direction:</div><p></p><ul><li><b>favorite: </b>percent = 1/[(100/-American) + 1]</li><li><b>underdog: </b>percent = 1/[(American/100) + 1]</li></ul><div></div><p></p><h4 style="text-align: left;">Real Life</h4><div>It's not quite this easy. The person offering the bets (bookie) needs to make money. Imagine in the above that the person offering the bet wants to make $10 for every $100 bet. How does that change things?</div><div><br /></div><div>Consider bets on A. You make 10 bets on A. A wins 60% of the time, so you should get $1000 back like before except that you pay $10 per bet so you get $900 back. 6 payout that give $900 means $150 per payout and subtracting initial investment means $50 profit. That means you'd bet 100/0.5, or $200 for each $100 profit which means the line is -200.</div><div><br /></div><div>For B...4 payouts gets $900, so $225 payout per win which is $125 profit per win after subtracting initial investment. $125 profit on a $100 bet means +125 is the line.</div><div><br /></div><div>How can you factor out this $10 cost (margin)?</div><div><br /></div><div>You get a line of -200/+125 to start and want to see what the margin is on this. It's actually easy from what we did earlier. Simply convert these lines to the percentage versions, and add them together. Taking these specific numbers:</div><div><ul style="text-align: left;"><li>-200 => 66.67%</li><li>+125 => 44.44%</li><li>sum = 111.11%</li></ul><div>That is, for every $100 that is bet, the bookie gets $11.11 (or in the earlier terms, for every $90 that is bet you pay an additional $10).</div></div><div><br /></div><div>Finally...how do you get the implied chance of each option winning from odds that have the margin factored in like these? Simply divide each percentage by the sum.</div><div><ul style="text-align: left;"><li>favorite: 66.67% / 1.1111 = 60%</li><li>underdog: 44.44% / 1.1111 = 40%</li></ul><div>And we recovered the original odds.</div></div><div><br /></div><div><a href="https://docs.google.com/spreadsheets/d/1heJ4vmXLJVEiGkFBJskJk6NlgSITJOwtzfb-G3PkVMo/edit?usp=sharing">You can play with this in a spreadsheet here if you want.</a></div><div><br /></div><div><br /></div><p></p></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com10tag:blogger.com,1999:blog-1532419805701836386.post-81017374302570149512020-10-17T22:58:00.011-07:002020-10-18T21:24:52.597-07:00How Does Deal or No Deal Determine the Offers?If you've ever watched 'Deal or No Deal', you've likely wondered how the 'banker' determines his offers.<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-ed-FUIx0__Q/X4vYb_FwxlI/AAAAAAAAHMU/t2QPK9lA0V002UhASIujXMLlra59SXlEgCLcBGAsYHQ/s1260/offer%2Bfrom%2Brandom%2Bforest.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="635" data-original-width="1260" src="https://1.bp.blogspot.com/-ed-FUIx0__Q/X4vYb_FwxlI/AAAAAAAAHMU/t2QPK9lA0V002UhASIujXMLlra59SXlEgCLcBGAsYHQ/s1600/offer%2Bfrom%2Brandom%2Bforest.png" width="0%" /></a></div><a name='more'></a><div>To play with this, I got data for 1 season (2006) of the US show from a <a href="https://www.aeaweb.org/articles?id=10.1257/aer.98.1.38">paper</a> on player behavior. For some obvious questions you might have...<br /><br /></div><h4 style="text-align: left;">Are the offers consistent?</h4><div>An immediate question I wanted to answer is 'does a player with a better board get a better offer?' The answer is 'usually', but there were many instances where this was not the case. Some specific examples...</div><div><ul style="text-align: left;"><li>round 8, player 1 had $50, $200, and $1,000,000 cases remaining and was offered $267,000</li><li>round 8, player 2 had $400, $1,000, and $1,000,000 cases remaining and was offered $215,000</li></ul><div>Player 2 clearly had the better board but received a much lower offer.</div></div><div><ul style="text-align: left;"><li>round 8, player 1 had $0.01, $400,000, and $750,000 cases remaining and was offered $375,000</li><li>round 8, player 2 had $25, $500,000, and $750,000 cases remaining and was offered $359,000</li></ul><div>Again, player 2 clearly had the better board but received a lower offer.</div></div><div><br /></div><div>Interestingly, if I restrict it to offers made in rounds 7 or 8 that were worth more than $100,000, 9 of 26 fit this pattern (player received an offer better than someone who had a better board). I'll cover another topic briefly then come back to this for speculation.</div><div><br /></div><h4 style="text-align: left;">Does the bank ever offer more than the board is worth?</h4><div>What is the board 'worth'? The most obvious answer is just the expectation value of the cases on the board. If you have 3 cases with $100, $500, and $1000 in them, the board's expectation value is ($100 + $500 + $1000)/3, or ~$533. You'd think it never makes sense for the banker to offer more than $533 for that board right?</div><div><br /></div><div>Turns out, for this season, 16 of the 62 round 7 and round 8 offers were for more than the board was worth using the definition above. Why would this ever make sense?</div><div><br /></div><h4 style="text-align: left;">Speculating so far</h4><div>There could just be a random noise generator weighting each offer to keep things interesting. There are some legit things that might explain the two questions above though that I can't answer confidently without more data. An idea that came to mind is basically...people get sad watching someone lose horribly, and the show might lose interest if that happens too often. </div><div><br /></div><div>In one instance of this pattern, a player had the following cases: $5, $75, and $400,000. That board is 'worth' ~$133,000, but the banker offered $137,000. It's sad if people in that situation commonly end up with $75, so the banker might offer more than it's worth just to keep that from happening.</div><div><br /></div><div>A sample safer board that got an offer below the board's worth had cases $200, $50,000, and $75,000 remaining. The offer was $35,000 for a board that's 'worth' ~$42,000. If the next case opened was $50,000, the player would receive an offer of $37,500 so there's no catastrophe. Thus, I think there's like a random component added to each offer that can be tuned up when they want to encourage the player to accept the offer.</div><div><br /></div><div>This might not be what's happening, but I can't think of a better reason for why the banker sometimes offers more than the expectation value of the board.</div><div><br /></div><div><h4 style="text-align: left;">Modeling the offers</h4><div>Now to answer the title question...can we reverse-engineer the model? Because of the above, I think it's impossible to get exactly. Further, I have no idea what type of model they're using. It could be a giant decision tree. It could be some random multiplier on the expectation value. It could be regression on, say, 5 parameters. I can get pretty close to the results though using the available information.</div><div><br /></div><div>To get it out of the way, here's what you get if you simply use the board's 'worth' from above:<br /><br /><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-oh2HbOuLfRY/X4vYJot0O3I/AAAAAAAAHMI/AdDUu8Obs-Y1pfDwl78M5IIpAYR2LJ19QCLcBGAsYHQ/s1260/offer%2Bis%2Bexpectation%2Bvalue.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="635" data-original-width="1260" src="https://1.bp.blogspot.com/-oh2HbOuLfRY/X4vYJot0O3I/AAAAAAAAHMI/AdDUu8Obs-Y1pfDwl78M5IIpAYR2LJ19QCLcBGAsYHQ/s1600/offer%2Bis%2Bexpectation%2Bvalue.PNG" width="90%" /></a></div><br /><div><br /></div><div>That's too simple to be exciting, so it's wrong.</div><div><br /></div><div>The most intuitive simple model to me thinking about this problem is:<br /><br />"Offer = constant * expectation value of board" where 'constant' is fixed based on the round</div><div><br /></div><div>Fitting the data to that, I get the following constants for each round (round # is list #):<br /><ol style="text-align: left;"><li>0.11</li><li>0.24</li><li>0.38</li><li>0.49</li><li>0.6</li><li>0.72</li><li>0.85</li><li>0.85</li><li>0.99</li></ol><div>This makes sense. In round 1, they don't want you to stop and there is a huge spread of outcomes left so the offer is so low that no one would accept it (11% of the board's value). By round 9, you have 2 cases left, so they just offer the average value of those two cases. In-between they build drama and steadily make the offers more attractive.</div></div><div><br /></div><div>How well does that predict the actual offers?</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-KhPCh7hLI1s/X4vYWHmv_GI/AAAAAAAAHMM/PIevBLVJscMJinCOxSV46PEUcsL_tQKsACLcBGAsYHQ/s1260/offer%2Bis%2Bweighted%2Bby%2Bround.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="635" data-original-width="1260" src="https://1.bp.blogspot.com/-KhPCh7hLI1s/X4vYWHmv_GI/AAAAAAAAHMM/PIevBLVJscMJinCOxSV46PEUcsL_tQKsACLcBGAsYHQ/s1600/offer%2Bis%2Bweighted%2Bby%2Bround.PNG" width="90%" /></a></div><br /><div><br /></div><div>That actually has an r^2 of ~95%, so we likely won't do much better with any sort of linear regression.<br /><br />Another approach is to try something like a random forest regressor. Using that with that weighted average, standard deviation of remaining case values, largest remaining case, smallest remaining case, and round number as features, I get:</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-ed-FUIx0__Q/X4vYb_FwxlI/AAAAAAAAHMU/t2QPK9lA0V002UhASIujXMLlra59SXlEgCLcBGAsYHQ/s1260/offer%2Bfrom%2Brandom%2Bforest.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="635" data-original-width="1260" src="https://1.bp.blogspot.com/-ed-FUIx0__Q/X4vYb_FwxlI/AAAAAAAAHMU/t2QPK9lA0V002UhASIujXMLlra59SXlEgCLcBGAsYHQ/s1600/offer%2Bfrom%2Brandom%2Bforest.png" width="90%" /></a></div><br /><div><br />That's a bit better, but it's likely overfit and I don't have enough data to split into large training and test sets. The simple weighted expectation value above works much better than I'd expected, so that's a decent model I think for this.</div><div><br /></div><div>One cool thing about the random forest regression is that you can get the importance of each feature. Those importances are:<br /><ul style="text-align: left;"><li>round-weighted expectation value: 0.96</li><li>standard deviation of remaining cases: 0.02</li></ul><div>and all the rest are less than 0.01. Running linear regression with spread included (so model is 'offer = C1*round-weighted average + C2*standard deviation of remaining cases) yields basically the same as just the round-weighted average:</div></div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-dAtd6EOrhs0/X4vYkxV4dvI/AAAAAAAAHMY/5C1r3ibs85EYy5TONjFSj6N4JT3oKAyagCLcBGAsYHQ/s1260/offer%2Bis%2Bregression%2Busing%2Bweighted%2Bby%2Bround%2Band%2Bspread.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="635" data-original-width="1260" src="https://1.bp.blogspot.com/-dAtd6EOrhs0/X4vYkxV4dvI/AAAAAAAAHMY/5C1r3ibs85EYy5TONjFSj6N4JT3oKAyagCLcBGAsYHQ/s1600/offer%2Bis%2Bregression%2Busing%2Bweighted%2Bby%2Bround%2Band%2Bspread.png" width="90%" /></a></div><br /><div><br /></div><h4 style="text-align: left;">Conclusion</h4><div>It looks like a simple model of 'offer some % of the average of the remaining cases where that % depends on current round' works well enough, and in reality they likely add some noise and probably alter it a bit as needed to keep interest/ratings up.</div><div><br /></div><div><br /></div><br /></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com1tag:blogger.com,1999:blog-1532419805701836386.post-17507847238905734292020-10-09T23:09:00.004-07:002020-10-10T22:07:36.753-07:00What Are the Most Common Scores in the NFL?My guess going into this is that both teams score around 30, a field goal is the most common separation, and it's some combination of 7 and 3. Taking a guess then, I think 27 - 24 is going to be most common.<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-CKP9TsbOYgA/X4FMlR99e8I/AAAAAAAAHJQ/6gHuPOf4ce47HT_xRBTYmGc-A63zp7raQCLcBGAsYHQ/s1206/most%2Bcommon%2Bgame%2Bscores.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="672" data-original-width="1206" src="https://1.bp.blogspot.com/-CKP9TsbOYgA/X4FMlR99e8I/AAAAAAAAHJQ/6gHuPOf4ce47HT_xRBTYmGc-A63zp7raQCLcBGAsYHQ/s1600/most%2Bcommon%2Bgame%2Bscores.PNG" width="0%" /></a></div><a name='more'></a>For the initial analysis, I'll take what I think is the most literal definition of 'most common score' which is what score is most common for a single team across all games. For data, I'm using the 2000-2019 NFL seasons. Doing that, I get the following plot:<div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-DR_S63GZcIQ/X4FKe6nnRiI/AAAAAAAAHJA/rJURcmzN_iY3e5I7NpyzpVuDqBIHtgUNwCLcBGAsYHQ/s1209/most%2Bcommon%2Bteam%2Bscores.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="632" data-original-width="1209" src="https://1.bp.blogspot.com/-DR_S63GZcIQ/X4FKe6nnRiI/AAAAAAAAHJA/rJURcmzN_iY3e5I7NpyzpVuDqBIHtgUNwCLcBGAsYHQ/s1600/most%2Bcommon%2Bteam%2Bscores.PNG" width="100%" /></a></div><br /><div>So there's some obvious results here. No one ever scores only 1 point in the NFL because you can't. Scoring only 2 is super-rare since that means just getting a safety. Scoring 4 is even less common since that's only possible with 2 safeties. Getting into what's common...you'd expect x*7 + y*3 to be most common since touchdown (td) + extra point is 7 and field goal (fg) is 3. Looking at the most common ones, the top 5 are:<br /><br /><ul style="text-align: left;"><li>20 - can be done 2 common ways: 2 tds with extra points + 2 fgs, 3 tds with 1 missed extra point</li><li>17 - 2 tds with extra points + 1 fg</li><li>24 - 3 tds with extra points + 1 fg</li><li>27 - can be done 2 common ways: 3 tds with extra points + 2 fgs, 4 tds with 1 missed extra point</li><li>10 - 1 td with extra point + 1 fg</li></ul><div>Just scanning through, the oddest one to me is that 16 and 21 occur roughly as regularly. 16 is likely mostly 3 fgs + 1 td with extra point, but I would have guessed 21 (3 tds with extra points) was much more common.</div></div><div><br /></div><div>For another definition of 'most common score', I'll do the most common scores factoring in both teams in a game. That is, the most common final scores for NFL games. Doing that:<br /><br /><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-CKP9TsbOYgA/X4FMlR99e8I/AAAAAAAAHJQ/6gHuPOf4ce47HT_xRBTYmGc-A63zp7raQCLcBGAsYHQ/s1206/most%2Bcommon%2Bgame%2Bscores.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="672" data-original-width="1206" src="https://1.bp.blogspot.com/-CKP9TsbOYgA/X4FMlR99e8I/AAAAAAAAHJQ/6gHuPOf4ce47HT_xRBTYmGc-A63zp7raQCLcBGAsYHQ/s1600/most%2Bcommon%2Bgame%2Bscores.PNG" width="100%" /></a></div><br /><div><br /></div><div>Given the first plot, this shouldn't be surprising. The top 7 scores here include one of the top 5 scores from the previous plot. What is a bit surprising to me here is that 1-score games (difference less than 9) seem most common. Looking at that in more detail, here is the distribution of margin of victory for all games:</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-Ms7X4F263Ng/X4FNJPS5zZI/AAAAAAAAHJg/9Tc1y1DVadUTzMWEtYBmy8U1myCfLPVGwCLcBGAsYHQ/s1238/most%2Bcommon%2Bmargin%2Bof%2Bvictory.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="606" data-original-width="1238" src="https://1.bp.blogspot.com/-Ms7X4F263Ng/X4FNJPS5zZI/AAAAAAAAHJg/9Tc1y1DVadUTzMWEtYBmy8U1myCfLPVGwCLcBGAsYHQ/s1600/most%2Bcommon%2Bmargin%2Bof%2Bvictory.PNG" width="100%" /></a></div><br /><div>No surprises from what we know from above...3 point difference is most common, and 7 point is next most common. On the '1-score game' note from above, doing some quick math, I get that ~50% of all NFL games in this period were decided by 1 score only (8 points or less).<br /><br />Also...in that last plot, you can see that the values on the right are larger than on the left. Since the right here means 'home team wins', this means that the home team wins more often. Adding it up, it turns out that over this 20-year period, the home team won ~58% of the time. Further, simply averaging the scores over this period, the home team averages ~2.5 points more points per game than the away team. Summarizing that, you could say that <b>the home field advantage in the NFL is roughly 2.5 points and leads to the home team winning ~58% of the time.</b></div><div><b><br /></b></div><div>That's it. Let me know if you want to see anything else with this data set.</div><div><br /></div><div><br /></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com0tag:blogger.com,1999:blog-1532419805701836386.post-60252965114003652042020-09-27T21:55:00.005-07:002020-09-27T21:56:02.655-07:00Austin, TX Growth Measured by Tall Building ConstructionAustin has grown a lot lately...<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-r1H1LKTRrBM/X3FogzygkfI/AAAAAAAAHHM/SMSTlRRVdsESH1SGpV0cS93huttEU9JWgCLcBGAsYHQ/s1471/austin%2Bvs%2Bkc%2Bvs%2Bbham.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="635" data-original-width="1471" src="https://1.bp.blogspot.com/-r1H1LKTRrBM/X3FogzygkfI/AAAAAAAAHHM/SMSTlRRVdsESH1SGpV0cS93huttEU9JWgCLcBGAsYHQ/s1600/austin%2Bvs%2Bkc%2Bvs%2Bbham.PNG" width="0%" /></a></div><a name='more'></a><p>There are cranes everywhere, and it feels like there are way more than in other mid-size cities. I thought a cool proxy for growth might be construction of those, so I put together a few quick visualizations. 'Tall building' here is >200 feet.</p><p>First, here's the number of 'tall buildings' in Austin vs time:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-z2jDrBfxkao/X3FoYYvJ8YI/AAAAAAAAHHE/rR7YDX0ioocQiLSu6DVh3_7LA6K-6woTwCLcBGAsYHQ/s1471/austin.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="636" data-original-width="1471" src="https://1.bp.blogspot.com/-z2jDrBfxkao/X3FoYYvJ8YI/AAAAAAAAHHE/rR7YDX0ioocQiLSu6DVh3_7LA6K-6woTwCLcBGAsYHQ/s1600/austin.PNG" width="100%" /></a></div><p>That by itself doesn't tell you much. Here's it compared with a sort of comparable city...Kansas City:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-DmlomEpxQK4/X3FodKk6qxI/AAAAAAAAHHI/eComfOgXYo0YrlhQjfF0wiam_fJCCIbGwCLcBGAsYHQ/s1472/austin%2Bvs%2Bkc.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="636" data-original-width="1472" src="https://1.bp.blogspot.com/-DmlomEpxQK4/X3FodKk6qxI/AAAAAAAAHHI/eComfOgXYo0YrlhQjfF0wiam_fJCCIbGwCLcBGAsYHQ/s1600/austin%2Bvs%2Bkc.PNG" width="100%" /></a></div><p>Austin's spike in the 2010's is really noticeable when plotted together. Just adding one more, here's the same with Birmingham, Alabama also included:<br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-r1H1LKTRrBM/X3FogzygkfI/AAAAAAAAHHM/SMSTlRRVdsESH1SGpV0cS93huttEU9JWgCLcBGAsYHQ/s1471/austin%2Bvs%2Bkc%2Bvs%2Bbham.PNG" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="635" data-original-width="1471" src="https://1.bp.blogspot.com/-r1H1LKTRrBM/X3FogzygkfI/AAAAAAAAHHM/SMSTlRRVdsESH1SGpV0cS93huttEU9JWgCLcBGAsYHQ/s1600/austin%2Bvs%2Bkc%2Bvs%2Bbham.PNG" width="100%" /></a></div><br /><div>I wanted to include Shanghai to show a whole different scale of rapid growth, but I can't find any lists that have buildings under 500 feet tall for it.</div><div><br /></div><div><br /></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com0tag:blogger.com,1999:blog-1532419805701836386.post-6279290909607070072020-09-23T20:22:00.009-07:002020-09-23T22:50:14.777-07:00Where Do Football Plays Occur?Simple question...where does each play start in the NFL?<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-hITjSt1YYl0/X2wPpGgJ4XI/AAAAAAAAHGc/Dh0TLM__Wh89lbDJavosd-Ndq4U5o1axQCLcBGAsYHQ/s1666/yards%2Bto%2Bgoal%2Bat%2Bstart%2Bof%2Beach%2Blpay.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1104" data-original-width="1666" src="https://1.bp.blogspot.com/-hITjSt1YYl0/X2wPpGgJ4XI/AAAAAAAAHGc/Dh0TLM__Wh89lbDJavosd-Ndq4U5o1axQCLcBGAsYHQ/s1666/yards%2Bto%2Bgoal%2Bat%2Bstart%2Bof%2Beach%2Blpay.png" width="0%" /></a></div><a name='more'></a>To determine this, I'm using <a href="https://github.com/ryurko/nflscrapR-data/tree/master/play_by_play_data/regular_season">this</a> play-by-play data set for 2018 and 2019. From there, I just counted how often each position on the field was the position of an offensive play (e.g., kickoffs are not included). The height of the bar is just the % of plays that started at that position (read right to left here...starting on your own 25 means starting on the rightmost 25 in the plot here):<br /><br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-hITjSt1YYl0/X2wPpGgJ4XI/AAAAAAAAHGc/Dh0TLM__Wh89lbDJavosd-Ndq4U5o1axQCLcBGAsYHQ/s1666/yards%2Bto%2Bgoal%2Bat%2Bstart%2Bof%2Beach%2Blpay.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1104" data-original-width="1666" src="https://1.bp.blogspot.com/-hITjSt1YYl0/X2wPpGgJ4XI/AAAAAAAAHGc/Dh0TLM__Wh89lbDJavosd-Ndq4U5o1axQCLcBGAsYHQ/s1666/yards%2Bto%2Bgoal%2Bat%2Bstart%2Bof%2Beach%2Blpay.png" width="100%" /></a></div><br /><div>No surprises here...most common is to start after a touchback (own 25), and it's really rare to start inside your own 10 (since those often become touchbacks). There's also a big spike in plays at the goalline because goalline stands happen slightly more often than plays from, say, the 6 yard line.<br /><br />It is a bit interesting that there is a slight bump for most multiples of 5 yards. I'm not really sure why. It could be an impact from many penalties being multiples of 5 yards. It could be something like ref spots just being slightly biased to 5 yard increments. It could be that first downs are 10 yard increments and they start at 25 more often than other locations so it is biased by that. If someone has a better idea please post in the comments because I'm curious now.</div><div><br /></div><div><br /></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com0tag:blogger.com,1999:blog-1532419805701836386.post-57844424074759915812020-09-05T21:11:00.003-07:002020-09-05T21:15:39.882-07:00Interesting Statistical Paradoxes You Might See With COVID-19Some trends might appear baffling but have really simple explanations. Here's my guess on two that we might see. They are both examples of <a href="https://en.wikipedia.org/wiki/Selection_bias" target="_blank">sampling bias</a>.<div class="separator" style="clear: both; text-align: center;"><a href="https://upload.wikimedia.org/wikipedia/commons/thumb/7/76/Novel_Coronavirus_SARS-CoV-2.jpg/1280px-Novel_Coronavirus_SARS-CoV-2.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="625" data-original-width="800" src="https://upload.wikimedia.org/wikipedia/commons/thumb/7/76/Novel_Coronavirus_SARS-CoV-2.jpg/1280px-Novel_Coronavirus_SARS-CoV-2.jpg" width="0%" /></a></div><br /><a name='more'></a><h4 style="text-align: left;">Wages might increase as the economy suffers</h4><div>Depending on how it's calculated/reported, you might see wages increase even though the economy is suffering. For an example of why this might happen, consider the following simplified workforce:<br /><br />100 people making $100,000/year, and 100 people making $20,000/year</div><div><br /></div><div>If you just take the average wage for each worker, you'll find that it's $60,000/year. Now, imagine that due to covid-19, 5% of the high-earners and 25% of the low-earners lose their jobs. Now the workforce is:<br /><br />95 people making $100,000/year, and 75 people making $20,000/year</div><div><br /></div><div>The average wage for each worker is now ~$65,000/year. Average wages increased by ~8%. </div><div><br /></div><div>The real wage distribution is obviously not that simple, but the basic principle holds. If more low-earners than high-earners are laid off (which is the case...restaurants, basic travel services, etc. have had way more layoffs than software companies), you will paradoxically see wages increase with a poorer economy. To get a better feel for this, we'd need to report something like 'median monthly pay per working age adult'.</div><div><br /></div><h4 style="text-align: left;">Case fatality rates might lower without the virus becoming less deadly</h4><div>We locked down early on, it's been summer, and children don't work. Thus, children have not really been as exposed to the virus as much as adults have. Imagine as a simple example that the case fatality rate by age is:<br /><ul style="text-align: left;"><li>5% for 70 and up</li><li>0.01% for 10 and down</li><li>1% in-between those two</li></ul><div>Imagine then that from June through August, the cases were:<br /><ul style="text-align: left;"><li>10,000 70 and up</li><li>10,000 10 and down</li><li>100,000 for all others</li></ul><div>You would have expected 1501 deaths out of 120,000 cases, so the case fatality rate would be ~1.3%. Now, imagine that schools open and children are all exposed to the virus. Everyone else keeps getting it, so imagine that September through November results in the following cases:<br /><ul><li>10,000 70 and up</li><li>100,000 10 and down</li><li>100,000 for all others</li></ul><div>Now, you would expect 1510 deaths out of 210,000 cases, so the case fatality rate would be ~0.7%. The virus didn't get any less deadly and the infection rate for adults didn't slow down, but the exposed population changed so the fatality rate went down.</div></div></div></div><div><br /></div><div>There is a similar one to this one with death reporting. New York reports Covid-19 deaths more accurately than many other states (Texas and Florida for example) and was hit hard early on. If New York reports 90% of Covid-19 deaths accurately and Texas reports 50% of them accurately, then even if the virus is just as deadly in Texas as New York, it will appear that the virus became much less deadly while really it is just being reported less accurately. We saw this early on in Europe also with Belgium reporting more accurately than other countries and showing an apparently higher death rate. The best way to handle this will likely be to look back at excess deaths after a year.</div><div><br /></div><div><br /></div>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com1tag:blogger.com,1999:blog-1532419805701836386.post-14639142306624723782020-08-16T21:08:00.020-07:002021-03-03T08:13:51.983-08:00Why Are Tax Rates So Low for Rich People?This is an enormous topic, so I'll just focus on investments, cover some basics here, and provide simple examples. A major reason that the tax rates are so low for rich people is that investments are often taxed lower than normal income and rich people make a lot of money from their investments.<img src="https://upload.wikimedia.org/wikipedia/commons/6/6f/IE_Real_SandP_Prices%2C_Earnings%2C_and_Dividends_1871-2006.png" width="0%" /><a name='more'></a><p>There are many classes of investments. There also many investment account types. To keep this simple, I will cover the basics of bank accounts, bonds, and stocks, and also cover how 401ks and IRAs work. </p><p>Overly simplified, investments pay you in two ways:<br /></p><ol style="text-align: left;"><li>You get some yield from them (interest, dividends, etc.)</li><li>They increase in value (capital gains)</li></ol><div>You typically owe taxes on any of #1, and any of #2 that you sold. For #2, the taxes are typically on the amount you gained in the sale...that is, if you bought for $25 and sold for $35, you're taxed on the $10 gain (there are weird exceptions to this and I discuss one around inheritance towards the end).</div><div><br /></div><h4 style="text-align: left;">Tax types</h4><div>These are generally taxed in a few different ways. Considering only federal taxes here to keep it simple:<br /><ol style="text-align: left;"><li>'Marginal rate' which means at your income rate; if you're in the 32% bracket, they're taxed at 32%</li><li>'Ordinary dividends'; same as 'Marginal rate'; these are typically dividends on stocks you've held for only a short time before the dividend was paid out</li><li>'Short-term capital gains'; same as 1 and 2; these are capital gains on things that you sold less than a year after purchasing</li><li>'Long-term capital gains'; less than or equal to marginal rate; vary from 0 - 20%; these are capital gains that aren't short-term</li><li>'Qualified dividends'; dividends that don't fit in #2; these are taxed like long-term capital gains</li><li>'Exempt-interest dividends'; dividends that are not subject to taxes...typically municipal bonds</li><li>Others I won't discuss (e.g., taxes around discounted options from an employer would need their own full post).</li></ol><div>Summarizing then, you basically have two tax rates: your normal income one that covers interest income, ordinary dividends, short-term capital gains, and a few others, and the investment rate that covers long-term capital gains and qualified dividends. Those last three are why you'll often hear about billionaires having low tax rates. Once you're wealthy enough that your income is mostly from investments, you primarily pay the (lower) investment tax rates.</div></div><div><br /></div><h4 style="text-align: left;">Income example</h4><div>How does this work with a real example?</div><div><ol style="text-align: left;"><li>You had $10,000 in a savings account yielding 1% for 1 year.</li><li>You had $10,000 in a municipal bond fund yielding 1% for 1 year.</li><li>You had $10,000 in a treasury bond fund yielding 1% for 1 year.</li><li>You had $10,000 in stocks that paid out a 1% (qualified) dividend for 1 year.</li><li>You are in the 24% tax bracket.</li></ol><div>What was the net gain on each of those after subtracting your taxes?</div></div><div><ol style="text-align: left;"><li>You earned $100. You owe $24 in taxes on it. Net yield was <b>0.76%.</b></li><li>You earned $100. You owe $0 in taxes on it. Net yield was<b> 1%.</b></li><li>You earned $100. You owe $24 in taxes on it. Net yield was <b>0.76%.</b></li><li>You earned $100. You owe $15 in taxes on it (qualified dividends are 15% rate for 24% tax bracket). Net yield was <b>0.85%.</b></li></ol><div>You can see that municipal bonds dominate here. For equivalent yields, they pay out 1/0.76 - 1, or 32% more than the savings account. In table form, for a base yield of 1% in the 24% tax bracket:</div><div><br /><table style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr style="background-color: #e6a117;"><th>Investment</th><th>Real Yield</th></tr><tr><td>Municipal Bond</td><td>1%</td></tr><tr><td>Stock (Qualified Dividend)</td><td>0.85%</td></tr><tr><td>Savings Account (Interest)</td><td>0.76%</td></tr><tr><td>Treasury</td><td>0.76%</td></tr></tbody></table></div></div><div><br /></div><div><br /></div><div>We covered a lot so far. To recap highlights...</div><div><ul style="text-align: left;"><li>Municipal bond yields are often not subject to federal tax at all.</li><li>Most dividends are taxed at a much lower rate than ordinary income.</li></ul><h4 style="text-align: left;">Capital gains example</h4></div><div style="text-align: left;">Now let's try some capital gains examples:</div><div><ol style="text-align: left;"><li>You buy $100 of a stock on 01-Jan-2019 and sell for $200 on 02-Jan-2019.</li><li>You buy $100 of a stock on 01-Jan-2019 and sell for $200 on 02-Jan-2020.</li><li>You buy $100 of a stock on 01-Jan-2019 and sell for $50 on 02-Jan-2019, then use that $50 to buy a different stock on 02-Feb-2019 and sell it for $200 on 02-Jan-2020.</li></ol><div>What do you owe in each case?</div></div><div style="text-align: left;"><ol style="text-align: left;"><li><b>$24. </b>You sold for a gain after holding for only 1 day...that's a short-term capital gain of $100, so you owe $24 (ordinary income rate).</li><li><b>$15. </b>You sold for a gain after holding for more than 1 year...that's a long-term capital gain of $100, so you owe $15 (long-term capital gains are 15% rate for 24% tax bracket).</li><li><b>$10.5. </b>You took a $50 loss in 2019, and a long-term capital gain of $150 in 2020. The $50 loss resulted in a capital loss of $12 (24%*$50), and the $150 gain results in a tax of $22.5, so you owe $22.5 in 2020 and had a net of $10.5 in taxes. </li></ol><div>Note that #3 is better than #2 even though both represent a $100 gain over the same time period. You can write off the loss in 2019 as ordinary income while the gain is the lower long-term capital gains rate. This is called 'tax loss harvesting'. Some brokers like <a href="https://wlth.fr/2mAgujo" target="_blank">Wealthfront</a> do this automatically for you. In table form, if you're in the 24% tax bracket and make $100 total in capital gains, you owe the following tax depending on the sale details:<br /><br /><table style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr style="background-color: #e6a117;"><th>Investment</th><th>Effective Tax Rate</th></tr><tr><td>Short-term gain</td><td>24%</td></tr><tr><td>Long-term gain</td><td>15%</td></tr><tr><td>Tax-loss harvested</td><td>10.5%</td></tr></tbody></table></div></div><div><br /></div><h4 style="text-align: left;">Account types</h4><div>You might have heard of 401ks, Roth IRAs, 529s, etc. What are these? The two most common (I think) are 401ks and Roth IRAs, so briefly:<br /><ul style="text-align: left;"><li>a 401k lets you invest money without paying taxes on it; you pay taxes when you withdraw; this is very helpful if you make more money now than you will in retirement (assuming tax brackets don't change)</li><li>a Roth IRA lets you invest taxed money and then you pay no taxes on the gains (dividends, capital gains, etc.); i.e., every tax in the examples above would be 0 like the municipal bond one</li></ul><div>Paying no taxes either on the invested income or on the gains can be hugely beneficial. It's generally ideal to max those two account types out each year if you can. For example, compare the 401k with a taxable account for $5,000 of income invested that doubles over 10 years and is cashed out then. Assume you were in the 24% bracket when you earned it and are in the 22% bracket when you withdraw it:</div></div><div><ul style="text-align: left;"><li><b>taxable account</b>: $5,000 pre-tax = (1 - 0.24)*$5,000 = $3,800 invested; doubled means gain is $3,800; that's taxed at 15%, so you end up with $7,030</li><li><b>401k</b>: $5,000 pre-tax; 401k is invested pre-tax so all $5,000 goes in; doubled means gain is $5,000; that + original $5,000 are taxed at 22%, so you end up with $7,800</li></ul><div>That is a significant difference. The gain on the 401k money is 56% while the gain on the taxable account money is 41%. 401ks have an additional advantage in that many employers match funds. Running the numbers assuming your emlpoyer matches half what you deposit:</div><div><br /></div><div><b>401k with 50% employer match: </b>$5,000 pre-tax; 401k is invested pre-tax so all $5,000 goes in; employer adds $2500; doubled means gain is $7,500; that + original $7,500 are taxed at 22%, so you end up with $11,700.</div><div><br /></div><div>That's awesome...you effectively got a gain of 134%.<br /><br />Similar example for Roth IRA:</div></div><div><ul><li><b>taxable account</b>: $5,000 pre-tax = (1 - 0.24)*$5,000 = $3,800 invested; doubled means gain is $3,800; that's taxed at 15%, so you end up with $7,030</li><li><b>Roth IRA</b>: $5,000 pre-tax = (1 - 0.24)*$5,000 = $3,800 invested; doubled means gain is $3,800; that's untaxed, so you end up with $7,600</li></ul><div>Not quite as significant in this case as the 401k in this case, but still great...Roth IRA gain was 52% while taxable account gain was 41%.<br /></div></div><div><br /></div><div>To summarize those in a table, if you're in the 24% tax bracket now and 22% tax bracket in retirement, an investment that doubles yields the following for the various account types:<br /><br /><table style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr style="background-color: #e6a117;"><th>Investment</th><th>Yield vs pretax investment</th></tr><tr><td>Taxable</td><td>41%</td></tr><tr><td>Roth IRA</td><td>52%</td></tr><tr><td>401k</td><td>56%</td></tr><tr><td>401k (with 50% employer match)</td><td>134%</td></tr></tbody></table></div><div><br /></div><div>Note...you might have a 401k plan through your employer that allows constant/in-plan conversion to a Roth IRA. This can be the best of both of the above. Basically, you:<br /><ul style="text-align: left;"><li>max out your tax-free 401k contribution</li><li>add after-tax money to 401k (this lets you get above the ~$19k annual limit)</li><li>convert the after-tax portion to a Roth in-plan so that it then grows tax-free (this gets around the Roth contribution limits and income restrictions)</li></ul><div>This lets you get way more tax-free growth, but is limited to people who make enough to need more than $19k/year invested and have employer plans that allow this. This is often called a 'megabackdoor' in case you want to ask your company's plan manager about it.</div></div><div><br /></div><div><h4>Living off of investments</h4></div><div>Imagine you're single and live off of capital gains. How much tax will you pay?</div><div><br /></div><div>There is no exact answer because it depends on a lot, but you can get a surprisingly large amount of money tax-free with this setup. Imagine all of your money is in taxable accounts to make it hard. Imagine you invested $600,000 total, and it's now worth $2,000,000. Remember from above that you are only taxed on the gains on your investments when they're sold and not the total amount sold. Current tax brackets have 0% capital gains tax on your first $40,000 in income. If you need $60,000/year to get by, you could thus sell:<br /><ul style="text-align: left;"><li>$40,000 of gains</li><li>$20,000 of initial investment</li></ul><div>That will give you $60,000, tax-free, even though your account is taxable (assuming this is your only income). If you're married, this is doubled. If you have some money in a Roth IRA, you can get out even more since those gains aren't taxed. Making this even crazier, say you do this until you die and have $1,200,000 in gains left over to pass on in inheritance. Because of a (IMO) loophole called '<a href="https://www.investopedia.com/terms/s/stepupinbasis.asp">step up in basis</a>', the cost basis is reset to current value, so <b>no one ever pays taxes on those gains</b>.</div></div><div><br /></div><div>It's important to note that the above does not factor in tax-free income from municipal bonds or the standard deduction, so you can actually get even more income without paying any income taxes.</div><div><br /></div><h4 style="text-align: left;">Not covered here</h4><div>There are many tax advantages when it comes to real estate. There are some crazy complicated tax rules around stock as income from a company. There are also crazy complicated tax rules around income from things that you own (e.g., a business). Some people also just lie on their taxes, hide money, etc.</div><div><br /></div><h4 style="text-align: left;">Summary</h4><div>Using simple strategies, it is very easy to get taxes on investments well below 20%. None of the above are tax evasion or crazy loopholes or anything. Some will change over time (e.g., I personally hope the step up in basis is removed), but there are many completely legitimate ways to get investment taxes lower than ordinary income. You also don't need to be a billionaire to take advantage of them. Over half of all working Americans have 401ks for example. This also leads to a few simple bits of advice:<br /><ul style="text-align: left;"><li>if you have access to a 401k through your employer and they match anything, take advantage of it if at all possible</li><li>max out a Roth IRA every year if at all possible</li><li>try to hold stocks for long enough to get the lowered tax rates</li><li>if you want a broad portfolio (e.g., mix of municipal bonds, treasuries, and stocks), put the worst tax-offenders (treasuries, then dividend stocks) in the tax-shielded accounts</li><li>consider tax implications when selling investments (e.g., try to sell older shares so you don't get hit with short-term capital gains taxes)</li></ul><div><br /></div></div><p></p>theboathttp://www.blogger.com/profile/01260139398901806725noreply@blogger.com0