Hello! I'm Jason. I'm a data scientist, actuary, and self-taught software engineer based in San Francisco. Currently I lead the data & analytics team at Symetra, an insurance company.
I created this site to share the things I'm interested in. You'll find posts explaining how I built this site, how I write code and solve puzzles using Python, and how I use math to understand the world - like using Bayesian inference to analyze my daily step count before and after I got a German Shepherd puppy. You can also find me on Twitter or Github - let's start a conversation!
Mar 11, 2022
I followed a winding path to solve this week's Riddler. First, I was convinced it was easy; then, I discovered some hidden complexity; finally, I realized there is much more under the surface!
Jan 28, 2022
This week's Riddler reminds me of the Ship of Theseus applied to the oil in a car's transmission. If we remove and replace a small amount of oil each month for many years, how much of the oil will be at least 12 months old?
Jan 24, 2022
After 18 months of development on our team's shared Python library, we had our first production pipeline failure. Here are my thoughts on what we can learn from it, and why it's good for our team's growth.
Jan 14, 2022
This week the Riddler takes on Wordle. I'll write code with the goal of solving any mystery word in three or fewer guesses.
Jan 7, 2022
I decided a great way to start 2022 was to solve this year's first Riddler! This is an optimization problem, where we solve for the strategy that minimizes the travel distance between three points along the edges of a triangle.
Jan 1, 2022
I often want to create objects that hold several related pieces of data together in the same place - like a `Book` object that has an `author`, `title`, etc. Python offers several methods to do this, but it's not always clear which one to use. I'll show different patterns that I use and discuss the advantages and disadvantages of each.
Dec 26, 2021
I've written before about how unit tests lead to better code. Some data scientists may not understand how or why unit testing should be used in their work. I'll outline steps you can take to incorporate unit testing into your data science projects and the key benefits you will get as a result.
Dec 23, 2021
Unit tests are a critical part of quality code. However, it's not always clear how or what you should be testing in your code. I'll walk through a unit testing example, then outline three reasons why I think unit testing will improve your code.
Jan 29, 2021
It's the 256th edition of the Riddler, and though I've taken many weekends off to spend time with my newborn, I became obsessed with this week's puzzle! We're asked to play the word guessing game from Lingo, a game show. It reminds me of another one of my favorite Riddlers where we identify the best strategy for playing the math game from Countdown, another game show. The puzzle is extremely challenging, and I couldn't find an exact solution. Instead, I used Python to implement the game engine, and used some heuristics to choose words that resulted in a high score.
Oct 2, 2020
I've taken a short break from solving the Riddlers because I've recently become a father! Our daughter, Emerson, was born in September, and I've been working on a new set of puzzles related to parenthood. However, a classic probability question from the Riddler drew me back in with the prospect of writing a dynamic programming solution.
Jun 19, 2020
As a self-professed Bayesian enthusiast, this week's Riddler Express is right up my alley! We want to calculate the odds that a coin is magical after flipping it a given number of times. Specifically, we want to calculate the number of flips before we're 99% certain the coin is actually magical. We'll use Bayes's formula to find an exact solution.
Jun 12, 2020
We're studying simulated petri dishes in this week's Riddler! We want to figure out how likely a colony of bacteria is to survive, given what we know about how often its cells divide. This is a classic problem that can be modeled with a stochastic process, but we'll try a different approach with very large markov transition matrices built in python and numpy.
Jun 12, 2020
I can never resist a good dice problem, and this week's Riddler Express is no different. In a technique called "bowling", you can try to throw a dice so it lands on one of four sides, rather than six. How could we use this to our advantage in a simplified game of craps? We'll solve this with a bit of dynamic programming - my favorite!
Jun 5, 2020
We are carefully coloring a poster in this week's Riddler. We want to draw horizontal lines with a marker in order to fill the poster with ink as evenly as possible. How far apart should each marker line be? We'll use numpy for a computational approach that minimizes the standard deviation of our coloring scheme.
May 29, 2020
Suppose everyone in the United States wanted to join the same video call? If each of the 330 million participants joins and drops at a random time, how likely is it that at least one person overlaps with everyone else? This problem easily exceeds the limits of a practical simulation, but we'll write some code to develop intuition about the results before attempting to solve it analytically.
May 22, 2020
This week's Riddler starts with a peculiar fact - "Ohio" doesn't share any letters with the word "mackerel", and it's the only state for which that is true. How many other state/word combinations can we find, and which states match with the longest words? We'll use python's super efficient sets, lists, and dictionaries to crunch millions of combinations in under two seconds to find the answer.
May 15, 2020
This week's Riddler asks us to determine the best dice-rolling strategy for our weekly Dungeons and Dragons game. With the option to roll once, or roll multiple times with various combinations of maximums and minimums, how can we optimize the odds of rolling the number we want?
May 8, 2020
This was a challenging Riddler about a toddler's inefficient eating habits. Our picky toddler takes a bite from an Apple once every minute, and only if the randomly chosen spot has skin left on it. How long will it take to eat the entire apple? Spoiler - it's likely to outlast the toddler's attention span by quite some time.
May 1, 2020
This week's Riddler uses probability to design the ultimate jailbreak. Prisoners are given the opportunity to flip coins, and if all flipped coins are heads, each prisoner is released. Without communicating, what strategy should the prisoners use to maximize their odds of success?
Apr 24, 2020
The original Monty Hall Show featured three doors, two goats, and a brand new car. Contestants choose a door, Monty reveals a goat behind another door, and contestants are offered the chance to switch their original choice. After heated debate among probability nerds, it was eventually agreed that switching is the optimal strategy, which wins two out of three games. In this week's Riddler, we examine a variation of this game in which the number of goats is random. Does it change the decision to switch?
Apr 17, 2020
In memory of John Conway, we explore a modified version of the famous "Game of Life" in this week's Riddler. I implemented the Game in python, which ended up being so much fun that many of the features aren't strictly required to solve the problem. Instead, they were amusing diversions that helped me explored this surprisingly nuanced game. It's probably just as Conway would have intended.
Apr 10, 2020
We're tracking spam messages in this week's Riddler. Spammers post messages on the column's comment board, and they also reply to each other's messages. Over a three-day time period, how many spam messages and replies should we expect to see? It turns out there are some fascinating connections with continuously compounding interest rates, which we'll derive from the differential equation. I'll also write a simulation using numpy and poisson distributions to verify our analytical approach.
Apr 3, 2020
This week's Riddler was a fun calculus problem. Time to brush off our integrals! On a snowy day we try to pinpoint the moment the snow started based on how long it takes the plows to clear the roads.
Mar 27, 2020
Rolling and re-rolling dice is our task for this week's Riddler Classic. Each time we roll, we replace the sides of the dice with the values of our previous roll. This makes for a tricky probability space, but a very fun Python class to write and some markov chain analysis to crunch in order to solve it.
Mar 6, 2020
We take all the "guess work" out of a classic board game in this week's Riddler - solving for the optimal strategy in Guess Who! My trusted technique of dynamic programming makes a (predictable) reappearance.
Feb 28, 2020
This week's Riddler asks us to estimate how long it takes to get a haircut with our favorite barber. Rather than scheduling an appointment, we roll the dice and hope we won't have to wait too long if we drop by unannounced. How long exactly? We'll use probability distributions and monte carlo simulation to estimate our idle time.
Feb 21, 2020
We tackle a solitaire game of coin flipping in this week's Riddler Classic. Using dynamic programming, we build a tree to explore all possible game states and work backwards to identify the ideal move at each state.
Jan 24, 2020
This week's Riddler Classic explores ideal strategy in a two-player game. We take turns removing coins from two piles, and the last one to remove a coin wins. I'll use pen and paper to sketch out the logical framework, then code a flexible solution using python and dynamic programming.
Jan 17, 2020
This week's Riddler Classic is a delightful diversion - tracking delirious ducks as they randomly swim from rock to rock in a pond. How long will it take them to end up on the same rock? We'll use markov chains in Python to solve it.
Jan 10, 2020
Gematria is a numerical system that links Hebrew characters to numbers. This week's Riddler Classic explores the "score" of a number, and asks us to identify any patterns that emerge.
Jan 3, 2020
This week's Riddler Classic was a fun way to welcome the new year. We're asked to find seven letters that maximize the score from the New York Times Spelling Bee puzzle. I use pure Python (lists, sets, and dictionaries only) to find the optimum pool of seven words.
Nov 15, 2019
This week's Riddler tested our ability to think recursively, but surprisingly not of the coding variety. Instead, we'll use a series of equations to solve the problem analytically. Brings back memories of high school proofs... which I may or may not remember with complete fidelity.
Nov 1, 2019
A small twist on a classic problem is this week's challenge from the Riddler. We're going to tackle an extension of the secretary problem - also known as the Sultan's Dowry problem. This classic puzzle has led to fascinating research in optimal stopping theory, which we will use to help our Sultan choose the best possible suitor. Let's dive in!
Oct 31, 2019
I've started a new project called pyesg - Python Economic Scenario Generator. Economic Scenario Generators, or ESGs, are used to simulate possible future markets, like stock prices, interest rates, or volatility. Actuaries use ESGs to determine the potential values of insurance portfolios in the future. This helps them ensure that their companies will have enough money to pay claims even under the worst scenarios. Other professionals might use ESGs to understand how business decisions today could affect company value in the future. The Python ecosystem has amazing libraries for data analysis, machine learning, and many other fields, but not for generating economic scenarios. I hope that an ESG library for Python will make this type of analysis easier and more widely adopted.
Oct 18, 2019
I'm a sucker for a good maze problem from the Riddler Express. Let's over-engineer it using networkx and python to extend the problem and see what we can learn!
Oct 18, 2019
This was a clever version of a classic problem for this week's Riddler Classic. With just two denominations of currency, what is the largest amount we can't create from a combination of bills?
Oct 4, 2019
Forgive the pun - this was a fantastic Riddler Express challenging you to calculate your odds of winning a million dollars!
Oct 4, 2019
This week's Riddler is a twist on the classic birthday problem. The birthday problem tells us that among a group of just 23 people, we are 50% likely to find at least one pair of matching birthdays. But what if we want to find three matching birthdays instead?
Sep 27, 2019
What happens when you create a baseball league from three teams with peculiar specialties? That's the objective of this week's Riddler. We're asked to determine whether it's better to specialize in home runs, doubles, or walks as a strategy to tally the most wins from a season of baseball in the Riddler League.
Sep 20, 2019
This week's riddler was an entertaining blend of probability and one of my favorite sports, cycling. We're asked to choose the ideal pace for a team time trial - trying to balance the rewards of a competitive time with the risks of pushing our riders too hard and having them crack due to the effort. Plus, there's a bonus extra credit problem!
Sep 13, 2019
The fivethirtyeight riddler this week asks us to make connections between states. Specifically, we want to map the connections between state abbreviations (e.g. CA for California). We've been tasked with finding the longest string of connections where the last letter from one state is the first letter from another, without repeating any states. With 59 state abbreviations to choose from, what is the longest string we can create?
Aug 23, 2019
This week's fivethirtyeight riddler was created by yours truly! It was the first puzzle I've submitted to the riddler, and I hope you enjoyed it. This week we attempt to fool a bank with counterfeit hundred dollar bills.
Jun 7, 2019
Based on an actual statistical analysis problem from World War II, this week's Riddler asks us to estimate the population of German tanks given uncertain information about the tanks we've observed. Fortunately, despite the uncertainty in our observations, we can still provide reasonably accurate estimates for the total German tank population. We'll rely heavily on Bayesian analysis to solve this problem.
May 31, 2019
Computers continue to fascinate me. The Riddler this week deals with an explosion of combinations and math that is nearly impossible to grasp without the help of a computer. Specifically, we're interested in crafting an ideal strategy for the "numbers game" from the UK television show Countdown. The numbers game asks contestants to use four mathematical operations (addition, subtraction, multiplication, and division) with six numbers as inputs to solve for a single, three digit target. Most of the time, this can be quite difficult, especially with a 30-second time limit. However, with the help of a computer, we can solve for every possible combination of input and output to identify the strategy that gives the humans the best chance to win!
May 17, 2019
This was a colorful Riddler Express. We start with a maze comprised of edges of different colors. Our task is to identify the shortest path from start to finish using only edges of certain colors. This was a great opportunity to take python's networkx library for a spin! We can build the maze as a network, where each edge has a "color" attribute, and use powerful solvers to do the path-finding for us!
May 17, 2019
This week's Riddler pits the army of the dead vs. the army of the living. As the two armies battle, any fallen soldiers from the living army rise to fight with the dead. How many soldiers would each side need to make it a fair fight?
Apr 19, 2019
The Riddler this week asks us about random points on the edge of a circle. Specifically, if we generate $n$ random points around the circumference of a circle, how likely are those points to fall on only one side?
Apr 5, 2019
Another weekly Riddler, this time with both an analytical and simulated solution!
Mar 29, 2019
I have a distinct memory of participating in my elementary school's spelling bee when I was in second grade. I was the unlikely runner-up, even though I was competing against children in third and fourth grade. What was the secret to my over-performance? Not my natural spelling ability, but rather the rules of the game - a participant is eliminated from the spelling bee after failing to spell a word correctly, which means that going last is an advantage. I was lucky enough to be the near the tail-end of the participants in my spelling bee, which surely improved my final ranking. This week's Riddler asks us to quantify that advantage.
Mar 22, 2019
This week's riddler asks us to simulate a game of baseball using rolls of a dice. To solve this problem, we're going to treat the game of Baseball like a markov chain. Under the simplified dice framework, we identify various states of the game, a set of transition probabilities to subsequent states, and associated payoffs (runs scored) when certain states are reached as a result of game events. Using this paradigm, we can simulate innings probabilistically, count the runs scored by each team, and determine the winner.
Mar 1, 2019
The Riddler Express this week asks us about collecting sets of cards. In particular, we're interested in collecting a complete set of 144 unique cards. We purchase one random card at a time for $5 each. How many purchases should we expect to make - and how much money should we expect to spend - in order to collect at least one of every card?
Jan 13, 2019
I've learned that there are many automatic differentiation libraries in the Python ecosystem. Often these libraries are also machine learning libraries, where automatic differentiation serves as a means to an end - for example in optimizing model parameters in a neural network. However, the autograd library might be one of the purest, "simplest" (relatively speaking) options out there. Its goal is to efficiently compute derivatives of numpy code, and its API is as close to numpy as possible. This means it's easy to get started right away if you're comfortable using numpy. In particular autograd claims to be able to differentiate as many times as one likes, and I thought a great way to test this would be to apply the Taylor Series approximation to some interesting functions.
Dec 21, 2018
This week's holiday Riddler is a twist on the classic "birthday problem". The birthday problem asks us to calculate the probability that at least two people at a party have the same birthday. Most people hearing this problem for the first time are surprised at how few people you need - roughly 23 people results in 50% odds of finding at least one pair of birthdays! For this problem, we're interested in calculating how likely we are to hear the same song more than once from a shuffled playlist. Moreover, what can we infer about the size of the playlist, given that we hear repeats roughly half the time?
Dec 13, 2018
As a follow up to my prior article on Black-Scholes in PyTorch, I wanted to explore more complex applications of automatic differentiation. As I showed before, automatic differentiation can be used to calculate the sensitivities, or "greeks", of a stock option, even if we use monte carlo techniques to calculate option price. Many exotic options can only be priced using monte carlo techniques, so automatic differentiation may be able to provide more accurate sensitivities in less time than traditional methods.
Dec 9, 2018
I've been experimenting with several machine learning frameworks lately, including Tensorflow, PyTorch, and Chainer. I'm fascinated by the concept of automatic differentiation. It's incredible to me that these libraries can calculate millions of partial derivatives of virtually any function with only one extra pass through the code. Automatic differentiation is critical for deep learning models, but I wanted to see how it could be applied to value financial derivatives.
Dec 6, 2018
I wrote yesterday about tracking my steps with a Garmin watch. Perhaps to keep me motivated and active, Garmin provides a daily step goal that moves up or down based on my activity. I've always been curious about how this algorithm works, but I couldn't find any resources that described it. Let's see if I can reverse engineer it instead.
Dec 5, 2018
Without a doubt, getting a puppy changes your life for the better. But I wanted to quantify this somehow. I used Bayesian inference to identify whether I logged more steps in the days since our puppy arrived.
Nov 1, 2017
I transitioned from a role as an actuarial consultant into the world of fintech a few years ago. The actuarial profession is relatively small and extremely specialized, but I believe actuarial methods and insights can play a significant role in the burgeoning fintech field. I wrote an article for The Actuary magazine summarizing the fintech landscape and the role that actuaries should be playing in it.
Sep 29, 2017
It turns out you can identify a doctored coin with a fairly high degree of certainty... It just takes lots and lots of trials.
Sep 22, 2017
This week's riddler reminds me of a cross between sudoku or kakuro puzzles and some good old fashioned geometry... you might call them geometric sudoku puzzles!
Sep 15, 2017
Before you build your campfire, what shapes can you make by breaking the sticks?
Aug 4, 2017
A classroom game of hot potato based on random walks; plus, my first recognition on fivethirtyeight as a solver - my chart explaining a non-intuitive puzzle result was featured in the solutions!
Jul 14, 2017
As the owner of a dominant team looking to establish a dynasty, how can you stack the odds against your opponent? This week's fivethirtyeight Riddler Classic has a heavy dose of combinatorics.
Jan 8, 2016
This was a fun dynamic programming exercise to try to identify a strategy that outsmarts a car salesman's game.