How to Use Data to Make (Slightly) Better Baby Name Decisions

What's the best way to make sure that your child doesn't have a faddish, boring, or weird name? Crunching numbers.

Naming humans is difficult. It’s a personal process that has long-term ramifications for the unborn and must be understood in a broader social context. There are emotions that must be considered and also data. But the data side of the naming process rarely gets its due. Naming a child, after all,  is clearly no job for a machine. That said, soon-to-be parents can make betters decision by looking at available data on their name choices.

Whether explicit or implicit, a lot of factors influence the name of a baby. How many syllables does it have? Does it go well with your surname? Did you know someone with that name growing up and, more importantly, did you like that person? The answers to many of those questions vary a lot for each family (and every opinionated family member). That said, there is one set of questions that data can answer unambiguously: How popular (i.e., frequently encountered) has a name been over the years? How popular is it now? How is it likely to be in the future?

For example, let’s look at my (Americanized) first name, George, in the top 100 American male 1-2-and-3-syllable names dataset:

The top graph represents the number of George births (y-axis) each year (x-axis). The middle graph is the normalization over the whole set. And the bottom graph is the extrapolation for the next couple of years. The orange, green, and red curves are the polynomial regressions of 1st, 2nd, 3rd, and 4th degree. This last graph represents an informed guess at the range of potential outcomes for the George community that speaks to the likelihood of the name receding from or gaining prominence. For George, there’s a fairly broad, but not massive range of possibilities. In all likelihood, the name will remain popular, but not incredibly common or as popular as it once was — barring a pop cultural moment.

Now, let’s look at more interesting plots, such as the name Shirley peaking in 1935 maybe due to the child actress Shirley Temple starting her career that year:

We start to see the immense power pop culture exerts on naming and also the degree to which names can quickly fade from grace — or not. An interesting and slightly different example shows the name Dylan spiking in the early 1990s — likely due to the debut of the fictional character Dylan McKay on Beverly Hills, 90210 — and what has happened since.

With that type of information, analytical parents can now make the conscious choice to pick an oldie-but-goodie, an about-to-be-rediscovered, or a bandwagon-y name for their offspring. Parents may react to the data in different ways. Some may be fine choosing a name that is likely to forever reveal their child’s exact age while others may want to choose something more original or more timeless. The key thing is understanding the nature of the decision before making it. And that is not only possible, it’s also fairly easy. The key is finding the data set. The best choice? National name data collected by the Social Security Administration.

Parents can use the information available to make a better decision — or at least agonize over it a bit more — as long as they’re down to learn how to do a polynomial regression, which isn’t terribly hard if you’re down to Google it. I’d say take the time. You don’t have a kid yet, you’ve got it.

This story was adapted from a story originally published on Georges Duverger’s personal website