Q: How accurate is this program?
A: More accurate than a Magic 8-ball. Less accurate than distributing and collecting 300 million surveys.
Q: No, really. How accurate?
A: Well, it's hard to say. In order to determine how accurate this program is, we would need a program that was completely accurate for comparison purposes. If we had a program that was completely accurate, we'd use that program instead of this one. At that point, discovering how accurate this program is would no longer be worth the effort. Therefore, we can fairly confidently say that it is impossible to determine how accurate this program is. (Confused? We're just warming up.)
In our completely non-expert opinion, we say that the program gives a decent ballpark estimate, but it shouldn't be used for anything more than that.
Q: Why isn't it more accurate?
A: There are a number of possible sources of inaccuracy:
First and foremost, the program is based upon a convenient fiction. Without getting too technical, the program makes the assumption that a person's first and last names are independent of one another. What this means is it assumes that the probability of a person having a particular first name is the same no matter what last name they have. It isn't.
So, for example: The program assumes that the chance that your first name is "Juan" is the same, regardless of whether your last name is "Arteaga" or "Epstein". Episodes of Welcome Back Kotter aside, we would hazard a guess that there are not that many people in the U.S. actually named "Juan Epstein". Depending upon what your family name is, it makes certain first names more likely, and certain others less likely. The program cannot compensate for that.
Second, the data is old. The data for this program comes from the U.S. Census Bureau's 1990 and 2000 censuses. That makes the data between 9 and 19 years old. This is the most recent name data available from the Census Bureau (the 2000 census did not include data for first names), but it is still old, and the accuracy may be slightly questionable.
Third, the data appears biased towards more formal versions of names. The data comes from forms mailed to the Census Bureau. It appears most people put their full, formal version of their name on the forms rather than a nickname. So, for example people who normally call themselves "Bill" would likely tend to put the name "William" on an official Census form. In fact, the data shows the name "William" outnumbering the name "Bill" 20 to 1. So, it appears that nicknames are under-represented in the statistics, and full formal names are over-represented in the statistics.
Fourth, not every name is on the list. A certain number of instances of a name were required to even make the list. About 10% of all responses were not included on the list because they appeared too few times. So, uncommon names are not represented on the list.
Fifth, we failed to make the required blood sacrifices to the gods of programming and statistics. Surely they will plague our endeavor with errors and inaccuracies.
Q: Can I use data from this site in my report / project / masters thesis / wikipedia article?
Sure. However, we take no responsibility for any merciless mocking from your teachers and/or peers for using questionable data. Kidding aside, please don't cite this in any sort of scholarly / semi-scholarly setting. It really isn't accurate enough to be used as a serious source.