When I was first considering going to college, I had access to one of the first databases of U.S. colleges and universities. This software was introduced way before the Internet and involved a series of reductive queries, starting with all schools in the database, and then progressively reducing the set based on factors such as geography, major subjects available, tuition, etc. I used this system to construct a short list of schools to consider and thought that the decision would be easy. Then I started talking to my friends. I found out that everyone seemed to be using different criteria. Some were considering the ratio of male to female (which I will admit hadn’t occurred to me, but seemed pretty important) while others were looking at the extracurricular activities available near the campus. There was also a whole host of information I would have loved to know, such as the exact criteria used for acceptance and the acceptance rate per 100 applications, which was not disclosed and confidential.
I had inadvertently stumbled on my first real “big data” experience with the disparate sets of data that are always present and the danger of rushing to a decision just because I had “enough” data to make a decision. Having enough data to make any decision, something that can be referred to as a dispositive threshold does not guarantee that you have enough data to make a good decision. Rushing to make a decision with the data you have, simply because you can, is possibly one of the biggest “big data” mistakes. Unless you take the time and effort to do some sort of analysis on the merits and implications of using the data in hand, there is no defendable premise that such data is sufficient or appropriate to make any particular decision.