It’s spring training time in sunny Florida, as well as other areas of the country with mild weather and baseball. For those of you new to the allusion, it comes from a poem by Franklin Pierce Adams and is also known as “Baseball’s Sad Lexicon”. Tinker, Evers, and Chance were the double play combination of the 1910 Chicago Cubs (shortstop, second base, and first base). Because of their effectiveness on the field these Cubs players were worthy opponents of the old New York Giants, for whom Adams was a fan, and who were the kings of baseball during most of the first fifth of a century of the modern era (1901-1922). That is, until they were suddenly overtaken by their crosstown rivals, the Yankees, who came to dominate baseball for the next 40 years, beginning with the arrival of Babe Ruth.
The analogy here is that the Cubs infielders, while individuals, didn’t think of their roles as completely separate. They had common goals and, in order to win on the field, needed to act as a unit. In the case of executing the double play, they were a very effective unit. So why do we have these dichotomies in information management when the goals are the same?
Much has been written both academically and commercially about Business Intelligence, Business Analytics, and Knowledge Discovery in Databases. I’ve surveyed the literature and for good and bad, and what I find is that these terms are thrown around, mostly by commercial firms in either information technology or consulting, all with the purpose of attempting to provide a discriminator for their technology or service. Many times the concepts are used interchangeably, or one is set up as a strawman to push an agenda or product. Thus, it seems some hard definitions are in order.
According to Technopedia:
Business Intelligence (BI) is the use of computing technologies for the identification, discovery and analysis of business data – like sales revenue, products, costs and incomes.
Business analytics (BA) refers to all the methods and techniques that are used by an organization to measure performance. Business analytics are made up of statistical methods that can be applied to a specific project, process or product. Business analytics can also be used to evaluate an entire company.
Knowledge Discover in Databases (KDD) is the process of discovering useful knowledge from a collection of data. This widely used data mining technique is a process that includes data preparation and selection, data cleansing, incorporating prior knowledge on data sets and interpreting accurate solutions from the observed results.
As with much of computing in its first phases, these functions were seen to be separate.
The perception of BI, based largely on the manner in which it has been implemented in its first incarnations, is viewed as a means of gathering data into relational data warehouses or data marts and then building out decision support systems. These methods have usually involved a great deal of overhead in both computing and personnel, since practical elements of gathering, sorting, and delivering data involved additional coding and highly structured user interfaces. The advantage of BI is its emphasis on integration. The disadvantage from the enterprise perspective, is that the method and mode of implementation is phlegmatic at best.
BA is BI’s younger cousin. Applications were developed and sold as “analytical tools” focused on a niche of data within the enterprise’s requirements. In this manner decision makers could avoid having to wait for the overarching and ponderous BI system to get to their needs, if ever. This led many companies to knit together specialized tools in so-called “best-of-breed” configurations to achieve some measure of integration across domains. Of course, given the plethora of innovative tools, much data import and reconciliation has had to be inserted into the process. Thus, the advantages of BA in the market have been to reward innovation and focus on the needs of the domain subject matter expert (SME). The disadvantages are the insertion of manual intervention in an automated process due to lack of integration, which is further exacerbated by so-called SMEs in data reconciliation–a form of rent seeking behavior that only rewards body shop consulting, unnecessarily driving up overhead. The panacea applied to this last disadvantage has been the adoption of non-proprietary XML schemas across entire industries that reduce both the overhead and data silos found in the BA market.
KDD is our both our oldster and youngster–grandpa and the grandson hanging out. It is a term that describes a necessary function of insight–allowing one to determine what the data tells us are needed for analytics rather than relying on a “canned” solution to determine how to approach a particular set of data. But it does so, oftentimes, using an older approach that predates BI, known as data mining. You will often find KDD linked to arguments in favor of flat file schemas, NoSQL (meaning flat non-relational databases), and free use of the term Big Data, which is becoming more meaningless each year that it is used, given Moore’s Law. The advantage of KDD is that it allows for surveying across datasets to pick up patterns and interrelationships within our systems that are otherwise unknown, particularly given the way in which the human mind can fool itself into reifying an invalid assumption. The disadvantage, of course, is that KDD will have us go backward in terms of identifying and categorizing data by employing Data Mining, which is an older concept from early in computing in which a team of data scientists and data managers develop solutions to identify, categorize, and use that data–manually doing what automation was designed to do. Understanding these limitations, companies focused on KDD have developed heuristics (cognitive computing) that identify patterns and possible linkages, removing a portion of the overhead associated with Data Mining.
Keep in mind that you never get anything for nothing–the Second Law of Thermodynamics ensures that energy must be borrowed from somewhere in order to produce something–and its corollaries place limits on expected efficiencies. While computing itself comes as close to providing us with Maxwell’s Demon as any technology, even in this case entropy is being realized elsewhere (in the software developer and the hardware manufacturing process), even though it is not fully apparent in the observed data processing.
Thus, manual effort must be expended somewhere along the way. In any sense, all of these methods are addressing the same problem–the conversion of data into information. It is information that people can consume, understand, place into context, and act upon.
As my colleague Dave Gordon has pointed out to me several times that there are also additional methods that have been developed across all of these methods to make our use of data more effective. These include more powerful APIs, the aforementioned cognitive computing, and searching based on the anticipated questions of the user as is used by search engines.
Technology, however, is moving very rapidly and so the lines between BI, BA and KDD are becoming blurred. Fourth generation technology that leverages API libraries to be agnostic to underlying data, and flexible and adaptive UI technology can provide a comprehensive systemic solution to bring together the goals of these approaches to data. With the ability to leverage internal relational database tools and flat schemas for non-relational databases, the application layer, which is oftentimes a barrier to delivery of information, becomes open as well, putting the SME back in the driver’s seat. Being able to integrate data across domain silos provide insight into systems behavior and performance not previously available with “canned” applications written to handle and display data a particular way, opening up knowledge discovery in the data.
What this means practically is that those organizations that are sensitive to these changes will understand the practical application of sunk cost when it comes to aging systems being provided by ponderous behemoths that lack agility in their ability to introduce more flexible, less costly, and lower overhead software technologies. It means that information management can be democratized within the organization among the essential consumers and decision makers.
Productivity and effectiveness are the goals.