Don’t Know Much…–Knowledge Discovery in Data

A short while ago I found myself in an odd venue where a question was posed about my being an educated individual, as if it were an accusation.  Yes, I replied, but then, after giving it some thought, I made some qualifications to my response.  Educated regarding what?

It seems that, despite a little more than a century of public education and widespread advanced education having been adopted in the United States, along with the resulting advent of widespread literacy, that we haven’t entirely come to grips with what it means.  For the question of being an “educated person” has its roots in an outmoded concept–an artifact of the 18th and 19th century–where education was delineated, and availability determined, by class and profession.  Perhaps this is the basis for the large strain of anti-intellectualism and science denial in the society at large.

Virtually everyone today is educated in some way.  Being “educated” means nothing–it is a throwaway question, an affectation.  The question is whether the relevant education meets the needs of the subject being addressed.  An interesting discussion about this very topic is explored at Sam Harris’ blog in the discussion he held with amateur historian Dan Carlin.

In reviewing my own education, it is obvious that there are large holes in what I understand about the world around me, some of them ridiculously (and frustratingly) prosaic.  This shouldn’t be surprising.  For even the most well-read person is ignorant about–well–virtually everything in some manner.  Wisdom is reached, I think, when you accept that there are a few things that you know for certain (or have a high probability and level of confidence in knowing), and that there are a host of things that constitute the entire library of knowledge encompassing anything from a particular domain to that of the entire universe, which you don’t know.

To sort out a well read dilettante from someone who can largely be depended upon to speak with some authority on a topic, educational institutions, trade associations, trade unions, trade schools, governmental organizations, and professional organizations have established a system of credentials.  No system is entirely perfect and I am reminded (even discounting fraud and incompetence) that half of all doctors and lawyers–two professions that have effectively insulated themselves from rigorous scrutiny and accountability to the level of almost being a protected class–graduate in the bottom half of their class.  Still, we can sort out a real brain surgeon from someone who once took a course in brain physiology when we need medical care (to borrow an example from Sam Harris in the same link above).

Furthermore, in the less potentially life-threatening disciplines we find more variation.  There are credentialed individuals who constantly get things wrong.  Among economists, for example, I am more likely to follow those who got the last financial crisis and housing market crash right (Joe Stiglitz, Dean Baker, Paul Krugman, and others), and those who have adjusted their models based on that experience (Brad DeLong, Mark Thoma, etc.), than those who have maintained an ideological conformity and continuity despite evidence.  Science–both what are called the hard and soft sciences–demands careful analysis and corroborating evidence to be tied to any assertions in their most formalized contexts.  Even well accepted theories among a profession are contingent–open to new information and discovery that may modify, append, or displace them.  Furthermore, we can find polymaths and self-taught individuals who have equaled or exceeded credentialed peers.  In the end the proof is in the pudding.

My point here is threefold.  First, in most cases we don’t know what we don’t know.  Second, complete certainty is not something that exists in this universe, except perhaps at death.  Third, we are now entering a world where new technologies allow us to discover new insights in accessing previously unavailable or previously opaque data.

One must look back at the revolution in information over the last fifty years and its resulting effect on knowledge to see what this means in our day-to-day existence.  When I was a small boy in school we largely relied on the published written word.  Books and periodicals were the major means of imparting information, aside from collocated collaborative working environments, the spoken word, and the old media of magazines, radio, and television.  Information was hard to come by–libraries were limited in their collections and there were centers of particular domain knowledge segmented by geography.   Furthermore, after the introduction of television, society had developed  trusted sources and gatekeepers to keep the cranks and flimflam out.

Today, new media–including all forms of digitized information–has expanded and accelerated the means of transmitting information.  Unlike old media, books, and social networking, there are also fewer gatekeepers in new media: editors, fact checkers, domain experts, credentialed trusted sources, etc. that ensure quality control, reliability, fidelity of the information, and provide context.  It’s the wild west of information and those wooed by the voodoo of self-organization contribute to the high risk associated with relying on information provided through these sources.  Thus, organizations and individuals who wish to stay within the fact-based community have had to sort out reliable, trusted sources and, even in these cases, develop–for lack of a better shorthand–BS detectors.  There are two purposes to this exercise: to expand the use of the available data and leverage the speed afforded by new media, and to ensure that the data is reliable and can reliably tell us something important about our subject of interest.

At the level of the enterprise, the sector, or the project management organization, we similarly are faced with the situation in which the scope of data that can be converted into information is rapidly expanding.  Unlike the larger information market, this data on the microeconomic level is more controlled.  Given that data at this level suffers from significance because it records isolated events, or small sample sizes, the challenge has been to derive importance from data where sometimes significance is minimal.

Furthermore, our business systems, because of the limitations of the selected technology, have been self-limiting.  I come across organizations all the time who cannot imagine the incorporation and integration of additional data sets largely because the limitations of their chosen software solution has inculcated that approach–that belief–into the larger corporate culture.  We do not know what we do not know.

Unfortunately, it’s what you do not know that, more often than not, will play a significant role in your organization’s destiny, just as an individual that is more self-aware is better prepared to deal with the challenges that manifest themselves as risk and its resultant probabilities.  Organizations must become more aware and look at things differently, especially since so many of the more conventional means of determining risk and opportunities seems to be failing to keep up with the times, which is governed by the capabilities of new media.

This is the imperative of applying knowledge discovery in data at the organizational and enterprise level–and in shifting one’s worldview from focusing on the limitations of “tools”: how they paint a screen, whether data is displayed across the x or y axis, what shade of blue indicates good performance, how many keystrokes does it take to perform an operation, and all manner of glorified PowerPoint minutia–to a focus on data:  the ability of solutions to incorporate more data, more efficiently, more quickly, from a wider range of sources, and processed in a more effective manner, so that it is converted into information to be able to be used to inform decision making at the most decisive moment.

One thought on “Don’t Know Much…–Knowledge Discovery in Data

  1. Nick,

    Good perspective. The best an educated person can hope for is that they do understand that they don’t know everything and are open to new ideas and concepts. I’ve known a number of people, some of whom had impressive degree designations, absolutely supreme in believing they were right beyond a doubt – “Yes, the World is flat!”

    I liked the article.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s