Like Tinker to Evers to Chance: BI to BA to KDD

It’s spring training time in sunny Florida, as well as other areas of the country with mild weather and baseball.  For those of you new to the allusion, it comes from a poem by Franklin Pierce Adams and is also known as “Baseball’s Sad Lexicon”.  Tinker, Evers, and Chance were the double play combination of the 1910 Chicago Cubs (shortstop, second base, and first base).  Because of their effectiveness on the field these Cubs players were worthy opponents of the old New York Giants, for whom Adams was a fan, and who were the kings of baseball during most of the first fifth of a century of the modern era (1901-1922).  That is, until they were suddenly overtaken by their crosstown rivals, the Yankees, who came to dominate baseball for the next 40 years, beginning with the arrival of Babe Ruth.

The analogy here is that the Cubs infielders, while individuals, didn’t think of their roles as completely separate.  They had common goals and, in order to win on the field, needed to act as a unit.  In the case of executing the double play, they were a very effective unit.  So why do we have these dichotomies in information management when the goals are the same?

Much has been written both academically and commercially about Business Intelligence, Business Analytics, and Knowledge Discovery in Databases.  I’ve surveyed the literature and for good and bad, and what I find is that these terms are thrown around, mostly by commercial firms in either information technology or consulting, all with the purpose of attempting to provide a discriminator for their technology or service.  Many times the concepts are used interchangeably, or one is set up as a strawman to push an agenda or product.  Thus, it seems some hard definitions are in order.

According to Technopedia:

Business Intelligence (BI) is the use of computing technologies for the identification, discovery and analysis of business data – like sales revenue, products, costs and incomes.

Business analytics (BA) refers to all the methods and techniques that are used by an organization to measure performance. Business analytics are made up of statistical methods that can be applied to a specific project, process or product. Business analytics can also be used to evaluate an entire company.

Knowledge Discover in Databases (KDD) is the process of discovering useful knowledge from a collection of data. This widely used data mining technique is a process that includes data preparation and selection, data cleansing, incorporating prior knowledge on data sets and interpreting accurate solutions from the observed results.

As with much of computing in its first phases, these functions were seen to be separate.

The perception of BI, based largely on the manner in which it has been implemented in its first incarnations, is viewed as a means of gathering data into relational data warehouses or data marts and then building out decision support systems.  These methods have usually involved a great deal of overhead in both computing and personnel, since practical elements of gathering, sorting, and delivering data involved additional coding and highly structured user interfaces.  The advantage of BI is its emphasis on integration.  The disadvantage from the enterprise perspective, is that the method and mode of implementation is phlegmatic at best.

BA is BI’s younger cousin.  Applications were developed and sold as “analytical tools” focused on a niche of data within the enterprise’s requirements.  In this manner decision makers could avoid having to wait for the overarching and ponderous BI system to get to their needs, if ever.  This led many companies to knit together specialized tools in so-called “best-of-breed” configurations to achieve some measure of integration across domains.  Of course, given the plethora of innovative tools, much data import and reconciliation has had to be inserted into the process.  Thus, the advantages of BA in the market have been to reward innovation and focus on the needs of the domain subject matter expert (SME).  The disadvantages are the insertion of manual intervention in an automated process due to lack of integration, which is further exacerbated by so-called SMEs in data reconciliation–a form of rent seeking behavior that only rewards body shop consulting, unnecessarily driving up overhead.  The panacea applied to this last disadvantage has been the adoption of non-proprietary XML schemas across entire industries that reduce both the overhead and data silos found in the BA market.

KDD is our both our oldster and youngster–grandpa and the grandson hanging out.  It is a term that describes a necessary function of insight–allowing one to determine what the data tells us are needed for analytics rather than relying on a “canned” solution to determine how to approach a particular set of data.  But it does so, oftentimes, using an older approach that predates BI, known as data mining.  You will often find KDD linked to arguments in favor of flat file schemas, NoSQL (meaning flat non-relational databases), and free use of the term Big Data, which is becoming more meaningless each year that it is used, given Moore’s Law.  The advantage of KDD is that it allows for surveying across datasets to pick up patterns and interrelationships within our systems that are otherwise unknown, particularly given the way in which the human mind can fool itself into reifying an invalid assumption.  The disadvantage, of course, is that KDD will have us go backward in terms of identifying and categorizing data by employing Data Mining, which is an older concept from early in computing in which a team of data scientists and data managers develop solutions to identify, categorize, and use that data–manually doing what automation was designed to do.  Understanding these limitations, companies focused on KDD have developed heuristics (cognitive computing) that identify patterns and possible linkages, removing a portion of the overhead associated with Data Mining.

Keep in mind that you never get anything for nothing–the Second Law of Thermodynamics ensures that energy must be borrowed from somewhere in order to produce something–and its corollaries place limits on expected efficiencies.  While computing itself comes as close to providing us with Maxwell’s Demon as any technology, even in this case entropy is being realized elsewhere (in the software developer and the hardware manufacturing process), even though it is not fully apparent in the observed data processing.

Thus, manual effort must be expended somewhere along the way.  In any sense, all of these methods are addressing the same problem–the conversion of data into information.  It is information that people can consume, understand, place into context, and act upon.

As my colleague Dave Gordon has pointed out to me several times that there are also additional methods that have been developed across all of these methods to make our use of data more effective.  These include more powerful APIs, the aforementioned cognitive computing, and searching based on the anticipated questions of the user as is used by search engines.

Technology, however, is moving very rapidly and so the lines between BI, BA and KDD are becoming blurred.  Fourth generation technology that leverages API libraries to be agnostic to underlying data, and flexible and adaptive UI technology can provide a  comprehensive systemic solution to bring together the goals of these approaches to data. With the ability to leverage internal relational database tools and flat schemas for non-relational databases, the application layer, which is oftentimes a barrier to delivery of information, becomes open as well, putting the SME back in the driver’s seat.  Being able to integrate data across domain silos provide insight into systems behavior and performance not previously available with “canned” applications written to handle and display data a particular way, opening up knowledge discovery in the data.

What this means practically is that those organizations that are sensitive to these changes will understand the practical application of sunk cost when it comes to aging systems being provided by ponderous behemoths that lack agility in their ability to introduce more flexible, less costly, and lower overhead software technologies.  It means that information management can be democratized within the organization among the essential consumers and decision makers.

Productivity and effectiveness are the goals.

Something New (Again)– Top Project Management Trends 2017

Atif Qureshi at Tasque, which I learned via Dave Gordon’s blog, went out to LinkedIn’s Project Management Community to ask for the latest tends in project management.  You can find the raw responses to his inquiry at his blog here.  What is interesting is that some of these latest trends are much like the old trends which, given continuity makes sense.  But it is instructive to summarize the ones that came up most often.  Note that while Mr. Qureshi was looking for ten trends, and taken together he definitely lists more than ten, there is a lot of overlap.  In total the major issues seem to the five areas listed below.

a.  Agile, its hybrids, and its practical application.

It should not surprise anyone that the latest buzzword is Agile.  But what exactly is it in its present incarnation?  There is a great deal of rising criticism, much of it valid, that it is a way for developers and software PMs to avoid accountability. Anyone ready Glen Alleman’s Herding Cat’s Blog is aware of the issues regarding #NoEstimates advocates.  As a result, there are a number hybrid implementations of Agile that has Agile purists howling and non-purists adapting as they always do.  From my observations, however, there is an Ur-Agile that is out there common to all good implementations and wrote about them previously in this blog back in 2015.  Given the time, I think it useful to repeat it here.

The best articulation of Agile that I have read recently comes from Neil Killick, whom I have expressed some disagreement on the #NoEstimates debate and the more cultish aspects of Agile in past posts, but who published an excellent post back in July (2015) entitled “12 questions to find out: Are you doing Agile Software Development?”

Here are Neil’s questions:

  1. Do you want to do Agile Software Development? Yes – go to 2. No – GOODBYE.
  2. Is your team regularly reflecting on how to improve? Yes – go to 3. No – regularly meet with your team to reflect on how to improve, go to 2.
  3. Can you deliver shippable software frequently, at least every 2 weeks? Yes – go to 4. No – remove impediments to delivering a shippable increment every 2 weeks, go to 3.
  4. Do you work daily with your customer? Yes – go to 5. No – start working daily with your customer, go to 4.
  5. Do you consistently satisfy your customer? Yes – go to 6. No – find out why your customer isn’t happy, fix it, go to 5.
  6. Do you feel motivated? Yes – go to 7. No – work for someone who trusts and supports you, go to 2.
  7. Do you talk with your team and stakeholders every day? Yes – go to 8. No – start talking with your team and stakeholders every day, go to 7.
  8. Do you primarily measure progress with working software? Yes – go to 9. No – start measuring progress with working software, go to 8.
  9. Can you maintain pace of development indefinitely? Yes – go to 10. No – take on fewer things in next iteration, go to 9.
  10. Are you paying continuous attention to technical excellence and good design? Yes – go to 11. No – start paying continuous attention to technical excellent and good design, go to 10.
  11. Are you keeping things simple and maximising the amount of work not done? Yes – go to 12. No – start keeping things simple and writing as little code as possible to satisfy the customer, go to 11.
  12. Is your team self-organising? Yes – YOU’RE DOING AGILE SOFTWARE DEVELOPMENT!! No – don’t assign tasks to people and let the team figure out together how best to satisfy the customer, go to 12.

Note that even in software development based on Agile you are still “provid(ing) value by independently developing IP based on customer requirements.”  Only you are doing it faster and more effectively.

With the possible exception of the “self-organizing” meme, I find that items through 11 are valid ways of identifying Agile.  Given that the list says nothing about establishing closed-loop analysis of progress says nothing about estimates or the need to monitor progress, especially on complex projects.  As a matter of fact one of the biggest impediments noted elsewhere in industry is the inability of Agile to scale.  This limitations exists in its most simplistic form because Agile is fine in the development of well-defined limited COTS applications and smartphone applications.  It doesn’t work so well when one is pushing technology while developing software, especially for a complex project involving hundreds of stakeholders.  One other note–the unmentioned emphasis in Agile is technical performance measurement, since progress is based on satisfying customer requirements.  TPM, when placed in the context of a world of limited resources, is the best measure of all.

b.  The integration of new technology into PM and how to upload the existing PM corporate knowledge into that technology.

This is two sides of the same coin.  There is always  debate about the introduction of new technologies within an organization and this debate places in stark contrast the differences between risk aversion and risk management.

Project managers, especially in the complex project management environment of aerospace & defense tend, in general, to be a hardy lot.  Consisting mostly of engineers they love to push the envelope on technology development.  But there is also a stripe of engineers among them that do not apply this same approach of measured risk to their project management and business analysis system.  When it comes to tracking progress, resource management, programmatic risk, and accountability they frequently enter the risk aversion mode–believing that the less eyes on what they do the more leeway they have in achieving the technical milestones.  No doubt this is true in a world of unlimited time and resources, but that is not the world in which we live.

Aside from sub-optimized self-interest, the seeds of risk aversion come from the fact that many of the disciplines developed around performance management originated in the financial management community, and many organizations still come at project management efforts from perspective of the CFO organization.  Such rice bowl mentality, however, works against both the project and the organization.

Much has been made of the wall of honor for those CIA officers that have given their lives for their country, which lies to the right of the Langley headquarters entrance.  What has not gotten as much publicity is the verse inscribed on the wall to the left:

“And ye shall know the truth and the truth shall make you free.”

      John VIII-XXXII

In many ways those of us in the project management community apply this creed to the best of our ability to our day-to-day jobs, and it lies as the basis for all of the management improvement from Deming’s concept of continuous process improvement, through the application of Six Sigma and other management improvement methods.  What is not part of this concept is that one will apply improvement only when a customer demands it, though they have asked politely for some time.  The more information we have about what is happening in our systems, the better the project manager and the project team is armed with applying the expertise which qualified the individuals for their jobs to begin with.

When it comes to continual process improvement one does not need to wait to apply those technologies that will improve project management systems.  As a senior management (and well-respected engineer) when I worked in Navy told me; “if my program managers are doing their job virtually every element should be in the yellow, for only then do I know that they are managing risk and pushing the technology.”

But there are some practical issues that all managers must consider when managing the risks in introducing new technology and determining how to bring that technology into existing business systems without completely disrupting the organization.  This takes–good project management practices that, for information systems, includes good initial systems analysis, identification of those small portions of the organization ripe for initial entry in piloting, and a plan of data normalization and rationalization so that corporate knowledge is not lost.  Adopting systems that support more open systems that militate against proprietary barriers also helps.

c.  The intersection of project management and business analysis and its effects.

As data becomes more transparent through methods of normalization and rationalization–and the focus shifts from “tools” to the knowledge that can be derived from data–the clear separation that delineated project management from business analysis in line-and-staff organization becomes further blurred.  Even within the project management discipline, the separation in categorization of schedule analysts from cost analysts from financial analyst are becoming impediments in fully exploiting the advantages in looking at all data that is captured and which affects project performance.

d.  The manner of handling Big Data, business intelligence, and analytics that result.

Software technologies are rapidly developing that break the barriers of self-contained applications that perform one or two focused operations or a highly restricted group of operations that provide functionality focused on a single or limited set of business processes through high level languages that are hard-coded.  These new technologies, as stated in the previous section, allow users to focus on access to data, making the interface between the user and the application highly adaptable and customizable.  As these technologies are deployed against larger datasets that allow for integration of data across traditional line-and-staff organizations, they will provide insight that will garner businesses competitive advantages and productivity gains against their contemporaries.  Because of these technologies, highly labor-intensive data mining and data engineering projects that were thought to be necessary to access Big Data will find themselves displaced as their cost and lack of agility is exposed.  Internal or contracted out custom software development devoted along these same lines will also be displaced just as COTS has displaced the high overhead associated with these efforts in other areas.  This is due to the fact that hardware and processes developments are constantly shifting the definition of “Big Data” to larger and larger datasets to the point where the term will soon have no practical meaning.

e.  The role of the SME given all of the above.

The result of the trends regarding technology will be to put the subject matter expert back into the driver’s seat.  Given adaptive technology and data–and a redefinition of the analyst’s role to a more expansive one–we will find that the ability to meet the needs of functionality and the user experience is almost immediate.  Thus, when it comes to business and project management systems, the role of Agile, while these developments reinforce the characteristics that I outlined above are made real, the weakness of its applicability to more complex and technical projects is also revealed.  It is technology that will reduce the risk associated with contract negotiation, processes, documentation, and planning.  Walking away from these necessary components to project management obfuscates and avoids the hard facts that oftentimes must be addressed.

One final item that Mr. Qureshi mentions in a follow-up post–and which I have seen elsewhere in similar forums–concerns operational security.  In deployment of new technologies a gatekeeper must be aware of whether that technology will not open the organization’s corporate knowledge to compromise.  Given the greater and more integrated information and knowledge garnered by new technology, as good managers it is incumbent to ensure these improvements do not translate into undermining the organization.

Over at AITS.org — In Defense of Empiricism

What is the responsibility of those in high tech for ensuring that that their products are used in an ethical manner?  That information management is a product of empiricism is self-evident.  Project and business managers who would delude themselves by relying on invalid information usually find themselves facing hard reality in a most unpleasant manner.  How do we separate out the fanciful from the real when information is flattened, highlighted by the issue, raised over the last year, of fake news?  These are the issues that I address in my latest post at AITS.org.  Please check it out.

Don’t Know Much…–Knowledge Discovery in Data

A short while ago I found myself in an odd venue where a question was posed about my being an educated individual, as if it were an accusation.  Yes, I replied, but then, after giving it some thought, I made some qualifications to my response.  Educated regarding what?

It seems that, despite a little more than a century of public education and widespread advanced education having been adopted in the United States, along with the resulting advent of widespread literacy, that we haven’t entirely come to grips with what it means.  For the question of being an “educated person” has its roots in an outmoded concept–an artifact of the 18th and 19th century–where education was delineated, and availability determined, by class and profession.  Perhaps this is the basis for the large strain of anti-intellectualism and science denial in the society at large.

Virtually everyone today is educated in some way.  Being “educated” means nothing–it is a throwaway question, an affectation.  The question is whether the relevant education meets the needs of the subject being addressed.  An interesting discussion about this very topic is explored at Sam Harris’ blog in the discussion he held with amateur historian Dan Carlin.

In reviewing my own education, it is obvious that there are large holes in what I understand about the world around me, some of them ridiculously (and frustratingly) prosaic.  This shouldn’t be surprising.  For even the most well-read person is ignorant about–well–virtually everything in some manner.  Wisdom is reached, I think, when you accept that there are a few things that you know for certain (or have a high probability and level of confidence in knowing), and that there are a host of things that constitute the entire library of knowledge encompassing anything from a particular domain to that of the entire universe, which you don’t know.

To sort out a well read dilettante from someone who can largely be depended upon to speak with some authority on a topic, educational institutions, trade associations, trade unions, trade schools, governmental organizations, and professional organizations have established a system of credentials.  No system is entirely perfect and I am reminded (even discounting fraud and incompetence) that half of all doctors and lawyers–two professions that have effectively insulated themselves from rigorous scrutiny and accountability to the level of almost being a protected class–graduate in the bottom half of their class.  Still, we can sort out a real brain surgeon from someone who once took a course in brain physiology when we need medical care (to borrow an example from Sam Harris in the same link above).

Furthermore, in the less potentially life-threatening disciplines we find more variation.  There are credentialed individuals who constantly get things wrong.  Among economists, for example, I am more likely to follow those who got the last financial crisis and housing market crash right (Joe Stiglitz, Dean Baker, Paul Krugman, and others), and those who have adjusted their models based on that experience (Brad DeLong, Mark Thoma, etc.), than those who have maintained an ideological conformity and continuity despite evidence.  Science–both what are called the hard and soft sciences–demands careful analysis and corroborating evidence to be tied to any assertions in their most formalized contexts.  Even well accepted theories among a profession are contingent–open to new information and discovery that may modify, append, or displace them.  Furthermore, we can find polymaths and self-taught individuals who have equaled or exceeded credentialed peers.  In the end the proof is in the pudding.

My point here is threefold.  First, in most cases we don’t know what we don’t know.  Second, complete certainty is not something that exists in this universe, except perhaps at death.  Third, we are now entering a world where new technologies allow us to discover new insights in accessing previously unavailable or previously opaque data.

One must look back at the revolution in information over the last fifty years and its resulting effect on knowledge to see what this means in our day-to-day existence.  When I was a small boy in school we largely relied on the published written word.  Books and periodicals were the major means of imparting information, aside from collocated collaborative working environments, the spoken word, and the old media of magazines, radio, and television.  Information was hard to come by–libraries were limited in their collections and there were centers of particular domain knowledge segmented by geography.   Furthermore, after the introduction of television, society had developed  trusted sources and gatekeepers to keep the cranks and flimflam out.

Today, new media–including all forms of digitized information–has expanded and accelerated the means of transmitting information.  Unlike old media, books, and social networking, there are also fewer gatekeepers in new media: editors, fact checkers, domain experts, credentialed trusted sources, etc. that ensure quality control, reliability, fidelity of the information, and provide context.  It’s the wild west of information and those wooed by the voodoo of self-organization contribute to the high risk associated with relying on information provided through these sources.  Thus, organizations and individuals who wish to stay within the fact-based community have had to sort out reliable, trusted sources and, even in these cases, develop–for lack of a better shorthand–BS detectors.  There are two purposes to this exercise: to expand the use of the available data and leverage the speed afforded by new media, and to ensure that the data is reliable and can reliably tell us something important about our subject of interest.

At the level of the enterprise, the sector, or the project management organization, we similarly are faced with the situation in which the scope of data that can be converted into information is rapidly expanding.  Unlike the larger information market, this data on the microeconomic level is more controlled.  Given that data at this level suffers from significance because it records isolated events, or small sample sizes, the challenge has been to derive importance from data where sometimes significance is minimal.

Furthermore, our business systems, because of the limitations of the selected technology, have been self-limiting.  I come across organizations all the time who cannot imagine the incorporation and integration of additional data sets largely because the limitations of their chosen software solution has inculcated that approach–that belief–into the larger corporate culture.  We do not know what we do not know.

Unfortunately, it’s what you do not know that, more often than not, will play a significant role in your organization’s destiny, just as an individual that is more self-aware is better prepared to deal with the challenges that manifest themselves as risk and its resultant probabilities.  Organizations must become more aware and look at things differently, especially since so many of the more conventional means of determining risk and opportunities seems to be failing to keep up with the times, which is governed by the capabilities of new media.

This is the imperative of applying knowledge discovery in data at the organizational and enterprise level–and in shifting one’s worldview from focusing on the limitations of “tools”: how they paint a screen, whether data is displayed across the x or y axis, what shade of blue indicates good performance, how many keystrokes does it take to perform an operation, and all manner of glorified PowerPoint minutia–to a focus on data:  the ability of solutions to incorporate more data, more efficiently, more quickly, from a wider range of sources, and processed in a more effective manner, so that it is converted into information to be able to be used to inform decision making at the most decisive moment.

The Monster Mash — Zombie Ideas in Project and Information Management

Just completed a number of meetings and discussions among thought leaders in the area of complex project management this week, and I was struck by a number of zombie ideas in project management, especially related to information, that just won’t die.  The use of the term zombie idea is usually attributed to the Nobel economist Paul Krugman from his excellent and highly engaging (as well as brutally honest) posts at the New York Times, but for those not familiar, a zombie idea is “a proposition that has been thoroughly refuted by analysis and evidence, and should be dead — but won’t stay dead because it serves a political purpose, appeals to prejudices, or both.”

The point is that to a techie–or anyone engaged in intellectual honesty–is that they are often posed in the form of question begging, that is, they advance invalid assumptions in the asking or the telling.  Most often they take the form of the assertive half of the same coin derived from “when did you stop beating your wife?”-type questions.  I’ve compiled a few of these for this post and it is important to understand the purpose for doing so.  It is not to take individuals to task or to bash non-techies–who have a valid reason to ask basic questions based on what they’ve heard–but propositions put forth by people who should know better based on their technical expertise or experience.  Furthermore, knowing and understanding technology and its economics is really essential today to anyone operating in the project management domain.

So here are a few zombies that seem to be most common:

a.  More data equals greater expense.  I dealt with this issue in more depth in a previous post, but it’s worth repeating here:  “When we inform Moore’s Law by Landauer’s Principle, that is, that the energy expended in each additional bit of computation becomes vanishingly small, it becomes clear that the difference in cost in transferring a MB of data as opposed to a KB of data is virtually TSTM (“too small to measure”).”  The real reason why we continue to deal with this assertion is both political in nature and also based in social human interaction.  People hate oversight and they hate to be micromanaged, especially to the point of disrupting the work at hand.  We see behavior, especially in regulatory and contractual relationships, where the reporting entity plays the game of “hiding the button.”  This behavior is usually justified by pointing to examples of dysfunction, particularly on the part of the checker, where information submissions lead to the abuse of discretion in oversight and management.  Needless to say, while such abuse does occur, no one has yet to point quantitatively to data (as opposed to anecdotally) that show how often this happens.

I would hazard to guess that virtually anyone with some experience has had to work for a bad boss; where every detail and nuance is microscopically interrogated to the point where it becomes hard to make progress on the task at hand.  Such individuals, who have been advanced under the Peter principle must, no doubt, be removed from such a position.  But this often happens in any organization, whether it be in private enterprise–especially in places where there is no oversight, check-and-balances, means of appeal, or accountability–or government–and is irrelevant to the assertion.  The expense item being described is bad management, not excess data.  Thus, such assertions are based on the antecedent assumption of bad management, which goes hand-in-hand with…

b. More information is the enemy of efficiency.  This is the other half of the economic argument to more data equals greater expense.  And I failed to mention that where the conflict has been engaged over these issues, some unjustifiable figure is given for the additional data that is certainly not supported by the high tech economics cited above.  Another aspect of both of these perspectives also comes from the conception of non-techies that more data and information is equivalent to pre-digital effort, especially in conceptualizing the work that often went into human-readable reports.  This is really an argument that supports the assertion that it is time to shift the focus from fixed report formatting functionality in software based on limited data to complete data, which can be formatted and processed as necessary.  If the right and sufficient information is provided up-front, then additional questions and interrogatories that demand supplemental data and information–with the attendant multiplication of data streams and data islands that truly do add cost and drive inefficiency–are at least significantly reduced, if not eliminated.

c.  Data size adds unmanageable complexity.  This was actually put forth by another software professional–and no doubt the non-techies in the room would have nodded their heads in agreement (particularly given a and b above), if opposing expert opinion hadn’t been offered.  Without putting too fine a point on it, a techie saying this to an open forum is equivalent to whining that your job is too hard.  This will get you ridiculed at development forums, where you will be viewed as an insufferable dilettante.  Digitized technology for well over 40 years has been operating under the phenomenon of Moore’s Law.  Under this law, computational and media storage capability doubles at least every two years under the original definition, though that equation has accelerated to somewhere between 12 and 24 months.  Thus, what was considered big data, say, in 1997 when NASA first coined the term, is not considered big data today.  No doubt, what is considered big data this year will not be considered big data two years from now.  Thus, the term itself is relative and may very well become archaic.  The manner in which data is managed–its rationalization and normalization–is important in successfully translating disparate data sources, but the assertion that big is scary is simply fear mongering because you don’t have the goods.

d.  Big data requires more expensive and sophisticated approaches.  This flows from item c above as well and is often self-serving.  Scare stories abound, often using big numbers which sound scary.  All data that has a common use across domains has to be rationalized at some point if they come from disparate sources, and there are a number of efficient software techniques for accomplishing this.  Furthermore, support for agnostic APIs and common industry standards, such as the UN/CEFACT XML, take much of the rationalization and normalization work out of a manual process.  Yet I have consistently seen suboptimized methods being put forth that essentially require an army of data scientists and coders to essentially engage in brute force data mining–a methodology that has been around for almost 30 years: except that now it carries with it the moniker of big data.  Needless to say this approach is probably the most expensive and slowest out there.  But then, the motivation for its use by IT shops is usually based in rice bowl and resource politics.  This is flimflam–an attempt to revive an old zombie under a new name.  When faced with such assertions, see Moore’s Law and keep on looking for the right answer.  It’s out there.

e.  Performance management and assessment is an unnecessary “regulatory” expense.  This one keeps coming up as part of a broader political agenda beyond just project management.  I’ve discussed in detail the issues of materiality and prescriptiveness in regulatory regimes here and here, and have addressed the obvious legitmacy of organizations to establish one in fiduciary, contractual, and governmental environments.

My usual response to the assertion of expense is to simply point to the unregulated derivatives market largely responsible for the financial collapse, and the resulting deep economic recession that followed once the housing bubble burst.  (And, aside from the cost of human suffering and joblessness, the expenses related to TARP).  Thus we know that the deregulation of banking had gone so well.  Even after the Band-Aid of Dodd-Frank the situation probably requires a bit more vigor, and should include the ratings agencies as well as the real estate market.  But here is the fact of the matter: such expenses cannot be monetized as additive because “regulatory” expenses usually represent an assessment of the day-to-day documentation, systems, and procedures required when performing normal business operations and due diligence in management.  I attended an excellent presentation last week where the speaker, tasked with finding unnecessary regulatory expenses, admitted as much.

Thus, what we are really talking about is an expense that is an essential prerequisite to entry in a particular vertical, especially where monopsony exists as a result of government action.  Moral hazard, then, is defined by the inherent risk assumed by contract type, and should be assessed on those terms.  Given the current trend is to raise thresholds, the question is going to be–in the government sphere–whether public opinion will be as forgiving in a situation where moral hazard assumes $100M in risk when things head south, as they often do with regularity in project management.  The way to reduce that moral hazard is through sufficiency of submitted data.  Thus, we return to my points in a and b above.

f.  Effective project assessment can be performed using high level data.  It appears that this view has its origins in both self-interest and a type of anti-intellectualism/anti-empiricism.

In the former case, the bias is usually based on the limitations of either individuals or the selected technology in providing sufficient information.  In the latter case, the argument results in a tautology that reinforces the fallacy that absence of evidence proves evidence of absence.  Here is how I have heard the justification for this assertion: identifying emerging trends in a project does not require that either trending or lower level data be assessed.  The projects in question are very high dollar value, complex projects.

Yes, I have represented this view correctly.  Aside from questions of competency, I think the fallacy here is self-evident.  Study after study (sadly not all online, but performed within OSD at PARCA and IDA over the last three years) have demonstrated that high level data averages out and masks indicators of risk manifestation, which could have been detected looking at data at the appropriate level, which is the intersection of work and assigned resources.  In plain language, this requires integration of the cost and schedule systems, with risk first being noted through consecutive schedule performance slips.  When combined with technical performance measures, and effective identification of qualitative and quantitative risk tied to schedule activities, the early warning is two to three months (and sometime more) before the risk is reflected in the cost measurement systems.  You’re not going to do this with an Excel spreadsheet.  But, for reference, see my post  Excel is not a Project Management Solution.

It’s time to kill the zombies with facts–and to behead them once and for all.

Over at AITS: The Medium Controls the Present: Is It Too Late to Stop a Digital Dark Age?

“He who controls the past controls the future. He who controls the present controls the past.” ― George Orwell, 1984

Google Vice President Vint Cerf recently turned some heads at the annual meeting of the American Association for the Advancement of Science in San Jose, warning the attending scientists that the digitization of the artifacts of civilization may create a digital dark age. “If we’re thinking 1,000 years, 3,000 years ahead in the future, we have to ask ourselves, how do we preserve all the bits that we need in order to correctly interpret the digital objects we create?” Cerf’s concerns are that today’s technology will become obsolete at some future time, with the information of our own times locked in a technological prison.

To see the remainder of this post please go to AITS.org.