Shake it Out – Embracing the Future of Program Management – Part Two: Private Industry Program and Project Management in Aerospace, Space, and Defense

In my previous post, I focused on Program and Project Management in the Public Interest, and the characteristics of its environment, especially from the perspective of the government program and acquisition disciplines. The purpose of this exploration is to lay the groundwork for understanding the future of program management—and the resulting technological and organizational challenges that are required to support that change.

The next part of this exploration is to define the motivations, characteristics, and disciplines of private industry equivalencies. Here there are commonalities, but also significant differences, that relate to the relationship and interplay between public investment, policy and acquisition, and private business interests.

Consistent with our initial focus on public interest project and program management (PPM), the vertical with the greatest relationship to it is found in the very specialized fields of aerospace, space, and defense. I will therefore first begin with this industry vertical.

Private Industry Program and Project Management

Aerospace, Space & Defense (ASD). It is here that we find commercial practice that comes closest to the types of structure, rules, and disciplines found in public interest PPM. As a result, it is also here where we find the most interesting areas of conflict and conciliation between private motivations and public needs and duties. Particularly since most of the business activity in this vertical is generated by and dependent on federal government acquisition strategy and policy.

On the defense side, the antecedent policy documents guiding acquisition and other measures are the National Security Strategy (NSS), which is produced by the President’s staff, the National Defense Strategy (NDS), which further translates and refines the NSS, and the National Military Strategy (NMS), which is delivered to the Secretary of Defense by the Joint Chiefs of Staff of the various military services, which is designed to provide unfettered military advise to the Secretary of Defense.

Note that the U.S. Department of Defense (DoD) and the related agencies, including the intelligence agencies, operate under a strict chain of command that ensures civilian control under the National Military Establishment. Aside from these structures, the documents and resulting legislation from DoD actions also impact such civilian agencies as the Department of Energy (DOE), Department of Homeland Security (DHS), the National Aeronautics and Space Administration (NASA), and the Federal Aviation Administration (FAA), among others.

The countervailing power and checks-and-balances on this Executive Branch power lies with the appropriation and oversight powers of the Congress. Until the various policies are funded and authorized by Congress, the general tenor of military, intelligence, and other operations have tangential, though not insignificant effects, on the private economy. Still, in terms of affecting how programs and projects are monitored, it is within the appropriation and authorization bills that we find the locus of power. As one of my program managers reminded me during my first round through the budget hearing process, “everyone talks, but money walks.”

On the Aerospace side, there are two main markets. One is related to commercial aircraft, parts, and engines sold to the various world airlines. The other is related to government’s role in non-defense research and development, as well as activities related to private-public partnerships, such as those related to space exploration. The individual civilian departments of government also publish their own strategic plans based on their roles, from which acquisition strategy follows. These long terms strategic plans, usually revised at least every five years, are then further refined into strategic implementation plans by various labs and directorates.

The suppliers and developers of the products and services for government, which represents the bulk of ASD, face many of the same challenges delineated in surveying their government counterparts. The difference, of course, is that these are private entities where the obligations and resulting mores are derived from business practice and contractual obligations and specifications.

This is not to imply a lack of commitment or dedication on the part of private entities. But it is an important distinction, particularly since financial incentives and self-interest are paramount considerations. A contract negotiator, for example, in order to be effective, must understand the underlying pressures and relative position of each of the competitors in the market being addressed. This individual should also be familiar with the particular core technical competencies of the competitors as well as their own strategic plans, the financial positions and goals that they share with their shareholders in the case of publicly traded corporations, and whether actual competition exists.

The Structure of the Market. Given the mergers and acquisitions of the last 30 years, along with the consolidation promoted by the Department of Defense as unofficial policy after the fall of the Berlin Wall and the lapse of antitrust enforcement, the portion of ASD and Space that rely on direct government funding, even those that participate in public-private ventures where risk sharing is involved, operate in a monopsony—the condition in which a single buyer—the U.S. government—substantially controls the market as the main purchaser of supplies and services. This monopsony market is then served by a supplier market that is largely an oligopoly—where there are few suppliers and limited competition—and where, in some technical domains, some suppliers exert monopoly power.

Acknowledging this condition informs us regarding the operational motivators of this market segment in relation to culture, practice, and the disciplines and professions employed.

In the first case, given the position of the U.S. government, the normal pressures of market competition and market incentives do not apply to the few competitors participating in the market. As a result, only the main buyer has the power to recreate, in an artificial manner, an environment which replicate the market incentives and penalties normally employed in a normative, highly diverse and competitive market.

Along these lines, for market incentives, the government can, and often does, act as the angel investor, given the rigorous need for R&D in such efforts. It can also lower the barriers to participation in order to encourage more competition and innovation. This can be deployed across the entire range of limited competitors, or it can be expansive in its approach to invite new participants.

Market penalties that are recreated in this environment usually target what economists call “rent-seeking behavior.” This is a situation where there may be incumbents that seek to increase their own wealth without creating new benefits, innovation, or providing additional wealth to society. Lobbying, glad-handing, cronyism, and other methods are employed and, oftentimes, rampant under monosponistic systems. Revolving-door practices, in which the former government official responsible for oversight obtains employment in the same industry and, oftentimes, with the same company, is too often seen in these cases.

Where there are few competitors, market participants will often play follow-the-leader and align themselves to dominate particular segments of the market in appealing to the government or elected representatives for business. This may mean that, in many cases, they team with their ostensible competitors to provide a diverse set of expertise from the various areas of specialty. As with any business, profitability is of paramount importance, for without profit there can be no business operations. It is here: the maximization of profit and shareholder value, that is the locus of power in understanding the motivation of these and most businesses.

This is not a value judgment. As faulty and risky as this system may be, no better business structure has been found to provide value to the public through incentives for productive work, innovation, the satisfaction of demand, and efficiency. The challenge, apart from what political leadership decides to do regarding the rules of the market, is to make those rules that do exist work in the public interest through fair, ethical, and open contracting practices.

To do this successfully requires contracting and negotiating expertise. To many executives and non-contracting personnel, negotiations appear to be a zero-sum game. No doubt, popular culture, mass media and movies, and self-promoting business people help mold this perception. Those from the legal profession, in particular, deal with a negotiation as an extension of the adversarial processes through which they usually operate. This is understandable given their education, and usually disastrous.

As an attorney friend of mine once observed: “My job, if I have done it right, is to ensure that everyone walking out of the room is in some way unhappy. Your job, in contrast, is to ensure that everyone walking out of it is happy.” While a generalization—and told tongue-in-cheek—it highlights the core difference in approach between these competing perspectives.

A good negotiator has learned that, given two motivated sides coming together to form a contract, that there is an area of intersection where both parties will view the deal being struck as meeting their goals, and as such, fair and reasonable. It is the job of the negotiator to find that area of mutual fairness, while also ensuring that the contract is clear and free of ambiguity, and that the structure of the instrument—price and/or cost, delivery, technical specification, statement of work or performance specification, key performance parameters, measures of performance, measures of effectiveness, management, sufficiency of capability (responsibility), and expertise—sets up the parties involved for success. A bad contract can no more be made good than the poorly prepared and compacted soil and foundation of a house be made good after the building goes up.

The purpose of a good contract is to avoid litigation, not to increase the likelihood of it happening. Furthermore, it serves the interests of neither side to obtain a product or service at a price, or under such onerous conditions, where the enterprise fails to survive. Alternatively, it does a supplier little good to obtain a contract that provides the customer with little financial flexibility, that fails to fully deliver on its commitments, that adversely affects its reputation, or that is perceived in a negative light by the public.

Effective negotiators on both sides of the table are aware of these risks and hazards, and so each is responsible for the final result, though often the power dynamic between the parties may be asymmetrical, depending on the specific situation. It is one of the few cases in which parties having both mutual and competing interests are brought together where each side is responsible for ensuring that the other does not hazard their organization. It is in this way that a contract—specifically one that consists of a long-term R&D cost-plus contract—is much like a partnership. Both parties must act in good faith to ensure the success of the project—all other considerations aside—once the contract is signed.

In this way, the manner of negotiating and executing contracts is very much a microcosm of civil society as a whole, for good or for bad, depending on the practices employed.

Given that the structure of aerospace, space, and defense consists of one dominant buyer with few major suppliers, the disciplines required relate to the details of the contract and its resulting requirements that establish the rules of governance.

As I outlined in my previous post, the characteristics of program and project management in the public interest, which are the products of contract management, are focused on successfully developing and obtaining a product to meet particular goals of the public under law, practice, and other delineated specific characteristics.

As a result, the skill-sets that are of paramount importance to business in this market prior to contract award are cost estimating, applied engineering expertise including systems engineering, financial management, contract negotiation, and law. The remainder of disciplines regarding project and program management expertise follow based on what has been established in the contract and the amount of leeway the contracting instrument provides in terms of risk management, cost recovery, and profit maximization, but the main difference is that this approach to the project leans more toward contract management.

Another consideration in which domains are brought to bear relates to position of the business in terms of market share and level of dominance in a particular segment of the market. For example, a company may decide to allow a lower than desired target profit. In the most extreme cases, the company may allow the contract to become a loss leader in order to continue to dominate a core competency or to prevent new entries into that portion of the market.

On the other side of the table, government negotiators are prohibited by the Federal Acquisition Regulation (the FAR) from allowing companies to “buy-in” by proposing an obviously lowball offer, but some do in any event, whether it is due to lack of expertise or bowing to the exigencies of price or cost. This last condition, combined with rent-seeking behavior mentioned earlier, where they occur, will distort and undermine the practices and indicators needed for effective project and program management. In these cases, the dysfunctional result is to create incentives to maximize revenue and scope through change orders, contracting language ambiguity, and price inelasticity. This also creates an environment that is resistant to innovation and rewards inefficiency.

But apart from these exceptions, the contract and its provisions, requirements, and type are what determine the structure of the eventual project or program management team. Unlike the commercial markets in which there are many competitors, the government through negotiation will determine the manner of burdening rate structures and allowable profit or margin. This last figure is determined by the contract type and the perceived risk of the contract goals to the contractor. The higher the risk, the higher the allowed margin or profit. The reverse applies as well.

Given this basis, the interplay between private entities and the public acquisition organizations, including the policy-setting staffs, are also of primary concern. Decision-makers, influences, and subject-matter experts from these entities participate together in what are ostensibly professional organizations, such as the National Defense Industrial Association (NDIA), the Project Management Institute (PMI), the College of Scheduling (CoS), the College of Performance Management (CPM), the International Council on Systems Engineering (INCOSE), the National Contract Management Association (NCMA), and the International Cost Estimating and Analysis Association (ICEAA), among the most frequently attended by these groups. Corresponding and associated private and professional groups are the Project Control Academy and the Association for Computing Machinery (ACM).

This list is by no means exhaustive, but from the perspective of suppliers to public agencies, NDIA, PMI, CoS, and CPM are of particular interest because much of the business of influencing policy and the details of its application are accomplished here. In this manner, the interests of the participants from the corporate side of the equation relate to those areas always of concern: business certainty, minimization of oversight, market and government influence. The market for several years now has been reactive, not proactive.

There is no doubt that business organizations from local Chambers of Commerce to specialized trade groups that bring with them the advantages of finding mutual interests and synergy. All also come with the ills and dysfunction, to varying degrees, borne from self-promotion, glad-handing, back-scratching, and ossification.

In groups where there is little appetite to upend the status quo, innovation and change, is viewed with suspicion and as being risky. In such cases the standard reaction is cognitive dissonance. At least until measures can be taken to subsume or control the pace and nature of the change. This is particularly true in the area of project and program management in general and integrated project, program and portfolio management (IPPM), in particular.

Absent the appetite on the part of DoD to replicate market forces that drive the acceptance of innovative IPPM approaches, one large event and various evolutionary aviation and space technology trends have upended the ecosystem of rent-seeking, reaction, and incumbents bent on maintaining the status quo.

The one large event, of course, came about from the changes wrought by the Covid pandemic. The other, evolutionary changes, are a result of the acceleration of software technology in capturing and transforming big(ger) dataset combined with open business intelligence systems that can be flexibly delivered locally and via the Cloud.

I also predict that these changes will make hard-coded, purpose-driven niche applications obsolete within the next five years, as well as those companies that have built their businesses around delivering custom, niche applications, and MS Excel spreadsheets, and those core companies that are comfortable suboptimizing and reacting to delivering the letter, if not the spirit, of good business practice expected under their contracts.

Walking hand-in-hand with these technological and business developments, the business of the aerospace, space and defense market, in general, is facing a window opening for new entries and greater competition borne of emergent engineering and technological exigencies that demand innovation and new approaches to old, persistent problems.

The coronavirus pandemic and new challenges from the realities of global competition, global warming, geopolitical rivalries; aviation, space and atmospheric science; and the revolution in data capture, transformation, and optimization are upending a period of quiescence and retrenchment in the market. These factors are moving the urgency of innovation and change to the left both rapidly and in a disruptive manner that will only accelerate after the immediate pandemic crisis passes.

In my studies of Toynbee and other historians (outside of my day job, I am also credentialed in political science and history, among other disciplines, through both undergraduate and graduate education), I have observed that societies and cultures that do not embrace the future and confront their challenges effectively, and that do not do so in a constructive manner, find themselves overrun by it and them. History is the chronicle of human frailty, tragedy, and failure interspersed by amazing periods of resilience, human flourishing, advancement, and hope.

As it relates to our more prosaic concerns, Deloitte has published an insightful paper on the 2021 industry outlook. Among the identified short-term developments are:

  1. A slow recovery in passenger travel may impact aircraft deliveries and industry revenues in commercial aviation,
  2. The defense sector will remain stable as countries plan to sustain their military capabilities,
  3. Satellite broadband, space exploration and militarization will drive growth,
  4. Industry will shift to transforming supply chains into more resilient and dynamic networks,
  5. Merger and acquisitions are likely to recover in 2021 as a hedge toward ensuring long-term growth and market share.

More importantly, the longer-term changes to the industry are being driven by the following technological and market changes:

  • Advanced aerial mobility (AAM). Both FAA and NASA are making investments in this area, and so the opening exists for new entries into the market, including new entries in the supply chain, that will disrupt the giants (absent a permissive M&A stance under the new Administration in Washington). AAM is the new paradigm to introduce safe, short-distance, daily-commute flying technologies using vertical lift.
  • Hypersonics. Given the touted investment of Russia and China into this technology as a means of leveraging against the power projection of U.S. forces, particularly its Navy and carrier battle groups (aside from the apparent fact that Vladimir Putin, the president of Upper Volta with Missiles and Hackers, really hates Disney World), the DoD is projected to fast-track hypersonic capabilities and countermeasures.
  • Electric propulsion. NASA is investing in cost-sharing capabilities to leverage electric propulsion technologies, looking to benefit from the start-up growth in this sector. This is an exciting development which has the potential to transform the entire industry over the next decade and after.
  • Hydrogen-powered aircraft. OEMs are continuing to pour private investment money into start-ups looking to introduce more fuel-efficient and clean energy alternatives. As with electric propulsion, there are prototypes of these aircraft being produced and as public investments into cost-sharing and market-investment strategies take hold, the U.S., Europe, and Asia are looking at a more diverse and innovative aerospace, space, and defense market.

Given the present condition of the industry, and the emerging technological developments and resulting transformation of flight, propulsion, and fuel sources, the concept and definitions used in project and program management require a revision to meet the exigencies of the new market.

For both industry and government, in order to address these new developments, I believe that a new language is necessary, as well as a complete revision to what is considered to be the acceptable baseline of best business practice and the art of the possible. Only then will organizations and companies be positioned to address the challenges these new forms of investment and partnering systems will raise.

The New Language of Integrated Program, Project, and Portfolio Management (IPPM).

First a digression to the past: while I was on active duty in the Navy, near the end of my career, I was assigned to the staff of the Office of the Undersecretary of Defense for Acquisition and Technology (OUSD(A&T)). Ostensibly, my assignment was to give me a place to transition from the Service. Thus, I followed the senior executive, who was PEO(A) at NAVAIR, to the Pentagon, simultaneously with the transition of NAVAIR to Patuxent River, Maryland. In reality, I had been tasked by the senior executive, Mr. Dan Czelusniak, to explore and achieve three goals:

  1. To develop a common schema by supporting an existing contract for the collection of data from DoD suppliers from cost-plus R&D contracts with the goal in mind of creating a master historical database of contract performance and technological development risk. This schema would first be directed to cost performance, or EVM;
  2. To continue to develop a language, methodology, and standard, first started and funded by NAVAIR, for the integration of systems engineering and technical performance management into the program management business rhythm;
  3. To create and define a definition of Integrated Program Management.

I largely achieved the first two during my relatively brief period there.

The first became known and the Integrated Digital Environment (IDE), which was refined and fully implemented after my departure from the Service. Much of this work is the basis for data capture, transformation, and load (ETL) today. There had already been a good deal of work by private individuals, organizations, and other governments in establishing common schemas, which were first applied to the transportation and shipping industries. But the team of individuals I worked with were able to set the bar for what followed across datasets.

The second was completed and turned over to the Services and federal agencies, many of whom adopted the initial approach, and refined it as well to inform, through the identification of technical risk, cost performance and technical achievement. Much of this knowledge already existed in the Systems Engineering community, but working with INCOSE, a group of like-minded individuals were able to take the work from the proof-of-concept, which was awarded the Acker in Skill in Communication award at the DAU Acquisition Research Symposium, and turn it into the TPM and KPP standard used by organizations today.

The third began with establishing my position, which hadn’t existed until my arrival: Lead Action Officer, Integrated Program Management. Gary Christle, who was the senior executive in charge of the staff, asked me “What is Integrated Program Management?” I responded: “I don’t know, sir, but I intend to find out.” Unfortunately, this is the initiative that has still eluded both industry and government, but not without some advancement.

Note that this position with its charter to define IPM was created over 24 years ago—about the same time it takes, apparently, to produce an operational fighter jet. I note this with no flippancy, for I believe that the connection is more than just coincidental.

When spoken of, IPM and IPPM are oftentimes restricted to the concept of cost (read cost performance or EVM) and schedule integration, with aggregated portfolio organization across a selected number of projects thrown in, in the latter case. That was considered advancement in 1997. But today, we seem to be stuck in time. In light of present technology and capabilities, this is a self-limiting concept.

This concept is technologically supported by a neutral schema that is authored and managed by DoD. While essential to data capture and transformation—and because of this fact—it is currently the target by incumbents as a means of further limiting even this self-limited definition in practice. It is ironic that a technological advance that supports data-driven in lieu of report-driven information integration is being influenced to support the old paradigm.

The motivations are varied: industry suppliers who aim to restrict access to performance data under project and program management, incumbent technology providers who wish to keep the changes in data capture and transformation restricted to their limited capabilities, consulting companies aligned with technology incumbents, and staff augmentation firms dependent on keeping their customers dependent on custom application development and Excel workbooks. All of these forces work through the various professional organizations which work to influence government policy, hoping to establish themselves as the arbiters of the possible and the acceptable.

Note that oftentimes the requirements under project management are often critiqued under the rubric of government regulation. But that is a misnomer: it is an extension of government contract management. Another critique is made from the perspective of overhead costs. But management costs money, and one would not (or at least should not) drive a car or own a house without insurance and a budget for maintenance, much less a multi-year high-cost project involving the public’s money. In addition, as I have written previously which is supported by the literature, data-driven systems actually reduce costs and overhead.

All of these factors contribute to ossification, and impose artificial blinders that, absent reform, will undermine meeting the new paradigms of 21st Century project management, given that the limited concept of IPM was obviously insufficient to address the challenges of the transitional decade that broached the last century.

Embracing the Future in Aerospace, Space, and Defense

As indicated, the aerospace and space science and technology verticals are entering a new and exciting phase of technological innovation resulting from investments in start-ups and R&D, including public-private cost-sharing arrangements.

  1. IPM to Project Life-Cycle Management. Given the baggage that attends the acronym IPM, and the worldwide trend to data-driven decision-making, it is time to adjust the language of project and program management to align to it. In lieu of IPM, I suggest Project Life-Cycle Management to define the approach to project and program data and information management.
  2. Functionality-Driven to Data-Driven Applications. Our software, systems and procedures must be able to support that infrastructure and be similarly in alignment with that manner of thinking. This evolution includes the following attributes:
    • Data Agnosticism. As our decision-making methods expand to include a wider, deeper, and more comprehensive interdisciplinary approach, our underlying systems must be able to access data in this same manner. As such, these systems must be data agnostic.
    • Data neutrality. In order to optimize access to data, the overhead and effort needed to access data must be greatly reduced. Using data science and analysis to restructure pre-conditioned data in order to overcome proprietary lexicons—an approach used for business intelligence systems since the 1980s—provides no added value to either the data or the organization. If data access is ad hoc and customized in every implementation, the value of the effort cannot either persist, nor is the return on investment fully realized. It backs the customer into a corner in terms of flexibility and innovation. Thus, pre-configured data capture, extract, transformation, and load (ETL) into a non-proprietary and objective format, which applies to all data types used in project and program management systems, is essential to providing the basis for a knowledge-based environment that encourages discovery from data. This approach in ETL is enhanced by the utilization of neutral data schemas.
    • Data in Lieu of Reporting and Visualization. No doubt that data must be visualized at some point—preferably after its transformation and load into the database with other, interrelated data elements that illuminate information to enhance the knowledge of the decisionmaker. This implies that systems that rely on physical report formats, charts, and graphs as the goal are not in alignment with the new paradigm. Where Excel spreadsheets and PowerPoint are used as a management system, it is the preparer is providing the interpretation, in a manner that predisposes the possible alternatives of interpretation. The goal, instead, is to have data speak for itself. It is the data, transformed into information, interrelated and contextualized to create intelligence that is the goal.
    • All of the Data, All of the Time. The cost of 1TB of data compared to 1MB of data is the marginal cost of the additional electrons to produce it. Our systems must be able to capture all of the data essential to effective decision-making in the periodicity determined by the nature of the data. Thus, our software systems must be able to relate data at all levels and to scale from simplistic datasets to extremely large ones. It should do so in such a way that the option for determining what, among the full menu of data options available, is relevant rests in the consumer of that data.
    • Open Systems. Software solution providers beginning with the introduction of widespread CPU capability have manufactured software to perform particular functions based on particular disciplines and very specific capabilities. As noted earlier, these software applications are functionality-focused and proprietary in structure, method, and data. For data-driven project and program requirements, software systems must be flexible enough to accommodate a wide range of analytical and visualization demands in allowing the data to determine the rules of engagement. This implies systems that are open in two ways: data agnosticism, as already noted, but also open in terms of the user environment.
    • Flexible Application Configuration. Our systems must be able to address the needs of the various disciplines in their details, while also allowing for integration and contextualization of interrelated data across domains. As with Open Systems to data and the user environment, openness through the ability to roll out multiple specialized applications from a common platform places the subject matter expert and program manager in the driver’s seat in terms of data analysis and visualization. An effective open platform also reduces the overhead associated with limited purpose-driven, disconnected and proprietary niche applications.
    • No-Code/Low-Code. Given that data and the consumer will determine both the source and method of delivery, our open systems should provide an environment that supports Agile development and deployment of customization and new requirements.
    • Knowledge-Based Content. Given the extensive amount of experience and education recorded and documented in the literature, our systems must, at the very least, provide a baseline of predictive analytics and visualization methods usually found in the more limited, purpose-built hardcoded applications, if not more expansive. This knowledge-based content, however, must be easily expandable and refinable, given the other attributes of openness, flexibility, and application configuration. In this manner, our 21st century project and program management systems must possess the attributes of a hybrid system: providing the functionality of the traditional niche systems with the flexibility and power of a business intelligence system enhanced by COTS data capture and transformation.
    • Ease of Use. The flexibility and power of these systems must be such that implementation and deployment are rapid, and that new user environment applications can be quickly deployed. Furthermore, the end user should be able to determine the level of complexity or simplicity of the environment to support ease of use.
  1. Focus on the Earliest Indicator. A good deal of effort since the late 1990s has been expended on defining the highest level of summary data that is sufficient to inform earned value, with schedule integration derived from the WBS, oftentimes summarized on a one-to-many basis as well. This perspective is biased toward believing that cost performance is the basis for determining project control and performance. But even when related to cost, the focus is backwards. The project lifecycle in its optimized form exists of the following progression:

    Project Goals and Contract (framing assumptions) –> Systems Engineering, CDRLs, KPPs, MoEs, MoPs, TPMs –> Project Estimate –> Project Plan –> IMS –> Risk and Uncertainty Analysis –> Financial Planning and Execution –> PMB –> EVM

    As I’ve documented in this blog over the years, DoD studies have shown that, while greater detail within the EVM data may not garner greater early warning, proper integration with the schedule at the work package level does. Program variances first appear in the IMS. A good IMS, thus, is key to collecting and acting as the main execution document. This is why many program managers who are largely absent in the last decade or so from the professional organizations listed, tend to assert that EVM is like “looking in the rearview mirror.” It isn’t that it is not essential, but it is true that it is not the earliest indicator of variances from expected baseline project performance.

    Thus, the emphasis going forward under this new paradigm is not to continue the emphasis and a central role for EVM, but a shift to the earliest indicator for each aspect of the program that defines its framing assumptions.
  1. Systems Engineering: It’s not Space Science, it’s Space Engineering, which is harder.
    The focus on start-up financing and developmental cost-sharing shifts the focus to systems engineering configuration control and technical performance indicators. The emphasis on meeting expectations, program goals, and achieving milestones within the cost share make it essential to be able to identify fatal variances, long before conventional cost performance indicators show variances. The concern of the program manager in these cases isn’t so much on the estimate at complete, but whether the industry partner will be able to deploy the technology within the acceptable range of the MoEs, MoPs, TPPs, and KPPs, and not exceed the government’s portion of the cost share. Thus, the incentive is to not only identify variances and unacceptable risk at the earliest indicator, but to do so in terms of whether the end-item technology will be successfully deployed, or whether the government should cut its losses.
  1. Risk and Uncertainty is more than SRA. The late 20th century approach to risk management is to run a simulated Monte Carlo analysis against the schedule, and to identify alternative critical paths and any unacceptable risks within the critical path. This is known as the schedule risk analysis, or SRA. While valuable, the ratio of personnel engaged in risk management is much smaller than the staffs devoted to schedule and cost analysis.

    This is no doubt due to the specialized language and techniques devoted to risk and uncertainty. This segregation of risk from mainstream project and program analysis has severely restricted both the utility and the real-world impact of risk analysis on program management decision-making.

    But risk and uncertainty extend beyond the schedule risk analysis, and their utility in an environment of aggressive investment in new technology, innovation, and new entries to the market will place these assessments at center stage. In reality, our ability to apply risk analysis techniques extends to the project plan, to technical performance indicators, to estimating, to the integrated master schedule (IMS), and to cost, both financial and from an earned value perspective. Combined with the need to identify risk and major variances using the earliest indicator, risk analysis becomes pivotal to mainstream program analysis and decision-making.

Conclusions from Part Two

The ASD industry is most closely aligned with PPM in the public interest. Two overarching trends that are transforming this market that are overcoming the inertia and ossification of PPM thought are the communications and information systems employed in response to the coronavirus pandemic, which opened pathways to new ways of thinking about the status quo, and the start-ups and new entries into the ASD market, borne from the investments in new technologies arising from external market, geo-political, space science, global warming, and propulsion trends, as well as new technologies and methods being employed in data and information technology that drive greater efficiency and productivity. These changes have forced a new language and new expectations as to the art of the necessary, as well as the art of the possible, for PPM. This new language includes a transition to the concept of the optimal capture and use of all data across the program management life cycle with greater emphasis on systems engineering, technical performance, and risk.

Having summarized the new program paradigm in Aerospace, Space, and Defense, my next post will assess the characteristics of program management in various commercial industries, the rising trends in these verticals, and what that means for the project and program management discipline.

Potato, Potahto, Tomato, Tomahto: Data Normalization vs. Standardization, Why the Difference Matters

In my vocation I run a technology company devoted to program management solutions that is primarily concerned with taking data and converting it into information to establish a knowledge-based environment. Similarly, in my avocation I deal with the meaning of information and how to turn it into insight and knowledge. This latter activity concerns the subject areas of history, sociology, and science.

In my travels just prior to and since the New Year, I have come upon a number of experts and fellow enthusiasts in these respective fields. The overwhelming numbers of these encounters have been productive, educational, and cordial. We respectfully disagree in some cases about the significance of a particular approach, governance when it comes to project and program management policy, but generally there is a great deal of agreement, particularly on basic facts and terminology. But some areas of disagreement–particularly those that come from left field–tend to be the most interesting because they create an opportunity to clarify a larger issue.

In a recent venue I encountered this last example where the issue was the use of the phrase data normalization. The issue at hand was that the use of “data normalization” suggested some statistical methodology in reconciling data into a standard schema. Instead, it was suggested, the term “data standardization” was more appropriate.

These phrases do not describe the same thing, but they do describe processes that are symbiotic, not mutually exclusive. So what about data normalization? No doubt there is a statistical use of the term, but we are dealing with the definition as used in digital technology here, just as the use of “standardization” was suggested in the same context. There are many examples of technical terminology that do not have the same meaning when used in different contexts. Here is the definition of normalization applied to data science from Technopedia, which is the proper use of the term in this case:

Normalization is the process of reorganizing data in a database so that it meets two basic requirements: (1) There is no redundancy of data (all data is stored in only one place), and (2) data dependencies are logical (all related data items are stored together). Normalization is important for many reasons, but chiefly because it allows databases to take up as little disk space as possible, resulting in increased performance.

Normalization is also known as data normalization

This is pretty basic (and necessary) stuff. I have written at length about data normalization, but also pair it with two other terms. This is data rationalization and contextualization. Here is a short definition of rationalization:

What is the benefit of Data Rationalization? To be able to effectively exploit, manage, reuse, and govern enterprise data assets (including the models which describe them), it is necessary to be able to find them. In addition, there is (or should be) a wealth of semantics (e.g. business names, definitions, relationships) embedded within an organization’s models that can be exposed for improved analysis and knowledge transfer. By linking model objects (across or within models) it is possible to discover the higher order conceptual objects for any given object. Conversely, it is possible to identify what implementation artifacts implement a higher order model object. For example, using data rationalization, one can traverse from a conceptual model entity to a logical model entity to a physical model table to a database table, etc. Similarly, Data Rationalization enables understanding of a database table by traversing up through the different model levels.

Finally, we have contextualization. Here is a good definition using Wikipedia:

Context or contextual information is any information about any entity that can be used to effectively reduce the amount of reasoning required (via filtering, aggregation, and inference) for decision making within the scope of a specific application.[2] Contextualisation is then the process of identifying the data relevant to an entity based on the entity’s contextual information. Contextualisation excludes irrelevant data from consideration and has the potential to reduce data from several aspects including volume, velocity, and variety in large-scale data intensive applications

There is no approximation of reflecting the accuracy of data in any of these terms wihin the domain of data and computer science. Nor are there statistical methods involved to approximate what needs to be accomplished precisely. The basic skill required to accomplish these tasks–knowing that the data is structured and pre-conditioned–is to reconcile the various lexicons from differing sources, much as I reconcile in my avocation the meaning of words and phrases across periods in history and across languages.

In this discussion we are dealing with the issue of different words used to describe a process or phenomenon. Similarly, we find this challenge in data.

So where does this leave data standardization? In terms of data and computer science, this describes a completely different method. Here is a definition from Wikipedia, which is the proper contextual use of the term under “Standard data model”:

A standard data model or industry standard data model (ISDM) is a data model that is widely applied in some industry, and shared amongst competitors to some degree. They are often defined by standards bodies, database vendors or operating system vendors.

In the context of project and program management, particularly as it relates to government data submission and international open standards across vendors in an industry, is the use of a common schema. In this case there is a DoD version of a UN/CEFACT XML file currently set as the standard, but soon to be replaced by a new standard using the JSON file structure.

In any event, what is clear here is that, while standardization is a necessary part of a data policy to allow for sharing of information, the strength of the chosen schema and the instructions regarding it will vary–and this variation will have an effect on the quality of the information shared. But that is not all.

This is where data normalization, rationalization, and contextualization come into play. In order to create data for the a standardized format, it is first necessary to convert what is an otherwise opaque set of data due to differences into a cohesive lexicon. In data, this is accomplished by reconciling data dictionaries to determine which items are describing the same thing, process, measure, or phenomenon. In a domain like program management, this is a finite set. But it is also specialized knowledge and where the value is added to any end product that is produced. Then, once we know how to identify the data, we must be able to map those terms to the standard schema but, keeping on eye on the use of the data down the line, must be able to properly structure and ensure interrelationships of the data are established and/or maintained to ensure its effective use. This is no mean task and why all data transformation methods and companies are not the same.

Furthermore, these functions can be accomplished efficiently or inefficiently. The inefficient method is to take the old-fashioned business intelligence method that has been around since the 1980s and before, where a team of data scientists and analysts deal with data as if it is flat and, essentially, reinvents the wheel in establishing the meaning and proper context of the data. Given enough time and money anything can be accomplished, but brute force labor will not defeat the Second Law of Thermodynamics.

In computing, which comes close to minimizing that physical law, we know that data has already been imbued with meaning upon its initial processing. In lieu of brute force labor we apply intelligence and knowledge to accomplish this requirement. This is called normalization, rationalization, and contextualization of data. It requires a small fraction of other methods in terms of time and effort, and is infinitely more transparent.

Using these methods is also where innovation, efficiency, performance, accuracy, scalability, and anticipating future requirements based on the latest technology trends comes into play. Establishing a seamless flow of data integration allows, for example, the capture of more data being able to be properly structured in a database, which lays the ground for the transition from 2D to 3D and 4D (that is, what is often called integrated) program management, as well as more effective analytics.

The term “standardization” also suffers from a weakness in data and computer science that requires that it be qualified. After all, data standardization in an enterprise or organization does not preclude the prescription of a propriety dataset. In government, this is contrary to both statutory and policy mandates. Furthermore, even given an effective, open standard, there will be a large pool of legacy and other non-conforming data that will still require capture and transformation.

The Section 809 Panel study dealt directly with this issue:

Use existing defense business system open-data requirements to improve strategic decision making on acquisition and workforce issues…. DoD has spent billions of dollars building the necessary software and institutional infrastructure to collect enterprise wide acquisition and financial data. In many cases, however, DoD lacks the expertise to effectively use that data for strategic planning and to improve decision making. Recommendation 88 would mitigate this problem by implementing congressional open-data mandates and using existing hiring authorities to bolster DoD’s pool of data science professionals.

Section 809 Volume 3, Section 9, p.477

As operating environment companies expose more and more capability into the market through middleware and other open systems methods of visualizing data, the key to a system no longer resides in its ability to produce charts and graphs. The use of Excel as an ad hoc data repository with its vulnerability to error, to manipulation, and for its resistance to the establishment of an optimized data management and corporate knowledge environment is a symptom of the larger issue.

Data and its proper structuring is at the core of organizational success and process improvement. Standardization alone will not address barriers to data optimization. According to RAND studies in 2015 and 2017* these are:

  • Data Quality and Discontinuities
  • Data Silos and Underutilized Repositories
  • Timeliness of Data for use by SMEs and Decision-makers
  • Lack of Access and Contextualization
  • Traceability and Auditability
  • Lack of the Ability to Apply Discovery in the Data
  • The issue of Contractual Technical Data and Proprietary Data

That these issues also exist in private industry demonstrates the universality of the issue. Thus, yes, standardize by all means. But also ensure that the standard is open and that transformation is traceable and auditable from the the source system to the standard schema, and then into the target database. Only then will the enterprise, the organization, and the government agency have full ownership of the data it requires to efficiently and effectively carry out its purpose.

*RAND Corporation studies are “Issues with Access to Acquisition Data and Information in the DoD: Doing Data Right in Weapons System Acquisition” (RR880, 2017), and “Issues with Access to Acquisition Data and Information in the DoD: Policy and Practice (RR1534, 2015). These can be found here.

Both Sides Now — The Value of Data Exploration

Over the last several months I have authored a number of stillborn articles that just did not live up to the standards that I set for this blog site. After all, sometimes we just have nothing important to add to the conversation. In a world dominated by narcissism, it is not necessary to constantly have something to say. Some reflection and consideration are necessary, especially if one is to be as succinct as possible.

A quote ascribed to Woodrow Wilson, which may be apocryphal, though it does appear in two of his biographies, was in response to being lauded by someone for making a number of short, succinct, and informative speeches. When asked how he was able to do this, President Wilson is supposed to have replied:

“It depends. If I am to speak ten minutes, I need a week for preparation; if fifteen minutes, three days; if half an hour, two days; if an hour, I am ready now.”

An undisciplined mind has a lot to say about nothing in particular with varying degrees of fidelity to fact or truth. When in normal conversation we most often free ourselves from the discipline expected for more rigorous thinking. This is not necessarily a bad thing if we are saying nothing of consequence and there are gradations, of course. Even the most disciplined mind gets things wrong. We all need editors and fact checkers.

While I am pulling forth possibly apocryphal quotes, the one most applicable that comes to mind is the comment by Hemingway as told by his deckhand in Key West and Cuba, Arnold Samuelson. Hemingway was supposed to have given this advice to the aspiring writer:

“Don’t get discouraged because there’s a lot of mechanical work to writing. There is, and you can’t get out of it. I rewrote the first part of A Farewell to Arms at least fifty times. You’ve got to work it over. The first draft of anything is shit. When you first start to write you get all the kick and the reader gets none, but after you learn to work it’s your object to convey everything to the reader so that he remembers it not as a story he had read but something that happened to himself.”

Though it deals with fiction, Hemingway’s advice applies to any sort of writing and rhetoric. Dr. Roger Spiller, who more than anyone mentored me as a writer and historian, once told me, “Writing is one of those skills that, with greater knowledge, becomes harder rather than easier.”

As a result of some reflection, over the last few months, I had to revisit the reason for the blog. Thus, this is still its purpose: it is a way to validate ideas and hypotheses with other professionals and interested amateurs in my areas of interest. I try to keep uninformed opinion in check, as all too many blogs turn out to be rants. Thus, a great deal of research goes into each of these posts, most from primary sources and from interactions with practitioners in the field. Opinions and conclusions are my own, and my reasoning for good or bad are exposed for all the world to see and I take responsibility for them.

This being said, part of my recent silence has also been due to my workload in–well–the effort involved in my day job of running a technology company, and in my recent role, since late last summer, as the Managing Editor of the College of Performance Management’s publication known as the Measurable News. Our emphasis in the latter case has been to find new contributions to the literature regarding business analytics and to define the concept of integrated project, program, and portfolio management. Stepping slightly over the line to make a pitch, I recommend anyone interested in contributing to the publication to submit an article. The submission guidelines can be found here.

Both Sides Now: New Perspectives

That out of the way, I recently saw, again on the small screen, the largely underrated movie about Neil Armstrong and the Apollo 11 moon landing, “First Man”, and was struck by this scene:

Unfortunately, the first part of the interview has been edited out of this clip and I cannot find a full scene. When asked “why space” he prefaces his comments by stating that the atmosphere of the earth seems to be so large from the perspective of looking at it from the ground but that, having touched the edge of space previously in his experience as a test pilot of the X15, he learned that it is actually very thin. He then goes on to posit that looking at the earth from space will give us a new perspective. His conclusion to this observation is then provided in the clip.

Armstrong’s words were prophetic in that the space program provided a new perspective and a new way of looking at things that were in front of us the whole time. Our spaceship Earth is a blue dot in a sea of space and, at least for a time, the people of our planet came to understand both our loneliness in space and our interdependence.

Earth from Apollo 8. Photo courtesy of NASA.

 

The impact of the Apollo program resulted in great strides being made in environmental and planetary sciences, geology, cosmology, biology, meteorology, and in day-to-day technology. The immediate effect was to inspire the environmental and human rights movements, among others. All of these advances taken together represent a new revolution in thought equal to that during the initial Enlightenment, one that is not yet finished despite the headwinds of reaction and recidivism.

It’s Life’s Illusions I Recall: Epistemology–Looking at and Engaging with the World

In his book Darwin’s Dangerous Idea, Daniel Dennett posited that what was “dangerous” about Darwinism is that it acts as a “universal acid” that, when touching other concepts and traditions, transforms them in ways that change our world-view. I have accepted this position by Dennett through the convincing argument he makes and the evidence in front of us, and it is true that Darwinism–the insight in the evolution of species over time through natural selection–has transformed our perspective of the world and left the old ways of looking at things both reconstructed and unrecognizable.

In his work, Time’s Arrow, Time’s Cycle, Stephen Jay Gould noted that Darwinism is part of one of the three great reconstructions of human thought that, in quoting Sigmund Freud, where “Humanity…has had to endure from the hand of science…outrages upon its naive self-love.” These outrages include the Copernican revolution that removed the Earth from the center of the universe, Darwinism and the origin of species, including the descent of humanity, and what John McPhee, coined as the concept of “deep time.”

But–and there is a “but”–I would propose that Darwinism and the other great reconstructions noted are but different ingredients of a larger and more broader, though compatible, type of innovation in the way the world is viewed and how it is approached–a more powerful universal acid. That innovation in thought is empiricism.

It is this approach to understanding that eats through the many ills of human existence that lead to self-delusion and folly. Though you may not know it, if you are in the field of information technology or any of the sciences, you are part of this way of viewing and interacting with the world. Married with rational thinking, this epistemology–coming from the perspectives of the astronomical observations of planets and other heavenly bodies by Charles Sanders Peirce, with further refinements by William James and John Dewey, and others have come down to us in what is known as Pragmatism. (Note that the word pragmatism in this context is not the same as the more generally used colloquial form of the word. For this type of reason Peirce preferred the term “pragmaticism”). For an interesting and popular reading of the development of modern thought and the development of Pragmatism written for the general reader I highly recommend the Pulitzer Prize-winning The Metaphysical Club by Louis Menand.

At the core of this form of empiricism is that the collection of data, that is, recording, observing, and documenting the universe and nature as it is will lead us to an understanding of things that we otherwise would not see. In our more mundane systems, such as business systems and organized efforts applying disciplined project and program management techniques and methods, we also can learn more about these complex adaptive systems through the enhanced collection and translation of data.

I Really Don’t Know Clouds At All: Data, Information, Intelligence, and Knowledge

The term “knowledge discovery in data”, or KDD for short, is an aspirational goal and so, in terms of understanding that goal, is a point of departure from the practice information management and science. I’m taking this stance because the technology industry uses terminology that, as with most language, was originally designed to accurately describe a specific phenomenon or set of methods in order to advance knowledge, only to find that that terminology has been watered down to the point where it obfuscates the issues at hand.

As I traveled to locations across the U.S. over the last three months, I found general agreement among IT professionals who are dealing with the issues of “Big Data”, data integration, and the aforementioned KDD of this state of affairs. In almost every case there is hesitation to use this terminology because it has been absconded and abused by mainstream literature, much as physicists rail against the misuse of the concept of relativity by non-scientific domains.

The impact of this confusion in terminology has caused organizations to make decisions where this terminology is employed to describe a nebulous end-state, without the initiators having an idea of the effort or scope. The danger here, of course, is that for every small innovative company out there, there is also a potential Theranos (probably several). For an in-depth understanding of the psychology and double-speak that has infiltrated our industry I highly recommend the HBO documentary, “The Inventor: Out for Blood in Silicon Valley.”

The reason why semantics are important (as they always have been despite the fact that you may have had an associate complain about “only semantics”) is that they describe the world in front of us. If we cloud the meanings of words and the use of language, it undermines the basis of common understanding and reveals the (poor) quality of our thinking. As Dr. Spiller noted, the paradox of writing and in gathering knowledge is that the more you know, the more you realize you do not know, and the harder writing and communicating knowledge becomes, though we must make the effort nonetheless.

Thus KDD is oftentimes not quite the discovery of knowledge in the sense that the term was intended to mean. It is, instead, a discovery of associations that may lead us to knowledge. Knowing this distinction is important because the corollary processes of data mining, machine learning, and the early application of AI in which we find ourselves is really the process of finding associations, correlations, trends, patterns, and probabilities in data that is approached in a manner as if all information is flat, thereby obliterating its context. This is not knowledge.

We can measure the information content of any set of data, but the real unlocked potential in that information content will come with the processing of it that leads to knowledge. To do that requires an underlying model of domain knowledge, an understanding of the different lexicons in any given set of domains, and a Rosetta Stone that provides a roadmap that identifies those elements of the lexicon that are describing the same things across them. It also requires capturing and preserving context.

For example, when I use the chat on my iPhone it attempts to anticipate what I want to write. I am given three choices of words to choose if I want to use this shortcut. In most cases, the iPhone guesses wrong, despite presenting three choices and having at its disposal (at least presumptively) a larger vocabulary than the writer. Oftentimes it seems to take control, assuming that I have misspelled or misidentified a word and chooses the wrong one for me, where my message becomes a nonsense message.

If one were to believe the hype surrounding AI, one would think that there is magic there but, as Arthur C. Clarke noted (known as Clarke’s Third Law): “Any sufficiently advanced technology is indistinguishable from magic.” Familiar with the new technologies as we are, we know that there is no magic there, and also that it is consistently wrong a good deal of the time. But many individuals come to rely upon the technology nonetheless.

Despite the gloss of something new, the long-established methods of epistemology, code-breaking, statistics, and Calculus apply–as do standards of establishing fact and truth. Despite a large set of data, the iPhone is wrong because the iPhone does not understand–does not possess knowledge–to know why it is wrong. As an aside, its dictionary is also missing a good many words.

A Segue and a Conclusion–I Still Haven’t Found What I’m Looking For: Why Data Integration?…and a Proposed Definition of the Bigness of Data

As with the question to Neil Armstrong, so the question on data. And so the answer is the same. When we look at any set of data under a particular structure of a domain, the information we derive provides us with a manner of looking at the world. In economic systems, businesses, and projects that data provides us with a basis for interpretation, but oftentimes falls short of allowing us to effectively describe and understand what is happening.

Capturing interrelated data across domains allows us to look at the phenomena of these human systems from a different perspective, providing us with the opportunity to derive new knowledge. But in order to do this, we have to be open to this possibility. It also calls for us to, as I have hammered home in this blog, reset our definitions of what is being described.

For example, there are guides in project and program management that refer to statistical measures as “predictive analytics.” This further waters down the intent of the phrase. Measures of earned value are not predictive. They note trends and a single-point outcome. Absent further analysis and processing, the statistical fallacy of extrapolation can be baked into our analysis. The same applies to any index of performance.

Furthermore, these indices and indicators–for that is all they are–do not provide knowledge, which requires a means of not only distinguishing between correlation and causation but also applying contextualization. All systems operate in a vector space. When we measure an economic or social system we are really measuring its behavior in the vector space that it inhabits. This vector space includes the way it is manifested in space-time: the equivalent of length, width, depth (that is, its relative position, significance, and size within information space), and time.

This then provides us with a hint of a definition of what often goes by the definition of “big data.” Originally, as noted in previous blogs, big data was first used in NASA in 1997 by Cox and Ellsworth (not as credited to John Mashey on Wikipedia with the dishonest qualifier “popularized”) and was simply a statement meaning “datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.”

This is a relative term given Moore’s Law. But we can begin to peel back a real definition of the “bigness” of data. It is important to do this because too many approaches to big data assume it is flat and then apply probabilities and pattern recognition to data that undermines both contextualization and knowledge. Thus…

The Bigness of Data (B) is a function (f ) of the entropy expended (S) to transform data into information, or to extract its information content.

Information evolves. It evolves toward greater complexity just as life evolves toward greater complexity. The universe is built on coded bits of information that, taken together and combined in almost unimaginable ways, provides different forms of life and matter. Our limited ability to decode and understand this information–and our interactions in it– are important to us both individually and collectively.

Much entropy is already expended in the creation of the data that describes the activity being performed. Its context is part of its information content. Obliterating the context inherent in that information content causes all previous entropy to be of no value. Thus, in approaching any set of data, the inherent information content must be taken into account in order to avoid the unnecessary (and erroneous) application of data interpretation.

More to follow in future posts.

Like Tinker to Evers to Chance: BI to BA to KDD

It’s spring training time in sunny Florida, as well as other areas of the country with mild weather and baseball.  For those of you new to the allusion, it comes from a poem by Franklin Pierce Adams and is also known as “Baseball’s Sad Lexicon”.  Tinker, Evers, and Chance were the double play combination of the 1910 Chicago Cubs (shortstop, second base, and first base).  Because of their effectiveness on the field these Cubs players were worthy opponents of the old New York Giants, for whom Adams was a fan, and who were the kings of baseball during most of the first fifth of a century of the modern era (1901-1922).  That is, until they were suddenly overtaken by their crosstown rivals, the Yankees, who came to dominate baseball for the next 40 years, beginning with the arrival of Babe Ruth.

The analogy here is that the Cubs infielders, while individuals, didn’t think of their roles as completely separate.  They had common goals and, in order to win on the field, needed to act as a unit.  In the case of executing the double play, they were a very effective unit.  So why do we have these dichotomies in information management when the goals are the same?

Much has been written both academically and commercially about Business Intelligence, Business Analytics, and Knowledge Discovery in Databases.  I’ve surveyed the literature and for good and bad, and what I find is that these terms are thrown around, mostly by commercial firms in either information technology or consulting, all with the purpose of attempting to provide a discriminator for their technology or service.  Many times the concepts are used interchangeably, or one is set up as a strawman to push an agenda or product.  Thus, it seems some hard definitions are in order.

According to Technopedia:

Business Intelligence (BI) is the use of computing technologies for the identification, discovery and analysis of business data – like sales revenue, products, costs and incomes.

Business analytics (BA) refers to all the methods and techniques that are used by an organization to measure performance. Business analytics are made up of statistical methods that can be applied to a specific project, process or product. Business analytics can also be used to evaluate an entire company.

Knowledge Discover in Databases (KDD) is the process of discovering useful knowledge from a collection of data. This widely used data mining technique is a process that includes data preparation and selection, data cleansing, incorporating prior knowledge on data sets and interpreting accurate solutions from the observed results.

As with much of computing in its first phases, these functions were seen to be separate.

The perception of BI, based largely on the manner in which it has been implemented in its first incarnations, is viewed as a means of gathering data into relational data warehouses or data marts and then building out decision support systems.  These methods have usually involved a great deal of overhead in both computing and personnel, since practical elements of gathering, sorting, and delivering data involved additional coding and highly structured user interfaces.  The advantage of BI is its emphasis on integration.  The disadvantage from the enterprise perspective, is that the method and mode of implementation is phlegmatic at best.

BA is BI’s younger cousin.  Applications were developed and sold as “analytical tools” focused on a niche of data within the enterprise’s requirements.  In this manner decision makers could avoid having to wait for the overarching and ponderous BI system to get to their needs, if ever.  This led many companies to knit together specialized tools in so-called “best-of-breed” configurations to achieve some measure of integration across domains.  Of course, given the plethora of innovative tools, much data import and reconciliation has had to be inserted into the process.  Thus, the advantages of BA in the market have been to reward innovation and focus on the needs of the domain subject matter expert (SME).  The disadvantages are the insertion of manual intervention in an automated process due to lack of integration, which is further exacerbated by so-called SMEs in data reconciliation–a form of rent seeking behavior that only rewards body shop consulting, unnecessarily driving up overhead.  The panacea applied to this last disadvantage has been the adoption of non-proprietary XML schemas across entire industries that reduce both the overhead and data silos found in the BA market.

KDD is our both our oldster and youngster–grandpa and the grandson hanging out.  It is a term that describes a necessary function of insight–allowing one to determine what the data tells us are needed for analytics rather than relying on a “canned” solution to determine how to approach a particular set of data.  But it does so, oftentimes, using an older approach that predates BI, known as data mining.  You will often find KDD linked to arguments in favor of flat file schemas, NoSQL (meaning flat non-relational databases), and free use of the term Big Data, which is becoming more meaningless each year that it is used, given Moore’s Law.  The advantage of KDD is that it allows for surveying across datasets to pick up patterns and interrelationships within our systems that are otherwise unknown, particularly given the way in which the human mind can fool itself into reifying an invalid assumption.  The disadvantage, of course, is that KDD will have us go backward in terms of identifying and categorizing data by employing Data Mining, which is an older concept from early in computing in which a team of data scientists and data managers develop solutions to identify, categorize, and use that data–manually doing what automation was designed to do.  Understanding these limitations, companies focused on KDD have developed heuristics (cognitive computing) that identify patterns and possible linkages, removing a portion of the overhead associated with Data Mining.

Keep in mind that you never get anything for nothing–the Second Law of Thermodynamics ensures that energy must be borrowed from somewhere in order to produce something–and its corollaries place limits on expected efficiencies.  While computing itself comes as close to providing us with Maxwell’s Demon as any technology, even in this case entropy is being realized elsewhere (in the software developer and the hardware manufacturing process), even though it is not fully apparent in the observed data processing.

Thus, manual effort must be expended somewhere along the way.  In any sense, all of these methods are addressing the same problem–the conversion of data into information.  It is information that people can consume, understand, place into context, and act upon.

As my colleague Dave Gordon has pointed out to me several times that there are also additional methods that have been developed across all of these methods to make our use of data more effective.  These include more powerful APIs, the aforementioned cognitive computing, and searching based on the anticipated questions of the user as is used by search engines.

Technology, however, is moving very rapidly and so the lines between BI, BA and KDD are becoming blurred.  Fourth generation technology that leverages API libraries to be agnostic to underlying data, and flexible and adaptive UI technology can provide a  comprehensive systemic solution to bring together the goals of these approaches to data. With the ability to leverage internal relational database tools and flat schemas for non-relational databases, the application layer, which is oftentimes a barrier to delivery of information, becomes open as well, putting the SME back in the driver’s seat.  Being able to integrate data across domain silos provide insight into systems behavior and performance not previously available with “canned” applications written to handle and display data a particular way, opening up knowledge discovery in the data.

What this means practically is that those organizations that are sensitive to these changes will understand the practical application of sunk cost when it comes to aging systems being provided by ponderous behemoths that lack agility in their ability to introduce more flexible, less costly, and lower overhead software technologies.  It means that information management can be democratized within the organization among the essential consumers and decision makers.

Productivity and effectiveness are the goals.

Takin’ Care of Business — Information Economics in Project Management

Neoclassical economics abhors inefficiency, and yet inefficiencies exist.  Among the core issues that create inefficiencies is the asymmetrical nature of information.  Asymmetry is an accepted cornerstone of economics that leads to inefficiency.  We can see in our daily lives and employment the effects of one party in a transaction having more information than the other:  knowing whether the used car you are buying is a lemon, measuring risk in the purchase of an investment and, apropos to this post, identifying how our information systems allow us to manage complex projects.

Regarding this last proposition we can peel this onion down through its various levels: the asymmetry in the information between the customer and the supplier, the asymmetry in information between the board and stockholders, the asymmetry in information between management and labor, the asymmetry in information between individual SMEs and the project team, etc.–it’s elephants all the way down.

This asymmetry, which drives inefficiency, is exacerbated in markets that are dominated by monopoly, monopsony, and oligopoly power.  When informed by the work of Hart and Holmström regarding contract theory, which recently garnered the Nobel in economics, we have a basis for understanding the internal dynamics of projects in seeking efficiency and productivity.  What is interesting about contract theory is that it incorporates the concept of asymmetrical information (labeled as adverse selection), but expands this concept in human transactions at the microeconomic level to include considerations of moral hazard and the utility of signalling.

The state of asymmetry and inefficiency is exacerbated by the patchwork quilt of “tools”–software applications that are designed to address only a very restricted portion of the total contract and project management system–that are currently deployed as the state of the art.  These tend to require the insertion of a new class of SME to manage data by essentially reversing the efficiencies in automation, involving direct effort to reconcile differences in data from differing tools. This is a sub-optimized system.  It discourages optimization of information across the project, reinforces asymmetry, and is economically and practically unsustainable.

The key in all of this is ensuring that sub-optimal behavior is discouraged, and that those activities and behaviors that are supportive of more transparent sharing of information and, therefore, contribute to greater efficiency and productivity are rewarded.  It should be noted that more transparent organizations tend to be more sustainable, healthier, and with a higher degree of employee commitment.

The path forward where there is monopsony power, where there is a dominant buyer, is to impose the conditions for normative behavior that would otherwise be leveraged through practice in a more open market.  For open markets not dominated by one player as either supplier or seller, instituting practices that reward behavior that reduces the effects of asymmetrical information, and contracting disincentives in business transactions on the open market is the key.

In the information management market as a whole the trends that are working against asymmetry and inefficiency involve the reduction of data streams, the construction of cross-domain data repositories (or reservoirs) that allow for the satisfaction of multiple business stakeholders, and the introduction of systems that are more open and adaptable to the needs of the project system in lieu of a limited portion of the project team.  These solutions exist, yet their adoption is hindered because of the long-term infrastructure that is put in place in complex project management.  This infrastructure is supported by incumbents that are reinforcing to the status quo.  Because of this, from the time a market innovation is introduced to the time that it is adopted in project-focused organizations usually involves the expenditure of several years.

This argues for establishing an environment that is more nimble.  This involves the adoption of a series of approaches to achieve the goals of broader information symmetry and efficiency in the project organization.  These are:

a. Instituting contractual relationships, both internally and externally, that encourage project personnel to identify risk.  This would include incentives to kill efforts that have breached their framing assumptions, or to consolidate progress that the project has achieved to date–sending it as it is to production–while killing further effort that would breach framing assumptions.

b. Institute policy and incentives on the data supply end to reduce the number of data streams.  Toward this end both acquisition and contracting practices should move to discourage proprietary data dead ends by encouraging normalized and rationalized data schemas that describe the environment using a common or, at least, compatible lexicon.  This reduces the inefficiency derived from opaqueness as it relates to software and data.

c.  Institute policy and incentives on the data consumer end to leverage the economies derived from the increased computing power from Moore’s Law by scaling data to construct interrelated datasets across multiple domains that will provide a more cohesive and expansive view of project performance.  This involves the warehousing of data into a common repository or reduced set of repositories.  The goal is to satisfy multiple project stakeholders from multiple domains using as few streams as necessary and encourage KDD (Knowledge Discovery in Databases).  This reduces the inefficiency derived from data opaqueness, but also from the traditional line-and-staff organization that has tended to stovepipe expertise and information.

d.  Institute acquisition and market incentives that encourage software manufacturers to engage in positive signalling behavior that reduces the opaqueness of the solutions being offered to the marketplace.

In summary, the current state of project data is one that is characterized by “best-of-breed” patchwork quilt solutions that tend to increase direct labor, reduces and limits productivity, and drives up cost.  At the end of the day the ability of the project to handle risk and adapt to technical challenges rests on the reliability and efficiency of its information systems.  A patchwork system fails to meet the needs of the organization as a whole and at the end of the day is not “takin’ care of business.”

River Deep, Mountain High — A Matrix of Project Data

Been attending conferences and meetings of late and came upon a discussion of the means of reducing data streams while leveraging Moore’s Law to provide more, better data.  During a discussion with colleagues over lunch they asked if asking for more detailed data would provide greater insight.  This led to a discussion of the qualitative differences in data depending on what information is being sought.  My response to more detailed data was to respond: “well there has to be a pony in there somewhere.”  This was greeted by laughter, but then I finished the point: more detailed data doesn’t necessarily yield greater insight (though it could and only actually looking at it will tell you that, particularly in applying the principle of KDD).  But more detailed data that is based on a hierarchical structure will, at the least, provide greater reliability and pinpoint areas of intersection to detect areas of risk manifestation that is otherwise averaged out–and therefore hidden–at the summary levels.

Not to steal the thunder of new studies that are due out in the area of data later this spring but, for example, I am aware after having actually achieved lowest level integration for extremely complex projects through my day job, that there is little (though not zero) insight gained in predictive power between say, the control account level of a WBS and the work package level.  Going further down to element of cost may, in the words of the character in the movie Still Alice, where “You may say that this falls into the great academic tradition of knowing more and more about less and less until we know everything about nothing.”  But while that may be true for project management, that isn’t necessarily so when collecting parametrics and auditing the validity of financial information.

Rolling up data from individually detailed elements of a hierarchy is the proper way to ensure credibility.  Since we are at the point where a TB of data has virtually the same marginal cost of a GB of data (which is vanishingly small to begin with), then the more the merrier in eliminating the abuse associated with human-readable summary reporting.  Furthermore, I have long proposed through this blog and elsewhere, that the emphasis should be away from people, process, and tools, to people, process, and data.  This rightly establishes the feedback loop necessary for proper development and project management.  More importantly, the same data available through project management processes satisfy the different purposes of domains both within the organization, and of multiple external stakeholders.

This then leads us to the concept of integrated project management (IPM), which has become little more than a buzz-phrase, and receives a lot of hand waves, mostly by technology companies that want to push their tools–which are quickly becoming obsolete–while appearing forward leaning.  This tool-centric approach is nothing more than marketing–focusing on what the software manufacturer would have us believe is important based on the functionality baked into their applications.  One can see where this could be a successful approach, given the emphasis on tools in the PM triad.  But, of course, it is self-limiting in a self-interested sort of way.  The emphasis needs to be on the qualitative and informative attributes of available data–not of tool functionality–that meet the requirements of different data consumers while minimizing, to the extent possible, the number of data streams.

Thus, there are at least two main aspects of data that are important in understanding the utility of project management: early warning/predictiveness and credibility/traceability/fidelity.  The chart attached below gives a rough back-of-the-envelope outline of this point, with some proposed elements, though this list is not intended to be exhaustive.

PM Data Matrix

PM Data Matrix

In order to capture data across the essential elements of project management, our data must demonstrate both a breadth and depth that allows for the discovery of intersections of the different elements.  The weakness in the two-dimensional model above is that it treats each indicator by itself.  But, when we combine, for example, IMS consecutive slips with other elements listed, the informational power of the data becomes many times greater.  This tells us that the weakness in our present systems is that we treat the data as a continuity between autonomous elements.  But we know that the project consists of discontinuities where the next level of achievement/progress is a function of risk.  Thus, when we talk about IPM, the secret is in focusing on data that informs us what our systems are doing.  This will require more sophisticated types of modeling.

Don’t Know Much…–Knowledge Discovery in Data

A short while ago I found myself in an odd venue where a question was posed about my being an educated individual, as if it were an accusation.  Yes, I replied, but then, after giving it some thought, I made some qualifications to my response.  Educated regarding what?

It seems that, despite a little more than a century of public education and widespread advanced education having been adopted in the United States, along with the resulting advent of widespread literacy, that we haven’t entirely come to grips with what it means.  For the question of being an “educated person” has its roots in an outmoded concept–an artifact of the 18th and 19th century–where education was delineated, and availability determined, by class and profession.  Perhaps this is the basis for the large strain of anti-intellectualism and science denial in the society at large.

Virtually everyone today is educated in some way.  Being “educated” means nothing–it is a throwaway question, an affectation.  The question is whether the relevant education meets the needs of the subject being addressed.  An interesting discussion about this very topic is explored at Sam Harris’ blog in the discussion he held with amateur historian Dan Carlin.

In reviewing my own education, it is obvious that there are large holes in what I understand about the world around me, some of them ridiculously (and frustratingly) prosaic.  This shouldn’t be surprising.  For even the most well-read person is ignorant about–well–virtually everything in some manner.  Wisdom is reached, I think, when you accept that there are a few things that you know for certain (or have a high probability and level of confidence in knowing), and that there are a host of things that constitute the entire library of knowledge encompassing anything from a particular domain to that of the entire universe, which you don’t know.

To sort out a well read dilettante from someone who can largely be depended upon to speak with some authority on a topic, educational institutions, trade associations, trade unions, trade schools, governmental organizations, and professional organizations have established a system of credentials.  No system is entirely perfect and I am reminded (even discounting fraud and incompetence) that half of all doctors and lawyers–two professions that have effectively insulated themselves from rigorous scrutiny and accountability to the level of almost being a protected class–graduate in the bottom half of their class.  Still, we can sort out a real brain surgeon from someone who once took a course in brain physiology when we need medical care (to borrow an example from Sam Harris in the same link above).

Furthermore, in the less potentially life-threatening disciplines we find more variation.  There are credentialed individuals who constantly get things wrong.  Among economists, for example, I am more likely to follow those who got the last financial crisis and housing market crash right (Joe Stiglitz, Dean Baker, Paul Krugman, and others), and those who have adjusted their models based on that experience (Brad DeLong, Mark Thoma, etc.), than those who have maintained an ideological conformity and continuity despite evidence.  Science–both what are called the hard and soft sciences–demands careful analysis and corroborating evidence to be tied to any assertions in their most formalized contexts.  Even well accepted theories among a profession are contingent–open to new information and discovery that may modify, append, or displace them.  Furthermore, we can find polymaths and self-taught individuals who have equaled or exceeded credentialed peers.  In the end the proof is in the pudding.

My point here is threefold.  First, in most cases we don’t know what we don’t know.  Second, complete certainty is not something that exists in this universe, except perhaps at death.  Third, we are now entering a world where new technologies allow us to discover new insights in accessing previously unavailable or previously opaque data.

One must look back at the revolution in information over the last fifty years and its resulting effect on knowledge to see what this means in our day-to-day existence.  When I was a small boy in school we largely relied on the published written word.  Books and periodicals were the major means of imparting information, aside from collocated collaborative working environments, the spoken word, and the old media of magazines, radio, and television.  Information was hard to come by–libraries were limited in their collections and there were centers of particular domain knowledge segmented by geography.   Furthermore, after the introduction of television, society had developed  trusted sources and gatekeepers to keep the cranks and flimflam out.

Today, new media–including all forms of digitized information–has expanded and accelerated the means of transmitting information.  Unlike old media, books, and social networking, there are also fewer gatekeepers in new media: editors, fact checkers, domain experts, credentialed trusted sources, etc. that ensure quality control, reliability, fidelity of the information, and provide context.  It’s the wild west of information and those wooed by the voodoo of self-organization contribute to the high risk associated with relying on information provided through these sources.  Thus, organizations and individuals who wish to stay within the fact-based community have had to sort out reliable, trusted sources and, even in these cases, develop–for lack of a better shorthand–BS detectors.  There are two purposes to this exercise: to expand the use of the available data and leverage the speed afforded by new media, and to ensure that the data is reliable and can reliably tell us something important about our subject of interest.

At the level of the enterprise, the sector, or the project management organization, we similarly are faced with the situation in which the scope of data that can be converted into information is rapidly expanding.  Unlike the larger information market, this data on the microeconomic level is more controlled.  Given that data at this level suffers from significance because it records isolated events, or small sample sizes, the challenge has been to derive importance from data where sometimes significance is minimal.

Furthermore, our business systems, because of the limitations of the selected technology, have been self-limiting.  I come across organizations all the time who cannot imagine the incorporation and integration of additional data sets largely because the limitations of their chosen software solution has inculcated that approach–that belief–into the larger corporate culture.  We do not know what we do not know.

Unfortunately, it’s what you do not know that, more often than not, will play a significant role in your organization’s destiny, just as an individual that is more self-aware is better prepared to deal with the challenges that manifest themselves as risk and its resultant probabilities.  Organizations must become more aware and look at things differently, especially since so many of the more conventional means of determining risk and opportunities seems to be failing to keep up with the times, which is governed by the capabilities of new media.

This is the imperative of applying knowledge discovery in data at the organizational and enterprise level–and in shifting one’s worldview from focusing on the limitations of “tools”: how they paint a screen, whether data is displayed across the x or y axis, what shade of blue indicates good performance, how many keystrokes does it take to perform an operation, and all manner of glorified PowerPoint minutia–to a focus on data:  the ability of solutions to incorporate more data, more efficiently, more quickly, from a wider range of sources, and processed in a more effective manner, so that it is converted into information to be able to be used to inform decision making at the most decisive moment.

I Can See Clearly Now — Knowledge Discovery in Databases, Data Scalability, and Data Relevance

I recently returned from a travel and much of the discussion revolved around the issues of scalability and the use of data.  What is clear is that the conversation at the project manager level is shifting from a long-running focus on reports and metrics to one focused on data and what can be learned from it.  As with any technology, information technology exploits what is presented before it.  Most recently, accelerated improvements in hardware and communications technology has allowed us to begin to collect and use ever larger sets of data.

The phrase “actionable” has been thrown around quite a bit in marketing materials, but what does this term really mean?  Can data be actionable?  No.  Can intelligence derived from that data be actionable?  Yes.  But is all data that is transformed into intelligence actionable?  No.  Does it need to be?  No.

There are also kinds and levels of intelligence, particularly as it relates to organizations and business enterprises.  Here is a short list:

a. Competitive intelligence.  This is intelligence derived from data that informs decision makers about how their organization fits into the external environment, further informing the development of strategic direction.

b. Business intelligence.  This is intelligence derived from data that informs decision makers about the internal effectiveness of their organization both in the past and into the future.

c. Business analytics.  The transformation of historical and trending enterprise data used to provide insight into future performance.  This includes identifying any underlying drivers of performance, and any emerging trends that will manifest into risk.  The purpose is to provide sufficient early warning to allow risk to be handled before it fully manifests, therefore keeping the effort being measured consistent with the goals of the organization.

Note, especially among those of you who may have a military background, that what I’ve outlined is a hierarchy of information and intelligence that addresses each level of an organization’s operations:  strategic, operational, and tactical.  For many decision makers, translating tactical level intelligence into strategic positioning through the operational layer presents the greatest challenge.  The reason for this is that, historically, there often has been a break in the continuity between data collected at the tactical level and that being used at the strategic level.

The culprit is the operational layer, which has always been problematic for organizations and those individuals who find themselves there.  We see this difficulty reflected in the attrition rate at this level.  Some individuals cannot successfully make this transition in thinking. For example, in the U.S. Army command structure when advancing from the battalion to the brigade level, in the U.S. Navy command structure when advancing from Department Head/Staff/sea command to organizational or fleet command (depending on line or staff corps), and in business for those just below the C level.

Another way to look at this is through the traditional hierarchical pyramid, in which data represents the wider floor upon which each subsequent, and slightly reduced, level is built.  In the past (and to a certain extent this condition still exists in many places today) each level has constructed its own data stream, with the break most often coming at the operational level.  This discontinuity is then reflected in the inconsistency between bottom-up and top-down decision making.

Information technology is influencing and changing this dynamic by addressing the main reason for the discontinuity existing–limitations in data and intelligence capabilities.  These limitations also established a mindset that relied on limited, summarized, and human-readable reporting that often was “scrubbed” (especially at the operational level) as it made its way to the senior decision maker.  Since data streams were discontinuous, there were different versions of reality.  When aspects of the human equation are added, such as selection bias, the intelligence will not match what the data would otherwise indicate.

As I’ve written about previously in this blog, the application of Moore’s Law in physical computing performance and storage has pushed software to greater needs in scaling in dealing with ever increasing datasets.  What is defined as big data today will not be big data tomorrow.

Organizations, in reaction to this condition, have in many cases tended to simply look at all of the data they collect and throw it together into one giant pool.  Not fully understanding what the data may say, a number of ad hoc approaches have been taken.  In some cases this has caused old labor-intensive data mining and rationalization efforts to once again rise from the ashes to which they were rightly consigned in the past.  On the opposite end, this has caused a reliance on pre-defined data queries or hard-coded software solutions, oftentimes based on what had been provided using human-readable reporting.  Both approaches are self-limiting and, to a large extent, self-defeating.  In the first case because the effort and time to construct the system will outlive the needs of the organization for intelligence, and in the second case, because no value (or additional insight) is added to the process.

When dealing with large, disparate sources of data, value is derived through that additional knowledge discovered through the proper use of the data.  This is the basis of the concept of what is known as KDD.  Given that organizations know the source and type of data that is being collected, it is not necessary to reinvent the wheel in approaching data as if it is a repository of Babel.  No doubt the euphemisms, semantics, and lexicon used by software publishers differs, but quite often, especially where data underlies a profession or a business discipline, these elements can be rationalized and/or normalized given that the appropriate business cross-domain knowledge is possessed by those doing the rationalization or normalization.

This leads to identifying the characteristics* of data that is necessary to achieve a continuity from the tactical to the strategic level that will achieve some additional necessary qualitative traits such as fidelity, credibility, consistency, and accuracy.  These are:

  1. Tangible.  Data must exist and the elements of data should record something that correspondingly exists.
  2. Measurable.  What exists in data must be something that is in a form that can be recorded and is measurable.
  3. Sufficient.  Data must be sufficient to derive significance.  This includes not only depth in data but also, especially in the case of marking trends, across time-phasing.
  4. Significant.  Data must be able, once processed, to contribute tangible information to the user.  This goes beyond statistical significance noted in the prior characteristic, in that the intelligence must actually contribute to some understanding of the system.
  5. Timely.  Data must be timely so that it is being delivered within its useful life.  The source of the data must also be consistently provided over consistent periodicity.
  6. Relevant.  Data must be relevant to the needs of the organization at each level.  This not only is a measure to test what is being measured, but also will identify what should be but is not being measured.
  7. Reliable.  The sources of the data be reliable, contributing to adherence to the traits already listed.

This is the shorthand that I currently use in assessing a data requirements and the list is not intended to be exhaustive.  But it points to two further considerations when delivering a solution.

First, at what point does the person cease to be the computer?  Business analytics–the tactical level of enterprise data optimization–oftentimes are stuck in providing users with a choice of chart or graph to use in representing such data.  And as noted by many writers, such as this one, no doubt the proper manner of representing data will influence its interpretation.  But in this case the person is still the computer after the brute force computing is completed digitally.  There is a need for more effective significance-testing and modeling of data, with built-in controls for selection bias.

Second, how should data be summarized to the operational and strategic levels so that “signatures” can be identified that inform information?  Furthermore, it is important to understand what kind of data must supplement the tactical level data at those other levels.  Thus, data streams are not only minimized to eliminate redundancy, but also properly aligned to the level of data intelligence.

*Note that there are other aspects of data characteristics noted by other sources here, here, and here.  Most of these concern themselves with data quality and what I would consider to be baseline data traits, which need to be separately assessed and tested, as opposed to antecedent characteristics.

 

Do You Believe in Magic? — Big Data, Buzz Phrases, and Keeping Feet Planted Firmly on the Ground

My alternative title for this post was “Money for Nothing,” which is along the same lines.  I have been engaged in discussions regarding Big Data, which has become a bit of a buzz phrase of late in both business and government.  Under the current drive to maximize the value of existing data, every data source, stream, lake, and repository (and the list goes on) has been subsumed by this concept.  So, at the risk of being a killjoy, let me point out that not all large collections of data is “Big Data.”  Furthermore, once a category of data gets tagged as Big Data, the further one seems to depart from the world of reality in determining how to approach and use the data.  So for of you who find yourself in this situation, let’s take a collective deep breath and engage our critical thinking skills.

So what exactly is Big Data?  Quite simply, as noted by this article in Forbes by Gil Press, term is a relative one, but generally means from a McKinsey study, “datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.”  This subjective definition is a purposeful one, since Moore’s Law tends to change what is viewed as simply digital data as opposed to big data.  I would add some characteristics to assist in defining the term based on present challenges.  Big data at first approach tends to be unstructured, variable in format, and does not adhere to a schema.  Thus, not only is size a criteria for the definition, but also the chaotic nature of the data that makes it hard to approach.  For once we find a standard means of normalizing, rationalizing, or converting digital data, it no longer is beyond the ability of standard database tools to effectively use it.  Furthermore, the very process of taming it thereby renders it non-big data, or perhaps, if a exceedingly large dataset, perhaps “small big data.”

Thus, having defined our terms and the attributes of the challenge we are engaging, we now can eliminate many of the suppositions that are floating around in organizations.  For example, there is a meme that I have come upon that asserts that disparate application file data can simply be broken down into its elements and placed into database tables for easy access by analytical solutions to derive useful metrics.  This is true in some ways but both wrong and dangerous in its apparent simplicity.  For there are many steps missing in this process.

Let’s take, for example, the least complex example in the use of structured data submitted as proprietary files.  On its surface this is an easy challenge to solve.  Once someone begins breaking the data into its constituent parts, however, greater complexity is found, since the indexing inherent to data interrelationships and structures are necessary for its effective use.  Furthermore, there will be corruption and non-standard use of user-defined and custom fields, especially in data that has not undergone domain scrutiny.  The originating third-party software is pre-wired to be able to extract this data properly.  Absent having to use and learn multiple proprietary applications with their concomitant idiosyncrasies, issues of sustainability, and overhead, such a multivariate approach defeats the goal of establishing a data repository in the first place by keeping the data in silos, preventing integration.  The indexing across, say, financial systems or planning systems are different.  So how do we solve this issue?

In approaching big data, or small big data, or datasets from disparate sources, the core concept in realizing return on investment and finding new insights, is known as Knowledge Discovery in Databases or KDD.  This was all the rage about 20 years ago, but its tenets are solid and proven and have evolved with advances in technology.  Back then, the means of extracting KDD from existing databases was the use of data mining.

The necessary first step in the data mining approach is pre-processing of data.  That is, once you get the data into tables it is all flat.  Every piece of data is the same–it is all noise.  We must add significance and structure to that data.  Keep in mind that we live in this universe, so there is a cost to every effort known as entropy.  Computing is as close as you’ll get to defeating entropy, but only because it has shifted the burden somewhere else.  For large datasets it is pushed to pre-processing, either manual or automated.  In the brute force world of data mining, we hire data scientists to pre-process the data, find commonalities, and index it.  So let’s review this “automated” process.  We take a lot of data and then add a labor-intensive manual effort to it in order to derive KDD.  Hmmm..  There may be ROI there, or there may not be.

But twenty years is a long time and we do have alternatives, especially in using Fourth Generation software that is focused on data usage without the limitations of hard-coded “tools.”  These alternatives apply when using data on existing databases, even disparate databases, or file data structured under a schema with well-defined data exchange instructions that allow for a consistent manner of posting that data to database tables. The approach in this case is to use APIs.  The API, like OLE DB or the older ODBC, can be used to read and leverage the relative indexing of the data.  It will still require some code to point it in the right place and “tell” the solution how to use and structure the data, and its interrelationship to everything else.  But at least we have a means for reducing the cost associated with pre-processing.  Note that we are, in effect, still pre-processing data.  We just let the CPU do the grunt work for us, oftentimes very quickly, while giving us control over the decision of relative significance.

So now let’s take the meme that I described above and add greater complexity to it.  You have all kinds of data coming into the stream in all kinds of formats including specialized XML, open, black-boxed data, and closed proprietary files.  This data is non-structured.  It is then processed and “dumped” into a non-relational database such as NoSQL.  How do we approach this data?  The answer has been to return to a hybrid of pre-processing, data mining, and the use of APIs.  But note that there is no silver bullet here.  These efforts are long-term and extremely labor intensive at this point.  There is no magic.  I have heard time and again from decision makers the question: “why can’t we just dump the data into a database to solve all our problems?”  No, you can’t, unless you’re ready for a significant programmatic investment in data scientists, database engineers, and other IT personnel.  At the end, what they deploy, when it gets deployed, may very well be obsolete and have wasted a good deal of money.

So, once again, what are the proper alternatives?  In my experience we need to get back to first principles.  Each business and industry has commonalities that transcend proprietary software limitations by virtue of the professions and disciplines that comprise them.  Thus, it is domain expertise to the specific business that drives the solution.  For example, in program and project management (you knew I was going to come back there) a schedule is a schedule, EVM is EVM, financial management is financial management.

Software manufacturers will, apart from issues regarding relative ease of use, scalability, flexibility, and functionality, attempt to defend their space by establishing proprietary lexicons and data structures.  Not being open, while not serving the needs of customers, helps incumbents avoid disruption from new entries.  But there often comes a time when it is apparent that these proprietary definitions are only euphemisms for a well-understood concept in a discipline or profession.  Cat = Feline.  Dog = Canine.

For a cohesive and well-defined industry the solution is to make all data within particular domains open.  This is accomplished through the acceptance and establishment of a standard schema.  For less cohesive industries, but where the data or incumbents through the use of common principles have essentially created a de facto schema, APIs are the way to extract this data for use in analytics.  This approach has been applied on a broader basis for the incorporation of machine data and signatures in social networks.  For closed or black-boxed data, the business or industry will need to execute gap analysis in order to decide if database access to such legacy data is truly essential to its business, or given specification for a more open standard from “time-now” will eventually work out suboptimization in data.

Most important of all and in the end, our results must provide metrics and visualizations that can be understood, are valid, important, material, and be right.