Take Me To The River, Part 2, Schedule Elements–A Digital Inventory of Integrated Program Management Elements

Recent attendance at various forums to speak has interrupted the flow of this series on IPM elements. Among these venues I was engaged in discussions regarding this topic, as well as the effects of acquisition reform on the IT, program, and project management communities in the DoD and A&D marketplace.

For this post I will restrict the topic to what are often called schedule elements, though that is a nebulous term. Also, one should not draw a conclusion that because I am dealing with this topic following cost elements, that it is somehow inferior in importance to those elements. On the contrary, planning and scheduling are integral to applying resources and costs, in tracking cost performance, and in our systemic analysis its activities, artifacts, and elements are antecedent to cost element considerations.

The Relative Position of Schedule

But the takeaway here is this: under no circumstances should any program or project manager believe that cost and schedule systems represent a dichotomy, nor a hierarchy, of disciplines. They are interdependent and the behavior noted in one will be manifested in the other.

This is important to keep in mind, because the software industry, more than any other, has been responsible for reinforcing and solidifying this (erroneous) perspective. During the first generation of desktop application development, software solutions were built to automate the functions of traditional line and staff functions. This made a great deal of sense.

From a sales and revenue perspective, it is easier to sell a limited niche software “tool” to an established customer base that will ensure both quick acceptance and immediate realization of productivity and labor savings. The connection from the purchase to ROI was easily traceable in the time span and at the level of the person performing their workaday tasks.

Thus, solutions were built to satisfy the needs of cost analysts, schedule analysts, systems engineers, cost estimators, and others. Where specific solutions left gaps, such spreadsheet solutions such as Microsoft Excel were employed to fill them. It was in no one’s interest to go beyond their core competency. Once a dominant or set of dominant incumbents (a monoposony) inhabited a niche, they employed the usual strategies for “stickiness” to defend territory and raise barriers to new entries.

What was not anticipated by many organizations was the fact that once you automate a function that the nature of the system, if one is to implement the most effective organizational structure, is transformed to conform to the most efficient flow and use of data–and its resulting transformation into information and intelligence. Oftentimes the skill set to use the intelligence does not exist because the resulting insights and synergy involved in taking larger and more comprehensive datasets which themselves are more credible and accurate was not anticipated in adjusting the organizational structure.

This is changing and must change, because the old way of using limited sets of data in the age of big(ger) data that provide a more comprehensive view of business conditions is not tenable. At least, not if a company or organization wants to stay relevant or profitable.

Characteristics and Basic Elements of the Project Schedule

If one were to perform a Google search of project schedule while reading this post, you would find a number of definitions, some of which overlap. For example, the PMBOK defines a schedule as, quite simply, “the planned dates for performing activities and the planned dates for meeting milestones.”

Thus our elements include planned dates, activities, and milestones. But is that all? Under this definition, any kind of plan, from a minor household renovation or upgrade to building an aircraft carrier would contain only these elements.

I don’t think so.

For complex projects and programs, which is the focus on this blog, our definition of a project schedule is a bit more comprehensive. If you go to the aforementioned A Guide for DoD Program Managers mentioned in my last post, you will find even less specificity.

The reason for this is that what we define as a project schedule is part and parcel of the planning phase of a project, which is then further specified in the specific time-phased planning elements for execution of the project through its lifespan into production. It is the schedule that ties together all of the disciplines in putting together a project–acquisition, systems engineering, cost estimating, and project performance management.

In attending scheduled-focused conferences over the years and in talking to program management colleagues is the refrain that:

a. It is hard to find a good scheduler, and

b. Constructing a schedule is more of an art than a science.

I can only say that this cedes the field to a small cadre of personnel who perform an essential function, but who do so with few objective tests of effectiveness or accountability–until it is too late.

But the reality is quite different from the fuzzy perception of schedule that is often assumed. All critical path method (CPM) schedules describe the same phenomena, though the lexicon will vary based on the specific proprietary application employed.

In government-focused and large commercial projects, the schedule is heart of planning and execution. In the DoD world it is known as the Integrated Master Schedule (IMS), which utilize the inherent bottom-up relationships of elements to determine the critical path. The main sources regarding the IMS have a great deal of overlap, but tend to be either aspirational (and unfortunately not prescriptive in defining the basic characteristics of an IMS) or reflect the “art over science” approach. For those following along these are the DoD Integrated Master Plan and Integrated Master Schedule Preparation and Use Guide of 21 October 2005, the NAVAIR Integrated Master Schedule (IMS) Guidebook of February 2010, and the NDIA Planning and Scheduling Excellence Guide (PASEG) of 9 March 2016 (unfortunately no current direct link).

The key elements that comprise an IMS, in addition to what we identified under the PMBOK are that it is networked schedule consisting of specific durations that are assigned to specific work tasks that must be accomplished in discrete work packages. In most cases these durations will be derived by some kind of either fixed, manual method or through the inherent optimization algorithm being applied by the CPM application. More on this below. But these work packages are discrete, meaning that they represent the full scope of the work that must be accomplished to during the specified duration for the creation of an end product. Discrete work is distinguished from level of effort (LOE) work, the latter being effort that is always expended, such as administrative and management tasks, that are not directly tied to the accomplishment of an end product.

These work packages are tied together to illustrate antecedent and progressive work that show predecessor and successor relationships. Long term planning activities, which cannot be fleshed out until more immediate work is completed are set aside as placeholders called planning packages. Each of the elements that are tracked in the IMS are based on the presentation of established criteria that define completion, events, and specific accomplishments.

The most comprehensive IMSs consist of detailed planning that include resources and elements of cost.

Detailed Elements of the IMS

Given these general elements, the best source of identifying the key elements of detailed schedules is also found in Department of Defense documents. The core document in this case is the Data Item Description for the IMS numbered as DI-MGMT-81650. The latest one is dated March 30, 2005. There are a minimum of 32 data elements, some of these already mentioned and which I will not repeat in this post since they are pretty well listed and identified in the source document.

For those not familiar with these documents, Data Item Descriptions (or DiDs–gotta love acronyms) represent the detailed technical documents for artifacts involved in the management of DoD-related operations. Thus, this provides us with a pretty good inventory of elements to source. But there are others that are implied.

For example, the 81650 DiD identifies an element known as “methodology.” What this means is that each scheduling application has an optimization engine where the true differences in schedule construction and intellectual property reside. Elements that affect these calculations are time-based, duration-based, float, and slack, and those related to resources.

These time-based elements consist of early start, early finish, late start, late finish. Duration-based elements consist of shortest time, longest time, greatest rank weight. An additional element related to schedule float identifies minimum slack. Resources are further delineated by the greatest work content and the greatest cumulative resource content.

I would note that the NDIA PASEG adds some sub-elements to this list that are based on the algorithmic result of the schedule engines and, thus, tends to ignore the antecedent salient elements of validating the optimization engine found above. These additional sub-elements are total float, free float, soft constraints, hard constraints, and–also found in the aforementioned DiD–program, task, and resource calendars.

Normally, this is where a survey would end–with schedule-specific data elements focused on the details of the schedule. But we’re going to challenge our assumptions a bit more.

Framing Assumptions of Schedules and Programs

The essential document that provides a definition of the term “framing assumption” was published by RAND Corporation in 2014 entitled Identifying Acquisition Framing Assumptions Through Structured Deliberation by Mark V. Arena and Lauren A. Mayer.  The definition of a framing assumption is “any explicit or implicit assumption that is central is shaping cost, schedule, or performance expectations.”

As I have explored in my prior post, the use of the term “cost” is a fuzzy one. To some it means earned value management, which measures a small part of the costs of development and ownership of a system. To others it means total cost of ownership. Schedule is an implicit part of this definition, and then we have performance expectations, which I will deal with in a separate post.

But we can apply the concept of framing assumptions in two ways.

The first applies to the assumed purpose of the schedule. What do we construct one? This goes back to my earlier statement that “…the schedule…ties together all of the disciplines in putting together a project–acquisition, systems engineering, cost estimating, and project performance management.”

For the NDIA PASEG the IMS is a “tool, not just a report” that “provides an ever-changing window into the progress (or lack of it) of current work effort. The strategic mission of the schedule is to point out future risks and opportunities.”

For the NAVAIR IMS Guide the IMS “At a top level…contain(ing) the networked, detailed tasks necessary to ensure successful program execution…” that “capture(s) project tasks and task relationships”, “show(s) the magnitude and how long each task will take”, “show(s) resources, durations, and constraints for each task” and “show(s) the critical path.”

For the DiD 81650 “The Integrated Master Schedule (IMS) is an integrated schedule containing the networked, detailed tasks necessary to ensure successful program execution.”

But the most comprehensive definition that goes to the core of the purpose of an IMS can be found in paragraph 1.2 of the DoD Integrated Master Plan and Integrated Master Schedule Preparation and Use Guide (IMP/IMS Guide). The elements of this purpose is worth transcribing, because if we have a requirement and cannot ask the “So What?” question, that is, if we cannot effectively determine why something must be done, then it probably does not need to be done (or we need to apply rigor in the development our expertise).

For what the IMP/IMS Guide does is clearly tie the schedule to the programmatic framing assumptions (used in the context in which RAND meant it) from initial acquisition through planning. Thus, the Integrated Master Plan (IMP) is firmly established as an antecedent and intermediate planning process (not merely an artifact or tool), that results in the program R&D execution process.

Taken in whole these processes and the resulting artifacts of the processes provide:

a. Provides offerors and acquiring activities with detailed execution planning, organization, and scheduling information that sets realistic expectations for the resulting contract action.

b. Serves as the execution plan for how the supplier will meet the contract’s performance requirements within cost and schedule constraints.

c. Provides a basis for integrating all of the functions involved in development and deployment of the system being acquired and, after award, sets the framing assumptions of the program.

d. Provides the basis for determining and assessing progress, identifying risks, determining the basis for contractual award fees and penalties, assess progress on Key Performance Parameters (KPPs) and Technical Performance Measures (TPMs), determine alternative paths to project completion, and determine opportunities for innovation and new acquisitions not apparent at the time of the award.

What all of this means is that the Integrated Master Schedule is too important to be left to the master scheduler. Yes, the schedule is a “tool” to those at the most basic tactical level in work execution. Yes, it is also an artifact and record.

But, more importantly, it is the comprehensive notional representation of the project’s or program’s scope, effort, progress, and assessment.

Private and Government-focused Industry Practice

A word has to mentioned here about the difference in practice between purely private industry practice in managing large projects and programs, and the skewing in the posts that focus on those industries that focus on public sector acquisition.

In the listing of schedule elements listed earlier there is reference to resources and elements of cost, yet here is an area that standard practice diverges. In private industry the application of resource assignments to specific work is standard practice and found in the IMS.

In companies focused on the public sector and DoD, the practice is to establish a different set of data outside of the schedule to manage resources. Needless to say this creates problems of validation of data across disparate systems related to the lowest level of planning and execution of a project or program. The basis for it, I think, relates to viewing the schedule as a “tool” and not the basis for project execution. This “tool” mindset also allows for separate “earned value engines” that oftentimes do not synchronize with the execution of the schedule, not only undermining the practical value of both, but also creating systems complexity and inefficiency where none need exist.

Another gap found in many areas of public acquisition concerns the development of an integrated master plan antecedent to the integrated master schedule. The cause here, once again, I believe is viewing the discipline of systems engineering separate; one that is somehow walled off from the continuing assessment of program execution, though that assumption is not supported by program phasing and milestone planning and achievement.

From the perspective of Integrated Program/Project Management, these considerations cannot be ignored, and so our inventory of essential data elements must include elements from these practices.

But Wait! There’s More!

Most discussions at conferences and professional meetings will usually stop at this point–viewing cost and schedule integration as the essence of IPM–with “cost’ limited to EVM. Some will add some “oh by the ways” such as technical performance and risk. I will address these in the next post as well.

But there are also other systems and processes that are relevant to our inventory. But what I have covered thus far in this series should challenge you if you have been paying attention.

I tackled cost first because of the assumptions implicit in equating it with EVM, and then went on to demonstrate that there are other elements of cost that provide a more comprehensive view. This is not denigrate the value of EVM, since it is an essential process in project management, but to demonstrate that its analytics are not comprehensive and, as with any complex system, require the contribution of additional information, depending on the level and type of work performance and progress being recorded and assessed.

In this post I have tacked the IMS, and have demonstrated that it is not supplementary process, but central to all other processes and actions being taken in the execution of the project or program. Many times people enter the schedule from an assessment of cost performance–tracing cost drivers to specific schedule activities and then tasks. But this has it backwards, based on the best technology available sometime in the late 1990s.

It is the schedule that brings together all relevant information from our execution and control processes and systems. It seems to me that perhaps the first place one goes is the schedule, that the first element to trace are those related to schedule slippage and unexpected resource consumption, and then to trace these to contract cost impact.

But, of course, there is more–and these other elements may turn out to be of greater consequence than just cost and schedule considerations. More on these in my next post.

In Closing: Battle Rhythm and the Plans of the Day and Week

When I was on active duty in the Navy we planned our days and weeks around a Plan of the Day or Plan of the Week. This is a posted agenda so that the entire ship or command understands the major events that affect its operations. It establishes focus on the main events at hand and fosters communication both laterally and vertically within the chain of command.

As one rises in rank and responsibility it is important to understand the operational tempo of the unit or ship, its systems, and subsystems. This is important in avoiding crisis management.This is known as Battle Rhythm.

Baked into the schedule (assuming proper construction and effective integrated product teaming) are the major events, milestones, and expected achievement of the program or project. Thus, there are events that should be planned around and anticipation of these items on a daily, weekly, biweekly, monthly, quarterly, and major milestone basis.

Given an effective battle rhythm, a PM should never complain about performance and progress indicators “looking into the rear view mirror”. If that is the case then perhaps the PM should look at the effectiveness and timeliness of the underlying project and program systems.Thus, when a PMO complains of information and intelligence being too late to be actionable, it is actually describing a condition of ineffective, latent, and disjointed information and intelligence systems.

Thus, our next step in our next post is to identify more salient IPM elements that cut to the heart of the matter.

Take Me to the River, Part 1, Cost Elements – A Digital Inventory of Integrated Program Management Elements

In a previous post I recommended a venue focused on program managers to define what constitutes integrated program management. Since that time I have been engaged with thought leaders and influencers in both government and industry, many of whom came to a similar conclusion independently, agree in this proposition and who are working to bring it about.

My own interest in this discussion is from the perspective of maximization of the information ecosystem that underlies and describes the systems known as projects and programs. But what do I mean by this? This is more than a gratuitous question, because oftentimes the information essential to defining project and program performance and behavior are intermixed, and therefore diluted and obfuscated, by confusion with those of the overall enterprise.

Project vs. Program

What a mean by the term project in this context is an organization that is established around a defined effort of fixed duration (a defined beginning and projected end) that is specifically planned and organized for the development and deployment of a particular end item, state, or result, with an identified set of resources assigned and allocated to achieve its goals.

A program is defined as a set of interrelated projects and sub-projects which is also of fixed duration that is specifically planned and organized not only for the development and deployment, but also the continues this role through sustainment (including configuration control), of a particular end item, state, or result, with an identified set of resources assigned and allocated to achieve its goals. As such, the program management team also is the first level life-cycle manager of the end item, state, or result, and participates with other levels of the organization in these activities. (More on life-cycle costs below).

Note the difference in scope and perspective, though oftentimes we use these terms interchangeably.

For shorthand, a small project of short duration operates at the tactical level of planning. A larger project, which because of size, complexity, duration, and risk approaches the definition of a program, operates at the operational level, as do most programs. Larger and more complex programs that will affect the core framing assumptions of the enterprise align their goals to the strategic level of planning. Thus, there are differences in scale, complexity and, hence, data points that can be captured at these various levels.

Another aspect of the question of establishing an integrated digital project and program management environment is sufficiency of data, which relates directly to scale. Sufficiency in this regard is defined as whether there is enough data to establish a valid correlation and, hopefully, draw a causation. Micro-economic foundations–and models–often fail because of insufficient data. This is important to keep in mind as we inventory the type of data available to us and its significance. Oftentimes additional data points can make up for those cases where there is insufficiency in the depth and quality of a more limited set of data points. Doing so will also mitigate subjectivity, especially in smaller efforts.

Thus, in constructing a project or program, regardless of its level of planning, we often begin by monitoring the most basic elements. These are usually described as cost, schedule, performance, and risk, though I will discuss and identify other contributors that can be indexed.

This first post will concentrate on the first set of elements–those that constitute cost. In looking at these, however, we will find that the elements within this category are a bit broader than what is currently used in determining project and program performance.

Contract Costs

When we refer to costs in project and program management we oftentimes are referring to those direct and indirect costs expended by the supplier over the course of the effort, particular in Cost Plus contractual efforts. The breakout of cost from a data perspective places it in subcategories:

Note that these are costs within the contract itself, as a cohesive, self-identifying entity. But there are other costs associated with our contracts which feed into program and project management. These are necessary to identify and capture if we are to take an holistic approach to these disciplines.

The costs that are anticipated by the contract are based on cost estimates, which need to be funded. These funded costs will be allocated to particular lines in the contract (CLINs), whether these be supporting contract efforts or deliverables. Thus, additional elements of our digital inventory include these items but lead us to our next categories.

Cost Estimates, Colors of Money, and Cash Flow

Cost estimates are the basis for determining the entire contract effort, and eventually make it into the project and program cost plan. Once cost estimates are applied and progress is tracked through the collection of actual costs, these elements are further traced to project and program activities, products, commodities, and other business categories, such as the indirect costs identified on the right hand side of the chart above.

Our cost plans need to be financed, as with any business entity. Though the most complex projects often are financed by some government entity because of their scale and impact, private industry–even among the largest companies–must obtain financing for the efforts at hand, whether these come from internal or external sources.

Thus two more elements present themselves: “colors” of money, that is, money that is provided for a specific purpose within the project and program cost plan which could also be made available for only some limited period of time, and the availability of that money sufficient to execute particular portions of the project or program, that is, cash flow.

The phase of the project or program will determine the type of money that is made available. These are also contained in the costs that are identified in the next section, but include, from a government financing perspective, Research, Development, Test and Evaluation (RDT&E) money, Procurement, Operations and Maintenance (O&M), and Military Construction (MILCON) dollars. By Congressional appropriation and authorization, each of these types of money may be provided for particular programs, and each type of authorization has a specific period in which they can be committed, obligated, and expended before they expire. The type of money provided also aligns with the phase of the project or program: whether it still be in development, production, deployment and acquisition, sustainment, or retirement.

These costs will be reflected in reporting that reflects actual and projected rates of expenditure, that will be tied to procurement, material management, and resource management systems.

Additional Relative Costs

As with all efforts, the supplier is not the only entity to incur costs on a development project or program. The customer also incurs costs, which must be taken into account in determining the total cost of the effort.

For anyone who has undergone any kind of major effort on their home, or even had to get things other workaday things done, like deciding when to change the tires on the car or when to get to the dentist implicitly understand that there is more effort in timing and determining the completion of these items than the cost of new kitchen cabinets, tires, or a filling. One must decide to take time off from work. One must look to their own cash flow to see if they have sufficient funds not only for the merchant, but for all of the sundry and associated tasks that must be done in preparation for and after the task’s completion. To choose to do one thing is to choose not to do another–an opportunity cost. Other people may be involved in the decision. Perhaps children are in the household and a babysitter is required. Perhaps the home life is so disrupted that another temporary abode is necessary on a short term basis.

All of these are costs that one must take into account, and at the individual level we do these calculations and plan these activities as a matter of fact.

In customer-supplier relationships the former incurs costs above the contract costs, which must be taken into account by the customer project or program executive. In the Department of Defense an associated element is called program management administration (PMA). For private entities this falls into allocated G&A and Overhead costs, aside from direct labor and material costs, but in all cases these are costs that have come about due to the decision to undertake the specific effort.

Other elements of cost on the customer side are contractually furnished facilities, property, material or equipment, and testing and evaluation costs.

Contract Cost Performance: Earned Value Management

I will further discuss EVM in more detail a later installment of this element inventory, but mention must be made of EVM since to exclude it is to be grossly remiss.

At core EVM is a financial measure of value against what has been physically achieved against a performance management baseline (PMB), which ties actual costs and completion of work through a work breakdown structure (WBS). It is focused on the contract level of performance, which in some cases may constitute the entire project, though not necessarily the entire effort for the program.

Linkages to the other cost elements I have delineated elsewhere in this post ranges from strong to non-existent. Thus, while an essential means of linking contractual achievement to work accomplishment that, at various levels of fidelity, is linked to actual technical achievement, it does not capture all of the costs in our data inventory.

An essential overview in understanding what it does capture is best summed up in the following diagram taken from the Defense Acquisition University (DAU) site:

Commercial EVM elements, while not necessarily using the same terminology or highly structured process, possess a similar structure in allocating costs and achievement against baseline costs in developmental efforts to work packages (oftentimes schedule tasks in resource-loaded schedules) under an integrated WBS structure with Management Reserve not included as part of the baseline.

Also note that commercial efforts often include their internal costs as part of the overall contractual effort in assessing earned value against actual work achievement, while government contracting efforts tend to exclude these inherent costs. That being said, it is not that there is no cost control in these elements, since strict ceilings often apply to PMA and other such costs, it is that contract cost performance does not take these costs, among others, into account.

Furthermore, the chart above provides us with additional sub-elements in our inventory that are essential in capturing data at the appropriate level of our project and program hierarchy.

Thus, for IPM, EVM is one of many elements that are part our digital inventory–and one that provides a linkage to other non-cost elements (WBS). But in no way should it be viewed as capturing all essential costs associated with a contractual effort, aside from the more expansive project or program effort.

Portfolio Management and Life-Cycle Costs

There is another level of management that is essential in thinking about project and program management, and that is the program executive level. In the U.S. military services these are called Program Executive Officers (PEOs). In private industry they are often product managers, CIOs, and other positions that often represent the link between the program management teams and the business operations side of the organization. Thus, this is also the level of management organized to oversee a number of individual projects and programs that are interrelated based on mission, commodity, or purpose. As such, this level of management often concentrates on issues across the portfolio of projects and programs.

The main purpose of the portfolio management level is to ensure that project and program efforts are aligned with the strategic goals of the organization, which includes an understanding of the total cost of ownership.

In performing this purpose one of the functions of portfolio management is to identify risks that may manifest within projects and programs, and to determine the most productive use of limited resources across them, since they are essentially competing for the same dollars. This includes cost estimates and re-allocations to address ontological, aleatory, and epistemic risk.

Furthermore, the portfolio level is also concerned with the life-cycle factors of the item under development, so that there is effective hand-off at the production and sustainment phases. The key here is to ensure that each project or program, which is focused on the more immediate goals of project and program execution, continues to meet the goals of the organization in terms of life-cycle costs, and its effectiveness in meeting the established goals essential to the project or program’s framing assumptions.

But here we are focusing on cost, and so the costs involved are trade-off costs and opportunities, assessments of return on investment, and the aforementioned total cost of ownership of the end item or system. The costs that contribute to the total cost of ownership include all of the development costs, external and internal program management costs, procurement costs, operations and support costs, maintenance and life extension costs, and system retirement costs.

Conclusion

I believe that the survey of cost elements presented in this initial post illustrates that present digital project and program management systems are limited and immature–capturing and evaluating only a small portion of the total amount of available data.

These gaps make it impossible, for example, to determine the relative significance any one element–and the analytics that can derived from it–over another; not to mention the inability to provide the linkage among these absent elements that would garner insights into cause-and-effect and predictive behavior so that we have enough time to influence the outcome.

It is also clear that, when we strive to define what constitutes integrated project and program management, that we must learn what is of most importance to the PM in performing those duties that are viewed as essential to success, and which are not yet captured in our analytical and predictive systems.

Only when our systems reach the level of cohesiveness and comprehensiveness in providing organizational insight and intelligence essential to project or program management will PMs ignore them at their own risk. In getting there we must first identify what can be captured from the activities that contribute to our efforts.

My next post will identify essential elements related to planning and scheduling.

 

Note: I am indebted to Defense Acquisition University’s resources in my research across many of my postings and link to them for the edification of the reader. For more insight into many of the points raised in this post I would recommend that readers familiarize themselves with A Guide for DoD Program Managers.

 

(Data) Transformation–Fear and Loathing over ETL in Project Management

ETL stands for data extract, transform, and load. This essential step is the basis for all of the new capabilities that we wish to acquire during the next wave of information technology: business analytics, big(ger) data, interdisciplinary insight into processes that provide insights into improving productivity and efficiency.

I’ve been dealing with a good deal of fear and loading regarding the introduction of this concept, even though in my day job my organization is a leading practitioner in the field in its vertical. Some of this is due to disinformation by competitors in playing upon the fears of the non-technically minded–the expected reaction of those who can’t do in the last throws of avoiding irrelevance. Better to baffle them with bullshit than with brilliance, I guess.

But, more importantly, part of this is due to the state of ETL and how it is communicated to the project management and business community at large. There is a great deal to be gained here by muddying the waters even by those who know better and have the technology. So let’s begin by clearing things up and making this entire field a bit more coherent.

Let’s start with the basics. Any organization that contains the interaction of people is a system. For purposes of a project management team, a business enterprise, or a governmental body we deal with a special class of systems known as Complex Adaptive Systems: CAS for short. A CAS is a non-linear learning system that reacts and evolves to its environment. It is complex because of the inter-relationships and interactions of more than two agents in any particular portion of the system.

I was first introduced to the concept of CAS through readings published out of the Santa Fe Institute in New Mexico. Most noteworthy is the work The Quark and the Jaguar by the physicist Murray Gell-Mann. Gell-Mann is received the Nobel in physics in 1969 for his work on elementary particles, such as the quark, and is co-founder of the Institute. He also was part of the team that first developed simulated Monte Carlo analysis during a period he spent at RAND Corporation. Anyone interested in the basic science of quanta and how the universe works that then leads to insights into subjects such as day-to-day probability and risk should read this book. It is a good popular scientific publication written by a brilliant mind, but very relevant to the subjects we deal with in project management and information science.

Understanding that our organizations are CAS allows us to apply all sorts of tools to better understand them and their relationship to the world at large. From a more practical perspective, what are the risks involved in the enterprise in which we are engaged and what are the probabilities associated with any of the range of outcomes that we can label as success. For my purposes, the science of information theory is at the forefront of these tools. In this world an engineer by the name of Claude Shannon working at Bell Labs essentially invented the mathematical basis for everything that followed in the world of telecommunications, generating, interpreting, receiving, and understanding intelligence in communication, and the methods of processing information. Needless to say, computing is the main recipient of this theory.

Thus, all CAS process and react to information. The challenge for any entity that needs to survive and adapt in a continually changing universe is to ensure that the information that is being received is of high and relevant quality so that the appropriate adaptation can occur. There will be noise in the signals that we receive. What we are looking for from a practical perspective in information science are the regularities in the data so that we can make the transformation of receiving the message in a mathematical manner (where the message transmitted is received) into the definition of information quality that we find in the humanities. I believe that we will find that mathematical link eventually, but there is still a void there. A good discussion of this difference can be found here in the on-line publication Double Dialogues.

Regardless of this gap, the challenge of those of us who engage in the business of ETL must bring to the table the ability not only to ensure that the regularities in the information are identified and transmitted to the intended (or necessary) users, but also to distinguish the quality of the message in the terms of the purpose of the organization. Shannon’s equation is where we start, not where we end. Given this background, there are really two basic types of data that we begin with when we look at a set of data: structured and unstructured data.

Structured data are those where the qualitative information content is either predefined by its nature or by a tag of some sort. For example, schedule planning and performance data, regardless of the idiosyncratic/proprietary syntax used by a software publisher, describes the same phenomena regardless of the software application. There are only so many ways to identify snow–and, no, the Inuit people do not have 100 words to describe it. Qualifiers apply in the humanities, but usually our business processes more closely align with statistical and arithmetic measures. As a result, structured data is oftentimes defined by its position in a hierarchical, time-phased, or interrelated system that contains a series of markers, indexes, and tables that allow it to be interpreted easily through the identification of a Rosetta stone, even when the system, at first blush, appears to be opaque. When you go to a book, its title describes what it is. If its content has a table of contents and/or an index it is easy to find the information needed to perform the task at hand.

Unstructured data consists of the content of things like letters, e-mails, presentations, and other forms of data disconnected from its source systems and collected together in a flat repository. In this case the data must be mined to recreate what is not there: the title that describes the type of data, a table of contents, and an index.

All data requires initial scrubbing and pre-processing. The difference here is the means used to perform this operation. Let’s take the easy path first.

For project management–and most business systems–we most often encounter structured data. What this means is that by understanding and interpreting standard industry terminology, schemas, and APIs that the simple process of aligning data to be transformed and stored in a database for consumption can be reduced to a systemic and repeatable process without the redundancy of rediscovery applied in every instance. Our business intelligence and business analytics systems can be further developed to anticipate a probable question from a user so that the query is pre-structured to allow for near immediate response. Further, structuring the user interface in such as way as to make the response to the query meaningful, especially integrated with and juxtaposed other types of data requires subject matter expertise to be incorporated into the solution.

Structured ETL is the place that I most often inhabit as a provider of software solutions. These processes are both economical and relatively fast, particularly in those cases where they are applied to an otherwise inefficient system of best-of-breed applications that require data transfers and cross-validation prior to official reporting. Time, money, and effort are all saved by automating this process, improving not only processing time but also data accuracy and transparency.

In the case of unstructured data, however, the process can be a bit more complicated and there are many ways to skin this cat. The key here is that oftentimes what seems to be unstructured data is only so because of the lack of domain knowledge by the software publisher in its target vertical.

For example, I recently read a white paper published by a large BI/BA publisher regarding their approach to financial and accounting systems. My own experience as a business manager and Navy Supply Corps Officer provide me with the understanding that these systems are highly structured and regulated. Yet, business intelligence publishers treated this data–and blatantly advertised and apparently sold as state of the art–an unstructured approach to mining this data.

This approach, which was first developed back in the 1980s when we first encountered the challenge of data that exceeded our expertise at the time, requires a team of data scientists and coders to go through the labor- and time-consuming process of pre-processing and building specialized processes. The most basic form of this approach involves techniques such as frequency analysis, summarization, correlation, and data scrubbing. This last portion also involves labor-intensive techniques at the microeconomic level such as binning and other forms of manipulation.

This is where the fear and loathing comes into play. It is not as if all information systems do not perform these functions in some manner, it is that in structured data all of this work has been done and, oftentimes, is handled by the database system. But even here there is a better way.

My colleague, Dave Gordon, who has his own blog, will emphasize that the identification of probable questions and configuration of queries in advance combined with the application of standard APIs will garner good results in most cases. Yet, one must be prepared to receive a certain amount of irrelevant information. For example, the query on Google of “Fun Things To Do” that you may use if you are planning for a weekend will yield all sorts of results, such as “50 Fun Things to Do in an Elevator.”  This result includes making farting sounds. The link provides some others, some of which are pretty funny. In writing this blog post, a simple search on Google for “Google query fails” yields what can only be described as a large number of query fails. Furthermore, this approach relies on the data originator to have marked the data with pointers and tags.

Given these different approaches to unstructured data and the complexity involved, there is a decision process to apply:

1. Determine if the data is truly unstructured. If the data is derived from a structured database from an existing application or set of applications, then it is structured and will require domain expertise to inherit the values and information content without expending unnecessary resources and time. A structured, systemic, and repeatable process can then be applied. Oftentimes an industry schema or standard can be leveraged to ensure consistency and fidelity.

2. Determine whether only a portion of the unstructured data is relative to your business processes and use it to append and enrich the existing structured data that has been used to integrate and expand your capabilities. In most cases the identification of a Rosetta Stone and standard APIs can be used to achieve this result.

3. For the remainder, determine the value of mining the targeted category of unstructured data and perform a business case analysis.

Given the rapidly expanding size of data that we can access using the advancing power of new technology, we must be able to distinguish between doing what is necessary from doing what is impressive. The definition of Big Data has evolved over time because our hardware, storage, and database systems allow us to access increasingly larger datasets that ten years ago would have been unimaginable. What this means is that–initially–as we work through this process of discovery, we will be bombarded with a plethora of irrelevant statistical measures and so-called predictive analytics that will eventually prove out to not pass the “so-what” test. This process places the users in a state of information overload, and we often see this condition today. It also means that what took an army of data scientists and developers to do ten years ago takes a technologist with a laptop and some domain knowledge to perform today. This last can be taught.

The next necessary step, aside from applying the decision process above, is to force our information systems to advance their processing to provide more relevant intelligence that is visualized and configured to the domain expertise required. In this way we will eventually discover the paradox that effectively accessing larger sets of data will yield fewer, more relevant intelligence that can be translated into action.

At the end of the day the manager and user must understand the data. There is no magic in data transformation or data processing. Even with AI and machine learning it is still incumbent upon the people within the organization to be able to apply expertise, perspective, knowledge, and wisdom in the use of information and intelligence.

Move It On Over — Third and Fourth Generation Software: A Primer

While presenting to organizations regarding business intelligence and project management solutions I often find myself explaining the current state of programming and what current technology brings to the table. Among these discussions is the difference between third and fourth generation software, not just from the perspective of programming–or the Wikipedia definition (which is quite good, see the links below)–but from a practical perspective.

Recently I ran into someone who asserted that their third-generation software solution was advantageous over a fourth generation one because it was “purpose built.” My response was that a fourth generation application provides multiple “purpose built” solutions from one common platform in a more agile and customer-responsive environment. For those unfamiliar with the differences, however, this simply sounded like a war of words rather than the substantive debate that it was.

For anyone who has used a software application they are usually not aware of the three basic logical layers that make up the solution. These are the business logic layer, the application layer, and the database structure. The user interface delivers the result of the interaction of these three layers to the user–what is seen on the screen.

Back during the early advent of the widespread use of PCs and distributed computing on centralized systems, a group of powerful languages were produced that allowed the machine operations to be handled by an operating system and for software developers to write code to focus on “purpose built” solutions.

Initially these efforts concentrated on automated highly labor-intensive activities to achieve maximum productivity gains in an organization, and to leverage those existing systems to distribute information that previously would require many hours of manual effort in terms of mathematical and statistical calculation and visualization. The solutions written were based on what were referred to as third generation languages, and they are familiar even to non-technical people: Fortran, Cobol, C+, C++, C#, and Java, among others. These languages are highly structured and require a good bit of expertise to correctly program.

In third generation environments, the coder specifies operations that the software must perform based on data structure, application logic, and pre-coded business logic.These three levels of highly integrated and any change in one of them requires that the programmer trace the impact of that change to ensure that the operations in the other two layers are not affected. Oftentimes, the change has a butterfly effect, requiring detailed adjustments to take into the account the subtleties in processing. It is this highly structured, interdependent, “purpose built” structure that causes unanticipated software bugs to pop up in most applications. It is also the reason why software development and upgrade configuration control is also highly structured and time-consuming–requiring long lead-times to even deliver what most users view as relatively mundane changes and upgrades, like a new chart or graph.

In contrast, fourth generation applications separate the three levels and control the underlying behavior of the operating environment by leveraging a standard framework, such as .NET. The .NET operating environment, for example, controls both a library of interoperability across programming languages (known as a Framework Class Library or FCL), and virtual machine that handles exception handling, memory management, and other common functions (known as Common Language Runtime or CLR).

With the three layers separated, with many of the more mundane background tasks being controlled by the .NET framework, a great deal of freedom is provided to the software developer that provides real benefits to customers and users.

For example, the database layer is freed from specific coding from the application layer, since the operating environment allows libraries of industry standard APIs to be leveraged, making the solution agnostic to data. Furthermore, the business logic/UI layer allows for table-driven and object-oriented configuration that creates a low code environment, which not only allows for rapid roll-out of new features and functionality (since hard-coding across all three layers is eschewed), but also allows for more precise targeting of functionality based on the needs of user groups (or any particular user).

This is what is meant in previous posts by new technology putting the SME back in the driver’s seat, since pre-defined reports and objects (GUIs) at the application layer allow for immediate delivery of functionality. Oftentimes data from disparate data sources can be bound together through simple query languages and SQL, particularly if the application layer table and object functionality is built well enough.

When domain knowledge is incorporated into the business logic layer, the distinction between generic BI and COTS is obliterated. Instead, what we have is a hybrid approach that provides the domain specificity of COTS (‘purpose built”), with the power of BI that reduces the response time between solution design and delivery. More and better data can also be accessed, establishing an environment of discovery-driven management.

Needless to say, properly designed Fourth Generation applications are perfectly suited to rapid application development and deployment approaches such as Agile. They also provide the integration potential, given the agnosticism to data, that Third Generation “purpose built” applications can only achieve through data transfer and reconciliation across separate applications that never truly achieve integration. Instead, Fourth Generation applications can subsume the specific “purpose built” functionality found in stand-alone applications and deliver it via a single platform that provides one source of truth, still allowing for different interpretations of the data through the application of differing analytical approaches.

So move it on over nice (third generation) dog, a big fat (fourth generation) dog is moving in.

Learning the (Data) — Data-Driven Management, HBR Edition

The months of December and January are usually full of reviews of significant events and achievements during the previous twelve months. Harvard Business Review makes the search for some of the best writing on the subject of data-driven transformation by occasionally publishing in one volume the best writing on a critical subject of interest to professional through the magazine OnPoint. It is worth making part of your permanent data management library.

The volume begins with a very concise article by Thomas C. Redman with the provocative title “Does Your Company Know What to Do with All Its Data?” He then goes on to list seven takeaways of optimizing the use of existing data that includes many of the themes that I have written about in this blog: better decision-making, innovation, what he calls “informationalize products”, and other significant effects. Most importantly, he refers to the situation of information asymmetry and how this provides companies and organizations with a strategic advantage that directly affects the bottom line–whether that be in negotiations with peers, contractual relationships, or market advantages. Aside from the OnPoint article, he also has some important things to say about corporate data quality. Highly recommended and a good reason to implement systems that assure internal information systems fidelity.

Edd Wilder-James also covers a theme that I have hammered home in a number of blog posts in the article “Breaking Down Data Silos.” The issue here is access to data and the manner in which it is captured and transformed into usable analytics. His recommended approach to a task that is often daunting is to find the path of least resistance in finding opportunities to break down silos and maximize data to apply advanced analytics. The article provides a necessary balm that counteracts the hype that often accompanies this topic.

Both of these articles are good entrees to the subject and perfectly positioned to prompt both thought and reflection of similar experiences. In my own day job I provide products that specifically address these business needs. Yet executives and management in all too many cases continue to be unaware of the economic advantages of data optimization or the manner in which continuing to support data silos is limiting their ability to effectively manage their organizations. There is no doubt that things are changing and each day offers a new set of clients who are feeling their way in this new data-driven world, knowing that the promises of almost effort-free goodness and light by highly publicized data gurus are not the reality of practitioners, who apply the detail work of data normalization and rationalization. At the end it looks like magic, but there is effort that needs to be expended up-front to get to that state. In this physical universe under the Second Law of Thermodynamics there are no free lunches–energy must be borrowed from elsewhere in order to perform work. We can minimize these efforts through learning and the application of new technology, but managers cannot pretend not to have to understand the data that they intend to use to make business decisions.

All of the longer form articles are excellent, but I am particularly impressed with the Leandro DalleMule and Thomas H. Davenport article entitled “What’s Your Data Strategy?” from the May-June 2017 issue of HBR. Oftentimes when addressing big data at professional conferences and in visiting businesses the topic often runs to the manner of handling the bulk of non-structured data. But as the article notes, less than half of an organization’s relevant structured data is actually used in decision-making. The most useful artifact that I have permanently plastered at my workplace is the graphic “The Elements of Data Strategy”, and I strongly recommend that any manager concerned with leveraging new technology to optimize data do the same. The graphic illuminates the defensive and offensive positions inherent in a cohesive data strategy leading an organization to the state: “In our experience, a more flexible and realistic approach to data and information architectures involves both a single source of truth (SSOT) and multiple versions of the truth (MVOTs). The SSOT works at the data level; MVOTs support the management of information.” Elimination of proprietary data silos, elimination of redundant data streams, and warehousing of data that is accessed using a number of analytical methods achieve the necessary states of SSOT that provides the basis for an environment supporting MVOTs.

The article “Why IT Fumbles Analytics” by Donald A. Marchand and Joe Peppard from 2013, still rings true today. As with the article cited above by Wilder-James, the emphasis here is with the work necessary to ensure that new data and analytical capabilities succeed, but the emphasis shifts to “figuring out how to use the information (the new system) generates to make better decisions or gain deeper…insights into key aspects of the business.” The heart of managing the effort in providing this capability is to put into place a project organization, as well as systems and procedures, that will support the organizational transformation that will occur as a result of the explosion of new analytical capability.

The days of simply buying an off-the-shelf silo-ed “tool” and automating a specific manual function are over, especially for organizations that wish to be effective and competitive–and more profitable–in today’s data and analytical environment. A more comprehensive and collaborative approach is necessary. As with the DalleMule and Davenport article, there is a very useful graphic that contrasts traditional IT project approaches against Analytics and Big Data (or perhaps “Bigger” Data) Projects. Though the prescriptions in the article assume an earlier concept of Big Data optimization focused on non-structured data, thereby making some of these overkill, an implementation plan is essential in supporting the kind of transformation that will occur, and managers act at their own risk if they fail to take this effect into account.

All of the other articles in this OnPoint issue are of value. The bottom line, as I have written in the past, is to keep a focus on solving business challenges, rather than buying the new bright shiny object. Alternatively, in today’s business environment the day that business decision-makers can afford to stay within their silo-ed comfort zone are phasing out very quickly, so they need to shift their attention to those solutions that address these new realities.

So why do this apart from the fancy term “data optimization”? Well, because there is a direct return-on-investment in transforming organizations and systems to data-driven ones. At the end of the day the economics win out. Thus, our organizations must be prepared to support and have a plan in place to address the core effects of new data-analytics and Big Data technology:

a. The management and organizational transformation that takes place when deploying the new technology, requiring proactive socialization of the changing environment, the teaching of new skill sets, new ways of working, and of doing business.

b. Supporting transformation from a sub-optimized silo-ed “tell me what I need to know” work environment to a learning environment, driven by what the data indicates, supporting the skills cited above that include intellectual curiosity, engaging domain expertise, and building cross-domain competencies.

c. A practical plan that teaches the organization how best to use the new capability through a practical, hands-on approach that focuses on addressing specific business challenges.

Like Tinker to Evers to Chance: BI to BA to KDD

It’s spring training time in sunny Florida, as well as other areas of the country with mild weather and baseball.  For those of you new to the allusion, it comes from a poem by Franklin Pierce Adams and is also known as “Baseball’s Sad Lexicon”.  Tinker, Evers, and Chance were the double play combination of the 1910 Chicago Cubs (shortstop, second base, and first base).  Because of their effectiveness on the field these Cubs players were worthy opponents of the old New York Giants, for whom Adams was a fan, and who were the kings of baseball during most of the first fifth of a century of the modern era (1901-1922).  That is, until they were suddenly overtaken by their crosstown rivals, the Yankees, who came to dominate baseball for the next 40 years, beginning with the arrival of Babe Ruth.

The analogy here is that the Cubs infielders, while individuals, didn’t think of their roles as completely separate.  They had common goals and, in order to win on the field, needed to act as a unit.  In the case of executing the double play, they were a very effective unit.  So why do we have these dichotomies in information management when the goals are the same?

Much has been written both academically and commercially about Business Intelligence, Business Analytics, and Knowledge Discovery in Databases.  I’ve surveyed the literature and for good and bad, and what I find is that these terms are thrown around, mostly by commercial firms in either information technology or consulting, all with the purpose of attempting to provide a discriminator for their technology or service.  Many times the concepts are used interchangeably, or one is set up as a strawman to push an agenda or product.  Thus, it seems some hard definitions are in order.

According to Technopedia:

Business Intelligence (BI) is the use of computing technologies for the identification, discovery and analysis of business data – like sales revenue, products, costs and incomes.

Business analytics (BA) refers to all the methods and techniques that are used by an organization to measure performance. Business analytics are made up of statistical methods that can be applied to a specific project, process or product. Business analytics can also be used to evaluate an entire company.

Knowledge Discover in Databases (KDD) is the process of discovering useful knowledge from a collection of data. This widely used data mining technique is a process that includes data preparation and selection, data cleansing, incorporating prior knowledge on data sets and interpreting accurate solutions from the observed results.

As with much of computing in its first phases, these functions were seen to be separate.

The perception of BI, based largely on the manner in which it has been implemented in its first incarnations, is viewed as a means of gathering data into relational data warehouses or data marts and then building out decision support systems.  These methods have usually involved a great deal of overhead in both computing and personnel, since practical elements of gathering, sorting, and delivering data involved additional coding and highly structured user interfaces.  The advantage of BI is its emphasis on integration.  The disadvantage from the enterprise perspective, is that the method and mode of implementation is phlegmatic at best.

BA is BI’s younger cousin.  Applications were developed and sold as “analytical tools” focused on a niche of data within the enterprise’s requirements.  In this manner decision makers could avoid having to wait for the overarching and ponderous BI system to get to their needs, if ever.  This led many companies to knit together specialized tools in so-called “best-of-breed” configurations to achieve some measure of integration across domains.  Of course, given the plethora of innovative tools, much data import and reconciliation has had to be inserted into the process.  Thus, the advantages of BA in the market have been to reward innovation and focus on the needs of the domain subject matter expert (SME).  The disadvantages are the insertion of manual intervention in an automated process due to lack of integration, which is further exacerbated by so-called SMEs in data reconciliation–a form of rent seeking behavior that only rewards body shop consulting, unnecessarily driving up overhead.  The panacea applied to this last disadvantage has been the adoption of non-proprietary XML schemas across entire industries that reduce both the overhead and data silos found in the BA market.

KDD is our both our oldster and youngster–grandpa and the grandson hanging out.  It is a term that describes a necessary function of insight–allowing one to determine what the data tells us are needed for analytics rather than relying on a “canned” solution to determine how to approach a particular set of data.  But it does so, oftentimes, using an older approach that predates BI, known as data mining.  You will often find KDD linked to arguments in favor of flat file schemas, NoSQL (meaning flat non-relational databases), and free use of the term Big Data, which is becoming more meaningless each year that it is used, given Moore’s Law.  The advantage of KDD is that it allows for surveying across datasets to pick up patterns and interrelationships within our systems that are otherwise unknown, particularly given the way in which the human mind can fool itself into reifying an invalid assumption.  The disadvantage, of course, is that KDD will have us go backward in terms of identifying and categorizing data by employing Data Mining, which is an older concept from early in computing in which a team of data scientists and data managers develop solutions to identify, categorize, and use that data–manually doing what automation was designed to do.  Understanding these limitations, companies focused on KDD have developed heuristics (cognitive computing) that identify patterns and possible linkages, removing a portion of the overhead associated with Data Mining.

Keep in mind that you never get anything for nothing–the Second Law of Thermodynamics ensures that energy must be borrowed from somewhere in order to produce something–and its corollaries place limits on expected efficiencies.  While computing itself comes as close to providing us with Maxwell’s Demon as any technology, even in this case entropy is being realized elsewhere (in the software developer and the hardware manufacturing process), even though it is not fully apparent in the observed data processing.

Thus, manual effort must be expended somewhere along the way.  In any sense, all of these methods are addressing the same problem–the conversion of data into information.  It is information that people can consume, understand, place into context, and act upon.

As my colleague Dave Gordon has pointed out to me several times that there are also additional methods that have been developed across all of these methods to make our use of data more effective.  These include more powerful APIs, the aforementioned cognitive computing, and searching based on the anticipated questions of the user as is used by search engines.

Technology, however, is moving very rapidly and so the lines between BI, BA and KDD are becoming blurred.  Fourth generation technology that leverages API libraries to be agnostic to underlying data, and flexible and adaptive UI technology can provide a  comprehensive systemic solution to bring together the goals of these approaches to data. With the ability to leverage internal relational database tools and flat schemas for non-relational databases, the application layer, which is oftentimes a barrier to delivery of information, becomes open as well, putting the SME back in the driver’s seat.  Being able to integrate data across domain silos provide insight into systems behavior and performance not previously available with “canned” applications written to handle and display data a particular way, opening up knowledge discovery in the data.

What this means practically is that those organizations that are sensitive to these changes will understand the practical application of sunk cost when it comes to aging systems being provided by ponderous behemoths that lack agility in their ability to introduce more flexible, less costly, and lower overhead software technologies.  It means that information management can be democratized within the organization among the essential consumers and decision makers.

Productivity and effectiveness are the goals.

Post-Blogging NDIA Blues — The Latest News (Project Management Wonkish)

The National Defense Industrial Association’s Integrated Program Management Division (NDIA IPMD) just had its quarterly meeting here in sunny Orlando where we braved the depths of sub-60 degrees F temperatures to start out each day.

For those not in the know, these meetings are an essential coming together of policy makers, subject matter experts, and private industry practitioners regarding the practical and mundane state-of-the-practice in complex project management, particularly focused on the concerns of the the federal government and the Department of Defense.  The end result of these meetings is to publish white papers and recommendations regarding practice to support continuous process improvement and the practical application of project management practices–allowing for a cross-pollination of commercial and government lessons learned.  This is also the intersection where innovation among the large and small are given an equal vetting and an opportunity to introduce new concepts and solutions.  This is an idealized description, of course, and most of the petty personality conflicts, competition, and self-interest that plagues any group of individuals coming together under a common set of interests also plays out here.  But generally the days are long and the workshops generally produce good products that become the de facto standard of practice in the industry. Furthermore the control that keeps the more ruthless personalities in check is the fact that, while it is a large market, the complex project management community tends to be a relatively small one, which reinforces professionalism.

The “blues” in this case is not so much borne of frustration or disappointment but, instead, from the long and intense days that the sessions offer.  The biggest news from an IT project management and application perspective was twofold. The data stream used by the industry in sharing data in an open systems manner will be simplified.  The other was the announcement that the technology used to communicate will move from XML to JSON.

Human readable formatting to Data-focused formatting.  Under Kendall’s Better Buying Power 3.0 the goal of the Department of Defense (DoD) has been to incorporate better practices from private industry where they can be applied.  I don’t see initiatives for greater efficiency and reduction of duplication going away in the new Administration, regardless of what a new initiative is called.

In case this is news to you, the federal government buys a lot of materials and end items–billions of dollars worth.  Accountability must be put in place to ensure that the money is properly spent to acquire the things being purchased.  Where technology is pushed and where there are no commercial equivalents that can be bought off the shelf, as in the systems purchased by the Department of Defense, there are measures of progress and performance (given that the contract is under a specification) that are submitted to the oversight agency in DoD.  This is a lot of data and to be brutally frank the method and format of delivery has been somewhat chaotic, inefficient, and duplicative.  The Department moved to address this by a somewhat modest requirement of open systems submission of an application-neutral XML file under the standards established by the UN/CEFACT XML organization.  This was called the Integrated Program Management Report (IMPR).  This move garnered some improvement where it has been applied, but contracts are long-term, so incorporating improvements though new contractual requirements tends to take time.  Plus, there is always resistance to change.  The Department is moving to accelerate addressing these inefficiencies in their data streams by eliminating the unnecessary overhead associated with specifications of formatting data for paper forms and dealing with data as, well, data.  Great idea and bravo!  The rub here is that in making the change, the Department has proposed dropping XML as the technology used to transfer data and move to JSON.

XML to JSON. Before I spark another techie argument about the relative merits of each, there are some basics to understand here.  First, XML is a language, JSON is simply data exchange format.  This means that XML is specifically designed to deal with hierarchical and structured data that can be queried and where validation and fidelity checks within the data are inherent in the technology. Furthermore, XML is known to scale while maintaining the integrity of the data, which is intended for use in relational databases.  Furthermore, XML is hard to break.  It is meant for editing and will maintain its structure and integrity afterward.

The counter argument encountered is that JSON is new! and uses fewer characters! (which usually turns out to be inconsequential), and people are talking about it for Big Data and NoSQL! (but this happened after the fact and the reason for shoehorning it this way is discussed below).

So does it matter?  Yes and no.  As a supplier specializing in delivering solutions that normalize and rationalize data across proprietary file structures and leverage database capabilities, I don’t care.  I can adapt quickly and will have a proof-of-concept solution out within 30 days of receiving the schema.

The risk here, which applies to DoD and the industry, is that the decision to go to JSON is made only because it is the shiny new thing used by gamers and social networking developers.  There has also been a move to adapt to other uses because of the history of significant security risks that had been found in Java, so much so that an entire Wikipedia page is devoted to them.  Oracle just killed off Java applets, though Java hangs on.  JSON, of course, isn’t Java, but it was designed from birth as JavaScript Object Notation (hence the acronym JSON), with the purpose of handling relatively small bits of data across web servers in a number of proprietary settings.

To address JSON deficiencies relative to XML, a number of tools have been and are being developed to replicate the fidelity and reliability found in XML.  Whether this is sufficient to be effective against a structured LANGUAGE is to be seen.  Much of the overhead that technies complain about in XML is due to the native functionality related to the power it brings to the table.  No doubt, a bicycle is simpler than a Formula One racer–and this is an apt comparison.  Claiming “simpler” doesn’t pass the “So What?” test knowing the business processes involved.  The technology needs to be fit to the solution.  The purpose of data transmission using APIs is not only to make it easy to produce but for it to–you know–achieve the goals of normalization and rationalization so that it can be used on the receiving end which is where the consumer (which we usually consider to be the customer) sits.

At the end of the day the ability to scale and handle hierarchical, structured data will rely on the quality and strength of the schema and the tools that are published to enforce its fidelity and compliance.  Otherwise consuming organizations will be receiving a dozen different proprietary JSON files, and that does not address the present chaos but simply adds to it.  These issues were aired out during the meeting and it seems that everyone is aware of the risks and that they can be addressed.  Furthermore, as the schema is socialized across solutions providers, it will be apparent early if the technology will be able handle the project performance data resulting from the development of a high performance aircraft or a U.S. Navy destroyer.