I’ve had a lot of discussions lately on data normalization, including being asked the question of what constitutes normalization when dealing with legacy data, specifically in the field of project management. A good primer can be found at About.com, but there are also very good older papers out on the web from various university IS departments. The basic principals of data normalization today consist of finding a common location in the database for each value, reducing redundancy, properly establishing relationships among the data elements, and providing flexibility so that the data can be properly retrieved and further processed into intelligence in such as way as the objects produced possess significance.
The reason why answering this question is so important is because our legacy data is of such a size and of such complexity that it falls into the broad category of Big Data. The condition of the data itself provides wide variations in terms of quality and completeness. Without understanding the context, interrelationships, and significance of the elements of the data, the empirical approach to project management is threatened, since being able to use this data for purposes of establishing trends and parametric analysis is limited.
A good paper that deals with this issue was authored by Alleman and Coonce, though it was limited to Earned Value Management (EVM). I would argue that EVM, especially in the types of industries in which the discipline is used, is pretty well structured already. The challenge is in the other areas that are probably of more significance in getting a fuller understanding of what is happening in the project. These areas of schedule, risk, and technical performance measures.
In looking at the Big Data that has been normalized to date–and I have participated with others in putting a significant dent in this area–it is apparent that processes in these other areas lack discipline, consistency, completeness, and veracity. By normalizing data in sub-specialties that have experienced an erosion in enforcing standards of quality and consistency, technology becomes a driver for process improvement.
A greybeard in IT project management once said to me (and I am not long in joining that category): “Data is like water, the more it flows downstream the cleaner it becomes.” What he meant is that the more that data is exposed in the organizational stream, the more it is questioned and becomes a part of our closed feedback loop: constantly being queried, verified, utilized in decision making, and validated against reality. Over time more sophisticated and reliable statistical methods can be applied to the data, especially if we are talking about performance data of one sort or another, that takes periodic volatility into account in trending and provides us with a means for ensuring credibility in using the data.
In my last post on Four Trends in Project Management, I posited that the question wasn’t more or less data but utilization of data in a more effective manner, and identifying what is significant and therefore “better” data. I recently heard this line repeated back to me as a means of arguing against providing data. This conclusion was a misreading of what I was proposing. One level of reporting data in today’s environment is no more work than reporting on any other particular level of a project hierarchy. So cost is no longer a valid point for objecting to data submission (unless, of course, the one taking that position must admit to the deficiencies in their IT systems or the unreliability of their data).
Our projects must be measured against the framing assumptions in which they were first formed, as well as the established measures of effectiveness, measures of performance, and measures of technical achievement. In order to view these factors one must have access to data originating from a variety of artifacts: the Integrated Master Schedule, the Schedule and Cost Risk Analysis, and the systems engineering/technical performance plan. I would propose that project financial execution metrics are also essential in getting a complete, integrated, view of our projects.
There may be other supplemental data that is necessary as well. For example, the NDIA Integrated Program Management Division has a proposed revision to what is known as the Integrated Baseline Review (IBR). For the uninitiated, this is a process in which both the supplier and government customer project teams can come together, review the essential project artifacts that underlie project planning and execution, and gain a full understanding of the project baseline. The reporting systems that identify the data that is to be reported against the baseline are identified and verified at this review. But there are also artifacts submitted here that contain data that is relevant to the project and worthy of continuing assessment, precluding manual assessments and reviews down the line.
We don’t yet know the answer to these data issues and won’t until all of the data is normalized and analyzed. Then the wheat from the chaff can be separated and a more precise set of data be identified for submittal, normalized and placed in an analytical framework to give us more precise information that is timely so that project stakeholders can make decisions in handling any risks that manifest themselves during the window that they can be handled (or make the determination that they cannot be handled). As the farmer says in the Chinese proverb: “We shall see.”