Innervisions: The Connection Between Data and Organizational Vision

During my day job I provide a number of fairly large customers with support to determine their needs for software that meets the criteria from my last post. That is, I provide software that takes an open data systems approach to data transformation and integration. My team and I deliver this capability with an open user interface based on Windows and .NET components augmented by time-phased and data management functionality that puts SMEs back in the driver’s seat of what they need in terms of analysis and data visualization. In virtually all cases our technology obviates the need for the extensive, time consuming, and costly services of a data scientist or software developer.

Over the course of my career both as a consumer and a provider of technology solutions, I have seen an evolution in software that began with simple point solutions being developed to automate particular manual processes, to more sophisticated solutions that are designed to automate a complex function. In most of these cases, a customer has identified a gap or deficiency in their requirements that represents an inefficiency or sub-optimization of their processes and then seek a software “tool” to acquire in order to address that specific purpose. The application of these “tools” combine to meet the overall vision of the organization or sub-system within the organization.

What Do You Do With A Problem Like “Tools”

The capabilities of software in terms of data handling capabilities and functionality double every 12-18 months in today’s environment. The use of the term “tools” for software, which is really based on a pre-2000 concept, is that in the mind’s eye software is analogous to any other tool. In the literature, particularly in that authored by consultants, this analogy is oftentimes extended to common household or construction tools: a wrench, a screwdriver, or a power drill. Under this concept each tool has a specific purpose and it is up to the SME to determine which tool is best for a specific job.

The problem with this concept is that not only is it obsolete, but it does great harm financially to the organization in terms of overhead costs, organizational efficiency and effectiveness.

First of all, most physical tools are fairly static in their specific use. A hammer is still a hammer, even if some sort of power is extended to give it power. It’s purpose remains to use force to insert a connective fastener, like a nail, into a medium, like a piece of wood. A nail gun, for instance, is a type of hammer. It is more powerful and efficient but, still, it is a glorified hammer. It is a superior tool in construction because it is more efficient, provides a consistency in quality, and is faster. It also eliminates the factors of arm strength, physical coordination, and visual alignment skills of the user; as anyone who has experienced a sore thumb as a result of a misaligned strike can attest. But a nail gun is still restricted to its specific function–sinking nails for the purpose of fastening.

Software, as it has evolved, was similarly based on the concept of a tool. The physical functions of a specific vocation were the first to undergo digitization: accountants and business operations personnel had spreadsheet software applications, secretarial and clerical staffs (yes, they used to exist) had word processing software, marketing and middle management could relay their ideas with presentation software, and the list went on.

As the power of software improved it followed the functions of traditional line-and-staff organizations. Many of these were built to replace the physical calculation of formulae and concepts that required a slide rule and, later, a scientific calculator. Soon scheduling software replaced manual GANTT planning, earned value software automated the calculation of basic EVM analytics, and risk software allowed for the complex formulation involved in assessing risk for the branch of a plan using simulated Monte Carlo analysis.

Each of these software applications targeted a specific occupation, and incorporated specific knowledge (functionality) required of that occupation.

Organizational software for multiple functions usually consisted of a suite of tools under the rubric of an ERP or Business Intelligence System. Modules and “bolt-ons” consisted of tying together business processes and point software requirements augmented by large software consulting staffs to customize the solutions. In actual practice, however, these were software tools tied together though a common brand and operating environment. Oftentimes the individual bolt-ons and tools weren’t even authored by the same development team with a common vision in mind, but a reaction to market forces that required a gap be filled through acquisition of a company or intellectual property.

Needless to say, these “enterprise” solutions aren’t that at all. Instead, they are a business-driven means to penetrate a vertical by providing scattershot functionality. Once inside a company or organization the other bolt-ons and modules are marketed in order to take over other business processes. Integration is achieved across domains through data transfer or other interpretive methods.

This approach has been successful, as it has been since the halcyon days when IBM dominated the computing market, especially among the larger software firms. It also meets many of the emotional and psychic needs of many senior managers. After all, the software firm–given its economic size–feels solid. The numbers of specialists introduced into the organization to augment staff provide a feeling of safety and accomplishment. C-level management and stockholders feel that risk is handled given that their software needs are being met at some level.

What this approach did not, and does not, meet is genuine data integration, especially given the realization that the data we have been using has been inadequate and artificially restricted based on what software providers were convincing their customers was the art of the possible. The term “Big Data” began to be introduced into the lexicon, and with it the economic realization that capturing and integrating datasets that were previously “impossible” to capture and integrate was (and presently is) an economic imperative.

But the approach of incumbents, whose priority is to remain “sticky” and to defend territory against new technologies, was to respond: “we have a tool for that.” Thus, the result has been the further introduction of inefficient individual applications with their inability to fully exploit data. Among these tools are largely “dumb”–that is, viewing data flat–data visualization tools that essentially paint pretty pictures from Excel or, when they need to be applied on a larger scale, default to the old business intelligence brute force approach of applying labor to derive the importance in data. Old habits are hard to change and what one person has done another can do. But this is the economic equivalent of what is called rent-seeking behavior. That is, it is inefficient and exploitative.

After all, if you buy what was advertised as a sports car you expect to see an engine under the hood and a transmission connected to a drivetrain and a pretty powerful one at that. What one does not expect is to buy the car but have to design and build the features of these essential systems while a team of individuals are paid by the hour to push us to where we want to go. Yet, organizations (and especially consultants) seem to be happy with this model when it comes to information management.

Thus, when a technology company like mine comes across a request for proposal, an informal invitation to participate in market research, or in exploratory professional meetings (largely virtual as of this writing), the emphasis and terminology is on software “tools”, which limits the ability of consumers to exploit technology because it mentally paints a picture that limits the definition of what software should do and can do.

This mindset, however, is beginning to change and, no doubt, our current predicament under the Coronavirus crisis will accelerate that transition.

To take our analogy one step further, we are long past the time when we must buy each component of an automobile individually and then assemble it in our own garage. Point solutions, which are set and inelastic, are like individual parts of the car.

Enterprise solutions consisting of different modules and datasets, oftentimes constructed from incompatible foundations, exacerbate this situation and add the element of labor to a supposedly automated process, like buying OEM products and having to upgrade the automobile we supposed bought to do its job, but still needed (with the help of a mechanic) to perform the normal functions of steering, stopping, and accelerating.

Open systems solutions provide more flexibility, but they can be both a blessing and a curse. The challenge is to provide the right balance of out-of-the-box point solution-type functionality while still providing enough flexibility for adaptability. Taking a common data approach is key to achieving this balance. This will require the abandonment of the concept of software “tools” and shifting the focus on data.

Data and Information Take Over: Two Models

The economic imperative for data integration and optimization developed from the needs of the organization and its practitioners–whether it be managers, analysts, or auditors working in a company, a business unit, a governmental agency, or a program or project organization–is to be positioned facing forward.

In order to face forward one must first establish a knowledge-based organization or, as oftentimes identified, a data-driven organization. What this means in real terms is that data is captured, processed, and contextualized so that its importance and meaning can be derived in a timely manner so that something can be done about what is happening. During our own present situation this is not just an economic imperative, but for public health an existential one for many of us.

Thus, we are faced with several key dimensions that must be addressed: size, manner of integration, contextualization, timeliness, and target. This applies to both known and unknown datasets.

Our known datasets are those that are already being used and populated in existing systems. We know, for example, that in program and project management that we require an estimate and plan, a schedule, a manner of organizing and tracking our progress, financial management and material management systems and others. These represent our pool of structured data, and understanding the lexicon of these systems is what is necessary to normalize and rationalize the data through a universal translator.

Our unknown datasets are those that require collection but, when done, is collected and processed in an ad hoc manner. Usually the need for this data collection is learned through the school of hard knocks. In other cases, the information is not collected at all or accidentally, such as when management relies on outside experts and anecdotal information. This is the equivalent of an organizational JOHARI window shown below.

Overview of Johari Window with quadrants
showing the relationships of self-knowledge and understanding

The Johari Window explains our perceptions and our relationship to the outside world. Our universe is not a construction of our own making or imagination. We cannot make our own reality nor are there “alternative facts.” The most colorful example of refuting this specious philosophical mind game is relayed to us in Boswell’s Life of Samuel Johnson.

After we came out of the church, we stood talking for some time together of Bishop Berkeley’s ingenious sophistry to prove the nonexistence of matter, and that every thing in the universe is merely ideal. I observed, that though we are satisfied his doctrine is not true, it is impossible to refute it. I never shall forget the alacrity with which Johnson answered, striking his foot with mighty force against a large stone, till he rebounded from it — “I refute it thus.”

We can deny what we do not know, or construct magical thinking. but reality is unmoved. In the case of Johnson he kicked the stone and the stone, also unmoved, kicked back in the form of the pain that Johnson felt when he “rebounded from it”.

Nor are the quadrants equal in our perceptual windows. Some people and organizations are very well informed and others less so, but the tension and conflict of our lives–both internally and externally–relates to expanding the “open” and “facade” portions of the Johari window so that we are not only informed of how others register us, but also to uncover the unknown, and to attempt to control how others perceive us in our various roles and guises.

We see this playing out in tracking the current Coronavirus pandemic. The absence of reliable widespread tests and testing infrastructure has impeded an understanding of the virus and the most effective strategies to deploy in dealing with it. Absent data, health and governmental agencies have been left with no choice but to use the same social distancing and travel restrictions deployed during the 1918 Influenza Pandemic and then, if lifting some of these, hope for the best.

This is the situation despite the fact that national risk assessments and risk registers, such as the U.S. National Security Council Pandemic Playbook and the U.K. National Risk Register, outlined measures to be taken given certain particular indicators. No doubt there are lessons to be learned here, but at the core lesson is the fact that, absent reliable and timely data that is converted into information that can be used in a decisive and practical manner, an organization, a state, or a nation risks its survival when it fails to imagine what information it needs to collect, absent the prosaic information that comes from performing the day-to-day routine.

Admittedly, there is no great insight here regarding this need (or, at least, there shouldn’t be). This condition is the reason why intelligence systems and agencies were created in the first place. It is why military and health services imagine scenarios and war-game them, and why organizations deploy brain-storming. Individuals and organizations that go into the world uninformed or self-deluded do not last long, and history is replete with such examples. Blanche DuBois relied on the kindness of strangers and we are best served by her experience as an archetype.

And yet, we still find ourselves struggling to properly collect, integrate, and utilize information at the same time that we have come to the realization that we need to collect and process information from larger pools of data. The root cause of this condition, as asserted above, rests in the mental framing of how to approach data and the problem that needs to be solved. It requires us to change the conceptual framework that relies on the concept of “tools.”

We can make this adjustment by realigning the object of the challenge so that it conforms with what we imagine to be the desired end-state. But, still, how do we determine what we need to collect? This is first a question of perception as opposed to one regarding knowledge: what one views as not only necessary but within the realm of possibility.

Once again, this dilemma is best served by models and, in this case, it is not unlike the Overton Window. Those preferring to eschew Wikipedia entries can also find a more detailed and nuanced definition at the source through the Mackinac Center for Public Policy website.

Overton Windows showing degrees of acceptability as modified by Joshua Trevino

Joseph Overton described the window as one of defining acceptable political policies in the mind of the public. He used the terms “more free” and “less free” to describe policies that think tanks recommend to describe the amount of government intervention, avoiding the left-right comparisons used by polemicists. Various adjustments and variations to the basic window have been proposed since his original use of the model, but it has been expanded to describe public perceptions in general on a host of socioeconomic concerns.

As with the Johari Window, I would posit that there is an analogous Overton Window in relation to information that frames what is viewed as the art of the possible. These perceptions influence the actions of decision-makers in assessing the risk involved in buying software solutions. When it comes to the rapidly developing field of data capture, transformation, and effective utilization, the perception from the start suggests some degree of risk and the danger of moving too quickly. For those in the field of data optimization, given that new technology capacity increases exponentially in shorter periods of time, the barrier here is to shift the informational Overton Window so that the market is educated on the risk-reward equation.

A Unified Model for Aligning Our Data

We have discussed two models up to this point in our exploration: an Informational Johari Window and an Informational Overton Window. Each of these models, using a simplified method, isolates different dimensions of the problem of data, which when freed of the concept of “tools” unlocking it, provides us with a clearer picture of the essential nature of its capture and utilization, and to what purposes.

We are now ready to take the next step in defining how to approach data to serve the strategic interests of the enterprise or organization.

For those of us in the information field, especially in the early years when applying solutions to line-and-staff organizations, what we found is that the very introduction of the new technology changed both the structure and nature of the organization. Initially we noted a sophisticated and accelerated version of the Hawthorne Effect. But there was something more elemental and significant going on.

Digital technology is amazingly attuned, especially when properly designed and deployed, to extend the functions of human knowledge gathering and processing. In this way it can be interpreted as an extension of human evolution–of the nature of human society acting as a complex adaptive system. In fact, there are so many connections between early physical, methodological, and industrial societal developments to digitization, such as the connection between the development of the Jacquard Loom to the development of the computer punch card (and there are others) that it seems that human society would have found a way to get to this point regardless of the existence of the intervening human pioneers, though their actual contributions are clear. (For further information on the waves of development see the books Future Shock and The Third Wave by Alvin Toffler.)

When many of us first applied digitized technology to knowledge workers (in my case in the field of contract management) we found that the very introduction of the technology changed perceptions, work habits, and organizational structures in very essential ways. Like the effect of the idea of evolution as described by Daniel Dennett, the application of digital evolution is like a universal acid–it eats through and transforms everything it touches.

For example, a report that, in the past, would have taken a week or two to complete, mostly because of the research required, now took a day or so. Procurement Action Lead Times (PALT) realized significant improvements since information previously only available in paper form was now provided on-line. At the same time, systems were now able to handle greater volumes of demand. As a result, customers’ expectations changed so much that they no longer felt that they had to hold back requests for fear of overloading the system and depend on human intervention. Suppliers, seeing many commodities experiencing steady and stable growth, reverted to just-in-time manufacturing.

Over time, typing pools and secretarial staffs, the former being commonplace well into the 1980s and the latter into the 1990s, except as symbols of privilege or prestige, disappeared. Middle management and many support staffs followed this trend in the early 2000s. Today, consulting services consisting of staffing personnel to apply non-value added manual solutions such as Excel spreadsheets and PowerPoint slides to display data that has already been captured and processed, still manage to hold on in isolated pockets. That this model is not sustainable nor efficient should be obvious except for the continued support these models lend to the self-serving concept of “tools.”

Thus, the next step in the alignment of data capture and utilization to organizational vision is the interplay between our models. Practical experience suggests, though anecdotal, that as forward-facing organizations adopt more powerful digitized technologies designed to capture more and larger datasets, and to better utilize that data, that they tend to move to expand their self-awareness–their Informational Johari Window.

This, in turn, allows them to distinguish between structured and unstructured data and the value–the qualitative information content–of these datasets. This knowledge is then applied to reduce the labor and custom code required for larger data capture and utilization. In the end, these developments then determine what is the art of the possible by moving and expanding the Informational Overton Window.

Combining these concepts from a data perspective results in a combined model as illustrated below from the perspective of the subject:

Data Window of Perception and Possibility (Subject)

Extending this concept to the external subject (object or others) results in the following:

Data Window of Perception and Possibility (Object or Others)

This simplistic model describes several ways of looking at the problem of data and how to align it with its use to serve our purposes. When we gather data from the world the result can be symmetrical or asymmetrical. That is, each of us does not have the capacity to collect the same data that may be relevant to our existence or the survival of our organizations or institutions.

This same concept of symmetry and asymmetry applies to our ability to process data into information and–further–to properly apply information to when it will contribute to a decisive outcome in terms of knowledge, understanding, insight, or action.

As with the psychological Johari Window, our model takes it account the unknown within the much larger data space. Think of our Big Blue Ball (which is not so big) within the context of space. All of space represents the data of the universe. We are finding that the secrets of vast space-time are found in quanta as well in the observations of large and distant celestial events and objects. Data is everywhere. Yet, we can perceive only a small part of the universe. That is why our Data Window does not encompass the entire data space.

The quadrants, of course, are rarely co-equal, but for purposes of simplicity they are shown as such. As with the psychological Johari Window of self-awareness, the tension and conflict within the individual and its relationship with the external world is in the adjustment of the sizes of the quadrants that, hopefully, tend toward more self-awareness and openness. From the perspective of data, the equivalent is toward the expansion of the physical expansion of the Data Window while the quadrants within the window expand to minimize asymmetry of external knowledge and the unknown.

The physical limitations of symmetry, asymmetry, and the unknown portions of the data space is further limited by our perceptions. Our understanding of what is possible, acceptable, sensible, radical, unthinkable, and impossible is influenced by these perceptions. Those areas of information management that fall within some mean or midpoint of the limitations of our perceptions represent current practice and which, as with the original Johari Window, I label as “policy,” though a viable alternative label would be “practice.”

Note that there perceptions vary by the position of the subject. In the case of our own perceptions, as for those reading this post, the first variation of the model is aligned vertically. For the case of the perceptions of others, which are important in understanding their position when advocating a particular course of action, the perception model is aligned horizontally across the quadrants.

The interplay of the quadrants within the Data Window directly affect how we perceive the use of data and its potential. Thus, I have labeled the no-man’s-land portion that pushes into areas that are unknown to the subject and external object is labeled as “The Frontier.”

To an American a “frontier” is an unexplored country while, historically, in the Old World a “frontier” is a border. The former promises not only risk, but, also opportunity and invites exploration. The latter is a limitation. No doubt, my use of the term is culturally biased to the first definition.

Intellectually and physically, as we enter the frontier and learn what secrets await us there, we learn. For data we may first see a Repository of Babel and deal with it as if it were flat. But, given enough exploration we will learn its lexicon and underlying structure and, eventually, learn how to process it into information and harness its content. This, in turn, will influence the size of the Data Window, the relative sizes of the quadrants, and our perceptions of the art of the possible.

Conception to Application

This model, I believe, is a useful antecedent concept in approaching and making comprehensible what is often called Big Data. The model also helps us be more precise in how we perceive and define the term as technology changes, given that exponential increases in hardware storage and processing capabilities expand our Data Window.

Furthermore, understanding the interplay of how wee approach data, and the consequences of our perceptions of it, allow us to weigh the risk when looking at new technologies and the characteristics they need to possess in order to meet organizational goals and vision. The initial bias, as noted by Paul Kahneman in his book Thinking, Fast and Slow, is for people to stick with the status quo or the familiar–the devil they know–in lieu of something new and innovative, even when the advantages of adoption of the new innovation are clearly obvious. It requires a reorientation of thinking to allow the acceptance of the new.

Our familiar patterns when thinking about information is to look for solutions that are “tools.” The new, unfamiliar concept that we find challenging is the understanding that we do not know what we do not know when it come to data and its potential–that we must push into the frontier in order to do so–and doing so will require not only new technology that is oriented toward the optimization of data, its processing from information to knowledge, and its use, but also a new way of thinking about it and how it will align with our organizational strategy.

This can only be done by first starting with a benchmark–to practically take stock–of where we individually as organizations and where we need to be in terms of understanding our mission or purpose. For project controls and project management there is no area more at odds with this alignment.

Recently, Dave Gordon in his blog The Practicing IT Project Manager argued why project managers needed to align their projects with organizational strategy. He noted that in 2015, during the development of the “Talent Triangle” that the Project Management Institute found that a major deficiency noted by organizations was that project managers needed to take an active role in aligning their projects with organizational strategy.

As I previously noted, there are a number of project management tools on the market today and a number of data visualization tools. Yet, there are significant gaps not only in the capture, quality, and processing of data, but also in the articulation of a consistent data strategy that aligns with the project organization and the overarching organization’s business strategy, goals, and priorities.

For example, in government, program managers spend a large portion of the year defending their programs to show that they are effectively and efficiently overseeing the expenditure of resources: that they are “executing program.” Failure to execute program will result in a budget mark, or worse, result in a re-baseline, or possible restructuring or cancellation. Projected production may be scaled back in favor of more immediate priorities.

Yet, none of our so-called “tools” fully capture program execution as it is defined by agencies and Congress. We have performance management tools, earned value tools, and the list can go on. A typical program manager in government spends almost five months assessing and managing program execution, and defending program and only a few minutes each month reviewing performance. This fact alone should be indicative that our priorities are misaligned.

The intersection of organizational alignment and program management in this case is related to resource utilization and program execution. No doubt, project controls and performance management contribute to our understanding of program execution, but they are removed from informing both the program manager and the organization in a comprehensive manner about execution, risk, and opportunity–and whether those elements conflict with or align with the agency’s goals. They are even further removed from an understanding of decisions related to program execution on the interrelationships across spectrum of the project and program portfolio.

The reason for this condition is that the data is currently not being captured and processed in a comprehensive manner to be positioned for its effective exploitation and utilization in meeting the needs of the various levels of the organization, nor does the perception of the specific data needed align with organizational needs.

Correspondingly, in construction and upstream oil and gas, project managers and stakeholders are most concerned with scope, timeliness, and the inevitable questions of claims–especially the avoidance or equitable settlement of the last.

As with government, our data strategy must align with our organizational goals and vision from the perspective of all stakeholders in the effort. At the heart of this alignment is data and those technologies “fitted” to exploit it and align it with our needs.

Sledgehammer: Pisano Talks!

My blogging hiatus is coming to an end as I take a sledgehammer to the writer’s block wall.

I’ve traveled far and wide over the last six months to various venues across the country and have collected a number of new and interesting perspectives on the issues of data transformation, integrated project management, and business analytics and visualization. As a result, I have developed some very strong opinions regarding the trends that work and those that don’t regarding these topics and will be sharing these perspectives (with the appropriate supporting documentation per usual) in following posts.

To get things started this post will be relatively brief.

First, I will be speaking along with co-presenter John Collins, who is a Senior Acquisition Specialist at the Navy Engineering & Logistics Office, at the Integrated Program Management Workshop at the Hyatt Regency in beautiful downtown Baltimore’s Inner Harbor 10-12 December. So come on down! (or over) and give us a listen.

The topic is “Unlocking Data to Improve National Defense Systems”. Today anyone can put together pretty visualizations of data from Excel spreadsheets and other sources–and some have made quite a bit of money doing so. But accessing the right data at the right level of detail, transforming it so that its information content can be exploited, and contextualizing it properly through integration will provide the most value to organizations.

Furthermore, our presentation will make a linkage to what data is necessary to national defense systems in constructing the necessary artifacts to support the Department of Defense’s Planning, Programming, Budgeting and Execution (PPBE) process and what eventually becomes the Future Years Defense Program (FYDP).

Traditionally information capture and reporting has been framed as a question of oversight, reporting, and regulation related to contract management, capital investment cost control, and DoD R&D and acquisition program management. But organizations that fail to leverage the new powerful technologies that double processing and data storage capability every 18 months, allowing for both the depth and breadth of data to expand exponentially, are setting themselves up to fail. In national defense, this is a condition that cannot be allowed to occur.

If DoD doesn’t collect this information, which we know from the reports of cybersecurity agencies that other state actors are collecting, we will be at a serious strategic disadvantage. We are in a new frontier of knowledge discovery in data. Our analysts and program managers think they know what they need to be viewing, but adding new perspectives through integration provide new perspectives and, as a result, will result in new indicators and predictive analytics that will, no doubt, overtake current practice. Furthermore, that information can now be processed and contribute more, timely, and better intelligence to the process of strategic and operational planning.

The presentation will be somewhat wonky and directed at policymakers and decisionmakers in both government and industry. But anyone can play, and that is the cool aspect of our community. The presentation will be non-commercial, despite my day job–a line I haven’t crossed up to this point in this blog, but in this latter case will be changing to some extent.

Back in early 2018 I became the sole proprietor of SNA Software LLC–an industry technology leader in data transformation–particularly in capturing datasets that traditionally have been referred to as “Big Data”–and a hybrid point solution that is built on an open business intelligence framework. Our approach leverages the advantages of COTS (delivering the 80% solution out of the box) with open business intelligence that allows for rapid configuration to adapt the solution to an organization’s needs and culture. Combined with COTS data capture and transformation software–the key to transforming data into information and then combining it to provide intelligence at the right time and to the right place–the latency in access to trusted intelligence is reduced significantly.

Along these lines, I have developed some very specific opinions about how to achieve this transformation–and have put those concepts into practice through SNA and delivered those solutions to our customers. Thus, the result has been to reduce both the effort and time to capture large datasets from data that originates in pre-processed data, and to eliminate direct labor and the duration to information delivery by more than 99%. The path to get there is not to apply an army of data scientists and data analysts that deals with all data as if it is flat and to reinvent the wheel–only to deliver a suboptimized solution sometime in the future after unnecessarily expending time and resources. This is a devolution to the same labor-intensive business intelligence approaches that we used back in the 1980s and 1990s. The answer is not to throw labor at data that already has its meaning embedded into its information content. The answer is to apply smarts through technology, and that’s what we do.

Further along these lines, if you are using hard-coded point solutions (also called purpose-built software) and knitted best-of-breed, chances are that you will find that you are poorly positioned to exploit new technology and will be obsolete within the next five years, if not sooner. The model of selling COTS solutions and walking away except for traditional maintenance and support is dying. The new paradigm will be to be part of the solution and that requires domain knowledge that translates into technology delivery.

More on these points in future posts, but I’ve placed the stake in the ground and we’ll see how they hold up to critique and comment.

Finally, I recently became aware of an extremely informative and cutting-edge website that includes podcasts from thought leaders in the area of integrated program management. It is entitled InnovateIPM and is operated and moderated by a gentleman named Rob Williams. He is a domain expert in project cost development, with over 20 years of experience in the oil, gas, and petrochemical industries. Robin has served in a variety of roles throughout his career and is now focuses on cost estimating and Front-End Loading quality assurance. His current role is advanced project cost estimator at Marathon Petroleum’s Galveston Bay Refinery in Texas City.

Rob was also nice enough to continue a discussion we started at a project controls symposium and interviewed me for a podcast. I’ll post additional information once it is posted.

Both Sides Now — The Value of Data Exploration

Over the last several months I have authored a number of stillborn articles that just did not live up to the standards that I set for this blog site. After all, sometimes we just have nothing important to add to the conversation. In a world dominated by narcissism, it is not necessary to constantly have something to say. Some reflection and consideration are necessary, especially if one is to be as succinct as possible.

A quote ascribed to Woodrow Wilson, which may be apocryphal, though it does appear in two of his biographies, was in response to being lauded by someone for making a number of short, succinct, and informative speeches. When asked how he was able to do this, President Wilson is supposed to have replied:

“It depends. If I am to speak ten minutes, I need a week for preparation; if fifteen minutes, three days; if half an hour, two days; if an hour, I am ready now.”

An undisciplined mind has a lot to say about nothing in particular with varying degrees of fidelity to fact or truth. When in normal conversation we most often free ourselves from the discipline expected for more rigorous thinking. This is not necessarily a bad thing if we are saying nothing of consequence and there are gradations, of course. Even the most disciplined mind gets things wrong. We all need editors and fact checkers.

While I am pulling forth possibly apocryphal quotes, the one most applicable that comes to mind is the comment by Hemingway as told by his deckhand in Key West and Cuba, Arnold Samuelson. Hemingway was supposed to have given this advice to the aspiring writer:

“Don’t get discouraged because there’s a lot of mechanical work to writing. There is, and you can’t get out of it. I rewrote the first part of A Farewell to Arms at least fifty times. You’ve got to work it over. The first draft of anything is shit. When you first start to write you get all the kick and the reader gets none, but after you learn to work it’s your object to convey everything to the reader so that he remembers it not as a story he had read but something that happened to himself.”

Though it deals with fiction, Hemingway’s advice applies to any sort of writing and rhetoric. Dr. Roger Spiller, who more than anyone mentored me as a writer and historian, once told me, “Writing is one of those skills that, with greater knowledge, becomes harder rather than easier.”

As a result of some reflection, over the last few months, I had to revisit the reason for the blog. Thus, this is still its purpose: it is a way to validate ideas and hypotheses with other professionals and interested amateurs in my areas of interest. I try to keep uninformed opinion in check, as all too many blogs turn out to be rants. Thus, a great deal of research goes into each of these posts, most from primary sources and from interactions with practitioners in the field. Opinions and conclusions are my own, and my reasoning for good or bad are exposed for all the world to see and I take responsibility for them.

This being said, part of my recent silence has also been due to my workload in–well–the effort involved in my day job of running a technology company, and in my recent role, since late last summer, as the Managing Editor of the College of Performance Management’s publication known as the Measurable News. Our emphasis in the latter case has been to find new contributions to the literature regarding business analytics and to define the concept of integrated project, program, and portfolio management. Stepping slightly over the line to make a pitch, I recommend anyone interested in contributing to the publication to submit an article. The submission guidelines can be found here.

Both Sides Now: New Perspectives

That out of the way, I recently saw, again on the small screen, the largely underrated movie about Neil Armstrong and the Apollo 11 moon landing, “First Man”, and was struck by this scene:

Unfortunately, the first part of the interview has been edited out of this clip and I cannot find a full scene. When asked “why space” he prefaces his comments by stating that the atmosphere of the earth seems to be so large from the perspective of looking at it from the ground but that, having touched the edge of space previously in his experience as a test pilot of the X15, he learned that it is actually very thin. He then goes on to posit that looking at the earth from space will give us a new perspective. His conclusion to this observation is then provided in the clip.

Armstrong’s words were prophetic in that the space program provided a new perspective and a new way of looking at things that were in front of us the whole time. Our spaceship Earth is a blue dot in a sea of space and, at least for a time, the people of our planet came to understand both our loneliness in space and our interdependence.

Earth from Apollo 8. Photo courtesy of NASA.

 

The impact of the Apollo program resulted in great strides being made in environmental and planetary sciences, geology, cosmology, biology, meteorology, and in day-to-day technology. The immediate effect was to inspire the environmental and human rights movements, among others. All of these advances taken together represent a new revolution in thought equal to that during the initial Enlightenment, one that is not yet finished despite the headwinds of reaction and recidivism.

It’s Life’s Illusions I Recall: Epistemology–Looking at and Engaging with the World

In his book Darwin’s Dangerous Idea, Daniel Dennett posited that what was “dangerous” about Darwinism is that it acts as a “universal acid” that, when touching other concepts and traditions, transforms them in ways that change our world-view. I have accepted this position by Dennett through the convincing argument he makes and the evidence in front of us, and it is true that Darwinism–the insight in the evolution of species over time through natural selection–has transformed our perspective of the world and left the old ways of looking at things both reconstructed and unrecognizable.

In his work, Time’s Arrow, Time’s Cycle, Stephen Jay Gould noted that Darwinism is part of one of the three great reconstructions of human thought that, in quoting Sigmund Freud, where “Humanity…has had to endure from the hand of science…outrages upon its naive self-love.” These outrages include the Copernican revolution that removed the Earth from the center of the universe, Darwinism and the origin of species, including the descent of humanity, and what John McPhee, coined as the concept of “deep time.”

But–and there is a “but”–I would propose that Darwinism and the other great reconstructions noted are but different ingredients of a larger and more broader, though compatible, type of innovation in the way the world is viewed and how it is approached–a more powerful universal acid. That innovation in thought is empiricism.

It is this approach to understanding that eats through the many ills of human existence that lead to self-delusion and folly. Though you may not know it, if you are in the field of information technology or any of the sciences, you are part of this way of viewing and interacting with the world. Married with rational thinking, this epistemology–coming from the perspectives of the astronomical observations of planets and other heavenly bodies by Charles Sanders Peirce, with further refinements by William James and John Dewey, and others have come down to us in what is known as Pragmatism. (Note that the word pragmatism in this context is not the same as the more generally used colloquial form of the word. For this type of reason Peirce preferred the term “pragmaticism”). For an interesting and popular reading of the development of modern thought and the development of Pragmatism written for the general reader I highly recommend the Pulitzer Prize-winning The Metaphysical Club by Louis Menand.

At the core of this form of empiricism is that the collection of data, that is, recording, observing, and documenting the universe and nature as it is will lead us to an understanding of things that we otherwise would not see. In our more mundane systems, such as business systems and organized efforts applying disciplined project and program management techniques and methods, we also can learn more about these complex adaptive systems through the enhanced collection and translation of data.

I Really Don’t Know Clouds At All: Data, Information, Intelligence, and Knowledge

The term “knowledge discovery in data”, or KDD for short, is an aspirational goal and so, in terms of understanding that goal, is a point of departure from the practice information management and science. I’m taking this stance because the technology industry uses terminology that, as with most language, was originally designed to accurately describe a specific phenomenon or set of methods in order to advance knowledge, only to find that that terminology has been watered down to the point where it obfuscates the issues at hand.

As I traveled to locations across the U.S. over the last three months, I found general agreement among IT professionals who are dealing with the issues of “Big Data”, data integration, and the aforementioned KDD of this state of affairs. In almost every case there is hesitation to use this terminology because it has been absconded and abused by mainstream literature, much as physicists rail against the misuse of the concept of relativity by non-scientific domains.

The impact of this confusion in terminology has caused organizations to make decisions where this terminology is employed to describe a nebulous end-state, without the initiators having an idea of the effort or scope. The danger here, of course, is that for every small innovative company out there, there is also a potential Theranos (probably several). For an in-depth understanding of the psychology and double-speak that has infiltrated our industry I highly recommend the HBO documentary, “The Inventor: Out for Blood in Silicon Valley.”

The reason why semantics are important (as they always have been despite the fact that you may have had an associate complain about “only semantics”) is that they describe the world in front of us. If we cloud the meanings of words and the use of language, it undermines the basis of common understanding and reveals the (poor) quality of our thinking. As Dr. Spiller noted, the paradox of writing and in gathering knowledge is that the more you know, the more you realize you do not know, and the harder writing and communicating knowledge becomes, though we must make the effort nonetheless.

Thus KDD is oftentimes not quite the discovery of knowledge in the sense that the term was intended to mean. It is, instead, a discovery of associations that may lead us to knowledge. Knowing this distinction is important because the corollary processes of data mining, machine learning, and the early application of AI in which we find ourselves is really the process of finding associations, correlations, trends, patterns, and probabilities in data that is approached in a manner as if all information is flat, thereby obliterating its context. This is not knowledge.

We can measure the information content of any set of data, but the real unlocked potential in that information content will come with the processing of it that leads to knowledge. To do that requires an underlying model of domain knowledge, an understanding of the different lexicons in any given set of domains, and a Rosetta Stone that provides a roadmap that identifies those elements of the lexicon that are describing the same things across them. It also requires capturing and preserving context.

For example, when I use the chat on my iPhone it attempts to anticipate what I want to write. I am given three choices of words to choose if I want to use this shortcut. In most cases, the iPhone guesses wrong, despite presenting three choices and having at its disposal (at least presumptively) a larger vocabulary than the writer. Oftentimes it seems to take control, assuming that I have misspelled or misidentified a word and chooses the wrong one for me, where my message becomes a nonsense message.

If one were to believe the hype surrounding AI, one would think that there is magic there but, as Arthur C. Clarke noted (known as Clarke’s Third Law): “Any sufficiently advanced technology is indistinguishable from magic.” Familiar with the new technologies as we are, we know that there is no magic there, and also that it is consistently wrong a good deal of the time. But many individuals come to rely upon the technology nonetheless.

Despite the gloss of something new, the long-established methods of epistemology, code-breaking, statistics, and Calculus apply–as do standards of establishing fact and truth. Despite a large set of data, the iPhone is wrong because the iPhone does not understand–does not possess knowledge–to know why it is wrong. As an aside, its dictionary is also missing a good many words.

A Segue and a Conclusion–I Still Haven’t Found What I’m Looking For: Why Data Integration?…and a Proposed Definition of the Bigness of Data

As with the question to Neil Armstrong, so the question on data. And so the answer is the same. When we look at any set of data under a particular structure of a domain, the information we derive provides us with a manner of looking at the world. In economic systems, businesses, and projects that data provides us with a basis for interpretation, but oftentimes falls short of allowing us to effectively describe and understand what is happening.

Capturing interrelated data across domains allows us to look at the phenomena of these human systems from a different perspective, providing us with the opportunity to derive new knowledge. But in order to do this, we have to be open to this possibility. It also calls for us to, as I have hammered home in this blog, reset our definitions of what is being described.

For example, there are guides in project and program management that refer to statistical measures as “predictive analytics.” This further waters down the intent of the phrase. Measures of earned value are not predictive. They note trends and a single-point outcome. Absent further analysis and processing, the statistical fallacy of extrapolation can be baked into our analysis. The same applies to any index of performance.

Furthermore, these indices and indicators–for that is all they are–do not provide knowledge, which requires a means of not only distinguishing between correlation and causation but also applying contextualization. All systems operate in a vector space. When we measure an economic or social system we are really measuring its behavior in the vector space that it inhabits. This vector space includes the way it is manifested in space-time: the equivalent of length, width, depth (that is, its relative position, significance, and size within information space), and time.

This then provides us with a hint of a definition of what often goes by the definition of “big data.” Originally, as noted in previous blogs, big data was first used in NASA in 1997 by Cox and Ellsworth (not as credited to John Mashey on Wikipedia with the dishonest qualifier “popularized”) and was simply a statement meaning “datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.”

This is a relative term given Moore’s Law. But we can begin to peel back a real definition of the “bigness” of data. It is important to do this because too many approaches to big data assume it is flat and then apply probabilities and pattern recognition to data that undermines both contextualization and knowledge. Thus…

The Bigness of Data (B) is a function (f ) of the entropy expended (S) to transform data into information, or to extract its information content.

Information evolves. It evolves toward greater complexity just as life evolves toward greater complexity. The universe is built on coded bits of information that, taken together and combined in almost unimaginable ways, provides different forms of life and matter. Our limited ability to decode and understand this information–and our interactions in it– are important to us both individually and collectively.

Much entropy is already expended in the creation of the data that describes the activity being performed. Its context is part of its information content. Obliterating the context inherent in that information content causes all previous entropy to be of no value. Thus, in approaching any set of data, the inherent information content must be taken into account in order to avoid the unnecessary (and erroneous) application of data interpretation.

More to follow in future posts.

(Data) Transformation–Fear and Loathing over ETL in Project Management

ETL stands for data extract, transform, and load. This essential step is the basis for all of the new capabilities that we wish to acquire during the next wave of information technology: business analytics, big(ger) data, interdisciplinary insight into processes that provide insights into improving productivity and efficiency.

I’ve been dealing with a good deal of fear and loading regarding the introduction of this concept, even though in my day job my organization is a leading practitioner in the field in its vertical. Some of this is due to disinformation by competitors in playing upon the fears of the non-technically minded–the expected reaction of those who can’t do in the last throws of avoiding irrelevance. Better to baffle them with bullshit than with brilliance, I guess.

But, more importantly, part of this is due to the state of ETL and how it is communicated to the project management and business community at large. There is a great deal to be gained here by muddying the waters even by those who know better and have the technology. So let’s begin by clearing things up and making this entire field a bit more coherent.

Let’s start with the basics. Any organization that contains the interaction of people is a system. For purposes of a project management team, a business enterprise, or a governmental body we deal with a special class of systems known as Complex Adaptive Systems: CAS for short. A CAS is a non-linear learning system that reacts and evolves to its environment. It is complex because of the inter-relationships and interactions of more than two agents in any particular portion of the system.

I was first introduced to the concept of CAS through readings published out of the Santa Fe Institute in New Mexico. Most noteworthy is the work The Quark and the Jaguar by the physicist Murray Gell-Mann. Gell-Mann is received the Nobel in physics in 1969 for his work on elementary particles, such as the quark, and is co-founder of the Institute. He also was part of the team that first developed simulated Monte Carlo analysis during a period he spent at RAND Corporation. Anyone interested in the basic science of quanta and how the universe works that then leads to insights into subjects such as day-to-day probability and risk should read this book. It is a good popular scientific publication written by a brilliant mind, but very relevant to the subjects we deal with in project management and information science.

Understanding that our organizations are CAS allows us to apply all sorts of tools to better understand them and their relationship to the world at large. From a more practical perspective, what are the risks involved in the enterprise in which we are engaged and what are the probabilities associated with any of the range of outcomes that we can label as success. For my purposes, the science of information theory is at the forefront of these tools. In this world an engineer by the name of Claude Shannon working at Bell Labs essentially invented the mathematical basis for everything that followed in the world of telecommunications, generating, interpreting, receiving, and understanding intelligence in communication, and the methods of processing information. Needless to say, computing is the main recipient of this theory.

Thus, all CAS process and react to information. The challenge for any entity that needs to survive and adapt in a continually changing universe is to ensure that the information that is being received is of high and relevant quality so that the appropriate adaptation can occur. There will be noise in the signals that we receive. What we are looking for from a practical perspective in information science are the regularities in the data so that we can make the transformation of receiving the message in a mathematical manner (where the message transmitted is received) into the definition of information quality that we find in the humanities. I believe that we will find that mathematical link eventually, but there is still a void there. A good discussion of this difference can be found here in the on-line publication Double Dialogues.

Regardless of this gap, the challenge of those of us who engage in the business of ETL must bring to the table the ability not only to ensure that the regularities in the information are identified and transmitted to the intended (or necessary) users, but also to distinguish the quality of the message in the terms of the purpose of the organization. Shannon’s equation is where we start, not where we end. Given this background, there are really two basic types of data that we begin with when we look at a set of data: structured and unstructured data.

Structured data are those where the qualitative information content is either predefined by its nature or by a tag of some sort. For example, schedule planning and performance data, regardless of the idiosyncratic/proprietary syntax used by a software publisher, describes the same phenomena regardless of the software application. There are only so many ways to identify snow–and, no, the Inuit people do not have 100 words to describe it. Qualifiers apply in the humanities, but usually our business processes more closely align with statistical and arithmetic measures. As a result, structured data is oftentimes defined by its position in a hierarchical, time-phased, or interrelated system that contains a series of markers, indexes, and tables that allow it to be interpreted easily through the identification of a Rosetta stone, even when the system, at first blush, appears to be opaque. When you go to a book, its title describes what it is. If its content has a table of contents and/or an index it is easy to find the information needed to perform the task at hand.

Unstructured data consists of the content of things like letters, e-mails, presentations, and other forms of data disconnected from its source systems and collected together in a flat repository. In this case the data must be mined to recreate what is not there: the title that describes the type of data, a table of contents, and an index.

All data requires initial scrubbing and pre-processing. The difference here is the means used to perform this operation. Let’s take the easy path first.

For project management–and most business systems–we most often encounter structured data. What this means is that by understanding and interpreting standard industry terminology, schemas, and APIs that the simple process of aligning data to be transformed and stored in a database for consumption can be reduced to a systemic and repeatable process without the redundancy of rediscovery applied in every instance. Our business intelligence and business analytics systems can be further developed to anticipate a probable question from a user so that the query is pre-structured to allow for near immediate response. Further, structuring the user interface in such as way as to make the response to the query meaningful, especially integrated with and juxtaposed other types of data requires subject matter expertise to be incorporated into the solution.

Structured ETL is the place that I most often inhabit as a provider of software solutions. These processes are both economical and relatively fast, particularly in those cases where they are applied to an otherwise inefficient system of best-of-breed applications that require data transfers and cross-validation prior to official reporting. Time, money, and effort are all saved by automating this process, improving not only processing time but also data accuracy and transparency.

In the case of unstructured data, however, the process can be a bit more complicated and there are many ways to skin this cat. The key here is that oftentimes what seems to be unstructured data is only so because of the lack of domain knowledge by the software publisher in its target vertical.

For example, I recently read a white paper published by a large BI/BA publisher regarding their approach to financial and accounting systems. My own experience as a business manager and Navy Supply Corps Officer provide me with the understanding that these systems are highly structured and regulated. Yet, business intelligence publishers treated this data–and blatantly advertised and apparently sold as state of the art–an unstructured approach to mining this data.

This approach, which was first developed back in the 1980s when we first encountered the challenge of data that exceeded our expertise at the time, requires a team of data scientists and coders to go through the labor- and time-consuming process of pre-processing and building specialized processes. The most basic form of this approach involves techniques such as frequency analysis, summarization, correlation, and data scrubbing. This last portion also involves labor-intensive techniques at the microeconomic level such as binning and other forms of manipulation.

This is where the fear and loathing comes into play. It is not as if all information systems do not perform these functions in some manner, it is that in structured data all of this work has been done and, oftentimes, is handled by the database system. But even here there is a better way.

My colleague, Dave Gordon, who has his own blog, will emphasize that the identification of probable questions and configuration of queries in advance combined with the application of standard APIs will garner good results in most cases. Yet, one must be prepared to receive a certain amount of irrelevant information. For example, the query on Google of “Fun Things To Do” that you may use if you are planning for a weekend will yield all sorts of results, such as “50 Fun Things to Do in an Elevator.”  This result includes making farting sounds. The link provides some others, some of which are pretty funny. In writing this blog post, a simple search on Google for “Google query fails” yields what can only be described as a large number of query fails. Furthermore, this approach relies on the data originator to have marked the data with pointers and tags.

Given these different approaches to unstructured data and the complexity involved, there is a decision process to apply:

1. Determine if the data is truly unstructured. If the data is derived from a structured database from an existing application or set of applications, then it is structured and will require domain expertise to inherit the values and information content without expending unnecessary resources and time. A structured, systemic, and repeatable process can then be applied. Oftentimes an industry schema or standard can be leveraged to ensure consistency and fidelity.

2. Determine whether only a portion of the unstructured data is relative to your business processes and use it to append and enrich the existing structured data that has been used to integrate and expand your capabilities. In most cases the identification of a Rosetta Stone and standard APIs can be used to achieve this result.

3. For the remainder, determine the value of mining the targeted category of unstructured data and perform a business case analysis.

Given the rapidly expanding size of data that we can access using the advancing power of new technology, we must be able to distinguish between doing what is necessary from doing what is impressive. The definition of Big Data has evolved over time because our hardware, storage, and database systems allow us to access increasingly larger datasets that ten years ago would have been unimaginable. What this means is that–initially–as we work through this process of discovery, we will be bombarded with a plethora of irrelevant statistical measures and so-called predictive analytics that will eventually prove out to not pass the “so-what” test. This process places the users in a state of information overload, and we often see this condition today. It also means that what took an army of data scientists and developers to do ten years ago takes a technologist with a laptop and some domain knowledge to perform today. This last can be taught.

The next necessary step, aside from applying the decision process above, is to force our information systems to advance their processing to provide more relevant intelligence that is visualized and configured to the domain expertise required. In this way we will eventually discover the paradox that effectively accessing larger sets of data will yield fewer, more relevant intelligence that can be translated into action.

At the end of the day the manager and user must understand the data. There is no magic in data transformation or data processing. Even with AI and machine learning it is still incumbent upon the people within the organization to be able to apply expertise, perspective, knowledge, and wisdom in the use of information and intelligence.

Learning the (Data) — Data-Driven Management, HBR Edition

The months of December and January are usually full of reviews of significant events and achievements during the previous twelve months. Harvard Business Review makes the search for some of the best writing on the subject of data-driven transformation by occasionally publishing in one volume the best writing on a critical subject of interest to professional through the magazine OnPoint. It is worth making part of your permanent data management library.

The volume begins with a very concise article by Thomas C. Redman with the provocative title “Does Your Company Know What to Do with All Its Data?” He then goes on to list seven takeaways of optimizing the use of existing data that includes many of the themes that I have written about in this blog: better decision-making, innovation, what he calls “informationalize products”, and other significant effects. Most importantly, he refers to the situation of information asymmetry and how this provides companies and organizations with a strategic advantage that directly affects the bottom line–whether that be in negotiations with peers, contractual relationships, or market advantages. Aside from the OnPoint article, he also has some important things to say about corporate data quality. Highly recommended and a good reason to implement systems that assure internal information systems fidelity.

Edd Wilder-James also covers a theme that I have hammered home in a number of blog posts in the article “Breaking Down Data Silos.” The issue here is access to data and the manner in which it is captured and transformed into usable analytics. His recommended approach to a task that is often daunting is to find the path of least resistance in finding opportunities to break down silos and maximize data to apply advanced analytics. The article provides a necessary balm that counteracts the hype that often accompanies this topic.

Both of these articles are good entrees to the subject and perfectly positioned to prompt both thought and reflection of similar experiences. In my own day job I provide products that specifically address these business needs. Yet executives and management in all too many cases continue to be unaware of the economic advantages of data optimization or the manner in which continuing to support data silos is limiting their ability to effectively manage their organizations. There is no doubt that things are changing and each day offers a new set of clients who are feeling their way in this new data-driven world, knowing that the promises of almost effort-free goodness and light by highly publicized data gurus are not the reality of practitioners, who apply the detail work of data normalization and rationalization. At the end it looks like magic, but there is effort that needs to be expended up-front to get to that state. In this physical universe under the Second Law of Thermodynamics there are no free lunches–energy must be borrowed from elsewhere in order to perform work. We can minimize these efforts through learning and the application of new technology, but managers cannot pretend not to have to understand the data that they intend to use to make business decisions.

All of the longer form articles are excellent, but I am particularly impressed with the Leandro DalleMule and Thomas H. Davenport article entitled “What’s Your Data Strategy?” from the May-June 2017 issue of HBR. Oftentimes when addressing big data at professional conferences and in visiting businesses the topic often runs to the manner of handling the bulk of non-structured data. But as the article notes, less than half of an organization’s relevant structured data is actually used in decision-making. The most useful artifact that I have permanently plastered at my workplace is the graphic “The Elements of Data Strategy”, and I strongly recommend that any manager concerned with leveraging new technology to optimize data do the same. The graphic illuminates the defensive and offensive positions inherent in a cohesive data strategy leading an organization to the state: “In our experience, a more flexible and realistic approach to data and information architectures involves both a single source of truth (SSOT) and multiple versions of the truth (MVOTs). The SSOT works at the data level; MVOTs support the management of information.” Elimination of proprietary data silos, elimination of redundant data streams, and warehousing of data that is accessed using a number of analytical methods achieve the necessary states of SSOT that provides the basis for an environment supporting MVOTs.

The article “Why IT Fumbles Analytics” by Donald A. Marchand and Joe Peppard from 2013, still rings true today. As with the article cited above by Wilder-James, the emphasis here is with the work necessary to ensure that new data and analytical capabilities succeed, but the emphasis shifts to “figuring out how to use the information (the new system) generates to make better decisions or gain deeper…insights into key aspects of the business.” The heart of managing the effort in providing this capability is to put into place a project organization, as well as systems and procedures, that will support the organizational transformation that will occur as a result of the explosion of new analytical capability.

The days of simply buying an off-the-shelf silo-ed “tool” and automating a specific manual function are over, especially for organizations that wish to be effective and competitive–and more profitable–in today’s data and analytical environment. A more comprehensive and collaborative approach is necessary. As with the DalleMule and Davenport article, there is a very useful graphic that contrasts traditional IT project approaches against Analytics and Big Data (or perhaps “Bigger” Data) Projects. Though the prescriptions in the article assume an earlier concept of Big Data optimization focused on non-structured data, thereby making some of these overkill, an implementation plan is essential in supporting the kind of transformation that will occur, and managers act at their own risk if they fail to take this effect into account.

All of the other articles in this OnPoint issue are of value. The bottom line, as I have written in the past, is to keep a focus on solving business challenges, rather than buying the new bright shiny object. Alternatively, in today’s business environment the day that business decision-makers can afford to stay within their silo-ed comfort zone are phasing out very quickly, so they need to shift their attention to those solutions that address these new realities.

So why do this apart from the fancy term “data optimization”? Well, because there is a direct return-on-investment in transforming organizations and systems to data-driven ones. At the end of the day the economics win out. Thus, our organizations must be prepared to support and have a plan in place to address the core effects of new data-analytics and Big Data technology:

a. The management and organizational transformation that takes place when deploying the new technology, requiring proactive socialization of the changing environment, the teaching of new skill sets, new ways of working, and of doing business.

b. Supporting transformation from a sub-optimized silo-ed “tell me what I need to know” work environment to a learning environment, driven by what the data indicates, supporting the skills cited above that include intellectual curiosity, engaging domain expertise, and building cross-domain competencies.

c. A practical plan that teaches the organization how best to use the new capability through a practical, hands-on approach that focuses on addressing specific business challenges.

Something New (Again)– Top Project Management Trends 2017

Atif Qureshi at Tasque, which I learned via Dave Gordon’s blog, went out to LinkedIn’s Project Management Community to ask for the latest tends in project management.  You can find the raw responses to his inquiry at his blog here.  What is interesting is that some of these latest trends are much like the old trends which, given continuity makes sense.  But it is instructive to summarize the ones that came up most often.  Note that while Mr. Qureshi was looking for ten trends, and taken together he definitely lists more than ten, there is a lot of overlap.  In total the major issues seem to the five areas listed below.

a.  Agile, its hybrids, and its practical application.

It should not surprise anyone that the latest buzzword is Agile.  But what exactly is it in its present incarnation?  There is a great deal of rising criticism, much of it valid, that it is a way for developers and software PMs to avoid accountability. Anyone ready Glen Alleman’s Herding Cat’s Blog is aware of the issues regarding #NoEstimates advocates.  As a result, there are a number hybrid implementations of Agile that has Agile purists howling and non-purists adapting as they always do.  From my observations, however, there is an Ur-Agile that is out there common to all good implementations and wrote about them previously in this blog back in 2015.  Given the time, I think it useful to repeat it here.

The best articulation of Agile that I have read recently comes from Neil Killick, whom I have expressed some disagreement on the #NoEstimates debate and the more cultish aspects of Agile in past posts, but who published an excellent post back in July (2015) entitled “12 questions to find out: Are you doing Agile Software Development?”

Here are Neil’s questions:

  1. Do you want to do Agile Software Development? Yes – go to 2. No – GOODBYE.
  2. Is your team regularly reflecting on how to improve? Yes – go to 3. No – regularly meet with your team to reflect on how to improve, go to 2.
  3. Can you deliver shippable software frequently, at least every 2 weeks? Yes – go to 4. No – remove impediments to delivering a shippable increment every 2 weeks, go to 3.
  4. Do you work daily with your customer? Yes – go to 5. No – start working daily with your customer, go to 4.
  5. Do you consistently satisfy your customer? Yes – go to 6. No – find out why your customer isn’t happy, fix it, go to 5.
  6. Do you feel motivated? Yes – go to 7. No – work for someone who trusts and supports you, go to 2.
  7. Do you talk with your team and stakeholders every day? Yes – go to 8. No – start talking with your team and stakeholders every day, go to 7.
  8. Do you primarily measure progress with working software? Yes – go to 9. No – start measuring progress with working software, go to 8.
  9. Can you maintain pace of development indefinitely? Yes – go to 10. No – take on fewer things in next iteration, go to 9.
  10. Are you paying continuous attention to technical excellence and good design? Yes – go to 11. No – start paying continuous attention to technical excellent and good design, go to 10.
  11. Are you keeping things simple and maximising the amount of work not done? Yes – go to 12. No – start keeping things simple and writing as little code as possible to satisfy the customer, go to 11.
  12. Is your team self-organising? Yes – YOU’RE DOING AGILE SOFTWARE DEVELOPMENT!! No – don’t assign tasks to people and let the team figure out together how best to satisfy the customer, go to 12.

Note that even in software development based on Agile you are still “provid(ing) value by independently developing IP based on customer requirements.”  Only you are doing it faster and more effectively.

With the possible exception of the “self-organizing” meme, I find that items through 11 are valid ways of identifying Agile.  Given that the list says nothing about establishing closed-loop analysis of progress says nothing about estimates or the need to monitor progress, especially on complex projects.  As a matter of fact one of the biggest impediments noted elsewhere in industry is the inability of Agile to scale.  This limitations exists in its most simplistic form because Agile is fine in the development of well-defined limited COTS applications and smartphone applications.  It doesn’t work so well when one is pushing technology while developing software, especially for a complex project involving hundreds of stakeholders.  One other note–the unmentioned emphasis in Agile is technical performance measurement, since progress is based on satisfying customer requirements.  TPM, when placed in the context of a world of limited resources, is the best measure of all.

b.  The integration of new technology into PM and how to upload the existing PM corporate knowledge into that technology.

This is two sides of the same coin.  There is always  debate about the introduction of new technologies within an organization and this debate places in stark contrast the differences between risk aversion and risk management.

Project managers, especially in the complex project management environment of aerospace & defense tend, in general, to be a hardy lot.  Consisting mostly of engineers they love to push the envelope on technology development.  But there is also a stripe of engineers among them that do not apply this same approach of measured risk to their project management and business analysis system.  When it comes to tracking progress, resource management, programmatic risk, and accountability they frequently enter the risk aversion mode–believing that the less eyes on what they do the more leeway they have in achieving the technical milestones.  No doubt this is true in a world of unlimited time and resources, but that is not the world in which we live.

Aside from sub-optimized self-interest, the seeds of risk aversion come from the fact that many of the disciplines developed around performance management originated in the financial management community, and many organizations still come at project management efforts from perspective of the CFO organization.  Such rice bowl mentality, however, works against both the project and the organization.

Much has been made of the wall of honor for those CIA officers that have given their lives for their country, which lies to the right of the Langley headquarters entrance.  What has not gotten as much publicity is the verse inscribed on the wall to the left:

“And ye shall know the truth and the truth shall make you free.”

      John VIII-XXXII

In many ways those of us in the project management community apply this creed to the best of our ability to our day-to-day jobs, and it lies as the basis for all of the management improvement from Deming’s concept of continuous process improvement, through the application of Six Sigma and other management improvement methods.  What is not part of this concept is that one will apply improvement only when a customer demands it, though they have asked politely for some time.  The more information we have about what is happening in our systems, the better the project manager and the project team is armed with applying the expertise which qualified the individuals for their jobs to begin with.

When it comes to continual process improvement one does not need to wait to apply those technologies that will improve project management systems.  As a senior management (and well-respected engineer) when I worked in Navy told me; “if my program managers are doing their job virtually every element should be in the yellow, for only then do I know that they are managing risk and pushing the technology.”

But there are some practical issues that all managers must consider when managing the risks in introducing new technology and determining how to bring that technology into existing business systems without completely disrupting the organization.  This takes–good project management practices that, for information systems, includes good initial systems analysis, identification of those small portions of the organization ripe for initial entry in piloting, and a plan of data normalization and rationalization so that corporate knowledge is not lost.  Adopting systems that support more open systems that militate against proprietary barriers also helps.

c.  The intersection of project management and business analysis and its effects.

As data becomes more transparent through methods of normalization and rationalization–and the focus shifts from “tools” to the knowledge that can be derived from data–the clear separation that delineated project management from business analysis in line-and-staff organization becomes further blurred.  Even within the project management discipline, the separation in categorization of schedule analysts from cost analysts from financial analyst are becoming impediments in fully exploiting the advantages in looking at all data that is captured and which affects project performance.

d.  The manner of handling Big Data, business intelligence, and analytics that result.

Software technologies are rapidly developing that break the barriers of self-contained applications that perform one or two focused operations or a highly restricted group of operations that provide functionality focused on a single or limited set of business processes through high level languages that are hard-coded.  These new technologies, as stated in the previous section, allow users to focus on access to data, making the interface between the user and the application highly adaptable and customizable.  As these technologies are deployed against larger datasets that allow for integration of data across traditional line-and-staff organizations, they will provide insight that will garner businesses competitive advantages and productivity gains against their contemporaries.  Because of these technologies, highly labor-intensive data mining and data engineering projects that were thought to be necessary to access Big Data will find themselves displaced as their cost and lack of agility is exposed.  Internal or contracted out custom software development devoted along these same lines will also be displaced just as COTS has displaced the high overhead associated with these efforts in other areas.  This is due to the fact that hardware and processes developments are constantly shifting the definition of “Big Data” to larger and larger datasets to the point where the term will soon have no practical meaning.

e.  The role of the SME given all of the above.

The result of the trends regarding technology will be to put the subject matter expert back into the driver’s seat.  Given adaptive technology and data–and a redefinition of the analyst’s role to a more expansive one–we will find that the ability to meet the needs of functionality and the user experience is almost immediate.  Thus, when it comes to business and project management systems, the role of Agile, while these developments reinforce the characteristics that I outlined above are made real, the weakness of its applicability to more complex and technical projects is also revealed.  It is technology that will reduce the risk associated with contract negotiation, processes, documentation, and planning.  Walking away from these necessary components to project management obfuscates and avoids the hard facts that oftentimes must be addressed.

One final item that Mr. Qureshi mentions in a follow-up post–and which I have seen elsewhere in similar forums–concerns operational security.  In deployment of new technologies a gatekeeper must be aware of whether that technology will not open the organization’s corporate knowledge to compromise.  Given the greater and more integrated information and knowledge garnered by new technology, as good managers it is incumbent to ensure these improvements do not translate into undermining the organization.

Takin’ Care of Business — Information Economics in Project Management

Neoclassical economics abhors inefficiency, and yet inefficiencies exist.  Among the core issues that create inefficiencies is the asymmetrical nature of information.  Asymmetry is an accepted cornerstone of economics that leads to inefficiency.  We can see in our daily lives and employment the effects of one party in a transaction having more information than the other:  knowing whether the used car you are buying is a lemon, measuring risk in the purchase of an investment and, apropos to this post, identifying how our information systems allow us to manage complex projects.

Regarding this last proposition we can peel this onion down through its various levels: the asymmetry in the information between the customer and the supplier, the asymmetry in information between the board and stockholders, the asymmetry in information between management and labor, the asymmetry in information between individual SMEs and the project team, etc.–it’s elephants all the way down.

This asymmetry, which drives inefficiency, is exacerbated in markets that are dominated by monopoly, monopsony, and oligopoly power.  When informed by the work of Hart and Holmström regarding contract theory, which recently garnered the Nobel in economics, we have a basis for understanding the internal dynamics of projects in seeking efficiency and productivity.  What is interesting about contract theory is that it incorporates the concept of asymmetrical information (labeled as adverse selection), but expands this concept in human transactions at the microeconomic level to include considerations of moral hazard and the utility of signalling.

The state of asymmetry and inefficiency is exacerbated by the patchwork quilt of “tools”–software applications that are designed to address only a very restricted portion of the total contract and project management system–that are currently deployed as the state of the art.  These tend to require the insertion of a new class of SME to manage data by essentially reversing the efficiencies in automation, involving direct effort to reconcile differences in data from differing tools. This is a sub-optimized system.  It discourages optimization of information across the project, reinforces asymmetry, and is economically and practically unsustainable.

The key in all of this is ensuring that sub-optimal behavior is discouraged, and that those activities and behaviors that are supportive of more transparent sharing of information and, therefore, contribute to greater efficiency and productivity are rewarded.  It should be noted that more transparent organizations tend to be more sustainable, healthier, and with a higher degree of employee commitment.

The path forward where there is monopsony power, where there is a dominant buyer, is to impose the conditions for normative behavior that would otherwise be leveraged through practice in a more open market.  For open markets not dominated by one player as either supplier or seller, instituting practices that reward behavior that reduces the effects of asymmetrical information, and contracting disincentives in business transactions on the open market is the key.

In the information management market as a whole the trends that are working against asymmetry and inefficiency involve the reduction of data streams, the construction of cross-domain data repositories (or reservoirs) that allow for the satisfaction of multiple business stakeholders, and the introduction of systems that are more open and adaptable to the needs of the project system in lieu of a limited portion of the project team.  These solutions exist, yet their adoption is hindered because of the long-term infrastructure that is put in place in complex project management.  This infrastructure is supported by incumbents that are reinforcing to the status quo.  Because of this, from the time a market innovation is introduced to the time that it is adopted in project-focused organizations usually involves the expenditure of several years.

This argues for establishing an environment that is more nimble.  This involves the adoption of a series of approaches to achieve the goals of broader information symmetry and efficiency in the project organization.  These are:

a. Instituting contractual relationships, both internally and externally, that encourage project personnel to identify risk.  This would include incentives to kill efforts that have breached their framing assumptions, or to consolidate progress that the project has achieved to date–sending it as it is to production–while killing further effort that would breach framing assumptions.

b. Institute policy and incentives on the data supply end to reduce the number of data streams.  Toward this end both acquisition and contracting practices should move to discourage proprietary data dead ends by encouraging normalized and rationalized data schemas that describe the environment using a common or, at least, compatible lexicon.  This reduces the inefficiency derived from opaqueness as it relates to software and data.

c.  Institute policy and incentives on the data consumer end to leverage the economies derived from the increased computing power from Moore’s Law by scaling data to construct interrelated datasets across multiple domains that will provide a more cohesive and expansive view of project performance.  This involves the warehousing of data into a common repository or reduced set of repositories.  The goal is to satisfy multiple project stakeholders from multiple domains using as few streams as necessary and encourage KDD (Knowledge Discovery in Databases).  This reduces the inefficiency derived from data opaqueness, but also from the traditional line-and-staff organization that has tended to stovepipe expertise and information.

d.  Institute acquisition and market incentives that encourage software manufacturers to engage in positive signalling behavior that reduces the opaqueness of the solutions being offered to the marketplace.

In summary, the current state of project data is one that is characterized by “best-of-breed” patchwork quilt solutions that tend to increase direct labor, reduces and limits productivity, and drives up cost.  At the end of the day the ability of the project to handle risk and adapt to technical challenges rests on the reliability and efficiency of its information systems.  A patchwork system fails to meet the needs of the organization as a whole and at the end of the day is not “takin’ care of business.”

I Can’t Drive 55 — The New York Times and Moore’s Law

Yesterday the New York Times published an article about Moore’s Law.  While interesting in that John Markoff, who is the Times science writer, speculates that in about 5 years the computing industry will be “manipulating material as small as atoms” and therefore may hit a wall in what has become a back of the envelope calculation of the multiplicative nature of computing complexity and power in the silicon age.

This article prompted a follow on from Brian Feldman at NY Mag, that the Institute of Electrical and Electronics Engineers (IEEE) has anticipated a broader definition of the phenomenon of the accelerating rate of computing power to take into account quantum computing.  Note here that the definition used in this context is the literal one: the doubling of the number of transistors over time that can be placed on a microchip.  That is a correct summation of what Gordon Moore said, but it not how Moore’s Law is viewed or applied within the tech industry.

Moore’s Law (which is really a rule of thumb or guideline in lieu of an ironclad law) has been used, instead, as a analogue to describe the geometric acceleration that has been seen in computer power over the last 50 years.  As Moore originally described the phenomenon, the doubling of transistors occurred every two years.  Then it was revised later to occur about every 18 months or so, and now it is down to 12 months or less.  Furthermore, aside from increasing transistors, there are many other parallel strategies that engineers have applied to increase speed and performance.  When we combine the observation of Moore’s Law with other principles tied to the physical world, such as Landauer’s Principle and Information Theory, we begin to find a coherence in our observations that are truly tied to physics.  Thus, rather than being a break from Moore’s Law (and the observations of these other principles and theory noted above), quantum computing, to which the articles refer, sits on a continuum rather than a break with these concepts.

Bottom line: computing, memory, and storage systems are becoming more powerful, faster, and expandable.

Thus, Moore’s Law in terms of computing power looks like this over time:

Moore's Law Chart

Furthermore, when we calculate the cost associated with erasing a bit of memory we begin to approach identifying the Demon* in defying the the Second Law of Thermodynamics.

Moore's Law Cost Chart

Note, however, that the Second Law is not really being defied, it is just that we are constantly approaching zero, though never actually achieving it.  But the principle here is that the marginal cost associated with each additional bit of information become vanishingly small to the point of not passing the “so what” test, at least in everyday life.  Though, of course, when we get to neural networks and strong AI such differences are very large indeed–akin to mathematics being somewhat accurate when we want to travel from, say, San Francisco to London, but requiring more rigor and fidelity when traveling from Kennedy Space Center to Gale Crater on Mars.

The challenge, then, in computing is to be able to effectively harness such power.  Our current programming languages and operating environments are only scratching the surface of how to do this, and the joke in the industry is that the speed of software is inversely proportional to the advance in computing power provided by Moore’s Law.  The issue is that our brains, and thus the languages we harness to utilize computational power, are based in an analog understanding of the universe, while the machines we are harnessing are digital.  For now this knowledge can only build bad software and robots, but given our drive into the brave new world of heuristics, may lead us to Skynet and the AI apocalypse if we are not careful–making science fiction, once again, science fact.

Back to present time, however, what this means is that for at least the next decade, we will see an acceleration of the ability to use more and larger sets of data.  The risks, that we seem to have to relearn due to a new generation of techies entering the market which lack a well rounded liberal arts education, is that the basic statistical and scientific rules in the conversion, interpretation, and application of intelligence and information can still be roundly abused and violated.  Bad management, bad decision making, bad leadership, bad mathematics, bad statisticians, specious logic, and plain old common human failings are just made worse, with greater impact on more people, given the misuse of that intelligence and information.

The watchman against these abuses, then, must be incorporated into the solutions that use this intelligence and information.  This is especially critical given the accelerated pace of computing power, and the greater interdependence of human and complex systems that this acceleration creates.

*Maxwell’s Demon

Note:  I’ve defaulted to the Wikipedia definitions of both Landauer’s Principle and Information Theory for the sake of simplicity.  I’ve referenced more detailed work on these concepts in previous posts and invite readers to seek those out in the archives of this blog.

River Deep, Mountain High — A Matrix of Project Data

Been attending conferences and meetings of late and came upon a discussion of the means of reducing data streams while leveraging Moore’s Law to provide more, better data.  During a discussion with colleagues over lunch they asked if asking for more detailed data would provide greater insight.  This led to a discussion of the qualitative differences in data depending on what information is being sought.  My response to more detailed data was to respond: “well there has to be a pony in there somewhere.”  This was greeted by laughter, but then I finished the point: more detailed data doesn’t necessarily yield greater insight (though it could and only actually looking at it will tell you that, particularly in applying the principle of KDD).  But more detailed data that is based on a hierarchical structure will, at the least, provide greater reliability and pinpoint areas of intersection to detect areas of risk manifestation that is otherwise averaged out–and therefore hidden–at the summary levels.

Not to steal the thunder of new studies that are due out in the area of data later this spring but, for example, I am aware after having actually achieved lowest level integration for extremely complex projects through my day job, that there is little (though not zero) insight gained in predictive power between say, the control account level of a WBS and the work package level.  Going further down to element of cost may, in the words of the character in the movie Still Alice, where “You may say that this falls into the great academic tradition of knowing more and more about less and less until we know everything about nothing.”  But while that may be true for project management, that isn’t necessarily so when collecting parametrics and auditing the validity of financial information.

Rolling up data from individually detailed elements of a hierarchy is the proper way to ensure credibility.  Since we are at the point where a TB of data has virtually the same marginal cost of a GB of data (which is vanishingly small to begin with), then the more the merrier in eliminating the abuse associated with human-readable summary reporting.  Furthermore, I have long proposed through this blog and elsewhere, that the emphasis should be away from people, process, and tools, to people, process, and data.  This rightly establishes the feedback loop necessary for proper development and project management.  More importantly, the same data available through project management processes satisfy the different purposes of domains both within the organization, and of multiple external stakeholders.

This then leads us to the concept of integrated project management (IPM), which has become little more than a buzz-phrase, and receives a lot of hand waves, mostly by technology companies that want to push their tools–which are quickly becoming obsolete–while appearing forward leaning.  This tool-centric approach is nothing more than marketing–focusing on what the software manufacturer would have us believe is important based on the functionality baked into their applications.  One can see where this could be a successful approach, given the emphasis on tools in the PM triad.  But, of course, it is self-limiting in a self-interested sort of way.  The emphasis needs to be on the qualitative and informative attributes of available data–not of tool functionality–that meet the requirements of different data consumers while minimizing, to the extent possible, the number of data streams.

Thus, there are at least two main aspects of data that are important in understanding the utility of project management: early warning/predictiveness and credibility/traceability/fidelity.  The chart attached below gives a rough back-of-the-envelope outline of this point, with some proposed elements, though this list is not intended to be exhaustive.

PM Data Matrix

PM Data Matrix

In order to capture data across the essential elements of project management, our data must demonstrate both a breadth and depth that allows for the discovery of intersections of the different elements.  The weakness in the two-dimensional model above is that it treats each indicator by itself.  But, when we combine, for example, IMS consecutive slips with other elements listed, the informational power of the data becomes many times greater.  This tells us that the weakness in our present systems is that we treat the data as a continuity between autonomous elements.  But we know that the project consists of discontinuities where the next level of achievement/progress is a function of risk.  Thus, when we talk about IPM, the secret is in focusing on data that informs us what our systems are doing.  This will require more sophisticated types of modeling.

Big Time — Elements of Data Size in Scaling

I’ve run into additional questions about scalability.  It is significant to understand the concept in terms of assessing software against data size, since there are actually various aspect of approaching the issue.

Unlike situations where data is already sorted and structured as part of the core functionality of the software service being provided, this is in dealing in an environment where there are many third-party software “tools” that put data into proprietary silos.  These act as barriers to optimizing data use and gaining corporate intelligence.  The goal here is to apply in real terms the concept that the customers generating the data (or stakeholders who pay for the data) own the data and should have full use of it across domains.  In project management and corporate governance this is an essential capability.

For run-of-the-mill software “tools” that are focused on solving one problem, this often is interpreted as just selling a lot more licenses to a big organization.  “Sure we can scale!” is code for “Sure, I’ll sell you more licenses!”  They can reasonably make this assertion, particularly in client-server or web environments, where they can point to the ability of the database system on which they store data to scale.  This also comes with, usually unstated, the additional constraint that their solution rests on a proprietary database structure.  Such responses, through, are a form of sidestepping the question, nor is it the question being asked.  Thus, it is important for those acquiring the right software to understand the subtleties.

A review of what makes data big in the first place is in order.  The basic definition, which I outlined previously, came from NASA in describing data that could not be held in local memory or local storage.  Hardware capability, however, continues to grow exponentially, so that what is big data today is not big data tomorrow.  But in handling big data, it then becomes incumbent on software publishers to drive performance to allow their customers to take advantage of the knowledge contained in these larger data sets.

The elements that determine the size of data are:

a.  Table size

b.  Row or Record size

c.  Field size

d.  Rows per table

e.  Columns per table

f.  Indexes per table

Note the interrelationships of these elements in determining size.  Thus, recently I was asked how many records are being used on the largest tables being accessed by a piece of software.  That is fine as shorthand, but the other elements add to the size of the data that is being accessed.  Thus, a set of data of say 800K records may be just as “big” as one containing 2M records because of the overall table size of fields, and the numbers of columns and indices, as well as records.  Furthermore, this question didn’t take into account the entire breadth of data across all tables.

Understanding the definition of data size then leads us to understanding the nature of software scaling.  There are two aspects to this.

The first is the software’s ability to presort the data against the database in such as way as to ensure that latency–the delay in performance when the data is loaded–is minimized.  The principles applied here go back to database management practices back in the day when organizations used to hire teams of data scientists to rationalize data that was processed in machine language–especially when it used to be stored in ASCII or, for those who want to really date themselves, EBCDIC, which were incoherent by today’s generation of more human-readable friendly formats.

Quite simply, the basic steps applied has been to identify the syntax, translate it, find its equivalents, and then sort that data into logical categories that leverage database pointers.  What you don’t want the software to do is what used to be done during the earliest days of dealing with data, which was smaller by today’s standards, of serially querying ever data element in order to fetch only what the user is calling.  Furthermore, it also doesn’t make much sense to deal with all data as a Repository of Babel to apply labor-intensive data mining in non-relational databases, especially in cases where the data is well understood and fairly well structured, even if in a proprietary structure.  If we do business in a vertical where industry standards in data apply, as in the use of the UN/CEFACT XML convention, then much of the presorting has been done for us.  In addition, more powerful industry APIs (like OLE DB and ODBC) that utilize middleware (web services, XML, SOAP, MapReduce, etc.) multiply the presorting capabilities of software, providing significant performance improvements in accessing big data.

The other aspect is in the software’s ability to understand limitations in data communications hardware systems.  This is a real problem, because the backbone in corporate communication systems, especially to ensure security, is still largely done over a wire.  The investments in these backbones is usually categorized as a capital investment, and so upgrades to the system are slow.  Furthermore, oftentimes backbone systems are embedded in physical plant building structures.  So any software performance is limited by the resistance and bandwidth of wiring.  Thus, we live in a world where hardware storage and processing is doubling every 12-18 months, and software is designed to better leverage such expansion, but the wires over which data communication depends remains stuck in the past–constrained by the basic physics of CAT 6 or Fiber Optic cabling.

Needless to say, software manufacturers who rely on constant communications with the database will see significantly degraded performance.  Some software publishers who still rely on this model use the “check out” system, treating data like a lending library, where only one user or limited users can access the same data.  This, of course, reduces customer flexibility.  Strategies that are more discrete in handling data are the needed response here until day-to-day software communications can reap the benefits of physical advancements in this category of hardware.  Furthermore, organizations must understand that the big Cloud in the sky is not the answer, since it is constrained by the same physics as the rest of the universe–with greater security risks.

All of this leads me to a discussion I had with a colleague recently.  He opened his iPhone and did a query in iTunes for an album.  In about a second or so his query selected the artist and gave a list of albums–all done without a wire connection.  “Why can’t we have this in our industry?” he asked.  Why indeed?  Well, first off, Apple iTunes has sorted its data to optimize performance with its app, and it is performed within a standard data stream and optimized for the Apple iOS and hardware.  Secondly, the variables of possible queries in iTunes are predefined and tied to a limited and internally well-defined set of data.  Thus, the data and application challenges are not equivalent as found in my friend’s industry vertical.  For example, aside from the challenges of third party data normalization and rationalization, iTunes is not dealing with dynamic time-phased or trending data that requires multiple user updates to reflect changes using predictive analytics which is then served to different classes of users in a highly secure environment.  Finally, and most significantly, Apple spent more on that system than the annual budget of my friend’s organization.  In the end his question was a good one, but in discussing this it was apparent that just saying “give me this” is a form of magical thinking and hand waving.  The devil is in the details, though I am confident that we will eventually get to an equivalent capability.

At the end of the day IT project management strategy must take into account the specific needs of classes of users in making determinations of scaling.  What this means is a segmented approach: thick-client users, compartmentalized local installs with subsets of data, thin-client, and web/mobile or terminal services equivalents.  The practical solution is still an engineered one that breaks the elephant into digestible pieces that lean forward to leveraging advances in hardware, database, and software operating environments.  These are the essential building blocks to data optimization and scaling.