jim.shamlin.com

2: Components of BI Architecture

"Architecture" refers to information systems. The author indicates a need for users to have a high-level understanding in order to participate in the discussions about building and leveraging systems, but are often confounded because of the level of technical detail. (EN: which may be intentional - one of the disempowerment tactics of IT is to use technobabble to disorient their business "partners").

The author provides examples of the various systems that are already in place (manufacturing systems, sales and marketing systems, supply chain systems, the accounting system, etc.), asserting that each of these creates and stores data that can be used as BI. These may be custom-developed systems, but more often they are purchased packages from a vendor that specializes in a specific task: a logistics system from Lawson, an HR management system from PeopleSoft, an accounting system from SAP, etc.

In addition to business systems, data may also come from other sources, such as the systems of vendors and customers, data from Web sites that pertains to customer behavior before and after a formal transaction, and data from research firms.

Ideally, there would be a consistent information flow across all systems, but this is seldom the case: systems designed to perform a specific function manage data related to that function only, data is stored in sources that are specific to that system, and there is no correspondence of data between systems. For example, a customer ID code may be stored in multiple databases (the order system, the accounting system, the logistics system, the marketing system) and they ID for a single customer is likely to be a different code in each of these systems.

(EN: The author does not mention a few other significant problems of separate systems: First, that they are entirely introspective and focus exclusively on a specific task, such that a logistics system focuses on logistics tasks, and fails to collect data, and may even dispose of data, that may be useful to CRM, because it is considered irrelevant to the tasks that system is intended to perform. Second, they are often opaque "black box" systems that are not designed to share data and are often designed to suppress data and make it unavailable, often for the ostensible reason of security. Third, that even in the modern age, where vendors acknowledge the demand for open systems, solutions are often designed to be inhospitable, even hostile, to other systems as a method of handcuffing the customer to the vendor's products - a brand-x e-commerce solution will communicate with a brand-x logistics system, but not a brand-y system.)

Data Transfer: From Operational to Data Warehouse

A key task for BI is combining the data from multiple operational systems into a single data warehouse, a transfer process that is often abbreviated as ETL (extract, transform, and load), with the "transform" process being the most time-consuming, as it must rectify inconsistent codes (a different customer ID in different systems), handle incomplete data, and translate abstract codes to meaningful terms.

Transformation may also involve calculation and analysis as a method of cleaning the data or making it more meaningful. For example, the original system may record the date on which an item was ordered, the date on which items were available in inventory, and the date on which the order was shipped, but the shipping manager may merely need to know the number of days it takes for his department to ship an order, taking into account inventory shortages.

There is also metadata generated by the ETL process: the date and times of extraction and loading tasks, the source from which the data originated, indication of the logic for matching and calculations, and more. This may not be all that interesting to the business users, but may be critical to the design and monitoring of the data warehouse system.

While there exist packaged solutions for ETL, many companies rely on custom-coded ETL due to the incomplete or unsuitable nature of the existing solutions (some specific vendor packages mentioned: Informatica Power Center, IBM Websphere Data State, Oracle Data Integrator, Ab Initio, and Microsoft Integration Services).

In an ideal situation, all data would be extracted and warehoused. Even if a datum is not presently used, it may be needed at a future date, an the absence of data creates a blind spot. However, for various reasons/excuses (the storage cost, the time available, impact of performance, etc.) it is more common to collect only data that is strictly needed for current BI analyses.

A side note: Master Data Management (MDM) has arisen as a discipline that specializes in the quality (accuracy and timeliness) of the "master" data - such as a single code that correlates the various customer ID codes in operational systems to a single code in the warehouse. While this seems an unglamorous maintenance function, MDM is often a pivotal role in uncovering and discovering the relationships between the data, which is of great value to BI.

The Data Warehouse

The data warehouse is a central collection of data that has been united from the various operational systems. It may be called by other names, which may be in part due to the negative baggage associated with the term: typically, a data warehouse is a huge and expensive system and is perceived as having little business value (the data it contains is redundant to the systems that actually do the work), but it is critical to BI.

The author enumerates the reasons that a data warehouse is necessary, which largely comes down to the limitations of operational systems:

Simply stated: an operational system simply is not designed for BI, outside of a limited analysis of the data they contains, in the context of the functions they are designed to perform, which necessitates a data warehouse to aggregate information from across the enterprise.

Neither is it a wise approach to attempt to deploy BI tools that work directly with the operational systems. While this seems a faster and cheaper approach to BI, it introduces overhead "costs" in terms of data storage and processing requirements that can degrade the performance of the operational systems.

An aside: some companies have implemented "data marts," each of which is a subset of a data warehouse that serves the needs of a particular business unit or function. For example, manufacturing and marketing may have separate data marts, each tailored to the needs of departmental management. The author asserts that, according to her industry research, this approach has met with limited success and, over time, has a higher implementation cost than a central data warehouse for the entire enterprise.

Data Warehouse Tables

The data warehouse consists of a large number of individual tables within a relational database that are cross-referenced. These tables are categorized as "fact tables" (which may contain millions of detailed rows of data) and "dimension tables" (also called look-up[ or reference tables, which contain aggregated information that has undergone some basic analyses).

The data warehouse is also a read-only environment for users. The data-entry tasks are generally automated ETL functions that import data from other systems, and the primary function of the data warehouse is to make that data available to those who need to analyze it.

Data in the warehouse is often organized hierarchically, whereas data in transaction systems is optimized for performance, and qualitative dimensions (such as product category, customer type, etc.) are irrelevant and unnecessary for their operations.

The Data Warehouse Technology Platform

The components of the data warehouse are:

There is reference to data warehouse "appliances," which are systems that combine some or all of the tasks above into a single server. These are popular among small businesses and those that operate data marts, as they propose to provide a complete, optimized solution at a lower cost-of-ownership, but they bring with them the limitations of operational systems (myopia and incompatibility) and are notoriously difficult to extend beyond the scope and capacity for which they are designed. Similarly, there are vendors that offer "Software as Service" solutions, which have the same limitations to some degree.

(EN: also worth noting, the use of a pre-packaged solution, whether an appliance or service, provides cost savings at the sacrifice of competitive advantage. Where multiple companies use the same approach to BI, none of them has a competitive advantage over the others, and "best practices" discovered at one firm will be made available to their competitors, particularly when a service model is employed.)