ABSTRACT
HEADNOTEThe practice of building a series of independent data marts and networking them into a distributed data warehouse is rapidly gaining acceptance and promises to quickly become the paradigm for data warehouse construction. This paper examines the current status of the Web-based data warehousing and its perspective.
INTRODUCTION
The concept of data warehouse has been around for decades. Recently, the proliferation of PCs and networking technologies has made the implementation of data warehouse feasible to businesses of all sizes. Data warehouse in simplest terms is a database. But it is more than that. A data warehouse is a computer system capable of storing, retrieving and managing large amounts of data gathered from diverse sources within a company. It is the major player in a decision support system, as user interfaces enable users to access the data stored in the data warehouse on-line and in real-time (13, 15).
The first formal data warehouses were implemented in the late 1990s. Data and computing in some ways has come full circle. In the early 70s data was centralized in mainframe environments but not easily accessible to end-users. In the mid80s with the proliferation of networked environments and personal computers, data and computing became decentralized giving greater access to end-users. Unfortunately, this progression led to inconsistent data across the organization. Data warehousing attempts to achieve the best of both worlds by centralizing and integrating an enterprise's data while still providing easy access to end-users (11, 18).
This paper aims at examining the current status of Webbased Data Warehousing and exploring its future development potentials. Concepts, challenges and limitations of the traditional centralized data warehouses will first be examined. The architecture of the distributed data warehouse will then be discussed together with its challenges to corporate information management. The following section will focus on examining ways of applying the Web and the intranet technology to enhance the concept of distributed data warehousing. Technical limitations and security issues for web-based distributed data warehouse will then be discussed. The paper will conclude with some recommended direction for future development for Web-- based data warehousing.
TRADITIONAL CENTRALIZED DATA WAREHOUSE
Historically, legacy systems were built within companies to satisfy data processing needs for various departments such as accounting, human resources, and sales. These systems were typically developed independently of one another, with little concern for coordination of data. For example, if human resources needed employee records, they created and stored employee records. Similarly, if accounting needed employee records they created and stored employee records. Duplication of effort is commonplace in legacy systems. In addition, when accumulating information from different systems, such as for a management report request, is needed coordination is difficult. A request went to the Information Systems department and had to go through appropriate prioritizing and processing. Timelines were less than optimal (1, 14, 16).
The Internet has revolutionized information delivery to the masses, and demand for real-time delivery of information to decision-makers within organizations has grown at the same pace. As a matter of fact, given the customer-driven e-- marketplaces, both B2C and B2B, timely delivery of information for decision making within companies is critical to their profitability and, possibly, to their survival.
Legacy systems existing to this day still fulfill the requirements for which they were developed. They were built and maintained independently, with each department responsible for its own system and access to the system's stored data. Also, the programming languages and database platforms utilized for each system were not necessarily the same, and the amount of data stored on any of the systems might be very large (6).
To solve those problems a Top-down centralized of data warehousing as illustrated in Figure 1 is used. The interaction associated with the architecture begins with an Extraction, Transformation, Migration, and Loading (ETML) process working from legacy and/or external data sources. Extraction transformation and migration process data from these sources and output it to a centralized Data Staging Area. Following this, data and metadata are loaded into the Enterprise Data Warehouse and the centralized metadata repository. Once these are constituted, Data Marts are created from summarized data warehouse data and metadata. The data warehouse has an atomic data layer and also contains detailed historical data. In contrast, the data marts contain lightly and highly summarized data and also metadata (9). Using data warehouses has its limitations:
* It cannot create more data
* There can be a learning curve in utilizing a data warehouse
* It takes too long to build
* Business processes can become complicated by warehouse systems
* The cost of capturing and maintaining data can be too much of a cost
* Gathering of data for the sake of data
* A short development life span may require an inelegant built system
* It may require a great deal of maintenance
IMAGE CHART 13FIGURE 1
* Access to month old data may not be worth the cost of a data warehouse
DISTRIBUTED DATA WAREHOUSE
The second data warehousing systems architecture, the "Bottom-up distributed" architecture became popular because the first architecture took too long to implement, was often politically unacceptable, and was too expensive. Figure 2 shows the Bottom-up distributed architecture, which provides a rationalization for department heads and others with budgets to use the new technology of data warehousing to produce application relevant to their organizational roles.
The central idea in Bottom-up architecture is to construct the data warehouse incrementally over time from independently developed data marts. The process begins with ETML for one or more data marts. No common data staging area is required. There is generally a separate area for each data mart. There may not even be standardization on the ETML tool (10, 16).
In centralized architecture, data marts use lightly and highly summarized data. But in distributed architecture, they also use atomic and detailed, including historical data. Since the data marts are to be the building blocks of the data warehouse, they must contain all of the data that will appear in the projected data warehouse ( 16).
Distributed differs from centralized architecture in that it provides no common metadata components across data marts. This is the most important difference between the two architectures from the standpoint of integration. hi fact, the evolution of architecture beyond the basic centralized and distributed patterns is largely the evolution of increasingly sophisticated metadata and meta-object structures in an effort to achieve integration (17, 23).
While distributed architecture was quite successful in meeting initial expectations in building data marts, it very soon was widely perceived as unacceptable in the long run for the very reason that it failed to provide a common metadata component. Without shared metadata, it is difficult to construct the data warehouse from data marts. So, the distributed architecture, in its pure form fails to fulfill its promise of an incremental approach to the data warehouse. This failure also leads to new "stovepipes" or "legamarts" over time (9).
Though the "legamart" critique of the pure distributed architecture was decisive, the idea of an incremental approach to data warehouse construction through application specific data marts is very valuable. Therefore, distributed architecture has been modified to suit the need to make an incremental, relatively inexpensive, value-driven approach to data warehousing work.
The modified version supports an incremental approach to the data warehouse through data mart development by creating a shared framework for development. The framework includes enterprise subject areas, common dimensions, metrics, business rules, and data sources, all represented in a logically common (but not necessarily physically centralized) Global Metadata Repository (GMR) (23),
Concurrent access is one of the most important of the DBMS basic services, i.e. the ability to support multi-user access to the data, or several users simultaneously reading and writing to the database. Distributed database management software and multiprocessor hardware adds new levels of complexity to the issue. The problems of concurrency control are complicated in a distributed warehouse environment by the fact that there are many "local" users accessing data in their departmental data mart at the same time as others are processing "global" queries against the warehouse as a whole (8, 23).
To ensure concurrency control, the DBMS must not only decide which copy of the data will be accessed by a query, it must also find a way to update the other copies of that data if a change is made. Most data warehouses are implemented using multiprocessor hardware, either SMP (symmetric multiprocessing or shared memory processing) or MPP (massively parallel processing or shared nothing system.).
IMAGE CHART 20FIGURE 2
There are many ways concurrently executing transactions can interfere with one another and compromise the integrity and consistency of the database. Three major examples are:
I. Lost updates 2. Violation of integrity constraints
3. Inconsistent retrieval
Most concurrency control concentrates on update because transactional interference can corrupt the data.
In a heterogeneous or homogeneous distributed system, the DBMSs' schedulers and transaction managers themselves are distributed across the nodes in the network. At each node, there must also be a "global" scheduler and transaction manager to act as coordinator for those transactions initiated locally that need to process at more than one site. The alternative would be to appoint one site as coordinator for all global transactions. The three basic methods of concurrency control are locking, time stamping and optimistic methods (19).
IMPACT OF THE INTERNET AND INTRANET
The WWW can hold the key to more affordable warehousing for those willing to look and learn. It is the model of a successful distributed environment, and the use of web browsers has taught everyone a better way to deploy software and access data across the organization. The Internet, and especially intranets, can introduce a new level of collaborative analysis and information sharing among decision makers when it replaces older, more costly, conventional networks.
The web can also change the way data is accessed and analyzed in support of mission-critical decisions. Data published on the Web is easily accessible and once obtained ideally suited for query, reporting and analysis by non-technical users. There are three basic approaches to distribute and maintain the software that runs on desktop computers, Fat Clients, Thin Clients, and Multi-tiered Approach. The first generation of Internet decision support aids must meet certain criteria. These are (a) support interactive analysis, (b) be able to keep pace with the infrastructure, and (c) ensure security and give users flexibility.
Most intranets currently manage unstructured content such as text pages, images, and even audio files as static HTML documents. A data warehouse stores structured content and raw alphanumeric data, but with the right tools and correct architecture, a data warehouse can be accessed through the corporate intranet. The web-enabled warehouse forms the foundation of a comprehensive enterprise information infrastructure that can confer three advantages to its corporate owner (2, 7):
* Improved cost/benefit ratio of an intranet architecture
* Enhanced decision support due to information integration
* Improved user collaboration on projects and key business decisions
Without relational technologies, data warehousing would not be possible. But it is the next generation in computing, the Internet/intranet, that will make data warehousing cost-effective and available to all those who can benefit from access to its contents. An intranet can deliver vital information and reports to each desktop in an organization at the click of a mouse, regardless of the user's location or the location of the data. Even remote users, traveling or just temporarily away from their desks, can access data through dial-up connections or the world wide web, provided correct security procedures are observed (16).
An intranet disseminates a uniform view of information to all users, producing high degree of coherence for the entire organization. The communications, report formats, and interfaces are consistent, making the information they contain easy to read and understand. Another important advantage of an intranet is that it provides a flexible and scalable nonproprietary solution for a distributed data warehouse implementation. It enables the integration of a diverse computing environment into a cohesive information network.
An intranet makes data stored at any node equally available to all authorized users regardless of their location in the company and can be easily extended to serve remote corporate locations, and business parts through a wide area network, an extranet, or even virtual private network. External users can access data and drill down or print reports through proxy servers located outside the corporate firewall. Even customers have limited access through the WWW visiting the company's home page for information about products and services (2, 10).
In summary, an intranet is flexible, scalable, cost-effective and easy to implement, maintain, and use - the best architecture to maximize the benefits of a corporate data warehouse. The challenge in putting a data warehouse on an intranet is in property enabling SQL access to the warehouse from HTML browsers. For this implementation to succeed, at least three application layers are needed, Analytic layer, File management layer and Security layer.
As for the future of web-enabled data warehousing, the answer will be more complexity and more third party vendor solutions, at least until the industry begins to normalize itself and a few architectures and enabling technologies become standards, e.g. CGI versus its proposed replacement, two-tier versus n-tiered architecture, Java versus ActiveX etc.
TECHNOCAL LIMITATIONS OF WEB TECHNOLOGY
Searching the web is difficult because different sites may take different views of what information should be represented as data and what information as metadata, a problem known as schematic heterogeneity management. Another common problem that arises is that data, which exists in one representation in some data source, may be needed in a different representation for some other purposes, a problem known as data heterogeneity management (3). Both issues will be discussed here.
Managing Schematic Heterogeneity
Web searching is sometimes difficult due to the fact that different sites may take different views of what information should be represented as data and what information as metadata. Consider a simple query requesting Web documents authored by Atwood. A keyword search for documents containing the word Atwood as data will retrieve many irrelevant documents. A more accurate search must understand how authorship is represented in different documents and reconcile different representations.
One document may have this information in the data itself (perhaps as a phrase "Atwood is the author of this..."), another within metadata (perhaps in an HTML tag author). This form of data heterogeneity has been called schematic heterogeneity since it occurs when data under one schema (or structure) corresponds to schema labels (such as HTML tags or attribute names) in another, Schematic heterogeneity is not unique to the Web; it can arise in structured databases such as relational or objectoriented databases.
Some researchers have developed techniques such as the popular Cuypers system as illustrated in Figure 3 for reconciling and managing schematic heterogeneity. This system addresses the above-mentioned problems and provides an effective mean of transforming multimedia information to target users in a very flexible manner. Techniques like the Cuypers system can greatly enhance the sophistication and effectiveness of search tools over heterogeneous data (22).
Managing Data Heterogeneity
The world today is full of information sources, all with their own ways of representing data. One common problem that arises is that data, which exists in one representation in some data source, is needed in a different representation for some other purpose. As a simple example, the owner of a data source may want to publish his data using a specific XML DTD, though it is stored in some different (legacy) format. As another example, data warehouses bring data from one or more sources together, in a new form that allows for efficient decision support queries. Today, such situations are for the most part dealt with manually, by an expert user who has knowledge of both the source and target representations. Converting from one data representation to another is a time-consuming and laborintensive project, with few tools available to ease the task (8).
Some researchers have tried to produce a tool for creating mappings between two data representations semi-automatically (i.e., with user input) and work with an associated "meta query engine," one that can query data in the source representation, and, if any exists, in the target representation as well. They will take as input a target schema and a source schema, and generate as output view definitions (queries). These queries, executed by the meta query engine, will take data from the source and transform it to match the target schema, cleansing and transforming it as needed to be compatible with existing data visible through that schema (5, 7).
Processing Web data poses new challenges for database and information retrieval technologies. Among these challenges are: the amounts of Web data; its relevance ranging from useless to highly useful sources; its heterogeneity ranging from plain text to structured documents, to sounds and images; the way data is generated, from static HTML pages to database queries, and its unpredictability. Despite these challenges, today we can search the entire Web, mine significant portions of it, integrate data from multiple Web data sources, and query Web data. New technologies, such as those based on the XML standard, are aimed at enterprise-wide Web applications and data integration (19).
In many ways, the WWW is not similar to a database. For example, there is no uniform structure, no integrity constraints, no transactions, and no standard query language or data model. And yet, the powerful abstractions developed in the database community may prove to be key in taming the web's complexity and providing valuable services. Of particular importance is the view of a large web site as being not just a database, but an information system built around one or more database with an accompanying complex navigation structure. In that view, a web site has many similarities to non-web information systems.
Several trends will have significant impact on the use of database technology for web applications. The first is XML. The considerable momentum behind XML and related metadata initiatives can only help the applicability of database concepts to the web by providing the much-needed structure in a widely accepted format. While the availability of data in XML format will reduce the need to focus on wrappers converting human readable data to machine readable data, the challenges of semantic integration of data from Web sources still remains (2, 21).
Building on the experience in developing methods for manipulating semi-structured data, our community is in a unique position to develop tools for manipulating data in XML format. In fact, some of the concepts developed are already being adapted to the XML context. Other projects under way in the database community in the area of metadata architecture and languages are likely to take advantage of and merge with the XML framework.
IMAGE CHART 36FIGURE 3
The second trend that will affect the applicability of database techniques for querying the web is the growth of the so-called hidden web. The hidden web refers to the web pages that are generated by programs given user inputs, and are therefore not accessible to web crawlers for indexing. Some researchers claimed that close to 80% of the web is already in the hidden web. If our tools are to be able to benefit from data in the hidden web, we must develop techniques for identifying sites that generate web pages, classify them and automatically create query interfaces to them (17).
There is no shortage in possible directions for future research in this area. In the past, the bulk of the work has focused on the logical level, developing appropriate data models, query languages and methods for describing different aspects of Web resources. In contrast, problems of query optimization and query execution have received relatively little attention in the database community, and pose some of the more
Designing such a web site requires extending information systems design methodologies. Using these principles to build web sites will also impact the way we query the web and the way we integrate data from multiple web sources. important challenges for future work. Some of the important directions in which to enrich our data models and query languages include the incorporation of various forms of meta data about sources (e.g. probabilistic information) and the principled combination of querying structured and unstructured data sources on the WWW.
SECURITY ISSUES FOR WEB-BASED DATA WAREHOUSE
The Internet operates much like a postal service, only at lightning speed, sorting and routing mail through "post offices" in seconds. As good as the Web is at message handling, it is inherently terrible at security because of the way it is designed and implemented. The architecture of the Internet is basically an open system. There is no built-in security on the Web to keep other people from reading, copying and even altering the mail as information is transmitted.
Major threats facing companies adopting Web-based distributed data warehousing today can be divided roughly into three categories (17):
1. Introduction of malicious programs into the system, such as Trojan Horses, Logic Bombs, Worms and Virus.
2. Snooping which involves the theft or compromise of data while in transit between endpoints of a network.
3. Physical theft of data, such as client lists, corporate plans and product designs.
The multitude of security tools are available for use in an Internet/Intranet environment, namely, firewalls, encryption algorithms, public key infrastructure, internet protocol security, virtual private networks. Communications can be encrypted, authenticated, and validated. Client computers and browsers provide a variety of security services. Host machines can be logged, validated and stripped down to bare bones. All these technological services come with a wide variety of choices and customization options. However, the best technological tools in the world are useless in the face of a situation where the organization has no security policy (4).
Technological tools alone cannot provide adequate security for a distributed data warehouse. Before implementing any tools, the following fundamental issues of network security must be addressed:
1. The type of data housed at each location must be identified and assigned a security level.
2. The possible threats to the network should be identified and categorized as to severity.
3. A security policy must be drawn up and agreed upon by all concerned.
4. Everyone, from the CEO to the lowest rank user, must be made aware of the company's security policies.
5. Someone must be in charge of security, and the position of head of security should be treated seriously.
Most medium-size and large corporations already have a well-established security department, but these employees alone are not enough to handle security concerns for a distributed data warehouse. Their efforts will need to be augmented with support from at least two other areas:
1. The telecommunications department, especially those concerned with the configuration of the distributed network, must play a key role in securing the data warehouse.
The warehouse management team should also contribute to the overall security effort. In fact, this management team should take the lead in formulating and implementing all aspects of warehouse security.
Overall, the security policy of the organization must include contingency plans to deal with both successful and unsuccessful attacks on their Web servers. Administrators should regularly monitor system logs, looking for traces of unusual activity and investigate the cause. Nevertheless, on the World Wide Web, it is not a question of whether a site will be attacked, but rather a question of when it will happen. These emergency plans should include fallback procedures to restore the environment to its "prehacked" condition and a blueprint to systematically investigate the circumstances leading up to the attack (4, 17).
It should be emphasized that the security of a data warehouse is dependent primarily on the quality of the company's security policy and the way in which it imposes that policy on itself. If the security policy is inadequate, not uniformly enforced, or has weak points in it, even the best technology cannot cure the problem. It is only with a wellthought-out policy and strict reinforcement that any organization can hope to protect itself from damaging security breaches.
OTHER IMPLEMENTATION ISSUES
In today's business environment, it is imperative that a company's data warehouse stores huge amounts of both internal and external data. Although Web technique can help eliminate the barriers as created by the difference in hardware, operating systems and application development tools, a few special considerations should be taken into account when the Webbased data-warehousing infrastructure is to be implemented.
1. As immense amount of historical data can be accumulated, it is necessary to clearly identify the sources of those data and the individuals responsible for collecting those data. Periodical review of those data to be maintained online must be scheduled to ensure the efficiency of users' datamining activities can be preserved.
2. Data collected for Web-based warehouse may involve different languages. It may be needed to create an internal mapping mechanism to ensure the correctness of translating data from one language to another.
3. Web-based data warehousing may facilitate the integration of data collected from various locations. However, the success of a Web-based data warehouse project requires the cooperation of all users at all locations. Responsibilities of each site users thus need to be clearly defined and observed in order to maintain quality of the information.
4. Consistent user-system interfaces must be developed. These interfaces would link user accessible automated directory to information stored in the warehouse to improve users' awareness of the data, communication and analysis requirements of the data warehouse.
5. Effects of local political environment on ways of storing into and retrieving data from the data warehouse must be explored. Some countries impose strictly Internet surveillance system, which may limit the types of data that the users can access their accustomed browser software. Appropriate procedures need to be developed to allow users to view needed information on the Web to avoid offending the local political regulations and culture.
6. Web-based data warehousing provides the mechanism to offer business intelligence (BI) and knowledge management (KM) capabilities to users. This can significantly improve the efficiency of accessing decision support information using conventional browsers. The architecture of a data warehouse thus may require the selection of appropriate BI and KM tools.
FUTURE DEVELOPMENT AND OUTLOOK
Historically, data warehouses have been a technical initiative designed to help control corporate costs and optimize operations. Almost all data warehouse initiatives are centrally focused and firms have benefited greatly from their use. Yet experience and statistics have shown that centralized data warehouses are very difficult to build - nearly 80% of these projects either fail outright or are redirected short of completion (12, 18, 24).
The convergence in the late 1990s of numerous technological developments - the Internet/intranet, the browser, and high performance, high speed servers - make possible a more cost-effective, easy-to-do appropriate to data warehousing. The practice of building a series of independent data marts and networking them into a distributed data warehouse is rapidly gaining acceptance and promises to quickly become the paradigm for data warehouse construction.
The distributed data warehouse can be started small (a single data mart), can be constructed quickly (three months per data mart), is easy to manage (short-term, achievable goals), provides quick return on investment (three months from initial investment to finished mart), is almost infinitely scalable (just add another server), and is usually less than one-third the cost of a centralized warehouse.
The use of a web browser, coupled with web-enabled decision-support and analysis software, provides easy access to server-based distributed warehouses. This combination of hardware and software offers many advantages over traditional warehouse interface tools (7, 11, 21):
1. It eliminates the need for many dispersed applications on user desktops that are nearly impossible to maintain and support.
2. Unlike many custom-built interfaces, Web-browsers are cheap and easy to install.
3. Web browsers are easy to use, and the training time is significantly reduced.
4. The cost associated with expanding the corporate network to include all warehouse users is greatly reduced when using the Internet as the vehicle for access.
5. The problems posed by multiple operating systems (or different versions of the same system) on the desktop are eliminated.
Nevertheless, despite the above advantages, attention should be paid to avoid the following pitfalls to ensure the continuing success of the data warehouse:
1. Data warehouse driven by technology instead of needs Successful implementation of a data warehouse requires,
among other things, a significant investment of time and energy on the part of many of those who will be its users to ensure the end results meet their needs. Thus, integration with the business strategy should be emphasized more than the technological components.
2. Treating the data warehouse as a destination instead of a journey.
It is clear that one constant in today's business environment is change. Successful businesses adapt to change in order to remain competitive, and as they change, the systems that record business activities change to capture the most relevant data. The data itself changes due to new regulatory requirements, evolving market conditions and many other factors. User needs change as they gain knowledge about their business and as they think of more creative ways to use the data warehouse. As a result, requirements for the data warehouse evolve along with the business and new subject areas; capabilities and dimensions should be added incrementally. A data warehouse must be dynamic, flexible and extensible in order to meet the changing requirements of users and the business as a whole (1, 13).
CONCLUDING REMARKS
A few promising directions in the development of data warehouse are provided to conclude this report.
Mission-Critical Data Warehouses
Data warehouses may have been initially justified as decision support systems, but it is found that the highest payoff comes when data warehouses are integrated with operational systems to improve performance of staff, systems and whole enterprises. The following three technologies will be important to achieve such result:
* Replication, especially heterogeneous replication, to ensure problem-free transfer of data and confidence that everyone is looking at the same, up-to-date information.
* System management, with special emphasis on security, to ensure that data warehouse systems can be trusted.
* Intelligent file transfer to radically increase the speed of data warehouse and to preformat the data to fit the target environment.
2. Proactive Data Warehouses
Most data warehouses sat around waiting for analysts and other users to ask questions. Next-generation systems will not wait. They will be proactive, asking questions themselves, all the time, when they find an answer that needs action, they will tell the appropriate people. Such "detect and alert" systems can transform data warehouses from useful to critical and may have the following capabilities:
* Database event alerters to monitor databases automatically looking for changes that exceed critical thresholds.
* Integration of data warehouses with groupware and email technologies for intelligent distribution of alerts.
3. Object-Oriented Data Warehouses
One of the clearest cases for object-oriented technologies is in the exploitation of data warehouses. Hundreds of millions of dollars are being invested in creating data warehouse infrastructures. Return on that investment will come only from viable business-critical applications, and those applications will come only from a cascading process of application creation, enhancement and recreation. Data warehouses are also being enhanced to support complex data such as video, audio, images, diagrams and text. Support of these non-traditional data types in the same warehouse as traditional data types, along with the ability to quickly develop and enhance the infrastructure is the perfect application of object technology.
4. Dynamic Query Optimization
A significant downside to data warehouses are queries that either never return or are cancelled because of too much resource utilization. Dynamic query optimization relieves much of that burden. Using statistics on the distribution of data values in the database, the intelligent query optimizer can choose the most resolving and narrowing searches for higher reliability and faster access. Server-defined resource control capabilities can also prevent runaway queries before they consume valuable CPU or disk resources. Information technology will continue its astonishing pace of advancement, which will help bring Web-based data warehousing to decision-makers of all levels in virtually all organizations. However, data warehouse is a very complex system. It has to be carefully designed and managed to make it work. And user training is a must before meaningful information can be retrieved from this modern information infrastructure.
REFERENCEREFERENCES
REFERENCE1. America's Network "The Transforming Power of Data Warehousing," March 15, 1997.
REFERENCE2. America's Network. "Trends and Development in Data Warehousing," May 15, 1999.
3. Appleton, E. "The Right Server for Your Data Warehouse," Datamation, 41:5, March 15, 1995, pp. 56-58.
4. Atkins, M. "Eliminating the Risks of Exposing Unreliable Data in Your Data Warehouse to Customers, Suppliers, and Partners," The Data Warehouse Institute. Vol. 11, 2000
REFERENCE5. Berson, A. and S. Smith. Data Warehousing, Data Mining and OLAP. New York: McGraw-Hill, 1997.
6. Bustamente, G. and K. Sorenson. "Decision Support at Lands' End - An Evolution," IBM Systems Journal, 33:2, 1994, pp. 228-238.
7. Gannon, T. and D. Bragger. "Data Warehousing with Intelligent Agents," Intelligent Enterprise Magazine, 1:1, October 1998.
8. Haas, L., R. Miller, B. Niswonger, M. Roth, P. Schwarz, and E. Wimmers. "Transforming Heterogeneous Data with Database Middleware: Beyond Integration," IEEE Data Engineering Bulletin, 22(1):31-36, 1999.
REFERENCE9. Hackney, D. Understanding and Implementing Successful Data Marts. Reading, MA: Addison-Wesley, 1997.
10. Haley, B., H. Watson, and D. Goodhue. "The Benefits of Data Warehousing at Whirlpool," University of Virginia, USA, 1999.
REFERENCE11. Hall, C. "Data Warehousing for Business Intelligence," Cutter Consortium, 2000.
12. Hernandez, M., R. Miller, L. Haas, L. Yan, C. Ho, and X.C. Tian. "A Semi-Automatic Tool for Schema Mapping, System Demonstration," SIGMOD, May 2001.
13. Inmon, W., J. Welch, and K. Glassey. Managing the Data Warehouse. New York: John Wiley & Sons, 1997.
14. Koslow, P. and W. Inmon. "Commandeering Mainframe Database for Data Warehouse Use," Application Development Trends, 1:8, August 1994, pp. 57-61, 64.
15. Lee, S.J. and K. Siau. "A Review of Data Mining Techniques," Industrial Management and Data System, 101:1, 2001, pp. 41-46.
16. Longman, C. "The Enterprise Data Warehouse: To Centralize or De-centralize, That is the Question - Or Is It?" Data Warehouse Institute, 11, 2000.
17. Martin, J. "Cybercorp: Trends in Distributed Data Warehousing," DM Review, January 1999.
REFERENCE18. Miller, R., M. Haas, and M. Hernandez. "Schema Mapping as Query Discovery," Proceedings of the Twenty-sixth International Conference on Very Large Data Bases (VLDB), Cairo, Egypt, September 2000.
19. Miller, R., M. Hernandez, L. Haas, L. Yan, C. Ho, R. Fagin, and L. Popa. "The Clio Project: Managing Heterogeneity," SIGN[OD Record, 30:1, March 2001, Postscript.
REFERENCE20. Orr, K. "Data Warehousing Technology," The Ken Orr Institute, 2000.
21. Radding, A. "Support Decision Makers with a Data Warehouse," Datamation, 41:5, March 15, 1995, pp. 5356.
22. Van Ossenbrugger, J.J. Geurts, F. Corenlissen, L. Hardman, and L. Rutledge. "Towards Second and Third Generation Web-based Multimedia," Proceedings of the le World Wide Web Conference, May 1-5, 2001, Hong Kong, pp. 479488.
23. White, C. "Managing Distributed Data Warehouse Meta Data," DM Review, February 1999.
24. Yan, L., R. Miller, L. Haas, and R. Fagin. "Data-Driven Understanding and Refinement of Schema Mappings," SIGMOD, May 2001.
AUTHOR_AFFILIATIONCHANG-TSEH HSIEH
University of Southern Mississippi
Hattiesburg, Mississippi 39406
AUTHOR_AFFILIATIONBINSHAN LIN
Louisiana State University in Shreveport Shreveport, Louisiana 71115