Repository logo

Global aggregate view selection in a peer warehousing system

Loading...
Thumbnail ImageThumbnail Image

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

University of Ottawa (Canada)

Abstract

Selecting views to materialize is one of the most important decisions to make when designing a data warehouse. In a peer data warehouse design, this problem is more difficult than in a centralized data warehouse design since the costs of global communication, data transfer, and data transformation between peers need to be additionally considered for getting the final integrated global query answers. The objective of our work is to select optimal sets of materialized aggregate views on different peers in a peer data warehousing system such that the total cost of answering the global queries posted on the given peer and maintaining the materialized views is minimized. In this thesis, we develop a theoretical framework for analyzing and solving the P2P global materialized view (GMV) selection problem. We extend the concepts of Expression AND-DAG, query aggregation lattice, and cost model defined on centralized data warehouses to the peer data warehousing semantics. In our problem scope, P2P Expression AND-DAG and P2P query aggregation lattice are constructed dynamically. In our cost model, we take the data transfer costs between peers and global materialized view maintenance costs into consideration. Then we extend an existing centralized view selection greedy algorithm to solve our P2P view selection problem. We assume that peer data warehouse dimensions are consistent throughout the whole system. We also assume that there is only one instance of the P2P GMV selection algorithm executing in the P2P system. Finally, we simulate the P2P view selection algorithm. The simulation experiment results show that the query frequencies and the update frequencies of materialized views, as well as the data transfer rate have critical impacts on the final view selection results. Moreover, the longest path of the P2P network, the number of granularity levels in the global dimension lattice, and the number of data warehouse dimensions affect the P2P view selection algorithm processing time and the minimum combination cost of answering global queries and maintaining the materialized views.

Description

Keywords

Citation

Source: Masters Abstracts International, Volume: 47-06, page: 3651.

Related Materials

Alternate Version