Liknande böcker
Effective and Efficient Summarization of Two-Dimensional Point Data : Approaches for Resource Description and Selection in Spatial Application Scenarios
Bok av Stefan Kufer
Space is everywhere, and so is spatial data. The digital revolution has led to an enormous pool of available spatial data, and the amount of newly generated spatial data is increasing day by day. Often, this spatial data is two-dimensional point data that in addition is associated with data objects such as media objects (e.g. pictures or texts). As a consequence of the huge amount of data objects which is generated and then has to be maintained, there is a need for effective and efficient search systems which are capable of addressing the spatial properties of the data objects. In this context, different search scenarios exist: The data objects might be maintained in a distributed system (i.e. on a set of different machines) or in a centralized system (i.e. on a single machine) which, in each case, leads to varying requirements for appropriate search systems. However, in both scenarios, the concept of resource description and selection is an applicable paradigm for conducting similarity searches with regard to the spatial properties of the data objects. Hereby, when focusing on the spatial aspects, a resource is an abstract entity that administers spatial point data. For describing the spatial footprint of a resource, its spatial data point set can be 'summarized' by means of geometrically delineated areas which cover this data point set. These resource descriptions are then usable for a targeted selection of the resources which administer the relevant data objects based on spatial properties.This thesis is concerned with the effective and efficient summarization of two-dimensional spatial point data as a means for resource description and selection in spatial application scenarios. The term 'effective' refers to a spatially very accurate geometric delineation of the data point set to describe whereas the term 'efficient' relates to its storage-space-efficient representation. Two search-based spatial application scenarios are assessed in this thesis, and in both, the search task can be modelled by adhering to the concept of resource description and selection.The first scenario is a distributed application scenario in which spatial point data is maintained by a set of independent resources such as peers in a peer-to-peer network. Given a concrete query, the resource descriptions shall enable the targeted selection of the peers administering the relevant data points while ignoring 'irrelevant' resources. The distributed application scenario is the main part of this thesis, and our summarization approaches are specifically developed for this purpose. Hereby, a variety of concepts is considered which include the compression of the data points, the use of arbitrarily complex bounding volumes to delineate them, the representation of complex objects (such as complex bounding volumes) with a set of simple objects, and a division of the data point set to describe into groups and the subsequent concise delineation of each group. Overall, 14 summarization approaches are presented in this thesis which can be categorized into data partitioning approaches, space partitioning approaches, and hybrid approaches. Following the specification of suitable resource selection schemes which are based on the various summaries, an extensive evaluation is conducted by means of assessing the approaches' performances for k nearest neighbor (kNN) queries. The evaluation considers different data collections and varying environmental conditions in order to assess the robustness of the approaches.In the second application scenario, a utilization of our summarization approaches in a multidimensional data structure is assessed---which constitutes a centralized application scenario. More specifically, the two summarization approaches of the distributed application scenario which are most suitable for this purpose are integrated into an R-tree which administers sets of two-dimensional point data. The R-tree is a tree-based, centralized multidimensional data