# Matching and Retrieval of 3D Deformable Models (三维立体可变形模型的比对与搜寻)

CITY UNIVERSITY OF HONG KONG 香港城市大學

Matching and Retrieval of 3D Deformable Models (三維立體可變形模型的比對與搜尋)

Submitted to Department of Computer Science 電腦科學系 in Partial Fulfillment of the Requirements for the Degree of Master of Philosophy 哲學碩士學位

by

Tam Kwok Leung 譚國亮

November 2004 二零零四年十一月

i

Abstract

Due to the popularity of computer graphics, 3D games and CG movies, a large number of 3D models have been created. Many of them are freely available on the Internet and can be found in many personal homepages, repositories and even libraries. Though the number of 3D models keeps on increasing, 3D models are time and effort consuming to build. The group of deformable models (or so called animating models or articulated models) is especially a good example. These models are usually high quality meshes that require weeks of work to construct. As such, it would be advantageous to build models by reusing or adapting from existing ones. To facilitate reuse and sharing, research on 3D model matching and retrieval becomes essential. Existing content-based 3D model matching methods can be categorized into four major approaches: geometry-based, transform-based, image-based and

topology-based. Among these four approaches, only topology-based matching methods can handle highly deformable models, i.e., models representing the same object but in different postures. In this research project, we target ourselves at the topology-based retrieval techniques. We believe that a successful topological retrieval system will facilitate sharing and reuse of these models, and it will greatly benefit the game and movie production industry in particular by reducing the overall production cost. Topology-based methods analyze 3D models based on their skeletal information. Since skeletal features are invariant to models’ postures, topology-based methods can thus handle deformable models and the matching results agree with human intuition.

ii

However, there are only several topology-based retrieval methods, and most of them require high computational cost. Researching for a fast and accurate method to retrieve deformable models becomes our ultimate goal. In this thesis, we explore these methods and also take a novel approach to analyze the same problem. Instead of using explicit skeleton or multi-resolution reeb graph (MRG), we propose the use of topological characteristics to represent a model. Since our method makes use of the flow and transportation algorithm (Earth Mover Distance) for model matching, it is shown to be fast and efficient. As a comparison, our proposed method is 4.5 and 15 times faster than the method MRG in feature extraction and matching, respectively. Though our method does not compare models by skeletons, the additional high-dimension geometric features that we introduce are capable of discriminating different models of similar and dissimilar skeletons. From our experiments, our method also outperforms MRG in accuracy and can even achieve a precision of 0.77 at a recall rate of 1.0.

iii

iv

Acknowledgements

I would like to extend my appreciation and gratitude to the following people for their contributions to this research study. With my deepest gratitude and respect, I am especially indebted to both of my supervisors Dr. Rynson W. H. Lau and Dr. Chong-Wah Ngo. Their insightful remarks and guidance have expanded my horizons in my research study. This thesis would not be completed without their invaluable advice, patience and endless support. I would also like to thank the two thesis examiners, Prof Horace H. S. Ip and Dr. George Baciu, for their insightful and constructive comments on this thesis. Special thanks go to Mr. Tat-Wan Leung, who has made this whole study process much more enjoyable. His companion, inspiration and collaboration have encouraged me throughout the study. Many thanks also go to Mr. Augus Ming-Kin Siu, Miss. Ada Sau-Kuen Wan, Mr. Wilson Chi-Ho Tsang, Miss. Miranda Kwok-Yee Lee, Miss. Karen Pui-Kwan Lam and Mr. Alan Yat-Kin Chan for their support and great friendship. Most of all, my wholehearted thanks go to my mother and my sister for their unconditional love and support. And finally, my deepest thanks must go to my dear Yan for her encouragement, concern and understanding.

v

Table of Content

Abstract .......................................................................................................................... i Acknowledgements ...................................................................................................... iv 1. Introduction ...............................................................................................................1 2. Literature Review ......................................................................................................4 2.1. Introduction................................................................................................... 4 2.2. Four Major Approaches ................................................................................ 4 2.2.1 Geometry-based .................................................................................. 4 2.2.2 Transform-based.................................................................................. 8 2.2.3 Image-based ........................................................................................ 9 2.2.4 Topology-based ................................................................................. 10 2.2.5 Summary ........................................................................................... 12 2.3. Summary of Other Relevant Techniques .................................................... 13 2.3.1 Rotation-Invariance............................................................................... 13 2.3.2 Multi-resolution analysis....................................................................... 15 3. Problems, Challenges and Solution Outline ..........................................................18 3.1. Problems...................................................................................................... 18 3.2. Challenges ................................................................................................... 19 3.3. Solution Outline .......................................................................................... 20 4. Point Based Topological Features and Bipartite Matching ..................................21 4.1. Introduction................................................................................................. 21 4.2. Feature Extraction ....................................................................................... 22 4.2.1 Topological Point Extraction............................................................. 22 4.2.2 Geometric Information Extraction .................................................... 27 4.3. Feature Matching ........................................................................................ 30 4.3.1 Similarity Measure for Topological points........................................ 30 4.3.2 Similarity Measure for Models ......................................................... 30 4.4. Experimental Results .................................................................................. 32 4.4.1 Performance Evaluation.................................................................... 33 4.4.2 Matching Accuracy ........................................................................... 34 4.4.3 LSD and Our Feature Extraction Algorithm ..................................... 36

vi

4.4.4 Complexity........................................................................................ 38 4.5. Conclusion .................................................................................................. 40 5. Topological Points, Rings Features and Earth Mover Distance ...........................42 5.1. Introduction................................................................................................. 42 5.2. Bi-Directional LSD Analysis ...................................................................... 44 5.2.1 Modified LSD ................................................................................... 46 5.2.2 Bi-Directional Analysis..................................................................... 47 5.2.2.1 Clustering of Local Maximum Vertices ............................... 48 5.2.2.2 Protrusion Region (PR) Extraction ..................................... 49 5.2.2.3 Segment Region (SR) Extraction ......................................... 51 5.3. Feature Extraction .................................................................................... 54 5.3.1 Topological Information................................................................. 54 5.3.2 Geometric Information................................................................... 54 5.3.2.1 Effective Area - Weights of Importance ............................... 54 5.3.2.2 Normalized Geodesic Sum - Spatial Information ............... 55 5.3.2.2 Surface Distribution – Geometric Surface data .................. 57 5.4. Feature Matching ..................................................................................... 59 5.4.1 Distance Measure of Two Topological Characteristics.................. 59 5.4.2 Distance Measure of Two Models.................................................. 59 5.5. Experimental Results ............................................................................... 60 5.6. Invariance Properties................................................................................ 63 6. Evaluation of Bi-Directional LSD Analysis ...........................................................66 6.1. Introduction................................................................................................. 66 6.2. Comparison between Bi-Directional LSD Analysis and MRG .................. 66 6.2.1 Performance Evaluation.................................................................... 67 I. Precision and Recall Graph .................................................................... 67 II. Similarity Matrix ................................................................................... 71 6.2.2 Complexity & Time Analysis............................................................ 73 6.3. Comparison between “Topological Points Based Bipartite Matching” (MLSD) and Bi-Directional LSD Analysis (BDLA) ......................................... 78 6.4. Discussion ................................................................................................... 79 7. Conclusion and Future Work .................................................................................82 7.1. Conclusion .................................................................................................. 82 7.2. Future Work................................................................................................. 84 References....................................................................................................................87

vii

Vitae .............................................................................................................................93

List of Figures

Figure 4.1 Figure 4.2 Figure 4.3 Figure 4.4 Figure 4.5 Figure 4.6 Figure 4.7 Figure 5.1 Figure 5.2 Figure 5.3 Figure 5.4 Figure 5.5 Figure 5.6 Figure 5.7 Figure 5.8 Figure 5.9 Figure 6.1 Figure 6.2 Figure 6.3 Figure 6.4 Critical points produced by LSD. ......................................................... 23 Selection of saddles in our method. ...................................................... 25 Comparison of LSD and our method.................................................... 26 Surface Mesh are partitioned into black and white bands. ................... 29 Bipartite graph matching. ..................................................................... 32 Plot of Precision and Recall.................................................................. 34 Some example model groups and similarity values.............................. 41 Instability of saddle point identification. .............................................. 45 Topological Points (Dots) and Topological Rings (Dashed Curves). ... 46 Notations used in Bi-Directional Analysis............................................ 47 Graphical representation for the extraction of PR. ............................... 50 Protrusion and Segment Regions.......................................................... 53 Surface Mesh are partitioned into color bands. .................................... 57 13 model groups in our model database. .............................................. 61 Precision and recall graph..................................................................... 62 Similarity values of query examples..................................................... 65 Performance comparison between our methods and other methods..... 68 Precision and recall graphs for some individual model groups. ........... 70 The neck of the horse suffers from slicing direction problem. ............. 71 Mean Similarity Matrix of BDLA and MRG. ....................................... 71

List of Tables

Table 4.1 Table 4.2 Table 4.3 Table 5.1 Table 5.2 Table 6.1 Table 6.2 Table 6.3 Table 6.4 Table 6.5 Table 6.6 Similarity of models with similar skeletons. .......................................... 35 Similarity of models with dissimilar skeletons....................................... 36 Comparison between LSD and our feature point extraction algorithm.. 37 Mean Similarity of non-similar skeleton models.................................... 62 Mean Similarity of similar skeleton models........................................... 62 Mean Similarity of Model Groups (BDLA). .......................................... 69 Complexity Analysis of BDLA. ............................................................. 74 Time analysis of BDLA and MRG for feature extraction. .................... 77 Time analysis of BDLA and MRG for feature matching. ..................... 77 Time analysis of BDLA and MLSD. ...................................................... 79 Comparison between MLSD and BDLA................................................ 79

1

Chapter 1 Introduction

Recent advances in computer graphics and modeling tools have led to the increase of 3D models that are publicly available on the internet. This huge number of manually annotated but unclassified geometry models urges the research of 3D search engine [Funkhouser03]. Apart from this reason, there are also several applications that support the development of 3D retrieval system. First, in medical and chemical fields, data like human organs [Keim99] and protein structures are stored in 3D data format. Developing an efficient matching algorithm, which can allow automatic comparison and retrieval of such data, is a challenging but beneficial task. Second, high quality meshes that are usually found in animation and CG movies are difficult to build. To reduce the overall production cost, reusing or adapting from existing models is one of the good strategies. A retrieval system that can support searching and retrieval of models definitely helps achieve the goal. As a 3D search engine becomes essential to many applications, it urges more research and development toward this direction. Traditional retrieval methods are based on annotations. Examples include audio, images and text retrieval system. However, human annotation depends on many factors, like languages, cultures, personal experience and even the ways the materials are recognized or used. There is no definite answer for manual annotation. Being one kind of these multimedia data, 3D models suffers from the same problem. Recently, researches focus on content-based retrieval techniques. Content-based methods analyze the multimedia data from their actual contents. Features are then extracted

2

and compared automatically without manual interference. Content-based methods are thus believed to be more robust and accurate. Content-based techniques, in general, comprise two main parts: feature extraction and feature matching. Feature extraction concerns the extraction of features which can be used to represent the multimedia data. Feature matching, on the other hand, concerns the comparison of features and, is usually accompanied by a distance function for measuring the similarity between two features. In our survey work, existing 3D retrieval methods can be roughly classified into four major approaches according to the features used. These four approaches are namely geometry-based, transform-based, image-based and topology-based. We will give examples and detail the differences of these methods in chapter 2. From our survey work, a lot of successful 3D retrieval systems have been purposed recently. One of them is the “Princeton Search Engine” [Funkhouser03], which targets at rigid geometry models. In this project, on the contrary, we target ourselves at the group of “3D deformable models” (or the so called animating models or articulated models). These models are usually high quality meshes that require weeks of work to construct. Though these models are time and effort consuming to build, we find that they are used most frequently in computer graphics, game industry, animation and CG movie production. As such, a successful retrieval system that facilitates reuse and sharing of these models would be greatly beneficial to all these industries. In order to handle deformable models, we must analyze the topological properties of these models, and thus, our method is classified as topology-based. While it differs from existing topology-based methods that analyze models based on skeletons or

3

reeb graph, our method uses topological points and rings as features. Furthermore, our method tries to combine additional geometric features for discriminating models of similar and dissimilar skeletons while other topology-based methods focus on discriminating models of dissimilar skeletons only. Our experiments show that the proposed method is fast, accurate, and is not affected by model deformation. It is also invariant to rotation, scaling and translation. The rest of the thesis is organized as follows. Chapter 2 surveys and classifies existing 3D models matching and retrieval techniques. Chapter 3 gives a brief overview of our research goals, challenges and suggests a solution outline. Chapter 4 puts forward a preliminary approach which allows fast feature extraction and deformable model matching. Chapter 5 presents a more advanced algorithm which solves most of the problems identified in our preliminary approach. Chapter 6 gives a detail comparison between our approach and another topology-based method [Hiliga01]. Finally, in Chapter 7, we conclude the research contributions of this work and discuss some possible future directions.

4

Chapter 2 Literature Review

2.1. Introduction

Recently, many content-based retrieval methods have been proposed for matching 3D models. As mentioned in Chapter 1, content-based techniques comprise two main parts, feature extraction and feature matching. Different retrieval methods may have different target models and may use different features and matching algorithms. Here, we categorize these methods into four approaches according to the features that they use.

2.2. Four Major Approaches

In our survey, we categorize existing 3D retrieval methods into four approaches. They are geometry-based, transform-based, image-based and topology-based. In the following sections, we will introduce them briefly and a summary will be given in section 2.2.5.

2.2.1 Geometry-based

The geometry-based approach analyzes models based on their shape and size. These properties include points, lines (rays), line segments, distances, planes, spaces, volumes, angles and even curvature. To give a better understand of the approach, we

5

can identify several properties that they use: volume, curvature, reflective symmetry, anisotropic scale, distribution and energy.

1. Volume

In [Keim99], two non-overlapping volumes (Maximum Included Volume and Minimum Surrounding Volume) are introduced to represent a 3D model hierarchically. The difference of the two hierarchical volumes is then considered as an error measure. This is also the earliest work that support multi-resolution indexing search.

2. Curvature

In [Shum96] curvature captured from an enclosing sphere is suggested as features. Because all models are represented as curvature spheres, rotational matching is then applied for model similarity measure.

3. Reflective Symmetry Descriptor

In [Kazhdan03a], reflective symmetry property is researched. Reflective Symmetry Descriptor (RSD) is a global measure between the model and its symmetric difference. In the project, RSD do not perform as good as other methods in model retrieval. However, it is a good tool for enhancing other shape descriptors. The capability of enhancement comes from the fact that RSD is orthogonal to other shape descriptors. When it is combined with them, a performance boost is resulted.

4. Anisotropic

In [Kazhdan04] the anisotropoic scale is researched. Anisotropic scale, which is similar to RSD, is used to enhance other shape descriptors. The shape matching

6

paradigm is designed to have two parts. The first part is the similarity between two “normalized” isotropic models using general shape descriptors. The second part is the measure between two anisotropic scales of the two models. It is shown in the experimental result that anisotropic scale can help enhancing the performance of other descriptors.

5. Distribution

In this type of method, distributions (or the so called feature vectors or histogram) are the core part of the whole retrieval techniques. [Paguet98] is one of the earliest works in this type of method. It considers distribution of normals and cords as model features. [Vranic00] purposes to shoot rays from the center of mass through the 20 vertices of dodecahedron to the model surface. A 20-dimension feature vector, which captures the distance from model surface to the sphere, is thus formed for matching. [Osada01] purposes several shape descriptors also. Among them, the distribution produced by the distances between two random points (D2 descriptor) performs the best. Since random points are in great number and are sampled all over the model surface, the distribution is intrinsically invariant to rotation. Because D2 is easy to implement, many improvements have been suggested. [Ip02a] seeks the possibility to apply D2 on manufacturing models. In [Vandeborre02], apart from distance distribution, feature vectors of volume and curvature are also used with linear weightings. [Ohbuchi02] considers the use of moment of inertia, average distance to axis and variance distance to axis as the new feature vectors. [Ohbuchi03a] uses a 2D histogram that incorporates both angle and distance information. [Ohbuchi03b] improves over [Ohbuchi03a] by enabling multi-resolution analysis using alpha shape. In all these works, Ohbuchi et al. focuses their work in using elastic matching instead of Euclidean distance and significant improvements are resulted. Apart from these,

7

different variations of distributions are also used in other research methods. For example, [Liu03] extracts directional histogram and [Kortgen03] captures distributions of point distance for each point as feature histograms. In [Ip02b], [Wong03], and [Cheung03], improvements are made over “Extended Gaussian Image” [Horn84] for eigenhead models. While “Extended Gaussian Image” is a distribution of surface normals, their works focus on genetic algorithm and evolutionary computation analysis. [Alarcon02] captures a set of spin images as features. Spin images are a series of distance histograms. Since the spin images are captured for each point and are in great number, Self-Organizing Map (SOM) is purposed for feature reduction.

6. Energy

Unlike the other geometric properties, the energy methods (or the so called morphing methods) capture the work done or energy for models’ morphing. In [Novotni01], an energy type method is suggested. The algorithm calculates the distance field that is required to morph a voxelized model directly to another target model. The work done required for morphing becomes the dissimilarity measure. [Yu03] and [Leifman03] both consider the distances traveled by the rays from the model surface to a sphere’s surface. Such distances (maps) are proportional to the energy required to morph an object into a sphere. While [Yu03] concerns the topology and concavity of a model, [Leifman03] emphases on the use of octree technique for multi-resolution analysis. In [Tangelder03], salient points are extracted from physical properties like curvature, area and middle points with corresponding weights. The proportional transportation distance, which calculates the work done for morphing two distributions, is then used as the similarity measure.

8

2.2.2 Transform-based

Transformation techniques have been used as an analyzing tool in many applications for a long time. Audio, image, video analyses are just some of the examples. In 3D model analysis, many research works also focus on transform-based techniques. The capability of handling feature reduction, rotation-invariant analysis and

multi-resolution search are some of the issues that make this approach method popular. Among all the transform-based techniques for 3D models, several transformation functions can also be identified. They include, but not limit to, Fourier Transform (Spherical Harmonic), Wavelets Transform, and Zernike Transform. Vranic et al. purposes a series of 3D descriptors based on Fourier Transform. [Vranic01a] suggests a continuous principal component analysis (PCA) for pose-normalization. Fourier transform is then applied on the S2 surface (the spherical polar coordinates) and corresponding Fourier coefficients are taken to be the model features. [Vranic01b] gives a detail discussion on applying Fourier descriptor on the voxelized model. [Vranic02] improves over [Vranic01a] by using a more complex spherical function which takes both real and imaginary part into consideration. On the other hand, [Paquet98] and [Lau02] suggest the use of Daubechies and Haar Wavelets Transform, respectively. Wavelets perform better than Fourier in that it captures both spatial and frequency information in their features. In [Lau02], Haar wavelets are shown to perform better than Daubechies wavelets for 3D models. For the case of Fourier and Wavelets Transform, pose-normalization algorithms

9

cannot be avoided. However, such normalization algorithms like PCA may not be stable when the model possesses more than one dominant axis. To avoid the use of PCA, [Funkhouser03] and [Kazhdan03b] purpose to apply Fourier Transform on a sequence of surface partitions of a model. These surface partitions are the intersections between the model surface and a set of concentric spheres (Concentric Spherical Harmonics). Since the Fourier coefficients on the concentric surface partitions are independent of rotation, the method greatly stabilizes the matching performance. In [Novotni03], 3D Zernike Transform is purposed, which improves over the “Concentric Spherical Harmonics”, by capturing additional coherence information in the radial direction.

2.2.3 Image-based

The image-based approach, as its name suggests, makes use of image views as features and employs image matching algorithm for dissimilarity measure. [Min02] purposes a 2D sketch interface for capturing users’ drawing as inputs. The drawing is then matched with the Fourier coefficients obtained from frequency analysis on 2D images (from different viewpoints) of a model. [Chen03] purposes a 4D light-field descriptor, which is represented by a collection of 2D images, to describe a 3D model. For each 2D image, Zernike and Fourier coefficients are extracted for image matching. [Ohbuchi03c], on the other hand, considers the depth images (2.5D) as features. Since depth images can be rendered with hardware acceleration, the feature extraction process can thus be speeded up.

10

2.2.4 Topology-based

The topology-based approach analyzes models based on skeletal or topological information. Since our method (detailed in Chapter 4 and Chapter 5) can be classified into the topology-based approach, we give detail descriptions for all these methods in this section. Among all, [Hiliga01] is the earliest work in this approach. It first partitions a model into different intervals using integral geodesic (or the so called sum of geodesic or average geodesic). Integral geodesic measures how far a point is away from the surface center. Unlike Euclidean distance, geodesic distance [Kimmel98, Novotni02] is measured on the model surface and is not affected by model deformation. Then, by analyzing the neighbors of each component in each interval, a Multi-Resolution Reeb Graph (MRG) is built hierarchically. In each node, area and length are used as geometric features for discriminating different models. To match two MRG tree, heuristic algorithm is applied in a coarse-to-fine manner. It first matches the two root nodes in the two MRG trees. Then the matching traverses down the two MRG trees following the child node with maximum similarity. The matching goes on until all the nodes in either MRG trees are matched. In [Chen02], an improvement is also purposed on the pre-processing stage of MRG. In [Bespalov03], a scale-space representation is suggested to describe manufacturing models hierarchically. The representation is based on singular-value-decomposition (SVD). The SVD calculation again bases on geodesic distance. By using SVD, a model can be partitioned into two half recursively into a binary tree. As the model is represented as a tree, matching is similar to that of MRG. [Tal04] analyzes models based on component graph. It first splits a model based on

11

mesh decomposition algorithm. Each component node is then described by one primitive. The choice of primitive is based on a non-linear least-square optimization algorithm. All these primitives are then connected to neighboring components to form a component graph. To match two component graphs, an optimal error-correcting subgraph isomorphism algorithm is applied. In [Sundar03], voxel thinning [Gagvani99] is applied to extract skeleton from a voxelized model. Clustering algorithm is then applied to extract nodes for skeletal graph construction. Apart from topological matching, the radial distribution of the node edges is preserved for local shape matching. To match two skeletons, maximum cardinality minimum weight bipartite matching is applied. The recursive bipartite formulation is then further enhanced by depth-first search and greedy approach. [Biasotti03] compares and contrast three extended reeb graph (ERG) representations. These 3 ERGs are skeleton extraction algorithms and are characterized by different distance functions used. In [Biasotti03], an error tolerant graph isomorphism algorithm is also purposed for node matching. In topology-based approach, we can see that topological features are closely related to models’ skeletal representation and are thus invariant to model deformation. Owing to this reason, topology-based approach is the only approach that can handle deformable models. However, among all the topology-based methods, all of them focus on skeletal matching only. Their methods and experiments concern the discrimination of dissimilar skeleton models and pay little attention to similar skeleton models. As a result, none of them shows retrieval work on large database. In general, all skeletal (or topological) matching techniques are derived from graph matching and they are usually slow in practice.

12

2.2.5 Summary

As a summary, the geometry-based approach is efficient and easy to implement. The distribution method, in particular, has many varieties of implementations because of its simplicity. However, regarding to the matching accuracy, they are usually outperformed by other approaches like transform-based and image-based. The two geometric properties, RSD [Kazhdan03a] and anisotropic scale [Kazhdan04] cannot perform as well as others. However, because of their orthogonality, they can be used to enhance other 3D descriptors. The energy method concerns the work done required to morph a model into a specific shape (e.g., sphere) or the target object. Though there are only a few methods found in this type of methods, more research works are expected in the future. The main reason is its ability to detect small changes in the model’s topology and concavity. For a detail comparison among the geometry-based, transform-based and image-based approaches, we refer the readers to [Min03] and [Shilane04]. Transform-based approach has several properties that make it popular. The most important property is its ability to support multi-resolution analysis. With the recent breakthrough in concentric spherical harmonic, the accuracy of transform-based approach improves dramatically because

pose-normalization steps can be avoided. For the image-based approach, its main advantage is its independence from 3D data representation. If a model can be rendered as images, the image-based approach can be applied. In order to be rotation independent, methods in this approach use multiple views as features to represent a model. As the feature size increases, the matching cost also increases. However, regarding to the usefulness, image-based approach can give a better human-computer interface [Min03] because users may provide a 2D sketch to give an initial query to

13

the retrieval system. Topology-based approach is the only approach that can truly handle highly deformable models. Since their features are extracted along the models’ skeleton, they are invariant to model deformation. On the other hand, the methods in this approach are usually slower than others. The matching algorithm is especially the case. According to [Biasotti03] and [Sundar03], if no heuristic algorithm can be found, skeletal graph matching usually suffers from sub-graph isomorphism, which is NP-Hard.

2.3. Summary of Other Relevant Techniques

Among all types of 3D model retrieval techniques, the capability of being invariant to scaling, translation and rotation towards different models is essential. Scaling and translation invariance can be handled easily by scaling models to unit length and translating the center of mass to the origin, respectively. However, for rotation invariance, it is dependent on the features that the method used. In section 2.3.1, we reviewed on how rotation-invariant properties can be achieved. Apart from rotation-invariant properties, multi-resolution analysis is another essential technique for matching 3D models of different level of details. In section 2.3.2, we will also give a brief summary.

2.3.1 Rotation-Invariance

In the following section, we will give a brief survey on existing methods for handling rotation invariance. Five main approaches can be found as follows: 1. Using a large number of samples

14

The straight forward approach that can achieve rotation-invariance is to design features that are independent to rotation. The geometric distribution [Osada01] and reeb-graph [Hiliga01] retrieval methods are good examples of this type. In [Osada01], large number of points is used to calculate distances for distribution feature formation. Since there is a large amount of sample points and the features are independent of the coordinate axis, these features are intrinsically rotation-invariant. The multi-resolution reeb graph (MRG) method [Hiliga01] uses “sum of geodesic” as a model partitioning function. This function is a measure of the relative distance away from the surface center. Once again, since “sum of geodesic” is calculated using all the vertices on the surface, it is intrinsically rotation-invariant. 2. Using pose-normalization algorithm Examples that fall in this category are [Vranic00] and [Tangelder03]. Both of them propose modified versions of principle component analysis (PCA) for pose-normalization. After applying PCA, the dominant axis is chosen and aligned to one of the main axes in the coordinate system. However, PCA has one main drawback. If there is more than one dominant axis in the model, PCA may not give a stable result. The plane model is a good example. Because the plane body and wings may have equal length, the angle of rotation may be misjudged. 3. Using fourier transform for feature normalization In [Yu03] and [Liu03], Fourier transform is applied to achieve rotation-invariant property. However, unlike [Vranic02], they apply Fourier transform on the feature maps and distributions instead of using Fourier to extract features. By

15

doing so, the feature map and distributions, which are originally rotation-dependent, become rotation-invariant. 4. Using concentric spherical harmonic [Kazhdan03b] is the first research work that applies Fourier transform on a sequence of concentric spherical partitions. Since the concentric spherical partitions are invariant to rotation, the features captured thus can handle model rotation. Similar to [Kazhdan03b], [Novotni03] proposes to use the Zernike transform instead of the Fourier one. 5. Using multiple view features and corresponding matching algorithms This approach is broadly used by image-based algorithms. For example in [Chen03] and [Ohbuchi03c], multiple images from different viewpoints are captured as features. Since these images are captured from different angles, the matching algorithm must scan through the image list for the best matches.

2.3.2 Multi-resolution analysis

Multi-resolution analysis can allow retrieval methods to compare and search models based on level of details. In our survey, there are two main approaches to handle this. They are summarized as follows: 1. Hierarchical approaches A. Octree-based Octree is the recursive partitioning of Euclidean sub-space using cubes. In

16

[Kiem99] and [Leifman03], octree is used for storing the features in a hierarchical manner. During searching and comparison, higher level of detail can be compared down the tree where finer and smaller partitions can be found. B. Multi-resolution Reeb Graph The multi-resolution reeb graph (MRG) is similar to the octree method that requires regular partitioning. However, MRG differs from octree that it does not partition the Euclidean Space. Instead, MRG partitions the model directly into different intervals. A hierarchy is then built based on these intervals. C. Alpha shape Alpha shape is used in [Ohbuchi03b] to create a hierarchical structure for multi-resolution analysis. Unlike Octree and MRG, no partitioning is required. On the contrary, different radius of alpha shape is used to generate simplex for describing a model (simplex: points, lines, triangles). For example, if the alpha shape radius equals zero, a point cloud representation is resulted. When the alpha shape radius tends to infinity, the convex hull of the model is returned. In [Ohbuchi03b], 5 alpha radius values are used to generate 5 representations for every model. For each representation, feature extraction is applied. These set of features are said to be hierarchically describing the shape of the model.

For all the hierarchical approaches, searching and comparison follows a

17

coarse-to-fine strategy. That is, all the matching are carried out from the coarsest level traversing down the hierarchy until a match is found or a certain level-of-detail is reached.

2.

Using transform and coefficients All the methods in transform-based approach can be classified into this type. [Vranic02] and [Lau02] are some of the examples. In these two cases, coefficients from Fourier or Wavelets transform are used as features. In the transformed feature space, lower order coefficients (harmonics) correspond to lower frequency information and higher order coefficients (harmonics) correspond to higher frequency information. To allow the analysis of a deeper level, more (lower and higher) coefficients can be used as features. We would refer the readers to [Vranic02] for a better graphical representation.

18

Chapter 3 Problems, Challenges and Solution Outline

3.1. Problems

As explained in previous chapters, research for a successful retrieval system can be beneficial to many different fields of applications. However, there are also many different types of 3D model inputs. (Non-) manifold meshes, (non-) closed meshes, ill-defined 3D models (polygon soup or point cloud models), manufacturing models and high quality deformable models are only some of the model types that can be identified. As such, it is difficult to design a system that can read all these models as inputs while being fast and giving good matching results. Therefore, we believe that defining good project goals as guidance would be advantageous in this research project. Among all the four types of retrieval methods discussed earlier, we believe that researching for a topology-based method would give the greatest benefit to movie and game industry. First, most of the high quality models in these industries are deformable models (i.e., the same object but of different postures). The topology-based approach is the only choice that can match this kind of models. Second, there are relatively few methods of topology-based. Researching for a fast and accurate method would be more useful for future analysis. Combining all these, we hope that our retrieval method can satisfy the following project goals:

19

a) Content-based approach b) Fast and efficient c) Invariant to scaling, translation and rotation d) Accurate in discriminating different models e) Insensitive to noise and small features f) Match human inspection g) Can handle deformable models

3.2. Challenges

After defining our project goals, we observe that most of the challenges arise from being topology-based. First, methods of topology-based are relatively slow in practice. In order to allow matching of topological information, features are usually represented by skeletons [Sundar03, Tal04] or reeb graph [Hilaga01]. Matching of skeletal graphs is usually carried out by sub-graph isomorphism analysis which is NP-hard [Biasotti03]. Typically, heuristic algorithms are required to handle the matching problem. Second, most topology-based methods focus on skeleton matching only, i.e., to discriminate models of dissimilar skeletons. When the model database becomes large, it is difficult for them to discriminate models of similar skeletons. [Hilaga01] and [Sundar03] try to handle this by capturing additional geometric features. However the features are relatively local and cannot tell the overall shape. So it is still not sufficient to discriminate models if their skeletons are similar. While combining geometric features is a good clue, ensuring features to be invariant to scaling, translation, rotation and deformation becomes another major

20

technical problem.

3.3. Solution Outline

To tackle the above challenges, we propose a solution outline as follows: 1) All of the existing topology-based methods try to capture features as skeletal graph. Since a graph is used, the matching process is slow due to the use of sub-graph isomorphism. To avoid this, we propose the use of point based features to represent the topological information of a model. As models are now represented as sets of points, matching of models becomes the matching of two point sets. There are efficient matching algorithms to do this. 2) To further reduce the matching cost, we try to capture topologically important points only. Topological points are not regularly sampled from the model’s surface. They are extracted based on skeletal importance. We believe that these points not only invariant to model tessellation, but their reduced number also reduce the overall matching time. 3) Additional usage of geometric features can help identify different models of similar skeletons. Unlike [Hilaga01] that uses low-dimension features, we apply high-dimension geometric features to describe the overall model surface. By doing so, we believe that our method can differentiate models of similar skeletons, and this claim has been verified in our experiments to be shown later.

21

Chapter 4 Point Based Topological Features and Bipartite Matching

4.1. Introduction

In this chapter, we present a novel method for efficient 3D model comparison. The method was accepted and presented at the Computer Graphics International 2004 Conference [Tam04a]. The method is designed to match highly deformed models through capturing two types of information, topological and geometric features. In this method, we propose a feature point extraction algorithm, which is based on the “Level Set Diagram”. Our algorithm can extract topological points from a general 3D model. These topological points represent the skeletal structure of the model. Additionally, we also capture both spatial and curvature information to describe the global surface of 3D objects. This differs from traditional topology-based 3D matching methods that use only low-dimension local features. Our method can accurately distinguish different types of 3D models even if they have similar skeletons. By applying the bipartite graph matching technique, our method can outperform two other methods [Osada01, Vranic01b] as demonstrated in our experimental results. The rest of the chapter is organized as follows. Section 4.2, presents the feature extraction method in details. Section 4.3 discusses our similarity measure. Section 4.4 presents and evaluates some of the experimental results. Finally, section 4.5

22

briefly concludes this chapter.

4.2. Feature Extraction

In this subsection, we describe in detail our method for features extraction. It involves two main steps. Section 4.2.1 discusses how we extract topological points from a 3D model, while section 4.2.2 discusses how we extract additional geometric information for describing each of the topological points for matching.

4.2.1 Topological Point Extraction

Traditional topology-based methods use explicit skeletons or reeb graph to represent a 3D mesh. In order to avoid the relatively slow sub-graph isomorphism matching process, we try to extract topological points as features instead. In this section, we talk about our algorithm which is adapted from the Level Set Diagram algorithm.

4.2.1.2 Level Set Diagram In [Lazarus99], Lazarus et al. presented a “Level Set Diagram” (LSD) algorithm to construct the skeleton of a 3D model. We prefer LSD to be our feature point extraction method above others like [Chuang02, Manzanera99, Zhou99, Li02] because of its efficiency, ease of coding and its ability to identify critical points on the mesh surface. The algorithm actually simulates the marching of a waterfront. The word “waterfront” is used from the graphical explanation in [Hart99]. A critical point arises when the topology of a waterfront changes. There are three types of critical

23

points: minima, maxima and saddles (Figure 4.1). If a waterfront opens or closes, maxima are registered. On the other hand, if waterfronts meet leading to a split or merge, saddles are registered. All waterfronts grow from a single minimum. To find the minimum from a mesh, LSD uses a heuristic approach. It first traverses the mesh two times to determine the diameter of the mesh. After obtaining the diameter, the two vertices that are separated by largest possible distance are obtained, and the minimum point is selected as the first vertex. By analyzing all these critical points, a skeleton can be built. We would refer readers to [Lazarus99, Steiner02] for details where [Steiner02] is the modified version of [Lazarus99] that can handle genus topology. Maximum: The critical point where a wavefront closes Saddle: The critical point where wavefronts meet leading to a split or merge Minimum: The source point where a wavefront starts --Waterfronts

Figure 4.1 Critical points produced by LSD.

4.2.1.2 Modified Level Set Diagram Though LSD runs reliably on smooth manifolds, it performs poorly in identifying critical points on general 3D models, which may have arbitrary curvature and

24

probably noise. The waterfronts maintained by LSD can easily be corrupted, leading to incorrect identification of these points. Here, we propose a novel approach to tackle this deficiency. In our experiment, we find that many of these incorrect points are wrongly identified during the registration of saddles. To avoid this problem, we propose to count the number of unvisited vertices adjacent to a waterfront. This is done by running a depth-first search on the waterfront before registering a saddle. In practical coding, “waterfront” corresponds to the priority queue of the LSD algorithm. We consider three cases below: z A Good saddle: Waterfronts can be uniquely identified and more than one

unvisited vertex (the grey region in case A of Figure 4.2) is found adjacent to each waterfront. Since this is considered as a normal case in [Lazarus99], we register the point as a normal saddle. z A Bad saddle with good maximum: No unvisited vertices (the grey region in

case B of Figure 4.2) are found adjacent to waterfronts. As such, there is no significant protrusion. We register the point as a maximum and disallow all vertices on the waterfronts to become saddles or maxima. z A Bad saddle: Two or more waterfronts are identified but only one waterfront

has at least one adjacent unvisited vertex. This is a degenerate case (case C of Figure 4.2). Our approach does not register the point as a saddle.

25

LSD A

Our Approach

C Dashed line: Circle: Saddle

B Waterfront Dot: Maxima

Shaded Area: Unvisited Region Figure 4.2 Selection of saddles in our method.

It should be noted that our depth-first search procedure is completely different from the original contour split procedure proposed in [Lazarus99]. The contour split procedure is applied on the dual graph of the model to analyze its skeleton. Our depth-first search procedure, however, is directly applied on the waterfronts, not the dual graph, and is used to filter noise and solve the problem of incorrect topological point identification. Figure 4.3 compares the results of LSD with our method. We can see that our method can extract more reliable topological points and is more noise resistant for general 3D models. The deficiency of the LSD method (left diagram of Figure 4.3) is due to the corruption of waterfronts during mesh transversal. Our method (right diagram of Figure 4.3), on the other hand, extracts critical points more accurately and reliably without such a deficiency.

26

LSD

Our Approach

Cube: Maxima Circle: Invalid critical points Square: Redundant points

Sphere: Saddles Dotted Circle: Missing of critical points due to noise

Figure 4.3 Comparison of LSD and our method.

Though our method can reduce most redundant and incorrect points, it sometimes misses very small features. In the dotted circle (right diagram of Figure 4.3), our method fails to find any points on one of the two ears. This defect is caused by our noise filtering step that two feature points are considered as noise if they are only one vertex away. However, missing of feature points only occurs infrequently, and the error percentage is far lower than that of the LSD method as will be shown in Table 4.3. By using bipartite matching (detailed in Section 4.3), the matching error becomes insignificant. Although the LSD method considers three types of critical points, namely, maxima, minima and saddles, to simplify our feature matching process (which will be shown in section 4.3), we have grouped maxima and minima into the same set and called them Local Maximum Points. Hence, for the rest of the chapter, we only consider two types of topological points: local maximum points and saddle points.

27

4.2.2 Geometric Information Extraction

While considering topological points as the skeletal representation (topology information), we use the 1) sum of geodesic distance and the 2) curvature distribution as geometry information. Feature 1: Sum of Geodesic Distance The Sum of Geodesic Distance (or the so called Average Geodesic or Integral Geodesic) was firstly introduced in [Hiliga01]. Let g(p,q) be the geodesic distance between points p, q ∈ S , where S is the surface of the model. The Sum of Geodesic Distance, G, is defined as:

G (v ) = ∫ g (v, p)?S

p∈S

Eq. 4.1

To describe the spatial location of a topological point relative to the surface center in a scale-invariant manner, we use the normalized geodesic sum, Gnorm, as a distance measure.

G (v) ? Min q∈S G (q ) Max q∈S G (q ) ? Min q∈S G (q )

G norm (v) =

Eq. 4.2

where v ∈ S and 0 ≤ Gnorm ≤ 1 . It should be noted that, in [Hiliga01], the Sum of Geodesic Distance is used as an index for the sampling interval. However, in our method, it is used as a feature to describe the spatial location of a topological point directly. The normalization formula used is also different. To do this, we find the minimum and the maximum values of G. In [Hiliga01], these values are approximated by calculating the geodesic sum on a large set of vertex regions. Since most deformable models are of high

28

quality meshes, it is common to have a vertex count of above 10,000. The calculation of geodesic sums becomes a very computationally expensive process. In our case, we notice that the vertices associated with the maximum and the minimum values of G, in most cases, are located on or near a path (the diameter) between two furthest points. The path has already been found during the preparation stage of LSD (Section 4.2.1.1). By applying binary search constrained by gradient descent detection, we can easily find an approximation of Maxq∈S G (q) and Min q∈S G (q) . Among all the models

in our database, we find that such an approximation only derivates from the actual value by no more than 3.76% and the number of geodesic sum computations is less than 20. Feature 2: Curvature Distribution To describe the global surface curvature change with respect to a topological point t, we use a feature vector V(t) of dimension n to store the curvature information. In order to handle topological change, the surface mesh is partitioned according to the geodesic distance as shown in Figure 4.4. We define c(t) to be the maximum geodesic distance obtained with respect to t: c(t ) = Maxq∈S g (t , q) Eq. 4.3

and let P be a partition of the model surface S with respect to t such that

P 1 (t ) ∪ P 2 (t ) ∪ ... ∪ P n (t ) = S

c(t ) c(t ) (i ? 1) ≤ g (t , q) < (i )} n n

Pi (t ) = {q ∈ S |

Eq. 4.4

29

where i=1,…,n. t , q ∈ S are vertices on the surface and n is the number of partitions.

Black and white bands: Partitions Pi(t) are obtained according to the geodesic distances originated from the topological point at the dog’s tail.

Figure 4.4 Surface Mesh are partitioned into black and white bands.

To approximate the curvature of the model surface, we apply the “Gauss Bonnet” (angle deficit) method [Meyer02]. Since the Gaussian curvature KG(q) at vertex q is only a discrete local approximation, it may be affected by noise or become very large at corners. We normalize the curvature to the range of [0,1] by the following function:

Normalized KG (q) = 1 ? 1 (1+ | KG (q) |) Eq. 4.5

We can then define our feature vector V(t) with respect to t as the sum of Normalized Gaussian curvature in individual partition Pi(t).

Vi (t ) = ∑ q∈P ( t ) Normalized K G ( q )

i

Eq. 4.6

where i=1,…n. t is a topological point such that t ∈ S . In our prototype system, we set n (the dimension of V(t)) to 20 for all the models in our database.

30

4.3. Feature Matching

In this section, we detail the steps for computing the similarity of two objects. This involves two measures. Section 4.3.1 presents how we compute the similarity of two topological points, while section 4.3.2 presents how we compute the similarity of two models, which is based on the point set similarity measure.

4.3.1 Similarity Measure for Topological points

To determine the similarity value of two topological points t1 and t2, we use the following formula to combine the normalized geodesic sums Gnorm and the curvature features V with ratio Wt: Dist tp (t1 , t 2 ) = Wt × | G norm (t1 ) ? G norm (t 2 ) | + (1 ? Wt ) × L2, norm (V (t1 ), V (t 2 )) Simtp (t1 , t2 ) = 1 ? Disttp (t1 , t2 ), Simtp ∈ [0,1] Eq. 4.7

4.3.2 Similarity Measure for Models

Since we now have a set of topological points and a similarity value for each point, matching two models can be formulated as a bipartite graph matching problem. Let G = (V , E ) be a bipartite graph such that there is a partition V = A ∪ B and every edge in E has one end-point in A and the other in B. Let M ? E be a matching such that no vertices are incident to more than one edge in M. We also define the cardinality |M| to be the number of edges in M and let c : E → R be a cost function

31

that returns real number (R) on the edge (E) of G. The cost (weight) of a matching is the sum of the costs of its edges, i.e.,

c ( M ) = ∑e∈M c (e)

Eq. 4.8

The “Maximum Weight Maximum Cardinality Bipartite Matching” (MWMCB Matching) problem is thus to find the matching on graph G such that |M| and c(M) are maximized, where c(M) is the MWMCB weight. For a practical implementation, we would refer readers to [Mehlhorn99]. To allow maximum number of matched topological points and maximum similarity weight between two models, we construct a bipartite graph G = ( A ∪ B, E ) , where A and B are the two topological point sets for comparison and Simtp is the edge weight c (Figure 4.5). By applying a standard MWMCB matching algorithm on G, we can measure the similarity of two point sets. Let n1 and n2 be the numbers of vertices in A and B, respectively. The number of matched edges should then be equal to n1, where n1 ≤ n2 . Hence, the maximum bound of the MWMCB weight, c(M), is shown as follows:

c( M ) ≤ Max (Simtp (t1, t2 )) × n1 ≤ n1

t1∈A, t2∈B

Eq. 4.9

The point set similarity is then normalized as follows: Simpoint set ( A, B ) = c( M ) n2 Eq. 4.10

In our similarity measure, we only allow bipartite matching among the same type of topological points in the two models. That is, we calculate the similarity for the two sets of local maximum points, (Amax, Bmax), and for the two sets of saddle points, (Asaddle, Bsaddle), separately. We then combine them together with ratio Ws as the

32

similarity measure for two models:

Simmodel ( A, B) = WS × Simpoint set ( Asaddle , Bsaddle ) + (1 ? WS ) × Simpoint set ( Amax , Bmax )

Eq. 4.11

Suppose M = {e2, e4, e5, e6} then c(M) = c(e2) + c(e4) + c(e5) + c(e6)

Sim point set ( A, B ) =

c( M ) n2

and n2 = # of point in Set A

Figure 4.5 Bipartite graph matching.

4.4. Experimental Results

We have implemented the proposed method in a prototype system to test its performance. To demonstrate its ability to handle deformable models, we have chosen 10 base models from a popular industrial modeling tool named “Poser”. For each of these 10 base models, we generate 3 more different poses as deformed models. Incorporating some models downloaded from the Internet, we have created a database of 70 different models. For each of these models, we generate an addition of 7 models by rotating them along the xy axis, yz axis, non-uniform scaling along the xy axis, yz axis, and reducing their vertex counts by 25%, 50% and 75%. To create the reference database, we manually categorize the 560 models into 25 groups. Some of the sample models are shown in Figure 4.7.

33

4.4.1 Performance Evaluation

Figure 4.6 shows the Precision-Recall graph. We use each of the 560 models as a query to the database (Wt=0.5 and Ws=0.5) and average the matching results. From the plot, we can see that our method can achieve a high precision value of 0.54 even at high recall value of 1.0. The features captured are found to be invariant to rotation, non-uniform scaling and independent of model deformation in the experiment. As a comparison, we have also plotted in Figure 4.6 the precision and recall of a geometry-based method, D2 [Osada01] (feature size: 72) and a frequency-based method, Fourier [Vranic01b] (feature size: 21). They are two of the best methods that we have tested in our earlier work [Lau02]. The results indicate that our method performs much better than [Osada01, Vranic01b], which cannot handle deformable models. We can see that the precision of the geometry-based and the frequency-based methods drops dramatically when the recall is above 0.1. Since our method depends only on model topology but not on model tessellation, the number of topological points captured is relatively small (minimum 2, mean 18, maximum 60 topological points for our model database), and the number of geodesic sum computations is thus limited. On the contrary, the voxelization process [Sundar03] involves high computational and memory costs, and interval sampling [Hilaga01] requires a large number of geodesic sum computations.

34

Figure 4.6 Plot of Precision and Recall.

4.4.2 Matching Accuracy

1) Matching of Models with Similar Skeletons Table 4.1 shows the similarity ratings of our method on models with similar skeletons. We can see that our method can distinguish models with very similar skeletons. This can be explained by the use of additional spatial and curvature features. For the case of cats and dogs, though the numbers of topological points captured are the same, i.e., they have similar skeletons, they have relatively different similarity rating. The difference is mainly contributed by the spatial information and surface curvature distribution features. Our observation here is that dogs have longer

35

limbs than cats. The spatial locations of topological points for dogs located at limbs are thus further away from the surface center than those of the cats. In addition, since dogs have a larger body than cats, the curvature distributions are also different.

Table 4.1 Mean Similarity Cats Dogs Horses Dolphins

Similarity of models with similar skeletons.

Cats Dogs Horses Dolphins

0.91 0.85 0.82 0.78

0.85 0.95 0.83 0.81

0.81 0.83 0.92 0.71

0.78 0.81 0.71 0.93

2) Matching of Models with Dissimilar Skeletons Table 4.2 shows the similarity ratings of our method on models with dissimilar skeletons. We can see that the similarity values differ significantly. For the case of dogs and human beings, the similarity rating is much smaller than the case of cats and dogs. This can be accounted for by two factors, the topological and geometric effects. From our observation, there are many fingers and toes in human models while there are none in dogs. Hence, the skeletal (topological) features are different. For geometric effects, we observe that human beings have very long arms and legs, but have relatively shorter body than those of the dogs. Such differences not only affect the spatial location (geodesic sum) of the topological points, but also the curvature distribution features. All these dissimilarities are reflected as larger difference in values of the similarity table.

36

Table 4.2 Similarity of models with dissimilar skeletons. Mean Similarity Frogs Humans Dogs Frogs Humans Dogs

0.95 0.63 0.42

0.62 0.88 0.31

0.42 0.31 0.95

(Note that in tables 4.1 and 4.2, all the models within each group are a series of deformable models.)

4.4.3 LSD and Our Feature Extraction Algorithm

To study the reliability of our feature point extraction algorithm for extracting topological points, we pick some models from our model database for comparison. In Table 4.3, alternating columns compare the topological points captured by the LSD method and our feature point extraction algorithm. We can see that the LSD method captures a large number of redundant and incorrect topological points. Since these invalid points spread across a large region, it may be difficult to solve the problem through clustering (e.g., the horse in the 1st column and 3rd row of Table 4.3). On the other hand, our method captures topological points precisely and most of them agree with human inspection. Most saddle points are located at the articulate joints while the local maximum points are located at the tips of a protrusion. For the example of noisy sphere (i.e., the epcot in the 1st column and 6th row of Table 4.3), our method is able to correctly return the local maximum points at the two tips, which match the topology of a sphere. Overall, our method works better than LSD in capturing topological points on general 3D models.

37

Table 4.3 Comparison between LSD and our feature point extraction algorithm. LSD Topological points* Total: 16 C: 76 I: 16 M: 0 R: 44 Total: 16 C: 46 I: 7 M: 0 R: 23 Total: 16 C: 84 I: 6 M: 0 R: 62 Total: 16 C: 30 I: 3 M: 0 R: 11 Total: 15 C: 58 I: 9 M: 0 R: 34 Total: 2 C: 6 I: 0 M: 0 R: 4 Our Approach Topological points* Total: 16 C: 16 I: 0 M: 0 R: 0 Total: 16 C: 16 I: 0 M: 0 R: 0 Total: 16 C: 14 I: 0 M: 2 R: 0 Total: 16 C: 16 I: 0 M: 0 R: 0 Total: 15 C: 14 I: 0 M: 1 R: 0 Total: 2 C: 2 I: 0 M: 0 R: 0

Sphere: Captured (C): Incorrect (I): Missing (M): Redundant (R):

Saddle points Local Maximum Cube: Number of topological points captured by algorithm Number of incorrect points far away from topological location (e.g., articulate joints, tips of protrusion) Number of topological points that algorithm fail to capture Redundant points locate near topological location captured by the algorithm

(*) Topological Points include both local maximum and saddle points

38

4.4.4 Complexity

Apart from having a high matching accuracy, our method also has a relatively low computational cost. In our topological point extraction algorithm, we modified the LSD method by introducing a depth-first search in the pre-registration of saddle points. The LSD algorithm [Lazarus99] (without contour split) requires

O ( nlog n + e) , where n is the total number of vertices and e is the number of edges.

To apply depth-first search for checking valid saddles or maxima, it requires O(λS (nv + ne )) , where nv and ne are the numbers of vertices and edges, respectively, explored during the depth-first search. λ S is the total number of saddle points found. Hence, the overall complexity is O(nlogn + e + λ S (nv + ne )) . Comparing with MRG [Hiliga01], although our method also needs to compute the expensive geodesic sum, we only apply it on topologically important feature points. The total complexity of computing the geodesic sum for the whole model is

O((?max + ?saddle + β )(nlogn + e)) , where ?max and ? saddle are the numbers of local

maximum points and saddle points, respectively. β is the number of binary search iterations (usually, less than 20) required to find Minq∈S G (q) . It should be noted that

?max + ?saddle + β is much smaller than the total number of base vertices required by

[Hiliga01]. For example, we consider the most complicated human model where

?max + ?saddle = 60 and β = 20 , making a total of 80. Whereas in MRG, about 150

base vertices [Hiliga01] and their geodesic sums are computed. These geodesic sums are then interpolated to become those of the other model’s vertices. Hence, the interpolated values associated with the other model’s vertices are only an

39

approximation from those of the base vertices. Our method, on the other hand, computes the geodesic sum directly on all topological points and is thus believed to be more accurate. To compare two models, we apply the MWMCB matching algorithm. This algorithm has a complexity of O( N ( M + NlogN )) , where N=|A| + |B|, M=|A| × |B|, and |A|, |B| are the numbers of vertices in the two topological point sets A and B, as discussed in section 4.2. For [Hiliga01], the complexity of comparing two models is O(NR(MR+NR)), where NR, MR are the R-nodes counts of the two models. Although the complexity of our method is higher than [Hiliga01] here, it should be noted that the number of matching points (i.e., the number of topological points) is less than that of [Hiliga01]. In our method, feature points are dependent on model topology only, not on tessellation, and thus the number of matching points is limited. For comparison, our method has on average 18 topological points for each model of our database, while the method in [Hiliga01] has about 300 r-nodes for each model. To give some statistics on matching time, we have maximum, minimum and average matching times of 110ms, <0.5ms and 0.71ms, respectively, on a PIII-500MHz machine. By reducing the number of geodesic sum computations (which consume 90% of the cost of the feature extraction process) and the number of nodes in the matching process, our method can support faster content-base retrieval and allow users to submit their query model on the web environment. We would like to point out that when a 3D model contains more and more topological and/or geometric details, the reeb graph based matching methods perform less accurate [Bespalov03]. The multi-resolution version of reeb graph [Hiliga01], which follows topological alignment of a model, is not a skeleton. It stores no explicit topological information and not even the shape of the model. For geometric

40

information, area and length are the only features used. These features are relatively local, and have no specific information to describe the model surface. To better tackle such limitation, we derive our topological points from LSD. LSD not only represents the actual skeleton, but also gives intrinsic topological properties, like genus from its structure. Further, by grouping feature points into local maximum points (protrusion tips) and saddle points (branches or handles (genus)), our method can better represent the model. To distinguish similar models of different shapes or poses, features that can describe the overall surface must be used. In our method, the additional curvature feature vectors are especially reserved for such purpose.

4.5. Conclusion

In this chapter, we have proposed a novel 3D model matching method through analyzing the models in both topological and geometric domains. As demonstrated in our experimental results, the method is efficient, invariant to rotation, non-uniform scaling. The new method also shows a high matching accuracy towards highly deformable models. We can conclude two facts from the experimental results. First, the attempt to represent 3D models by topological points is confirmed to be a good initial start for our research direction. Unlike existing topology-based methods that use reeb graph [Hiliga01] or explicit skeletion [Sundar03], we use topological points to represent the skeletal information. This is relatively more efficient. Second, the combination of features from different domains is a possible clue for advanced model retrieval. To distinguish different skeletons models, topological features play an important role. To discriminate different models that are of similar skeletons, additional high-dimensional geometric features become the essential hint. Among all

41

the topological methods, our method can handle different models with similar skeletons and we have presented all the essential algorithms in this chapter.

Human

Boys Query: 0.947 1.0

Dog frog

Girls 0.908 0.89 0.88 0.82 0.79

0.945

Query: 0.97 1.0

hand Cat

Query: 0.929 0.928 1.0 0.977 0.904 0.899

Query: 0.94 1.0

Raptor Horse

Query: 0.93 0.82 1.0 0.92 0.90 0.77

Query: 0.96 1.0 0.95 0.90

Query: 0.95 1.0 0.92 0.83

Figure 4.7 Some example model groups and similarity values.

42

Chapter 5 Topological Points, Rings Features and Earth Mover Distance

5.1. Introduction

In chapter 4, we have considered the use of topological points and geometric features to represent a model for matching. Although the method is robust and accurate, there are several issues that we have not addressed. In this chapter, we first focus on these problems as well as our proposed solutions. After that, we present our new algorithm for advanced deformable model matching that can handle all these problems. As a side note, part of the content of this chapter has been successfully published in the International Conference of Pattern Recognition 2004 [Tam04b]. Unaddressed Issues and our solutions: 1) To extract topological points reliably from 3D models, we proposed the use of modified LSD in the last chapter. Although modified LSD produces lesser noise, it still suffers from two problems: the “slicing direction” [Mortara02] and the instability of saddle point identification. The former problem may lead to the loss of some features because LSD works poorly on some non-tubular shaped models. The later problem, on the other hand, may lead to unstable topological features. To handle these two problems, we have designed a Bi-directional LSD analysis method and decided to extract topological rings instead of saddle points as the features. In the last chapter, we have discussed the use of Maximum Weight Maximum

43

Cardinality Bipartite Matching as a similarity measure for two models. Though it is shown to give good retrieval results, the matching algorithm sometimes fails to give good similarity measures that match human inspection. After careful analysis, we have found that the consideration of topological points alone in our bipartite matching process is not sufficient. First, the number of points extracted may be affected by two factors: 1) the contrast of topological characteristics, and 2) noise. If the models are ill-formed or the features contain a lot of noise, our matching algorithm will therefore suffer. Second, topological points may not truly represent the models’ skeletons. For example, a dog may be considered as similar to a dolphin if they have a similar number of topological points. However, according to human inspection, there are some differences in term of skeletons and therefore they should have different similarity values. In order to tackle this problem and to avoid slow skeleton matching, we apply additional features, the area of importance, to further describe the topological characteristics. The rest of the chapter is organized as follows. Section 5.2, introduces our Bi-Directional LSD Analysis. Section 5.3 presents our feature extraction method in details. Section 5.4 discusses our similarity measure. Section 5.5 presents and evaluates some of the experimental results. Finally, Section 5.6 briefly discusses some of the properties of our algorithm.

44

5.2. Bi-Directional LSD Analysis

To solve the “slicing direction” and the instability of saddle point identification problems, we have designed an algorithm, “Bi-Directional LSD Analysis”, for advanced feature extraction. Before presenting the algorithm, we can take a deeper look at these two problems. Problem 1: “Slicing Direction” The “Slicing direction” problem is firstly described in [Mortara02]. Since LSD uses only one source point as the minimum for skeleton extraction, a slicing direction is privileged and it may lead to the loss of some features if the features are located at non-tubular region. To solve that, [Mortara02] tries to find all the feature locations and grows waterfronts from them at the same time. However, since all waterfronts are grown at the same starting time, the locations of saddle points may be dependent on the length of the model’s limbs and connections.

Problem 2: “Instability of saddle point identification” To have a better understanding of this problem, we may consider the following example. Given a human model, we can identify two minima using LSD (the two yellow triangles in Figure 5.1 left and right diagram). In LSD and the Modified LSD (in section 4.2.1.2), we choose one of the two minima as starting seed arbitrarily. However, the two seeds can lead to different saddle point locations. The minimum that is located at the leg (Figure 5.1 left diagram) may grow waterfronts upwards to the head, and eventually, saddles will be registered on the shoulder. However, considering a minimum located at the head (Figure 5.1 right

45

diagram), waterfronts will be grown downward and saddles will be registered under the shoulder. As we can see, though Modified LSD works best at locating maximum points, saddle points are dependent of the choice of minimum.

,

,

Topological Points

Waterfronts

Growing Trend of waterfronts

Figure 5.1 Instability of saddle point identification.

Bi-Directional LSD Analysis tries to solve the first problem by running two (instead of one) independent Modified LSD processes from the two minima. Though “slicing direction” is privileged from one direction, missing features can now be found by the second direction as the slicing direction is reversed. To solve the second problem, we try to find the topological rings to replace the idea of saddle points. Topological rings that are shown in Figure 5.2 (Dashed Curves) are independent of the choice of minima.

46

Figure 5.2 Topological Points (Dots) and Topological Rings (Dashed Curves).

The Bi-directional LSD Analysis consists of two parts: Modified LSD and Bi-directional Analysis. The first part tries to improve LSD’s performance on general models, while the second part tries to solve the two problems mentioned above and prepares the features for model matching.

5.2.1 Modified LSD

In Chapter 4, we have observed that the major problem of LSD is its poor saddle identification on general models. To avoid such problem, we propose to count the number of unvisited vertices adjacent to a waterfront during the registration of saddles so that we know whether a saddle is a good saddle, bad saddle or a maximum point instead. To tackle another problem of LSD, the “slicing direction” problem, we try to apply two Modified LSDs at the two furthest points of a model. Further analysis is then based on these two LSD trees. Details are discussed next.

47

5.2.2 Bi-Directional Analysis

From our observation, the Modified LSD works best on locating maximum and minimum critical points. In order to extract features reliably, we propose to analyze a model from its maximum and minimum points first and Bi-directional Analysis is thus designed in a bottom-up manner. It consists of three steps: (1) clustering of local maximum vertices, (2) protrusion region (PR) extraction, and (3) segment region (SR) extraction. Minimum / Source points: s1 , s2 ∈ V Maximum

Saddles Parent-Child LSD Tree: T1 LSD Tree: T2 , Relationship } ∈ Ti

Local Maximum M i = {

} ∈ Ti , Saddle Points S i = {

Figure 5.3 Notations used in Bi-Directional Analysis.

Notations: To simplify our discussion, we adopt the following notations for our

analysis and an example can be found in Figure 5.3. Suppose that we have a surface mesh G = (V, E), where V is the set of vertices, E is the set of edges. Let gv(k, v), gvs(k, R) be the geodesic distances of k respect to a vertex v and vertex set R, respectively, and path(k, v) be the shortest path, where k , v ∈ V and R ? V . Let also the two furthest points from the Modified LSD be s1 , s 2 ∈ V and the two

48

corresponding LSD skeletal trees be T1 and T2 , such that Ti = (Vi , Ei ) , where Vi ? V and root (Ti ) = s i for i=1,2. We define P(v) as the parent of vertex v within Ti , where v ∈ Vi . To further simplify our analysis, we consider two types of critical points only: Local Maximum M i ={maxima and minima in Vi },

z

z

Saddle Points S i = {saddles in Vi }, M i ∪ S i = Vi , i = 1,2 .

5.2.2.1

Clustering of Local Maximum Vertices

Since the Modified LSD is best at locating local maximum, most of the vertices in M1 and M2 coincide and can be paired locally. We define a mv_cluster = {m, n} where n = f(m) and define f(m) as follows:

f (m) = {n | gv (n, m) ≤ gv ( P(m), m) and , gv(n, m) = Mink∈M j ( gv(k , m)) }

Eq. 5.1 where m ∈ M i , P (m) ∈ S i , n, k ∈ M j and i ≠ j

Normally, a mv_cluster contains two vertices m and n=f(m). However, there are also cases where f(m) = φ . 1) 2) m = si , where i = 1,2 ?n ∈ M j , g v (n, m) > g ( P(m), m)

49

The first case occurs when m is the source point. In that case, since root (Ti ) = s i , so P(m) is undefined. For the second case, we cannot find n because of one of the following reasons: (i) it is the missing feature point due to the “slicing direction” problem, (ii) m is noise such that n cannot be found, or (iii) the two points are located at totally non-tubular shape region that they are too far away from each other. For all the cases above, we simply let n = m, for our upcoming analysis. To find all mv_clusters, we apply the following pseudocodes: 1. 2. 3. 4. 5. Put all vertices in M1 and M2 into one set U. Select a vertex m in the set U, apply f(m) and find the corresponding n. If n cannot be found, let n = m. Removed m, n from U. A new mv_cluster is registered = {m, n}. Repeat step 2 to 5 until set U is empty.

5.2.2.2

Protrusion Region (PR) Extraction

After obtaining a series of mv_clusters, we can then extract Protrusion Region (PR). A PR physically means the mesh region where protrusion tip resides. These regions include fingers, toes, ears, etc. To define PR, we first find a geodesic limit (l) with respect to a starting seed z. Let

z , v1 , v2 , t ∈ V and k ∈ ? 3 , we define ? to be a segment ratio:

(t ? v1 ) ? (v 2 ? v1 ) v 2 ? v1

2

?=

Eq. 5.2

such that k = v1 + ? (v 2 ? v1 ) is the closest point to z.

50

Figure 5.4 Graphical representation for the extraction of PR.

Assuming gv (v1 , z ) ≤ gv (v2 , z ) , geodesic limit is defined as:

gv (v1 , z ), if ? < 0 ? ? ? ? l (t , z ) = ?(1 ? ? ) × gv (v1 , z ) + ? × gv (v2 , z ), if 0 ≤ ? ≤ 1? ? gv (v2 , z ), if ? > 1 ? ? ?

Eq. 5.3

Using the geodesic limit, we can define a new PR, PR new , as the region bounded by a mv_cluster and its parent vertices {{m, f(m)}, {P(m), P(f(m))}}:

PRnew = {t | g v (t , z ) ≤ l (t , z ), t ∈ {V \ ∪ PRold}}

Eq. 5.4

where ∪ PRold are all the extracted PR so far and geodesic limit l is defined by letting

v1 = P(m) , v2 = P( f (m)) , z = {x | Max x∈ path ( m, f ( m )) ( xv1 + xv 2 ))} which

denotes the start seed for the PR extraction and x is a vertex on path(m, f(m)). Let

neigh(t) be the one-ring neighborhood of t, boundary RPR of the PR is then defined

as:

51

RPR = PR \ {t | y = neigh(t), t , y ∈ PR, ?y } and P(m),P(f(m)) ∈ RPR Eq. 5.5 As stated in Section 5.2.2.2, noise can be extracted as a mv_cluster and eventually extracted as a PR. To filter away these incorrect protrusions, we first calculate the average angle normal ∠ avgNorm ( PR) for each of them. ∠ avgNorm ( PR) is defined below:

? ?1 ? Norm (t ) ? Norm ( z ) Area ( t ) Cos × ∑t∈PR ? Norm(t ) × Norm( z ) ? ∠avgNorm ( PR) = ∑t∈PR Area(t )

? ? ? ?

Eq. 5.6

where z is the index vertex as shown in Figure 5.4 and t is a vertex in PR. Since these noises correspond to flat regions, the values of ∠ avgNorm ( PR) are small. In practice, we consider a PR having ∠ avgNorm ( PR) > thres to be a valid protrusion. In our experiment, we find that a thres (45-60 degree) can produce very good result.

5.2.2.3

Segment Region (SR) Extraction

After PR extraction, all the remaining regions are sequentially extracted as Segment Regions (SR). SR physically means the mesh region where significant branches occur, such as legs, arms and necks. By extending the idea of PR, we can extract SR similarly, where SR is defined exactly as PR except for two points: 1. SR is no longer originated from a start vertex z like PR. Instead, it is originated

from the boundary of an existing PR or SR (RPR or RSR). 2. Since the starting seed is no longer a point z, but a boundary vertex set R, the

geodesic distance function is also changed from gv(k, v) to gvs(k, R).

52

Hence, a new SR, SR new , is defined as follows:

SRnew = { t | g vs (t , R ) ≤ l (t , R ), t ∈ {V \ {∪ PR + ∪ SRold }}}

Eq. 5.7

where ∪ PR + ∪SRold are all the regions extracted so far and we define geodesic limit l by letting v1 = P ' ( m ) , v 2 = P ' ( f ( m )) , z = R , m, f ( m) ∈ R and P ' (m ) defines the ancestor of m which is not yet visited by any PR or SR so far. The boundary of the SR, RSR, is then defined as:

RSR = SR \ {t | y = neigh(t), t ∈ SR , y ∈ ∪ PR + ∪ SR , ?y}

Eq. 5.8

In order to extract SR reliably, we try to extract the outmost SR first. To do so, we simply calculate

p∈S

Gvs ( RPR )

or

Gvs ( RSR )

and

heap

sort

them

where Gvs ( R ) = ∫

gvs ( p, R )?S . After that, the RPR or RSR with largest Gvs ( R) are

automatically selected as the next SR to extract. Eventually, a final mesh region will be left (e.g., the body of a human model). This last part is then extracted as a final SR. Figure 5.5 shows the results after our extraction process. All PR and SR are colored differently. For the raptor and dino-pet models, nose, jaw, fingers, toes and tail are PR and neck, arms, legs and body are SR. For the female model, fingers, toes, ear tips, eye balls and mouth cavity are PR and arms, legs, neck, body are SR.

53

Raptor

Dino-pet

Female

Figure 5.5 Protrusion and Segment Regions.

The pseudocode for Segment Regions Extraction: 1. 2. 3. 4. 5. 6. 7. 8. 9. For all the RPR, calculate Gvs ( RPR ) and heap sort the result. Select the RPR (or RSR) in the heap with largest Gvs ( R ) as the next SR to extract. Obtain m and f(m) from RPR (or RSR) where m, f ( m) ∈ R and R= RPR (or RSR). Calculate P ' ( m ) , P ' ( f ( m )) If P' (m) and P' ( f (m)) is undefined, go to Step 10. Use {{m, f ( m )}, {P ' ( m ), P ' ( f ( m ))}} to extract SR using Eq. 5.6. Extract the corresponding RSR from SR using Eq. 5.7. Let P ' (m), P' ( f (m)) be the new m, f (m) in RSR . Calculate Gvs( RSR) and heap sort the result.

10. Repeat Step 2 to Step 10, until the heap is empty. 11. Extract the final SR if it exists.

54

5.3.

Feature Extraction

5.3.1 Topological Information

After Bi-Directional LSD analysis, we obtain a set of local maximum clusters, together with a set of bounded regions B = ∪ PR + ∪SR . Since the local maximum clusters are located at protrusion tips and the rings of boundary vertices (R) of both PR and SR are located at articulated joints, their existences actually represent the skeletal information of a model. Such features are not affected by model deformation. For the following subsections, we consider the start vertex z in all PR as topological points and all vertex boundaries R as topological rings.

5.3.2 Geometric Information

Apart from topological characteristics, we are also the first to combine the use of high-dimension geometric features to discriminate different models of similar skeletons. To obtain geometric information for a model, we extract the following features to describe each topological points and rings.

5.3.2.1 Effective Area - Weights of Importance

Different topological points and rings should possess different importance. For example, the importance of a topological point located at the finger would be smaller than that of a topological ring at the waist. To capture these essential features for our models, we consider the use of effective area to represent the importance. Let z be the topological points, RPR, RSR be the topological rings, and B() be the bounded regions associated with the topological points and rings. We also let Area(), Parent() and Children() be the area, parent and children, respectively, of the bounded region.

55

Before we define EffectArea(U), we need to define two more weight-distributing functions:

AreaShareF romParent( U)

Area(B(U)) ? if Parent( B(U)) is n ot the fin al SR ? Area(Paren t(B(U)) ) × Area(b) + Area(Paren t(B(U)) ∑ ? b∈Children(P arent(B(U))) =? Area(B(U)) ? Area(Paren t(B(U)) ) × if Parent( B(U)) is the final S R ? ∑ b∈Children(Parent(B(U))) Area(b) ?

AreaLeft(U) = Area(B(U)) ×

Area(B(U)) ∑b∈Children((B(U)) Area(b) + Area(B(U ))

The effective area EffectArea(U) is defined as follows:

Area(B(U))/2 if U is a topological point ? ? EffectArea(U) = ? Area( B(U )) / 2 + AreaShareFromParent(U) if U is a RPR ? AreaShareFromParent(U) + AreaLeft(U) if U is a RSR ?

Eq. 5.9

∑ EffectArea(U) = 1 , U is a topological charateristics

5.3.2.2

Normalized Geodesic Sum - Spatial Information

To describe the spatial location of b, we use geodesic sum G(U) as the spatial information for topological points and rings. We compute G(U) as follows:

? if U is a topological point ∫p∈S gv(p,U)?S ? G(U) = ? gvs(o,U) × G(w) + gvs(w,U) × G(o) if U is a topological ring ? gvs(o,U) + gvs(w,U) ?

Eq. 5.10 where o = {v | G (v) = Minq∈S G (q), v ∈ S } and w is the ancestor topological point of

56

which a topological ring U originates. Finally, the normalized geodesic sum is then calculated as follows:

Gnorm (U ) =

G (U ) ? Minq∈S G(q) Maxq∈S G (q) ? Minq∈S G (q)

, U is a topological point or ring

Eq 5.11

In practice, Maxq∈S G (q ) actually equals to the maximum of all G(U) calculated above. To find Minq∈S G (q) (i.e., vertex o), we have also developed a hierarchical approach which takes around 25 iterations to converge. The pseudocode for our hierarchical approach is shown below: Function: Extract_Center_Point (Partition p) 1. 2. Find the path with furthest distance (diameter) using dijkstra on p. For each vertex on the path, calculate the distance sum from the other vertices. The one with smallest sum would be the center vertex (c) of the region.

Function: Mesh_Partitioning (Given a mesh S or a partition P, r) 1. 2. Randomly pick an unvisited vertex v on S or P. Apply the dijkstra algorithm to find a partition p with gv(t,v) ≤ r , where t ∈ S or P , and t is unvisited. 3. 4. Store p, and repeat steps 1-3, until all the vertices in S (or P) are visited. Return the collection of partitions.

Function: Hierarchical Partitioning (Given a mesh S) 1. Let r =

0.1× area(S ) , run Mesh_Paritioning (S, r)

57

2.

For each of the partitions (p) obtained i. ii. Apply Extract_Center_Point (p) to find the center vertex c. Calculate G(c) = ∫ gv(t,c)?S .

t∈S

3. 4. 5.

Find the partition p with minimum G(c) . Repeat steps1-3 using r = r / 2 until Area(p) is less than 0.005 × Area( S ) . The last c found becomes the approximated vertex with Minq∈S G (q) .

5.3.2.2

Surface Distribution – Geometric Surface data

To describe the global surface change with respect to topological points and rings, we use three feature vectors K, A, H (each of them is a 20-dimension histogram) to store the curvature, area and average distance information, respectively, with respect to its topological features as in Figure 5.6. After grouping all the vertices t ∈V with respect to geodesic distance gv(t , z) or gvs(t , R) into 20 bands (band1-20), we calculate the curvature, area and average distance in the same interval to form the three feature vectors K1-20, A1-20, H1-20, respectively.

Vertices, t, of two surface meshes are partitioned into 20 bands with respect to

gvs(t , R) from a topological ring R near the dogs’ legs (the dashed line).

Figure 5.6 Surface Mesh are partitioned into color bands.

58

To further improve the matching result, we apply Depth First Search to each band of K1-20 and H1-20 in order to identify individual components. For example, a yellow color band (in Figure 5.6) may have components at the body, at the two legs as well as the tail of the dog model. We then take average for each component using area as weight.

Normalized curvature (NC) for a component:

? ? 1 ? ? ?1 ? (1+ | K (t ) |) ? G ? ?

∑

t∈Component

Eq. 5.12

where K G (t ) is the gaussian curvature, and t is a vertex. Average distance (AD) for a component:

∑

t∈Component

Area(t ) × t ? CM Area(t )

∑

Eq. 5.13

t∈Component

where CM is the center of mass of the component, and t is a vertex. K1-20, A1-20, H1-20 is defined as follows:

Area(Component j ) Area( Band i (U ))

K i (U ) =

Component j ∈Band i (U )

∑

× NC (Component j ) Eq. 5.14

H i (U ) =

Component j ∈Band i (U )

∑

Area(Component j ) Area( Band i (U ))

× AD(Component j ) Eq. 5.15

59

Ai (U ) = ∑t∈Band (U ) Area(t )

i

Eq. 5.16

In all case,

U Band

i

=S,

U Component

j

= Band i (U )

? if U is a topological point, geodesic = g v ? c(U) c(U) Band i(U) = ?t ∈ S (i ? 1) ≤ geodesic(t,U) < (i), ? if U is a topological ring, geodesic = g vs ? n n ?

Eq. 5.17 , where c(U ) = Maxq∈S geodesic(q,U ) .

As a side note, since we are using area as weight for each component, it is not useful for A1-20 because it is already a distribution of area.

5.4.

Feature Matching

5.4.1 Distance Measure of Two Topological Characteristics

To determine the similarity value of two topological characteristics U 1 and U 2 , we use the following formula:

Dist (U 1 ,U 2 ) = W1 × | Gnorm (U 1 ) ? Gnorm (U 2 ) | + W2 × L2,norm ( K (U 1 ), K (U 2 ))

+ W3 × L2,norm ( A(U 1 ), A(U 2 )) + W4 × L2,norm ( H (U 1 ), H (U 2 ))

Eq. 5.18

where U are the topological points or rings and W1, W2, W3 and W4 are ratios such that W1 + W2 + W3 + W4 = 1 .

5.4.2 Distance Measure of Two Models

Since we have a set of topological characteristics as model features and a distance

60

measure between them, we can use Earth Mover Distance (EMD) [Rubner98] to compute the distance between two feature sets (signatures). Earth Mover Distance is a distance measure which calculates minimum amount of work that is required to transform one signature into another. By letting EMD weight equals to the effective area and letting the signature distance be Dist(), we can compute the distance between two models.

5.5.

Experimental Results

In our experimental database, there are 150 different models which are of different groups and postures. To test the invariant properties of our method towards rotation and scaling, we prepare 3 additional sets by rotating against xy-axis, random scaling between (1.0, 2.0], and rotating by yz-axis plus random scaling to produce a total of 600 models. We then manually categorize these models into 13 groups. All of them are shown in Figure 5.7. Tables 5.1 and 5.2 show some of our matching results. We can see that our method can distinguish models based on their skeletons and shapes. For example in Table 5.1, boy, frog and dolphin are totally different models in term of skeletons. Their similarities are shown to have high contrast. In Table 5.2, boy, girl and baby models are similar models in term of skeleton, but different models in term of shapes. Our method can still distinguish them but with a smaller similarity difference. All these match human intuition correctly. Note that the similarity values are taken using all models and the value is normalized by maximum and minimum work done. Figure 5.8 shows the precision and recall graph of our method. It demonstrates that our method outperforms Geometry (D2) [Osada01] (feature size: 72) and Frequency

61

(Fourier) [Vranic01b] (feature size: 21) which are shown to have good performance in [Lau02]. On method can achieve a high accuracy with precision of 0.77 even at a recall rate of 1.0. On the other hand, it also reviews that our method is capable of handling deformable models while [Osada01, Vranic01b] cannot.

dog

boy

cat

baby

dino

raptor

female

frog

dino-pet

horse

dolphin

wolf

hand

Figure 5.7 13 model groups in our model database.

62

Table 5.1 Mean Similarity of non-similar skeleton models.

Boys Boys Frogs Dolphins 1 0.430 0.127

Frogs 0.430 1 0

Dolphins 0.127 0 1

Table 5.2 Mean Similarity of similar skeleton models.

Boys Boys Girls Babies 1 0.840 0.592

Girls 0.840 1 0.551

Babies 0.592 0.551 1

Figure 5.8 Precision and recall graph.

63

5.6.

Invariance Properties

In this section, we analytically show that our feature extraction process is invariant to rotation, scaling and translation. There are two kinds of features used in our method: topological and geometrical features. Topological points and rings are extracted based on Level Set Diagram and are parts of skeletons. Skeletons are invariant to rotation, scaling and translation because they are independent of the coordinate system. As a result, given the same but rotated, scaled or translated 3D model, the extraction of these topological points and rings will not be affected. There are mainly two kinds of geometrical features that we use: the normalized geodesic sum and distribution of curvature, area or average distance. To check if the normalized geodesic sum is invariant to rotation, scaling and translation, we can consider the invariance of the surface center first. Surface center is the location which has smallest average geodesic distance to all the vertices on the surface. (Analogically, the center of mass is the location which has the smallest average Euclidean distance to all the other vertices in Euclidean space.) Therefore, rotation, scaling and translation of the model will not affect the location of the surface center (as well as its center of mass). Normalized geodesic sum is the measure that calculates the spatial distance from the surface center. As seen, rotation and translation does not change such spatial distance. For uniform scaling, the spatial distance may be affected. However, because the geodesic sum is normalized by the maximum and minimum geodesic sums to the range of [0,1], uniform scaling again does not affect this geometric property.

64

To check the invariance of the geometric distribution, we can check if the partitioning of the surfaces can be affected by rotation, scaling and translation. Again, the partitioning is based on the maximum geodesic distance that is measured from the topological characteristics (points or rings). Since each partition is proportionally and equally spaced, rotation, scaling and translation will not affect such partitioning and the features remain unchanged. In generally, both topological and geometric information are not affected by the deformation of the 3D models. First, topological characteristic is part of the skeletal information, and so, it is not affected by the model deformation. Second, for geometric features, normalized geodesic sum and distribution are based on geodesic distance. And since geodesic distance is the distance on the surface, these geometric features will not be affected by model deformation also.

Query

Query

Query

0.588

0.344

0.343

0.517

0.471

0.411

0.564

0.512

0.271

0.338

0.309

0.246

0.345

0.308

0.303

0.188

0.181

0.179

0.245

0.227

0.216

0.287

0.280

0.273

0.177

0.173

0.172

Figure 5.9 Similarity values of some query examples.

65

66

Chapter 6 Evaluation of Bi-Directional LSD Analysis

6.1. Introduction

In previous chapters, we have carried out some experiments and analyses to test the performance of our proposed methods. In those experiments, our topology-based approach in Chapter 5 outperforms [Osada01] and [Vranic01b] where [Osada01] and [Vranic01b] are geometry-based and transform-based methods, respectively. In this chapter, we try to compare our method in Chapter 5 with MRG (Multi-Resolution Reeb Graph) [Hiliga01], with both of them being topology-based methods. After that, we will also give a brief comparison between the two methods that we propose in Chapters 4 and 5. In the following discussion, we will refer to our method discussed in Chapter 4 as the “MLSD” (Modified LSD feature extraction) method and the method discussed in Chapter 5 as the “BDLA” (Bi-Directional LSD Analysis) method.

6.2. Comparison between Bi-Directional LSD Analysis and MRG

Among all the topology-based methods, MRG [Hiliga01] is the earliest work that can handle deformable models. It first splits a 3D model into regular intervals by measuring the sum of geodesic distance. Then, by identifying the individual components within an interval, a multi-resolution reeb graph is built hierarchically to represent the 3D model. In each node, area and length is proposed as geometry data

67

for matching. The main difference between MRG and Bi-Directional LSD Analysis is that our method does not extract features from the whole mesh surface. We focus on topologically important locations only. Moreover, by using additional high-dimensional geometry data, we believe that our method can distinguish models of similar and dissimilar skeletons. In this section, we are going to give a comparison between MRG and BDLA. Our evaluation focuses mainly on two issues: accuracy and speed.

6.2.1 Performance Evaluation

I. Precision and Recall Graph

In Figure 6.1, we have plotted the mean precision and recall of several methods. In particular, we focus on the first two PR curves, BDLA and MRG. The experiment is carried out using datasets described in Section 5.5. From the graph, we can see that BDLA and MRG work equally well when recall is below 0.4. When recall is above 0.4, BDLA outperforms MRG in precision. The increase in performance can be reasoned by the use of additional geometric high-dimension features. In MRG, only local features (area and length) are registered in each node of the reeb graph. During node matching, it is difficult to tell the difference between the overall shapes of the models. This is especially the case when the skeletons of models are similar, for example, dogs, wolves, babies and boys. When the skeletons are similar, the geometric features are the only features that can discriminate them. To tell the difference, BDLA tries to use additional geometric features. Area, curvature and average distance information are especially computed for such purpose. For the case of babies and boys, we can observe that the bodies of boys are relatively longer than

68

those of the babies.

Figure 6.1 Performance comparison between our methods and other methods.

For the case of dogs and wolves, the bodies and ears of the wolves are fatter and shaper, respectively, than those of the dogs. All these differences alter the distributions of area, curvature and average distance and are reflected in the similarity values as shown in Table 6.1. To further verify our claim, we plot the individual precision and recall graph for each group. Some examples are shown in Figure 6.2. We observe the following performance differences in groups, where “>”, “ ≈ ”, “<” mean “performs better”, “performs equally well” and “performs poorer”, respectively.

69

Performance Model Groups

BDLA > MRG Raptors, Dolphins, Dogs, Dinopets, Dinos, Babies, Wolf

BDLA ≈ MRG Hands, Girls, Frogs, Cats, Boys

BDLA < MRG Horses

We observe that BDLA performs better or equally well as MRG in all the model groups, except for the Horse group. As shown in Figure 6.2, our method performs better than MRG for distinguishing similar and dissimilar skeletons models.

Table 6.1 Mean Similarity of Model Groups (BDLA). baby baby boy girl dog wolf cat horse boy girl dog wolf cat horse

1 0.60 0.55 0.27 0.28 0.27 0.14

0.59 1 0.84 0.12 0.20 0.19 0.12

0.55 0.84 1 0.05 0.12 0.14 0.09

0.27 0.12 0.05 1 0.66 0.39 0.34

0.28 0.20 0.12 0.66 1 0.54 0.51

0.27 0.19 0.14 0.39 0.54 1 0.40

0.14 0.12 0.09 0.34 0.51 0.40 1

To account for the performance drop in the Horse group, we find that it is caused by the intrinsic “slicing direction” problem of LSD. Considering the two horse models in Figure 6.3, we can see that the neck of the right horse is extracted as a segment region while there is none in the left one. The difference is due to the two different starting minima used in our BDLA. In the left horse, the two minima are located at the tail and the horse’s head. In the right horse, the two minima are located at the tail and at the front leg. Since one of the seeds in the right horse is located at the front leg, a saddle can be found at the back of the neck, and this additional saddle point makes

70

the difference. According to [Mortara02], the “slicing direction” can be solved by using multiple seeds. We believe that by applying more starting seeds (say three seeds in total) in our BDLA, the issue can be handled. As a side note, though the problem occurs in the horse model, the performance drop is slight. We believe that our method is fast and reliable in determining topological characteristics. On the other hand, we are still investigating for a more reliable and efficient topology or skeleton extraction algorithm that performs better and faster than LSD.

Babies

Boys

Horses

Dogs

Wolves

Cats

Figure 6.2 Precision and recall graphs for some individual model groups.

71

Figure 6.3 The neck of the horse suffers from slicing direction problem.

II. Similarity Matrix

In Figure 6.4, we have plotted the mean similarity matrix of BDLA and MRG. The matrix is normalized by the maximum and minimum values for each group and we use grayscale to represent the normalized similarity value. The darker the color in the similarity matrix, the higher the similarity between the corresponding groups. To give a better understanding of the MRG similarity matrix, we also rearrange the order for some of the model groups. These groups are shown in italic and underlined fonts in Figure 6.4. From the similarity matrix, we can observe three things: BDLA MRG

Figure 6.4 Mean Similarity Matrix of BDLA and MRG.

72

1.

BDLA performs as good as MRG in identifying models of the same group. It can be reviewed from the diagonal lines of the two similarity matrices. All the groups can identify their corresponding classes with the highest similarity values comparatively. This leaves the diagonal lines the darkest in color.

2.

BDLA gives a higher similarity contrast between dissimilar models. This can be seen from the circled regions in Figure 6.4. In the circled regions, BDLA gives a lighter color than MRG does. This means, BDLA can give a higher similarity contrast within these regions. The regions actually correspond to the similarity values of dinosaurs, human beings against 4-legs animals. Note that although we have changed the group order of the MRG’s similarity matrix, the circled regions corresponds to the same model groups. The reason that BDLA can give higher contrast is the use of both topological and geometric high-dimension features. During models matching, the difference in the number of topological features and their corresponding importances require greater work done for Earth Mover Distance to rearrange the distribution. Together with the geometric features discussed in the previous section, BDLA can thus discriminate models of dissimilar skeleton models with high contrast. While for MRG, the similarity measure is based on reeb graph matching only. For dissimilar skeleton models, the differences can only be captured at the very low level of the reeb graph using low-dimension geometric features. These features (area and length) are relatively local and may not tell the overall shape of a model. As a result, it is relatively difficult to tell the difference compared to BDLA.

3.

BDLA gives better matching results for similar skeleton models when compared

73

to MRG. There are two observations we can make. First, MRG considers the hand models to be similar to many 4-leg animals like, dogs, wolves and horses, while BDLA can discriminate the hands from these models. Second, we can see that MRG again considers cats, dino and dolphins to be very similar. However, they are different models by human inspection, and BDLA can discriminate them easily. In the above cases, we can see that these mismatched models are similar in term of skeletons. For example, dogs, wolves, cats and horses are 4-leg animals with tails. Dolphin has 4 side fins and a back fin, while human hands have 5 fingers. If we consider only topological information, it is difficult to tell the difference. For example, in BDLA, the number of topological characteristics would be the same. To obtain a better matching performance, BDLA uses additional high-dimension geometric features and weights of importance to describe the overall shape of these models, as contrast to the low-dimension geometric features used in MRG node matching. This contributes to the reason for BDLA outperforming MRG.

6.2.2 Complexity & Time Analysis

Apart from performance analysis, we have also investigated the time complexity of difference processes of BDLA.

Feature Extraction

Table 6.2 shows all the steps and summarizes the corresponding complexity calculations. For MV_Clusters, PR and SR extraction, we adapt the dijkstra

74

algorithm in practical coding. For each extraction of mv_cluster, protrusion and segment region, our algorithm stops when the cluster or region is identified. Since we do not extract visited vertices, only boundary vertices of each region may be visited more than once. The algorithm will process all the vertices and its complexity is slightly higher than O(n log n + e) but bounded by R × O(n log n + e) where R ≈ 3 is a constant. In order to find the minimum sum of geodesic, we have also designed a hierarchical approach which hasη × O(n log n + e) andη ≤ 25 in our experiment. For geodesic sum calculation, it takes ? × O(n log n + e) for all the ? topological points and rings. In practice, we pre-calculate all sum of geodesic during the PR and SR extraction. This allows us to heap sort the values afterward. In general, the whole BDLA feature extraction algorithm takes (4 + R + ? + η ) × O(n log n + e) . Since

(4 + R) is roughly a constant, the complexity of the whole algorithm depends on

η + ? , “the number of geodesic sum calculations”.

Table 6.2 Complexity Analysis of BDLA.

Topological Feature Extraction Modified LSD MV_Clusters , PR & SR Extraction Heap sort for SR extraction 4 × [O(n log n + e) + λs × O(nv + ne )] , nv + ne << n < R × O(n log n + e) , where R is a constant << n . In our experiment R ≈ 3 O( ? log ? ) , where ? << n

Geometric Feature Extraction Geodesic Sum Hierarchical Approach Area Weight

? × O(n log n + e) η × O(n log n + e) , η ≤ 25

O ( n)

75

Area, Curvature, Average Distance Distribution Overall Feature Extraction Notations:

? × [O(n) + O(n + e)]

(4 + R + ? + η ) × O(n log n + e)

?

n e

λs

nv , ne R

The number of topological characteristics found = Number of topological points + rings The number of vertices in the models The number of edges in the models The number of saddles encountered in MLSD The number of vertices and edges visited during DFS traversal in MLSD A constant

Similar to BDLA, MRG also requires the computation of geodesic sums for interval partitioning. It first samples a large number of seeds regularly over the model surface. Then, it calculates the geodesic sums and interpolates the values over other vertices. In order to obtain a good approximation for interval partitioning, the number of seeds required is usually over 130 (Table 6.3). Compared to MRG, BDLA runs much faster because we do not need to sample large number of seeds on the model surface. On the other hand, we limit the geodesic calculation to the topologically important locations only. (i.e., η + ? ≈ 55 ) Since geodesic sum calculations dominate the time required for the feature extraction process, by doing so, we greatly reduce the time required for this process. In Table 6.3, we show the time required for feature extraction using BDLA and MRG as a comparison. As we observe from the table, BDLA runs a lot faster than MRG. The time that is required by MRG is 4.5 times that of BDLA.

76

Feature Matching

To compare the features of two models, we apply the Earth Mover Distance (EMD). A theoretical computation analysis on the complexity of Earth Mover Distance is hard [Rubner00] because it is based on simplex algorithm. However, according to [Rubner00], if EMD is formulated as a bipartite graph problem with signatures of the same size, the time complexity is roughly equals to O(n 3 log n) where n is the number of topological characteristics. Compared to MRG, which has an overall complexity of O(nr × (mr + nr )) , BDLA has a higher complexity. However, as shown in Table 6.4, matching time for BDLA is

9ms = 15 times faster than that of MRG. From our 0.6ms

observation, the speedy performance of BDLA is resulted from the small number of topological characteristics, n, required. On average, n = 30 while the number of r-nodes nr = 551. It should be noted that the number of topological characteristics in BDLA are dependent on the model’s topology only, and is invariant to tessellation. As seen, even though BDLA has a higher complexity, the actual number of ground distance calculations is limited and relatively smaller compared to MRG. The total number of ground distance computations required is approximately equal to: For BDLA (n = 30) : (30) × log(30) ≈ 40000

3

For MRG (nr = 551) : (551) × (551 + 551) = 607202

so

40000 1 ≈ 607202 15

77

Table 6.3 Time analysis of BDLA and MRG for feature extraction.

BDLA Average number of vertices / faces per model Average number of geodesic sum calculated (per model) Average time for all geodesic calculation (per model) Average total time required for feature extraction per model Machine 8598 / 17192

MRG 8598 / 17192

55

132

14.74 s

65.64 s

25.88 s

68.12 s

Pentium 4 - 2.4GHz / 1G RAM

Table 6.4 Time analysis of BDLA and MRG for feature matching.

BDLA Average number of topological features (per model) Average total time required for matching one model Average total time required for one query Total time required for our retrieval experiments (600 queries) Machine 30 points and rings

MRG 551 r-nodes

1 ms

15 ms

0.6s

9s

8.62 min

92.86 min

Pentium 4 - 2.4GHz / 1G RAM

78

6.3. Comparison between the two Proposed Method: Modified Level Set Diagram (MLSD) and Bi-Directional LSD Analysis (BDLA)

From Figure 6.1, we can see that BDLA outperforms MLSD in accuracy because BDLA gives a better precision recall curve. The increased performance is mainly due to the use of additional high-dimension geometric features. In MLSD, only one high-dimension geometric feature (curvature distribution) is used. However, in BDLA, there are three high-dimension geometric features (curvature, area and average distance distribution). On the other hand, from Table 6.5, we can see that MLSD outperforms BDLA in time requirement. In the feature extraction process and the matching process, MLSD is 1.2 and 2.78 times, respectively, faster than BDLA. This can be explained by the use of only one LSD analysis in MLSD and the features extracted in MLSD are much smaller than those of BDLA. To conclude, though MLSD performs less accurate than BDLA and MRG, its performance is still relatively better than those of the other two geometric [Osada01] and transform-based [Vranic01b] methods. MLSD extract features based on skeleton alignment, and so it can handle deformable models where [Osada01] and [Vranic01b] cannot. However, MLSD has a much lower computational cost as compared with BDLA and MRG. This means that MLSD can be used as initial retrieval filtering to speed up the whole retrieval process. Another advantage of MLSD is that it can support genus models. For BDLA, since our major concern focuses on protrusion extraction and targets at articulated models, it currently cannot support genus models. Table 6.6, we summarize the comparison between MLSD and BDLA.

79

Table 6.5 Time analysis of BDLA and MLSD for feature extraction & matching.

BDLA Average Number of features Average Total Time required for feature extraction per model Average time for matching one model Average total time for one query Total time for 600 queries (Our Experiment) Machine 30 points and rings

MLSD 26 topological points

25.88s

21.29s

1 ms 0.6 s 8.62 min

0.359 ms 0.2154 s 2.15 min

Pentium 4 - 2.4GHz / 1G RAM

Table 6.6 Comparison between MLSD and BDLA.

MLSD Feature Extraction Feature Matching Evaluation Others Topological Geometrical Similarity Measure Accuracy Performance Genus Topological Points 1x High-Dimension Features Bipartite Matching - Depends on number of features Less Accurate Around 2 times faster LSD (support)

BDLA Topological Points, Rings 3x High-Dimension Features Earth Mover Distance - Depends on total weight More Accurate Future Work

6.4. Discussions Strengths

In this section, we have evaluated the performance of BDLA and MRG in term of accuracy and speed. Regarding to the accuracy, we can see that BDLA works equally

80

well or even outperforms MRG on most of the model groups. This can be explained by the use of high-dimension topological and geometric features in BDLA. In MRG, though reeb graph implicitly stores the topological information, the low-dimension geometric features (area and length) are insufficient for discriminating different models of similar skeletons. As a result, BDLA performs better than MRG in discriminating different models of similar and dissimilar skeletons. The high performance of BDLA also indicates that the uses of topological characteristics (points and rings) are appropriate for representing deformable models. These features are invariant to rotation, scaling, transformation and are not affected by model deformation. BDLA also outperforms MRG in term of speed. In the feature extraction process, BDLA tries to reduce the geodesic calculations to topologically important (skeletal) locations only. This change greatly cuts down the time required for feature extraction because geodesic calculation is the dominant part for topological analysis. As a result, our feature extraction process is 4.5 times faster than that of MRG. In the feature matching task, BDLA also outperforms MRG. Though the complexity of BDLA’s matching algorithm (EMD) is higher than that of MRG, BDLA runs a lot faster than MRG. The reason is solely because of the limited topological characteristics required in BDLA. As the feature size is small, BDLA runs extremely fast. It is 15 times faster than MRG. We believe that the high accuracy and speed of BDLA make it a competitive candidate for online deformable models retrieval system.

Weaknesses

Though BDLA outperforms MRG in both accuracy and speed, there are several issues that we have not explored yet. First, BDLA currently cannot support

81

non-manifold, non-closed models or work poorly on models with genus number greater than zero. This is due to the use of Level Set Diagram (LSD) because LSD cannot handle or works poorly on such models. On the other hand, although Modified LSD has greatly reduced the noise that appears in LSD, Modified LSD, similar to LSD, works best on tubular-shaped model only. Note that because we only consider neighboring vertices within the same waterfronts in MLSD, the features may still be affected by the noise problem if the models are finely tessellated.

Summary

In conclusion, our experience in developing a retrieval system that can support all types of models (such as non-manifold, non-closed, disjoint models, polygon soap and point cloud models) while being fast and accurate is extremely difficult. Though our method, BDLA, is not as practical as MRG when handling some of these ill-defined models, the research idea of combining topological characteristics (topological points and rings), geometric features (spatial locations,

high-dimensional geometric features) and weights are novel, and can be extended for future research analysis.

82

Chapter 7 Conclusion and Future Work

7.1. Conclusion

This thesis examines the matching and retrieval techniques of 3D deformable models. Particularly, we focus ourselves in designing a novel feature extraction and matching algorithm. Our method can be classified as a new topology-based method. In the following section, we summarize our research contributions as follows: 1. We are the first researchers to purpose the use of topological points and rings as features to represent a 3D deformable model. This differs from traditional topology-based methods that use reeb graph [Hiliga01], explicit skeleton [Sundar03, Biasotti03] or components [Tal04] as topological features. 2. Apart from topological features, this is also the earliest work to stress on using high-dimension geometric features to distinguish models of similar and dissimilar skeletons. This not only differs from traditional topology-based methods that concern solely the matching of dissimilar skeletons models [Sundar03, Biasotti03, Tal04], but also give a better matching performance than MRG [Hiliga01] which uses only low-dimension local features. Among all the topology-based methods, none of them shows 3D retrieval results in their experiment. We are also the first work that can show our retrieval results and compare it with other methods. 3. Since we use topological points and rings as features, we also introduce the use

83

of Earth Mover Distance as the distance measure. Earth Mover Distance is an algorithm for flow and transportation problem. Our work is the first project to show that matching of topological entities is not necessary accompanied by skeleton or graph matching algorithms. On the other hand, by using additional high-dimension geometric features and weights to describe the importance of nodes, the matching can be modeled as a flow and transportation problem. We believe that the idea of node weighting can be applied to other skeleton or graph matching algorithm for topological analysis. 4. We have introduced the use of Modified LSD for reliable topological point extraction. Though it is used as a feature extraction process here, we believe that it is also useful for many other applications which require reliable skeleton extraction. On the other hand, Bi-Directional LSD Analysis can also be extended for model segmentation. 5. Above all, we have introduced in this thesis a novel matching and retrieval method for 3D deformable models. The features we have extracted are invariant to scaling, translation and rotation. From our experimental results, we have shown that our method outperforms MRG [Hiliga01] in both accuracy and speed. This also opposes the general idea that topology-based methods are usually slow in practice. To conclude, our content-based feature extraction and matching algorithm is fast and efficient. The features extracted are invariant to scaling, translation and rotation. Since the features are not affected by model deformation, our method can handle deformable models. To solve the noise problem of LSD, we have designed and applied the Modified LSD to filter noise. In our feature matching process that uses

84

Earth Mover Distance, we also apply small weightings towards noise and small features. All these measures help stabilize our similarity measure. Judging from our experiment results, we can say that we have met the project goals, which we have set out in Chapter 3.

7.2. Future Work

There are many future directions for our 3D deformable model retrieval system. According to their natures, we group them into two parts:

z

Improvements to the existing system First, our feature extraction process can be improved. Referring to Section 6.2.1 (I), though Bi-Directional LSD Analysis performs very well in 12 out of 13 groups of models, it fails to extract the “neck” features from some of the horses models. The failure is caused by the intrinsic “slicing direction problem” of LSD. We believe that by adding more starting seeds to our algorithm [Mortara02], the problem can be solved. On the other hand, we may also consider designing a new skeleton extraction algorithm, or re-using other slower methods like mesh segmentation [Katz03] algorithms, to extract topological points and rings. Second, we may apply weighting to each component of our high-dimension geometric features. For example, a component in our geometric distribution that is far away from the topological characteristic may have less influence than a nearby one. Weightings may tell the importance and also improve the matching result. L2-Norm is currently used to calculate the distance between two

85

geometric distributions. We believe that by applying elastic matching [Ohbuchi02] instead of L2-Norm, the matching results may be furthered improved.

z

Extension of current work One of the major problems of a retrieval system is its reduction in efficiency towards large database. This is especially essential to our system because retrieval of topological features is slow when the datasets grow. To handle this, a general approach is to index features in a spatial database and then use an indexing algorithm (e.g. r-tree) for fast retrieval. Another approach is to design a matching algorithm that obeys triangle-inequality and then use an indexing algorithm for fast retrieval. For the first approach, it is very difficult for us to transform our existing features (sets of points and sets of distributions) into a point in the feature space of spatial database. However, for the second approach, there is room for us to research. In our feature matching process, we adapt the Earth Mover Distance (EMD) as the distance measure. If we can show that EMD together with our ground distance follows the triangle inequality, it is possible for us to build a fast indexing algorithm for retrieving deformable models. Another major concern of a 3D model retrieval system is the input data format. Non-manifold, ill-defined polygon clouds are only some of the poor 3D data formats most researchers try to avoid. In our system, we assume that all models are of proper formats. In the future, we will extend our algorithm to accept this kind of ill-defined models. One of the approaches is to apply mesh repairing techniques to the input model [Nooruddin03] before actual processing.

86

To conclude, in this work, we have presented our Bi-Directional LSD Analysis for 3D deformable model matching and retrieval. Though the purposed method is fast and accurate compared to other methods, we believe that we have only scratched the surface of the current research area. A lot of improvements, ideas and new directions are awaiting us to discover and explore.

87

References

[Alarcon02]

P. d. Alarcon, A. D. Pascual-Montano, and J. M. Carazo, "Spin Images and Neural Networks for Efficient Content-Based Retrieval in 3D Object Databases", Proc. of CIVR 2002, pp. 225-234, 2002.

[Bespalov03]

D. Bespalov, A. Shokoufandeh, W. Regli, and W. Sun, "Scale-Space Representation of 3D Models and Topological Matching", Proc. of ACM/SIGGRAPH Symp. on Solid Modeling and Applications, Jun 2003.

[Biasotti03]

S. Biasotti, S. Marini, M. Mortara, G. Patane, M. Spagnuolo, and B. Falcidieno, "3D Shape Matching through Topological Structures", Lecture Notes in Computer Science, vol. 2886, pp. 194-203, 2003.

[Chen02]

D. Chen and M. Ouhyoung, "A 3D Object Retrieval System Based on Multi-Resolution Reeb Graph", Proc. of Computer Graphics Workshop, pp. 16, Jun 2002.

[Chen03]

D. Chen, X. Tian, Y. Shen, and M. Ouhyoung, "On Visual Similarity Based 3D Model Retrieval", Proc. of Eurographics, Sept. 2003.

[Cheung03]

K. Cheung, H.S. Wong, and H. Ip, "3D Graphical Model Retrieval by Evolutionary Computation-based Adaptive Histogram Binning", Proc. of Intl. Conf. on Visual Information Systems (VISual'2003), 2003.

[Chuang02]

J. Chuang, C. Tsai, and M. Ko, "Skeletonization of three-dimensional object using generalized potential field", IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11):1241-1251, Nov 2002.

[Elad01]

M. Elad, A. Tal, and S. Ar, "Content Based Retrieval of VRML Objects An Iterative and Interactive Approach", Proc. of EG Multimedia, pp. 97-108, Sept 2001.

[Funkhouser03]

T. Funkhouser, P. Min, M. Kazhdan, J. Chen, A. Halderman, D. Dobkin, and D. Jacobs, "A Search Engine for 3D Models", ACM Transactions on Graphics, 22(1):83-105, Jan 2003.

[Gagvani99]

N. Gagvani and D. Silver, "Parameter Controlled Volume Thinning",

88

Graphical Models and Image Processing, 61(3): 149-164, 1999. [Hart99] J. Hart, "Computational Topology for Shape Modeling", Proc. of Shape Modeling International, pp. 36-45, 1999. [Hilaga01] M. Hilaga, Y. Shinagawa, T. Kohmura, and T. Kunii, "Topology Matching for Fully Automatic Similarity Estimation of 3D Shapes", Proc. of SIGGRAPH 2001, pp. 203-212, Aug 2001. [Horn84] B. Horn, "Extended Gaussian images", Proc. of IEEE,

72(12):1671-1686, Dec 1984. [Ip02a] C. Ip, D. Lapadat, L. Sieger, and W. C. Regli, "Using Shape Distributions to Compare Solid Models", Proc. of ACM/SIGGRAPH Symp. on Solid Modeling and Applications, Jul 2002. [Ip02b] H. Ip and W. Wong, "3D Head Model Retrieval Based on Hierarchical Facial Region Similarity", Proc. of International Conference on Visual Interface (VI2002), pp. 314-319, May 2002. [Katz03] S. Katz and A. Tal, "Hierarchical Mesh Decomposition using Fuzzy Clustering and Cuts", Proc. of SIGGRAPH 2003, pp. 954-961, July 2003. [Kazhdan03a] M. Kazhdan, B. Chazelle, D. Dobkin, T. Funkhouser, and S. Rusinkiewicz, "A Reflective Symmetry Descriptor for 3D Models", Algorithmica, 2003. [Kazhdan03b] M. Kazhdan, T. Funkhouser, and S. Rusinkiewicz, "Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors", Proc. of Symp. on Geometry Processing, 2003. [Kazhdan04] M. Kazhdan, T. Funkhouser, and S. Rusinkiewicz, "Shape Matching and Anisotropy", Proc. of SIGGRAPH 2004, Aug 2004. [Keim99] D. A. Keim, "Efficient Geometry-based Similarity Search of 3D Spatial Databases", Proc. of SIGMOD, pp. 419-430, 1999. [Kimmel98] R. Kimmel and J. A. Sethian, "Computing Geodesic Paths on Manifolds", Proc. of National Academy of Sciences, 95(15): 8431-8435, Jul 1998.

89

[Kortgen03]

M. Kortgen, G. Park, M. Novotni, and R. Klein, "3D Shape Matching with 3D Shape Contexts", Proc. of Central European Seminar on Computer Graphics, Apr 2003.

[Lau02]

R.W.H. Lau and B. Wong, "Web-Based 3D Geometry Model Retrieval", World Wide Web, 5(2):193-206, 2002.

[Lazarus99]

F. Lazarus and A. Verroust, "Level Set Diagrams of Polyhedral Objects", Proc. of ACM/SIGGRAPH Symp. on Solid Modeling and Applications, 1999.

[Leifman03]

G. Leifman, S. Katz, A. Tal, and R. Meir, "Signatures of 3D Models for Retrieval", Proc. of The Israel-Korea Bi-National Conference on Geometric Modeling and Computer Graphics, pp. 159-163, Feb 2003.

[Li01]

X. Li, T. W. Toon, and Z. Huang, "Decomposing polygon meshes for interactive applications", Proc. of Symp. on Interactive 3D graphics, 2001.

[Liu03]

X. Liu, R. Sun, S. Kang, and H. Shum, "Directional Histogram Model for Three-Dimensional Shape Similarity", Proc. of Conf. on Computer Vision and Pattern Recognition, Jun 2003.

[Manzanera99]

A. Manzanera, T. Bernard, F. Preteux, and B. Longuet, "Medial Faces from a Concise 3D Thinning Algorithm", Proc. of Int'l Conf. on Computer Vision, pp. 337-343, Sep 1999.

[Mehlhorn99]

K. Mehlhorn and S. Naher, “LEDA: A Platform for Combinatorial and Geometric Computing”, Cambridge University Press, 1999.

[Meyer02]

M. Meyer, M. Desbrun, P. Schroder, and A. Barr, "Discrete Differential-Geometry Operators for Triangulated 2-Manifolds", Proc. of VisMath, May 2002.

[Min02]

P. Min, J. Chen, and T. Funkhouser, "A 2D Sketch Interface for a 3D Model Search Engine", Proc. of SIGGRAPH 2002 Technical Sketches, pp. 138, Jul 2002.

[Min03]

P. Min, J. Halderman, M. Kazhdan, and T. Funkhouser, "Early Experiences with a 3D Model Search Engine", Proc. of Symp. On Web3D, pp. 7-18, Mar 2003.

90

[Mortara02]

M. Mortara and G. Patane, "Affine-invariant Skeleton of 3D Shapes", Proc. of Shape Modeling International, 2002.

[Nooruddin03]

F. Nooruddin and G. Turk, "Simplification and Repair of Polygonal Models Using Volumetric Techniques", IEEE Transactions on Visualization and Computer Graphics, 9(2): 191-205, Apr 2003.

[Novotni01]

M. Novotni and R. Klein, "A Geometric Approach to 3D Object Comparison", Proc. of Int’l Conf. on Shape Modeling and Applications, pp. 167-175, May 2001.

[Novotni02]

M. Novotni and R. Klein, "Computing Geodesic Distances on Triangular Meshes", Proc. of The Int’l Conf. in Central Europe on Computer Graphics, Visualization and Computer Vision'2002, Feb 2002.

[Novotni03]

M. Novotni and R. Klein, "3D Zernike Descriptors for Content Based Shape Retrieval", Proc. of ACM/SIGGRAPH Symp. on Solid Modeling and Applications, Jun 2003.

[Ohbuchi02]

R. Ohbuchi, T. Otagiri, M. Ibato, and T. Takei, "Shape-Similarity Search of Three-Dimensional Models Using Parameterized Statistics", Proc. of the Pacific Graphics 2002, pp. 265-274, Oct 2002.

[Ohbuchi03a]

R. Ohbuchi, T. Minamitani, and T. Takei, "Shape Similarity Search of 3D Models by Using Enhanced Shape Functions", Proc. of Theory and Practice in Computer Graphics, 2003.

[Ohbuchi03b]

R. Ohbuchi and T. Takei, "Shape-Similarity Comparison of 3D Shapes Using Alpha Shapes", Proc. of Pacific Graphics 2003, Oct 2003.

[Ohbuchi03c]

R. Ohbuchi, M. Nakazawa, and T. Takei, "Retrieving 3D Shapes Based On Their Appearance", Proc. of ACM SIGMM Workshop on Multimedia Information Retrieval, Nov 2003.

[Osada01]

R. Osada, T. Funkhouser, B. Chazelle, and D. Dobkin, "Matching 3D Models with Shape Distributions", Proc. of Shape Modeling International, May 2001.

[Paquet98]

E. Paquet and M. Rioux, "Content-Based Access of VRML Libraries", Proc. of Multimedia Information Analysis and Retrieval 1998, pp. 20-32, 1998.

91

[Rubner98]

Y. Rubner, C. Tomasi, and L. Guibas, "A metric for distributions with applications to image databases", Proc. of the IEEE Int’l Conf. on Computer Vision, pp. 59-66, Jan 1998.

[Rubner00]

Y. Rubner, C. Tomasi, and L. Guibas, "The Earth Mover's Distance as a Metric for Image Retrieval", Journal of Computer Vision, 40(2):99-121, Nov 2000.

[Shilane04]

P. Shilane, P. Min, M. Kazhdan, and T. Funkhouser, "The Princeton Shape Benchmark", Proc. of Shape Modeling International, Jun 2004.

[Shum96]

H. Y. Shum, M. Hebert, and K. Ikeuchi, "On 3D Shape Similarity", Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR '96), pp. 526-531, Jun 1996.

[Steiner02]

D. Steiner and A. Fischer, "Cutting 3D freeform objects with genus-n into single boundary surfaces using topological graphs", Proc. of the ACM/SIGGRAPH Symp. on Solid Modeling and Applications, Jun 2002.

[Sundar03]

H. Sundar, D. Silver, N. Gagvani, and S. Dickinson, "Skeleton Based Shape Matching and Retrieval", Proc. of Shape Modeling and Applications Conf., May 2003.

[Tal04]

A. Tal and E. Zuckerberger, "Mesh Retrieval by Components", Technical Report, Faculty of Electrical Engineering, Technion, CCIT-475, EE-2004, Mar 2004.

[Tam04a]

K.L. Tam, R.W.H. Lau, and C.W. Ngo, "Deformable Geometry Model Matching Using Bipartite Graph", Proc. of Computer Graphics International (CGI), pp. 335-342, Jun 2004.

[Tam04b]

K.L. Tam, R.W.H. Lau, and C.W. Ngo, "Deformable Geometry Model Matching by Topological and Geometric Signatures", Proc. of Int'l Conf. on Pattern Recognition (ICPR), Vol. 3, pp. 910-913, Aug 2004.

[Tangelder03]

J. Tangelder and R. Veltkamp, "Polyhedral Model Retrieval Using Weighted Point Sets", Proc. of Shape Modeling International, 2003.

[Vandeborre02]

J. Vandeborre, V. Couillet, and M. Daoudi, "A Practical Approach for 3D Model Indexing by combining Local and Global Invariants", Proc. of Int’l Symp. on 3D Data Processing Visualization and Transmission

92

(3DPVT 2002), Jun 2002. [Vranic00] D. V. Vranic and D. Saupe, "3D Model Retrieval", Proc. of the Spring Conf. on Computer Graphics and its Applications (SCCG2000), pp. 89-93, May 2000. [Vranic01a] D. Vranic, D. Saupe, and J. Richter, "Tools for 3D-object Retrieval: Karhunen-Loeve Transform and Spherical Harmonics", Proc. of the IEEE 2001 Workshop Multimedia Signal Processing, pp. 293-298, Oct 2001. [Vranic01b] D. Vranic and D. Saupe, "3D Shape Descriptor Based on 3D Fourier Transform", Proc. of ECMCS 2001, pp. 271-274, Sep 2001. [Vranic02] D. Vranic and D. Saupe, "Description of 3D-Shape Using a Complex Function on the Sphere", Proc. of the IEEE Int’l Conf. on Multimedia and Expo (ICME 2002), pp. 177-180, Aug 2002. [Wong03] H. Wong, K. Cheung, and H. Ip, "An evolutionary optimization approach for 3D human head model classification", Proc. of ACM SIGMM Int’l Workshop on Multimedia Information Retrieval, 2003. [Yu03] M. Yu, I. Atmosukarto, W. Leow, Z. Huang, and R. Xu, "3D model retrieval with morphing-based geometric and topological feature maps", Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2003. [Zhou99] Y. Zhou and A. Toga, "Efficient Skeletonization of Volumetric Objects", IEEE Transactions on Visualization and Computer Graphics, 5(3):196-209, 1999.

93

Vitae

Gary Tam received a B.Eng. degree in Computer Science from the Hong Kong University of Science and Technology in 2002. He is currently pursuing a MPhil degree at City University of Hong Kong. His research focuses on Computer Graphics and Pattern Recognition.

Publications:

K.L. Tam, R.W.H. Lau, and C.W. Ngo, "Deformable Geometry Model Matching Using Bipartite Graph", Proc. of Computer Graphics International (CGI), pp. 335-342, Jun 2004. K.L. Tam, R.W.H. Lau, and C.W. Ngo, "Deformable Geometry Model Matching by Topological and Geometric Signatures", Proc. of International Conference on Pattern Recognition (ICPR), Vol. 3, pp. 910-913, Aug 2004.