##### 1. Background

In computer vision, image segmentation is the process that decomposes an image into multiple segments. The goal of this process is to partition an image into something more meaningful and easier for subsequent analyses. For example, for the image shown in Figure 1, human eyes can easily recognize that there are two persons walking on a beach. Apparently, they have just finished snorkeling. To computers, however, this image is nothing but an array of pixel data, with each image pixel (picture element) containing three different kinds of color values (Red, Green, and Blue). If we can decompose the image into several smaller regions with each region containing similar colors or textures, it becomes easier for the computer to recognize the possible objects in the image (like humans, diving boots, and beach) and understand the image content.

Over the past few decades, hundreds or even thousands of segmentation algorithms have been proposed, trying to produce segmentation results that are close to what human eyes perceive. Among these segmentation approaches, graph-based methods and clustering-based methods have proven to be quite successful and have been widely used. However, for the beach image in Figure 1, graph-based methods tend to merge image pixels together based on the local similarity in colors or textures. This kind of approach may cause the sea region in the image to be segmented into several small image segments that are slightly different in color or to be merged with a few portions of the sky that share similar colors with the sea. On the other hand, for clustering-based methods, it is usually necessary to explicitly specify the number of image segments before segmentation. This requirement would be impractical and inconvenient in the automatic analysis of this beach image.

##### 2. Current Progress

###### 2.1 Overview

In this semester, we investigated whether a deeper understanding of the probability distribution of the image data in the feature space could be helpful for the improvement of the image segmentation process. In our study, we developed a hierarchical image segmentation algorithm based on a split-and-merge scheme. That is, we first split the image into small image regions based on some pre-defined criteria. After that, we merge image regions with similar characteristics together to form visually meaningful image segments.

In Figure 1, we illustrate this two-phase framework for image segmentation. In the first phase, named the split phase, we propose an efficient partition algorithm, called Just-Noticeable-Difference Bayesian Sequential Partitioning (JND-BSP), to derive a set of partitioning cells in the commonly used CIE L*a*b* color space, as illustrated in (a) and (b). In the second phase, named the merge phase, we propose a simple but effective merging criterion to sequentially construct a hierarchical structure that represents the relative similarity among these partitioned regions. This hierarchical representation consists of monotonically decreasing number of clusters, as illustrated in (c). With this hierarchical representation, if we only want to decompose the image into two segments, we get the right sub-image in (c). On the other hand, if we want to decompose the image into 11 segments instead, we get the left sub-image in (c). Based on this hierarchical representation, we can decompose the image into any number of image segments as we wish. This capability will be very useful for us to perform object recognition or scene analysis.

###### 2.2 Technical Details

In the first phase of the proposed segmentation framework, we developed the so-called JND-BSP algorithm based on the Bayesian Sequential Partitioning (BSP) algorithm proposed in 2013 by the hosting laboratory of my visit, Professor Wing H. Wong’s laboratory at Stanford University. The original BSP algorithm explores the probability distribution of the feature space based on binary partitioning. Compared to traditional approaches for probability density estimation, such as histogram or kernel density estimation, the BSP approach has proven to be more capable of providing accurate and efficient estimates when the feature space is of moderate to high dimensions.

However, even though the original BSP algorithm can provide efficient estimates of the probability density distribution, we actually focus more on the detection of cluster separation in image segmentation, rather than an accurate description of data distribution. Hence, we come up with a new partitioning strategy, called Just-Noticeable Difference-Bayesian Sequential Partitioning. Compared to the original BSP algorithm, the proposed JND-BSP method provides improvement in two aspects: (1) the JND-BSP algorithm takes into account the color differences among image pixels which is an important clue for image segmentation; and (2) the JND-BSP algorithm adopts the concept of just-noticeable difference to construct a proper prior model for the guidance of data partitioning. In the literature, Just-noticeable difference (JND) is defined as the smallest color difference that can be distinguished by human eyes. In the proposed JND-BSP algorithm, we split the image data into plentiful smaller image regions so that within each image region the color difference between adjacent pixels is always smaller than the JND value. That is, there would be no visually apparent color difference within each split image region. Figure 2 shows a comparison of the splitting results of the original BSP algorithm and the newly developed JND-BSP algorithm. For each split image region, we reproduce its color values by the averaged color value of that region. We can see that the new method can better partition the given image into a set of smooth image regions than the original BSP algorithm, in the sense that the color values in the partitioned region are closer to the averaged color value of that region

After splitting the image into M image partitions, we merge these regions gradually to perform hierarchical clustering. In our algorithm, we define a measure to evaluate the similarity between every two image regions. As we gradually merge the most similar regions into meaningful clusters, we end up with a hierarchical structure consisting of different numbers of image segments on different layers, as illustrated in Figure 1(c).

###### 2.3 Evaluation

To evaluate the performance of the proposed framework on image segmentation, we compare our algorithm (with total 201 cells for each image) with a few popular image segmentation algorithms, including the multi-scale Normalized Cut (MNCut) [1], Graph-Based Image Segmentation (GBIS) method [2], and Meanshift method [3] over the Berkeley BSD dataset. Figure 3 shows the segmentation results with different number of image segments. It can be seen that the proposed method generates quite reasonable image segments that are close to human perception. Moreover, it is worth mentioning that all the other methods need to determine the number of image segments beforehand and then perform the segmentation process. In comparison, the proposed method builds the entire hierarchical structure in a single run. After that, we can simply deduce any number of image segments as we wish. This capability provides a great flexibility for subsequent analyses.

##### 3. Conclusions and Future Work

I am grateful for the Fulbright research grant, which gives me the opportunity to closely cooperate with the research team at Stanford University to develop a new image segmentation method based on the state-of-the-arts probability density estimation algorithm. So far, we have developed a hierarchical image segmentation algorithm based on a modified version of the Bayesian Sequential Partitioning algorithm developed by Professor Wing H. Wong’s laboratory in 2013. The newly proposed JND-BSP algorithm can efficiently decompose an image into plentiful smooth image regions. By analyzing the similarity of these partitioned image regions, we can effectively construct a hierarchical representation scheme to describe the inherent structure of the image. Compared with other segmentation methods, the proposed method performs very well on capturing the image content structure at different resolutions.

Currently, we only apply the JND-BSP algorithm over the CIE L*a*b* color space for image decomposition. Better segmentation results would be expected if we add in more appropriate image features, like image textures. Since the original BSP algorithm has been proved to be very efficient when dealing with moderate-to-high dimensional feature space, we would expect the proposed JND-BSP algorithm can also perform efficiently when more types of image features are utilized for image segmentation in the near future.

##### 4. References

[1] T. Cour, F. Benezit and J. Shi, “Spectral Segmentation with Multi-scale Graph Decomposition,” Computer Vision and Pattern Recognition, vol. 2, pp. 1124-1131, 2005.

[2] Pedro F. Felzenszwalb and Daniel P. Huttenlocher, “Efficient Graph-Based Image Segmentation,” International Journal of Computer Vision, Volume 59, Number 2, September 2004.

[3] D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature space analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, NO. 5, pp. 603-619, 2002.