# Weakly Supervised Object Co-Localization via Sharing Parts Based on a Joint Bayesian Model

^{*}

## Abstract

**:**

## 1. Introduction

- (1)
- We propose a novel framework based on the Bayesian hierarchical topic model for weakly supervised object localization. Without extra requirements of object annotations, latent parts and appearances are given information together at the class level to increase the recognition of an object. In addition, the appearance and correspondence position are modeled jointly to help visualize the object parts.
- (2)
- We show how the joint Bayesian model utilizes the benefits of shared parts to help object co-localization throughout the dataset. Through sharing a common set of features, the same semantic objects can be found simultaneously in each class. Through parts sharing, a few training images can make robust predictions of the objects. Meanwhile, with a small amount of training data and feature sharing, our model can save a great deal of computational resources.
- (3)
- We define a constraint to distinguish between noisy images and clean images. Noisy images can be found by measuring the rate of transferring information of shared parts in each category. Furthermore, to illustrate the effectiveness of our model, we present the experiments performed on two challenging datasets, which represent the difficulties of intra-class variation and inter-class diversity. The results demonstrate that our method is robust in object discovery and localization.

## 2. Related Work

## 3. Methods

#### 3.1. Symbol Description

#### 3.2. The Joint Topic Model for Objects

#### 3.3. Parameters Learning

#### 3.4. Supervision via Class Label Constraint on both of Appearances and Topics

#### 3.5. Probabilistic Parts Sharing

#### 3.6. Object Localization

#### 3.7. Complexity

## 4. Experiments

#### 4.1. Experimental Settings

#### 4.2. PASCAL VOC 2007 6 × 2

#### 4.3. Object Discovery Dataset

#### 4.4. Time Complexity

## 5. Discussion and Conclusions

## Author Contributions

## Acknowledgments

## Conflicts of Interest

## References

- Dalal, N.; Triggs, B. Histogram of Oriented Gradients for Human Detection. In Proceedings of the IEEE Computer Society Conference on CVPR, San Diego, CA, USA, 20–25 June 2005. [Google Scholar]
- Pandey, M.; Lazebnik, S. Scene Recognition and Weakly Supervised Object Localization with Deformable Part-based Models. In Proceedings of the IEEE International Conference on ICCV, Barcelona, Spain, 6–13 November 2011; pp. 1307–1314. [Google Scholar]
- Leibe, B.; Schindler, K.; Van Gool, L. Coupled Detection and Trajectory Estimation for Multi-Object Tracking. In Proceedings of the IEEE 11th International Conference on ICCV, Rio de Janeiro, Brazil, 14–21 October 2007. [Google Scholar]
- Andriluka, M.; Roth, S.; Schiele, B. Monocular 3D Pose Estimation and Tracking by Detection. In Proceedings of the IEEE Conference on CVPR, San Francisco, CA, USA, 13–18 June 2010. [Google Scholar]
- Cho, M.; Kwak, S.; Schmid, C.; Ponce, J. Unsupervised Object Discovery and Localization in the Wild: Part-Based Matching with Bottom-Up Region Proposals. arXiv, 2015; arXiv:1501.06170. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature
**2015**, 521, 436. [Google Scholar] [CrossRef] [PubMed] - Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D.A.; Ramanan, D. Object Detection with Discriminatively Trained Part-based Models. IEEE Trans. Pattern Anal. Mach. Intell.
**2010**, 32, 1627–1645. [Google Scholar] [CrossRef] [PubMed] - Oquab, M.; Bottou, L.; Laptev, I.; Sivic, J. Is Object Localization for Free? Weakly-Supervised Learning with Convolutional Neural Networks. In Proceedings of the IEEE conference on CVPR, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Zhou, B.; Khosla, A.; Oliva, A.; Torralba, A. Learning Deep Features for Discriminative Localization. arXiv, 2016; arXiv:1512.04150. [Google Scholar]
- Zhu, Y.; Zhou, Y.; Ye, Q.; Qiu, Q.; Jiao, J. Soft Proposal Networks for Weakly Supervised Object Localization. arXiv, 2017; arXiv:1709.01829. [Google Scholar]
- Tang, K.; Joulin, A.; Li, J.; Li, F.F. Co-Localization in Real World Images. In Proceedings of the IEEE Conference on CVPR, Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
- Alexe, B.; Deselaers, T.; Ferrari, V. Measuring the Objectness of Image Windows. IEEE Trans. Pattern Anal. Mach. Intell.
**2012**, 34, 2189–2202. [Google Scholar] [CrossRef] [PubMed] - Nguyen, M.H.; Torresani, L.; Torre, F.; Rother, C. Weakly Supervised Discriminative Localization and Classification: A Joint Learning Process. In Proceedings of the 12th IEEE International Conference on CVPR, Kyoto, Japan, 29 September–2 October 2009. [Google Scholar]
- Siva, P.; Russell, C.; Xiang, T.; Agapito, L. Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection. In Proceedings of the IEEE Conference on CVPR, Portland, OR, USA, 23–28 June 2013. [Google Scholar]
- Bilen, H.; Pedersoli, M.; Tuytelaars, T. Weakly Supervised Object Detection with Convex Clustering. In Proceedings of the IEEE Conference on CVPR, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Shi, M.; Caesar, H.; Ferrari, V. Weakly Supervised Object Localization Using Things and Stuff Transfer. arXiv, 2017; arXiv:1703.08000. [Google Scholar]
- Rochan, M.; Wang, Y. Weakly Supervised Localization of Novel Objects Using Appearance Transfer. In Proceedings of the IEEE Conference on CVPR, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Jie, Z.; Wei, Y.; Jin, X.; Feng, J.; Liu, W. Deep Self-Taught Learning for Weakly Supervised Object Localization. arXiv, 2017; arXiv:1704.05188. [Google Scholar]
- Rasiwasia, N.; Vasconcelos, N. Latent Dirichlet Allocation Models for Image Classification. IEEE Trans. Pattern Anal. Mach. Intell.
**2013**, 35, 2665–2679. [Google Scholar] [CrossRef] [PubMed] - Li, L.; Zhang, X.; Zhou, M.; Carin, L. Nested Dictionary Learning For Hierarchical Organization of Imagery and Text. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence, Catalina Island, CA, USA, 15–17 August 2012. [Google Scholar]
- Wang, C.; Ren, W.; Zhang, J.; Maybank, S. Large-Scale Weakly Supervised Object Localization via Latent Category Learning. IEEE Trans. Image Process.
**2015**, 24, 1371–1385. [Google Scholar] [CrossRef] [PubMed] - Shi, Z.; Hospedales, T.M.; Xiang, T. Bayesian Joint Modeling for Object Localisation in Weakly Labeled Images. IEEE Trans. Pattern Anal. Mach. Intell.
**2015**, 37, 1959–1972. [Google Scholar] [CrossRef] [PubMed] - Sudderth, E.; Torralba, A.; Freeman, W.; Willsky, A. Learning Hierarchical Models of Scenes, Objects, and Parts. In Proceedings of the IEEE Computer Society Conference on CVPR, San Diego, CA, USA, 20–25 June 2005. [Google Scholar]
- Niu, Z.; Hua, G.; Wang, L.; Gao, X. Knowledge Based Topic Model for Unsupervised Object Discovery and Localization. IEEE Trans. Image Process.
**2017**, 27, 50–63. [Google Scholar] [CrossRef] [PubMed] - Li, F.F.; Perona, P. A Bayesian Hierarchical Model for Learning Natural Scene Categories. In Proceedings of the IEEE Computer Society Conference on CVPR, San Diego, CA, USA, 20–25 June 2005. [Google Scholar]
- Wang, C.; Blei, D.; Li, F.F. Simultaneous Image Classification and Annotation. In Proceedings of the IEEE Computer Society Conference on CVPR, Miami, FL, USA, 20–25 June 2009. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. Available online: http://www.pascalnetwork.org/challenges/VOC/voc2007/workshop/index.html,2007 (accessed on 7 June 2007).
- Rubinstein, M.; Joulin, A.; Kopf, J.; Liu, C. Unsupervised Joint Object Discovery and Segmentation in Internet Images. In Proceedings of the IEEE Conference on CVPR, Portland, OR, USA, 23–28 June 2013. [Google Scholar]
- Deselaers, T.; Alexe, B.; Ferrari, V. Weakly Supervised Localization and Learning with Generic Knowledge. Int. J. Comput. Vis.
**2012**, 100, 275–293. [Google Scholar] [CrossRef] - Russell, B.C.; Efros, A.A.; Sivic, J.; Freeman, W.T.; Zisserman, A. Using Multiple Segmentations to Discovery Objects and Their Extent in Image Collections. In Proceedings of the IEEE Computer Society Conference on CVPR, New York, NY, USA, 17–22 June 2006. [Google Scholar]
- Chum, O.; Zisserman, A. An Exemplar Model for Learning Objective Classes. In Proceedings of the IEEE Conference on CVPR, Minneapolis, MN, USA, 17–22 June 2007. [Google Scholar]
- Kim, G.; Xing, E.P.; Li, F.F.; Kanade, T. Distributed Cosegmentation via Submodular Optimization on Anisotropic Diffusion. In Proceedings of the IEEE International Conference on ICCV, Barcelona, Spain, 6–13 November 2011. [Google Scholar]
- Joulin, A.; Bach, F.; Ponce, J. Discriminative Clustering for Image Co-Segmentation. In Proceedings of the IEEE Conference on CVPR, San Francisco, CA, USA, 13–18 June 2010. [Google Scholar]
- Joulin, A.; Bach, F.; Ponce, J. Multi-Class Cosegmentation. In Proceedings of the IEEE Conference on CVPR, Providence, RI, USA, 16–21 June 2012. [Google Scholar]

**Figure 1.**The framework for co-localization task. In this framework, our goal is to localize the airplane within each image. The color yellow represents a salient region. It can be seen that our model can distinguish the right image from noisy images that have a few parts to share.

**Figure 2.**Our joint topic model proposed for describing visual sharing parts. The shade nodes are the observed data and the rounded rectangle are hyperparameters. Let ${w}_{j}$ and ${v}_{j}$ denote the appearance and two–dimensional position, respectively, with ${N}_{j}$ features in image j. Using the visual codebook, the $i$th feature in image j is described by its discrete appearance ${w}_{ji}$ and corresponding position ${v}_{ji}$.

**Figure 3.**Visualizations of learned parts distribution of our model. Top: Category distance embedding computed by multidimensional scaling, where coordinates for each object category are chosen to approximate pairwise KL distances. The left is clustered with blue arrows, the middle with a red arrow and the right with yellow arrows. Bottom: Category distance dendrogram illustrating a hierarchical clustering, where branch lengths are proportional to inter-category distances. This dendrogram describes a more detailed relationship of classes corresponding to category distance embedding.

**Figure 4.**Example of co-localization results for each class. In each category, the upper row represents the original images, which are in contrast to the visualized images below. The yellow region indicates the salient parts, which consist of dozens to hundreds of ellipses in visualizations. Due to sharing a few parts, the red ellipses indicate no objects discovered in (

**b**) airplane_right, (

**c**) bicycle_left and (

**f**) boat_right classes, respectively.

**Figure 5.**Example of co-localization results on object discovery dataset. Red boxes are our method and green boxes are ground truth localizations. Yellow boxes at the bottom rows represent the wrong localizations.

for each image $j\in 1,...,J$ |

sample a topic distribution $\pi ~Dir(\alpha )$ |

sample a class label $l~Multi(O)$ |

for each class $k\in 1,...,K$ |

sample a topic ${z}_{i}~{\mathsf{\Pi}}_{j}$, ${Z}_{i}\in T=\{1,...,K\}$ |

for each observation $i\in 1,...,{N}_{j}$ |

sample a visual word ${w}_{ji}~Multi({\eta}_{k};{z}_{ji},{l}_{j})$ |

sample the correspondence location ${v}_{ji}~N({\mu}_{{Z}_{ji}},{\Lambda}_{{Z}_{ji}})$ |

end for |

end for |

end for |

**Table 2.**CorLoc results for three terms combinations on PASCAL VOC 2007 6 × 2. Original method means 6 classes without left and right annotations. Improved method bases on the 1st experiment and adds the constraint to deal with noisy images. Full method means using 12 classes instead of 6 classes in the 2nd settings to complete the experiment.

Method | Airplane | Bicycle | Boat | Bus | Horse | Motorbike | Avg. | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Left | Right | Left | Right | Left | Right | Left | Right | Left | Right | Left | Right | ||

Our (original) | 14.55 | --- | 14.29 | --- | 16.095 | --- | 13.67 | --- | 14.90 | --- | 12.29 | --- | 14.30 |

Our (improved) | 27.935 | --- | 26.54 | --- | 29.89 | --- | 36.44 | --- | 27.67 | --- | 26.06 | --- | 29.09 |

Our (full) | 46.51 | 43.59 | 43.75 | 42.00 | 45.45 | 46.51 | 57.14 | 56.52 | 43.75 | 45.65 | 43.59 | 44.12 | 46.55 |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Wu, L.; Liu, Q.
Weakly Supervised Object Co-Localization via Sharing Parts Based on a Joint Bayesian Model. *Symmetry* **2018**, *10*, 142.
https://doi.org/10.3390/sym10050142

**AMA Style**

Wu L, Liu Q.
Weakly Supervised Object Co-Localization via Sharing Parts Based on a Joint Bayesian Model. *Symmetry*. 2018; 10(5):142.
https://doi.org/10.3390/sym10050142

**Chicago/Turabian Style**

Wu, Lu, and Quan Liu.
2018. "Weakly Supervised Object Co-Localization via Sharing Parts Based on a Joint Bayesian Model" *Symmetry* 10, no. 5: 142.
https://doi.org/10.3390/sym10050142