Topic driven multimodal similarity learning with multi-view voted convolutional features

Research output: Contribution to journalArticle

  • Authors:
  • Xinjian Gao
  • Tingting Mu
  • John Y. Goulermas
  • Meng Wang


Similarity (and distance metric) learning plays a very important role in many artificial intelligence tasks aiming at quantizing the relevance between objects. We address the challenge of learning complex relation patterns from data objects exhibiting heterogeneous properties, and develop an effective multi-view multimodal similarity learning model with much improved learning performance and model interpretability. The proposed method firstly computes multi-view convolutional features to achieve improved object representation, then analyses the similarities between objects by operating over multiple hidden relation types (modalities), and finally fine-tunes the entire model variables via back-propagating a ranking loss to the convolutional layers. We develop a topic-driven initialization scheme, so that each learned relation type can be interpreted as a representative of semantic topics of the objects. To improve model interpretability and generalization, sparsity is imposed over these hidden relations. The proposed method is evaluated by solving the image retrieval task using challenging image datasets, and is compared with seven state-of-the-art algorithms in the field. Experimental results demonstrate significant performance improvement of the proposed method over the competing ones.

Bibliographical metadata

Original languageEnglish
JournalPattern Recognition
Early online date9 Mar 2017
Publication statusPublished - Mar 2018