AIML Special Presentation: Learning Spatial context-aware Global Visual Feature Representation for Instance Image Retrieval
In instance image retrieval, considering local spatial information within an image has proven effective to boost retrieval performance. It will be highly valuable to make ordinary global image representations spatial-context-aware because global representation based image retrieval is appealing thanks to its algorithmic simplicity, low memory cost, and being friendly to sophisticated data structures. This talk describes a novel feature learning framework proposed for instance image retrieval, which embeds local spatial context information into the learned global feature representations.
In parallel to the visual feature branch in a convolutional neural networks' (CNN) backbone, a spatial context branch is designed to consist of two modules called online token learning and distance encoding. For each local descriptor learned in CNN, the former module is used to indicate the types of its surrounding descriptors, while their spatial distribution information is captured by the latter module. After that, the visual feature branch and the spatial context branch are fused to produce a single global feature representation per image. As experimentally demonstrated, with the spatial-context aware characteristic, the proposed framework can well improve the performance of global representation based image retrieval while maintaining all of its appealing properties.