Learning bottom-up visual recognition processes using automatically generated ground truth data

Kalle Åström, Lund University

Abstract:

The state of the art in finding matching images in large databases is based on quantizing descriptors of interest points into visual words. The co-occurrence of visual words between a query image and those in the database is then used to generate hypotheses of matched images. High similarity between matching image representations (as bags of words) is based upon the assumption that matched points in the two images end up in similar words in hard assignment or in similar representations in soft assignment techniques. Such methods have similarities with bottom up processes. Learning of steps in such processes can be based on ground truth on corresponding feature points or image classes. In this talk we discuss how such ground truth data can be obtained using computationally more expensive methods. As an example we study the problem of generating vocabularies for bags of words image retrieval. We use training and testing data with detected feature points and their descriptors, with partial ground truth on correspondences between points. For optimization of the vocabulary, we propose minimizing the entropies of soft assignment of points. We base our clustering on hierarchical two-class divisions. The results from our entropy based clustering a re compared with hierarchical k-means. The vocabularies has been tested on real data with decreased entropy and increased true positive rate for the new vocabularies.

Slides

Navigation

Learning bottom-up visual recognition processes using automatically generated ground truth data