Classification and Retrieval of Images

This website describes CLARET II, an image Classification and Retrieval system which is a collaborative research project between the University of Surrey and the BBC Research. The system is intended for assistance in annotation of the User Generated Content (UGC). UGC are images sent to BBC from the general public which are then annotated and displayed on BBC web sites.


News Interactive's User Generated Hub (UGH) receives hundreds of images per week from the public, though any major incident very quickly increases the number of images received to an unworkable amount. During the London bombings of July 7th hundreds of images were received in a very short space of time. The first pictures of the incident on the BBC's web site were from the public. Opening up the BBC to User Generated Content (UGC) creates a practical problem in sorting the large volumes that can be submitted in a short space of time. The material is often topical and must be dealt with quickly. This project addresses object recognition and retrieval of still image content to allow rapid selections to be made.


CLARET is implementing a method of simultaneous recognition and localisation of multiple object classes in image content. The system analyses still images or video key frames. The recognition method is based on appearance clusters built from local image features. The appearance clusters are shared amongst several object classes and images and are represented in a hierarchical tree structure. A probabilistic model allows for detection of various objects in the images as well as classification of entire scenes.

CLARET's object recognition method is used in two different scenarios. The first scenario is object detection for different object categories i.e., pedestrians, cars, motorbikes, bicycles etc. The image categories are learned through a training process and an image is annotated in terms of the learned categories. If the resulting confidence factor for a category in an image is above a pre-determined threshold, the category is judged to be present in the image. This scenario can be applied to UGC for automatically generating keyword metadata and for efficient searching and leverage of archived content. The second scenario is image retrieval (search) based on the image similarity to a selected image example; also know as ‘query by example’. The search area is indexed through a training process and then the search area is ranked in terms of the uploaded image query. This scenario can be applied to UGC in a breaking news situation to order the incoming UGC and therefore filter content relevant to the breaking news story.

Both classification and retrieval require training from selected images. Classification requires training for each image category and retrieval requires training for the search area. A mathematical model of the image category or search area is built from the training process. Training with non-segmented images requires minimal manual intervention compared to segmented training images. A classification training model built from approximately a hundred non-segmented images of each image category (pedestrians, cars, motorbikes, bicycles and rocket propelled grenades) was produced by the University of Surrey and a retrieval training model built from over five hundred non-segmented images of landscape and urban images was produced by the BBC Research.


Figure 1. Simplified diagram of the system.

Figure 1 introduces a general concept of bag-of-words for the object category classification.


  1. Piotr Koniusz, NICTA, Canberra Research Lab, INRIA LEAR, University of Surrey
  2. Denise Bland, BBC Research
  3. Krystian Mikolajczyk, Imperial College London, University of Surrey
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License