TheDigitalLifestyle.tv - Your App Authority

Entries in david crandalll (1)

Monday

May182009

Interview: This Guy Knows Just How Photogenic The Apple Store Can Be

Monday, May 18, 2009 at 6:05AM

A few weeks back, we mentioned on TDL Live an interesting research study that analyzed Flickr photos and found the Fifth Ave. Apple Store was the 29th most photographed location in the world. We had a chance to ask one of the researchers, David Crandall from Cornell, about the study and the methodology of pinpointing the most popular photo locales:

In basic terms, what was the scope/intention of your research?

We wanted to explore how to automatically organize photos based on the geographic location of where they were taken. People don't take photos with uniform density across the globe; instead there are concentrations of photos at a relatively small number of places, like in cities and at landmarks within those cities. So it makes sense to try to organize photos according to these clusters of human photographic activity.

Our paper looks at two basic tasks. One is: given a large collection of geo-tagged photos, can we discover these concentrations of photographic activity, and automatically extract textual and visual descriptions of them? That's what led to the automatically-generated maps that you see in our paper. The second question is: given a photo that hasn't been geo-tagged, can we discover where it was taken using the visual content of the image and/or its textual tags?

Can you take us through the basics of the process? What kind of computing power did it take?

Sure. To build the annotated maps, we took a collection of 35 million photos from Flickr and looked at the geographic distribution of the geo-tags (latitude and longitude coordinates). We take that photo distribution and find "hotspots" of activity at different scales (a "city-size" scale with radius 50km and a "landmark-size" scale with radius 50m) using a clustering technique called Mean Shift. We then find a textual description of each hotspot by looking for text tags that occur often on photos inside the hotspot but infrequently on photos outside. We also find a visual description of the hotspot, in the form of a single "representative image," by looking for a scene that has been photographed by many different users. This is more difficult than it might sound, because deciding whether two photos are of the same scene (despite differences in viewing angle, illumination, color shifts, zoom, etc.) is not an easy task for a computer. We do the image matching using a technique from computer vision called SIFT (the Scale Invariant Feature Transform), and then use spectral graph clustering to find the representative scene. The whole process is completely automatic.Heat Map of Worldwide Photographic Activity. Courtesy David Crandall

For the second task, automatically geo-locating an untagged image, we build a model of what each landmark "looks like" using a large set of training images. Then when an unlabeled image comes along, we can find which of the landmark models fits it best. Our models are based on a popular machine learning approach called Support Vector Machines (SVMs). This task in particular required a lot of computing power to process the millions of images in the dataset. We used a cluster of Linux machines running Hadoop, which is an open-source framework for writing data-intensive distributed computing applications. The cluster has 120 processors (480 cores), 270 terabytes of disk space and almost 1 terabyte of RAM, and it took a few days to run all of the experiments in the paper on this cluster.

What were some of the surprises to you in the results?

We were surprised at how well the computer vision analysis worked, both in finding the representative images and in performing the landmark classification. We used relatively simple vision algorithms (that are 5-10 years old) because using newer, more computationally-demanding algorithms wouldn't have been possible at the scale of millions of images. We had thought that the visual analysis based on these simpler algorithms would work better than random chance but not nearly as well as using the textual tags. In fact, we found that using visual features was better than textual tags for finding representative images, and almost as good as textual features for the landmark classification problem. I think these results demonstrate what is possible by coupling robust but relatively simple algorithms with very large datasets (and large amounts of computational power).

We were also surprised by some of the hotspots in photographic activity on Flickr, such as the Apple Store's unexpected prominence on the list of top landmarks.

Can you tell us about the other landmarks that came in around (above/below) the Fifth Ave. Apple Store?

Sure. The five landmarks ranked directly above are the Lincoln Memorial (23rd), the British Museum (24th), the Brandenburg Gate (25th), the Tower of London (26th), and the Ponte di Rialto in Venice (27th). Directly below the Apple Store are the Space Needle in Seattle (29th), Pike Place Market in Seattle (30th), Westminster Bridge in London (31st), the World War II Memorial in Washington, DC (32nd), and the Old Town Square in Prague (33rd).

Do you believe flickr is now representative of the photo-taking population as a whole, or are there some biases, more tech-savvy photographers for example, that could keep us from extrapolating to all photography?

There are definitely biases. For example, it seems that Flickr is most popular in Europe and North America, which means that landmarks in other parts of the world are likely underrepresented. We could correct for this to some extent by analyzing multiple photo-sharing sites; Fotolog is popular in South America, for example. But there's still likely to be a bias towards tech-savvy photographers on any of these sites, as you suggested. We haven't tried to measure or correct for these biases -- the point of our paper wasn't to produce accurate rankings of landmarks, but instead to show that such rankings (and visual and textual descriptions of places) could be inferred completely automatically from a large collection of geo-tagged images.

It seems to me this is something that would be interesting to watch trend over time: see what the most photographed places/things are two, five, ten years down the road.

I agree. It would be especially interesting if we could correct for the biases that I mentioned above: then we could separately study how the overall popularity of landmarks changes with time, and how the user population of Flickr (or of photo-sharing sites in general) changes with time.

Any reaction/comment from Flickr on the research?

I've communicated with Flickr somewhat during the development process, to ask questions about their API for example. They were very helpful and seem like a great group of people.

And we always have to ask: Mac or PC?

As I mentioned, we use a cluster of Linux-based PCs for running compute-intensive jobs. As for day-to-day machines, our team of four people is evenly split between Macs and PCs (with me on the Mac side, with an ancient but much-beloved Powerbook G4).

You can find out more about the research, here.

Ryan Ritchey |