The Architecture of Visual Information

Hello everyone, and thanks for having me here. My name
is Alex, and since I was young, at least according to surviving evidence, I’ve been interested in 2 things:

… Computers …

… And photography ….

Nowadays I work as a designer in a lovely company
down in London called Webcredible, and one of the things that keeps fascinating me is the the intersection between people, computers, and images

– the images that we create and most often store,
manipulate and share using computers. I think there’s a few interesting challenges and opportunities in this space, and that’s what I’d like to talk about today. But first …

These are some of the most well known paleolithic cave
paintings in Lascaux in France, estimated to be more than 17000 years old. Nobody can really tell for sure who created them and why, but whoever did must have had a reason. For many thousands of years, painting was all people could do in order to create images. Then a magic technique known as photography came along.

This is the earliest photo that still survives today, shot
between 1826-1827, it’s simply the view outside the photographer’s window. The materials used at that time weren’t quite as sensitive to light as you’d want them to be, so this exposure took an estimated 8 hours. Not quite what you’d call an instagram … However, even though we’ve had photos since 1826, it took much longer until it was commonplace to work with photos on a computer screen. Even when the World Wide Web was 'invented' 23 years ago, around 1990, it was a text-based world.

This is a screenshot of the first web browser, showing
some of the websites from the early 90s. Lots of text, zero images. Of course, you could be forgiven for not including any images – at that era, digital cameras were little more than prototypes.

This is one of the first digital SLR cameras. It
cost 30,000 US$, had 200MB of storage for 1MP photos, and needed a suitcase of electronics to go with it, weighing a total of 25kgs. So it’s no surprise that it would be 2 more years after the invention of the www, when in 1992 the first photographic image would be uploaded into what we now call a website

This is the image in question – a pop music
group started by four women working in CERN, the physics institute where the web started from. The year 1992 marks another milestone.

It’s the year when the image file format we all
know as JPEG was finalised. JPEG is now the most popular and most efficient format for sharing photographic images online. And it took a bit longer until people agreed how to embed images in webpages.

This is Marc Andreessen in 1993, proposing the IMG tag
for embedding images in HTML. The full discussion thread goes on for quite a while, during which they discuss some interesting things that still haven’t been implemented in web browsers. It’s all available to read online, if you’re interested in seeing how standards are made, and what the web would have might have looked like like if embedding images was implemented in a different way. -- Since those discussions have taken place, 20 years have passed, and nowadays we seem to be infatuated with images. We create images anywhere and everywhere. In fact creating images has become a way of interacting with the world around us.

Are you an athlete parading in the Olympic Stadium during
the opening ceremony? Why not take a few photos to show your friends you were really there. It doesn’t matter if you’re Chinese …

… or Spanish – I watched the whole thing and
there were people holding cameras in almost all delegations!

Maybe you’ve come to an event featuring your favourite science
fiction TV characters – why not take some photos of them too?

Or maybe you’re going to a concert – enough people
are now “watching” a concert through their cameraphones that artists are starting to complain!

Or maybe you’ve just escaped a crash-landed airplane – surely
that’s a photo opportunity? -- Of course we don’t just create many more images – we also share them more than ever. It’s become our way of sharing our everyday lives.

350 million photos a day is rather significant when you
have ~600 million active users per day

2 of the largest startup acquisitions in the last year,
Instagram & Tumblr, facilitate sharing of images, static or moving. 1.1 billion = 1.1 instagrams ! Even social networks originally focused on textual information, such as Twitter, quickly felt the need to create and standardise ways for people to share images.

And compared to the good old days of film photography,
we have reached an interesting point. It’s never been easier to preserve images for an indefinite amount of time. Because visual information can now be so efficiently compressed and because storage is so cheap, we can make as many copies of an image as we like, with no degradation over time. But it’s not just that we’re creating loads of images, the way we’re creating them is also changing

We're looking into a future where images might be created
in an unattended, subconscious way, with little direct human intervention. The obvious device that's received a lot of publicity in terms of image making in the future is Google Glass. Unless you’ve been away from earth in the last few months, you’ve probably seen this:

It’s funny that people have focused so much on Google
Glass because, at least in its current incarnation, isn't programmed in any way to record everything around you. It’s actually quite cumbersome to take a photo: you have to give it a voice command: 'ok glass, take a picture' which I think is unlikely to help you capture any very spontaneous and interesting moments. But there's a couple of other products both released in the last year, that are more interesting.

Memoto is a wearable camera that’s clips onto your clothes
and simply takes 1 photo every 30 seconds – when you plug it back to a computer, it both charges and uploads all its photos to the cloud

Autographer is another similar wearable camera, which is meant to
be a bit smarter. It has a number of sensors in addition to the camera, monitoring things like changes in light level and temperature, and it only takes photos when it thinks that something interesting is happening. For example, when it detects a sudden change in temperature, it might assume you’ve gone from outdoors to indoors, so it will attempt to take a photo. Another thing that’s likely to change in the future is that whatever images you create won’t need to be coming from your own perspective.

Reuters put up a lot of these last summer for
the Olympics. They’re remotely controlled cameras with robotic mounts. The "photographer", if you would still want to use this term, is sitting on a laptop, watching a live view from the camera, and can move it and trigger it at any time. In fact, one photographer can watch multiple cameras at any time. If you don’t quite have the budget of Reuters, maybe you can try something simpler

This is a prototype of a throwable ball camera –
it’s actually 36 cameras arranged along the surface of a sphere made out of foam. You just throw it up in the air, and when it reaches its highest point, just before it starts falling down again, it triggers all cameras at the same time and creates a panoramic photo – a bit like that:

This is just a percentage of what you can see
– you can pan the panorama around and look at all other sides or up into the sky even if you wish. So it’s a bit like “shoot first, frame later” So if a photographer doesn’t need to be there to actually press the shutter, how long until we no longer need a photographer, the cameras shoot automatically and you just scour through later and pick best shots? I decided to give that a try. One of my other passions apart from photography is cycling, and if you combine the two, I also like taking nice photos of cyclists. Outside our office in London there’s a rather busy road, with many cyclists passing every day, and some of them have all sorts weird and interesting bikes. Now I could of course just sit outside our office with a camera and take photos, but I do have a day job to do. So I did this instead:

I got a webcam pointing out of the window, and
put together a small image recognition algorithm that detects when there’s a bicycle in the frame and takes a photo. It’s not that hard to do – all you need is to find if there’s 2 circles, that can’t be too large or too small, and can’t be too close or too far from each other. And there you go, you have a bicycle, and you can take a photo of it. Leave this running for a day, and you end up with hundred of photos – and that’s the problem. When we manage to create a mountain of visual information – what do we do with it next? Did we stop storing our photos in a physical shoebox, only do end up with a digital one?

To find out how to deal with all that, I
want to first take a step back and talk about -  why images are interesting -  and what sets them apart from other types of content that we’re perhaps more used to deal with. But before we get into the differences, let’s start with a similarity: images, like any other piece of content, are created for many different reasons.

If you look at an example screenshot of a photo
stream provided by Apple, you’d be forgiven to think that all people use their phone cameras for is to take photos of their friends while on holiday or strolling around. The reality is always a bit different.

Here’s a random screenshot from one point in my photo
stream.. Back in 2005, a group of researchers from Microsoft Research started studying early adopters of camera phones, tried to classify their photos and understand what drives people to take photos

2 dimensions … Now, even though people might have a
very specific intention when they create an image, this intention isn’t always easy to distinguish. Images are much more generic - they're very much open to interpretation by whoever looks at them.

Here’s a random photo I got off Flickr the other
day. Taken out of context, it can mean anything. Who knows what the person who photographed it wanted to say? - Perhaps there’s something interesting about all these people who have gathered in the park. -  Perhaps it’s about the guy on the left playing badminton. - Maybe he just wanted to show that the weather is good. - Or maybe the photographer just shot this to send it to a friend and plan where on this park they were going to meet up. It’s actually a photo of Central Park in New York, taken on Memorial Day. But even with this extra context, it’s still difficult to narrow down why this image was created.

This image could represent any of these 2 statements. And
this is what we mean when we say “a picture is worth a thousand words”. It’s just it’s not always easy to know which of these thousand words are the most important ones. Since images are very generic when taken out of context, we can often create our own context to lead to a specific interpretation. Take this image for example …

It’s just a photo of some colourful Crocs sandals, right?
Now let’s put something else next to it.

We’ve managed to create a rather humorous picture mocking Apple’s
new iPhone colours and cases. In fact that’s how LOLCATS and much of the visual humor in the internet is created – by juxtaposing images with text or with other images, in combinations that create a humorous connotations. Another interesting attribute of images is that they’re believable. You’ve probably heard this phrase …

When are you more likely to believe me – if
I just say to you: “I’ve had a camera since I was very young”, or if I show you this photo?

Just to survive in our day to day lives, our
brains have to process a lot of visual information, and they don't usually have time to look at it in detail or question it - they assume it's true. For all you know, it might have been somebody else in this photo – I doubt any of you made a serious effort to compare my face and check if it was really me on this photo. Because images are so believable, there’s also high value in faking them.

This seems to be a favourite technique of totalitarian regimes.
Stalin was known to routinely have photos altered to remove people he’d fallen out with. More recently, the Iranians wanted to make their missile test appear more impressive, so they just used Clone Brush in Photoshop to add an extra missile. So there’s lots of interesting things about images, but there’s also some issues. Because the web started as a hyper-text project, images were always a bit of second-class citizens.

One of the very foundations of the web, hyperlinks, don't
always work well with images. What people usually expect to get when they click or tap on an image is an enlarged version of that image

In some cases, for example in Facebook, it's also possible
to click on different people inside an image and go to their profiles. But it's not always clear if the image or any region inside the image is clickable at all - there's no blue underline, or any other obvious design convention to delineate boundaries. Anything too obvious will probably end up being intrusive and compete with the aesthetics of images. Facebook has gotten around this by only showing these links in a special mode, or only when you’re hovering over an image, but it’s still an unresolved issue.

Another issue with images compared to text is scaling to
different screen sizes and resolutions. Text has a linear structure: a series of words with convenient gaps in between. Whether you mange to fit in 10 or 20 words in a column of text, people will be able to read and understand it. With images, important details that are shown crystal clear on a large screen might get easily lost when you scale it down to a smaller screen. James Chudley from cxPartners has written an interesting blog post about this which I encourage you to read in full. He created these examples that show how the problem could be tackled in some cases: that’s by picking the most important detail in an image and zooming in instead of scaling down.

So if images are so interesting, but also so multifaceted,
how do we tame them. What can we do to design for a world a lot of the information we share is visual? As information architects, part of our strategy has always been to try and gather as much metadata as possible about each piece of data – and to devise ways of searching and browsing around using that metadata.

So where do we get all that metadata? One way
is of course to get people to create them, for example allow them to give tiles and tags to their photos. In practice, this happens very rarely in a private context. Only professional photographers regularly sit and tag their photos, because they have an obvious benefit if their photos can be found and used. Most of us, when even when we share a photo online, rarely bother to add a lot of meaningful information. But fortunately nowadays photos come with a lot of metadata embedded from the point where they’re produced – the camera itself.

All this data (which you might have heard of as
EXIF tags) is usually embedded by default, and usually stays with the photo unless it's removed by the user or by some badly-made image processing software. That's not to say that you should place absolute trust on any of this information as there's no way you can validate it and it's trivial to change it - I could take a photo of you now, and make it appear like it was taken on the other side of the world. You may think this metadata is trivial and not a lot to help you organise an image collection, but you can actually put it to very good use. The most obvious example you can see is the Camera Roll in iOS 7.

Going from a linear structure to grouping a series of
photos by location and time gives a very good approximation of the different things you were doing when you took these photos. It’s such an obvious thing once you’ve seen it, and it requires so little processing, that it makes you wonder why it wasn’t done earlier. Even just using the time in photos, you can get some pretty inspiring uses.

This is a tool called “Photo time capsule”, built by
an amazing photo blog called photojojo. Once you subscribe and give them the link to your photostream on flickr, they’ll send you every couple of weeks a selection of the photos you’ve uploaded exactly one year ago. This summer I got one of these in my mailbox, reminding me that last summer I was in Copenhagen with my wife, and we were doing some late night cycling. Reminiscing is an important reason behind creating such images, and reminiscing is all about time. -- Unfortunately the straightforward metadata stops here – if you want to gather more information you’ll need to process the image in some way.

You could focus on purely visual characteristics, for example extracting
the colour (or colours) of the image, whether it's overall a dark or bright image and so one. This is useful if you can think of a reason to search or filter images in this way, but in the end it doesn't give you that many hints about the meaning of the image. There’s 2 other things that computers nowadays can extract from images in a pretty reliable way – text, and faces.

Text recognition algorithms can scour through images and identify pieces
of text that exist in them. This is especially useful for images in the “functional” category that I mentioned before, for photos that were taken just because it was quicker and easier to photograph than to scan something. This is why it’s one of the most popular features of Evernote, a piece of software that aids note- taking in any form.

Face detection, simply recognising the presence of a face in
an image is so straightforward that it's now possible even in the cheapest compact cameras out there. Face detection offers us an important cue about the meaning of a photo. If there's only 1 face in an image, and it takes a significant proportion of the frame, we might be able to assume that the image is someone's portrait. If on the other hand we detect 20 faces in an image, it might be a photo of a crowd. Or a group photo.

Face recognition is a bit more complex for computers. We're
probably still a few years away from a solution that could recognise hundreds of thousands or millions of people with any degree of reliability. But if you're looking to to recognise people out of a limited set, for example which of my friends are in this photo, there's commercially available software that works well enough, such as iPhoto on the Mac, Picasa from Google etc.

One of the holy grails of image processing is being
able to recognise all objects in an image. For example being able to take this image and recognise that it contains a Macbook, a table, and a cup of coffee. Again, even though there have been successful examples, you usually have to limit your search to specific objects under controlled circumstances. If for example you were looking for the Starbucks logo, you could detect it with a reasonable degree of confidence. But if it's difficult to recognise arbitrary objects, another approach could be to add man-made objects to the environment that can be easily recognised by computers. A good example is a QR codes.

This is a product sketch presented a couple of years
ago by the design studio BERG from London. This little QR code up there is supposed to be generated by an e-paper display, and provides a unique representation of the time and location where the photo was shot, in a format that can be recognised by computers - hence the title “clocks for robots”. BERG envisaged that the metadata provided in this barcode can trigger something in your smartphone, for example launch an app when you take the photo. It can also trigger something when the photo is uploaded to a 3rd party service – for example applying some tags to the photo. Now when you mention all these possibilities aroun automatic photo capture and tagging, there’s one concern that consistently comes up: privacy.

So what about privacy? Isn't the world going to be
a worse place when we all walk around with a camera, able to take and share photos without anyone noticing? To start with, this isn't a very new concern. Let me show you what Google Glass looked like in the 1880s:

To be fair, small cameras of that era could only
take a limited number of photos to change the glass plate or the film inside them. But they still caused a stir ...

There were comic songs written, mocking such devices ...

And lots of upset people demanding that something should happen
I haven’t seen any comedies about Google Glass yet, but I’ve definitely seen a lot of anger. A few weeks after Google Glass was released, a campaign group called "Stop the cyborgs" was founded. Source: http://www.billjayonphotography.com/The%20Camera%20Fiend.pdf

They created signs like this and according to their website
they want to "encourage as many places as possible to become ‘Surveillance free zones’” This is a pointless request, and one that's eventually unenforceable, as image capture devices are becoming smaller and more invisible. A couple of weeks ago, while I was finalising this talk, yet another wearable device with a camera emerged: the Samsung Galaxy Gear "smart watch"

Good luck trying to spot and ban people wearing this.
I guess you could introduce airport-style screening in the entrance of your venue, which of course ends up being a more intrusive behaviour than what you're trying to prevent. Or unless you try and impose controls on the manufacture of photographic devices, where it’s impossible to draw a line. I don’t think you’ll ever get any “stop the cyborgs” signs going up, in fact I thin there’s another category of signs that’s going to become obsolete.

If there is actually a museum of obsolete signs out
there, they should be prepared to add this sign to their collection. So if this ends up in the museum, what next? Is there anything we should restrict?

In fact, a lot of people out there are quite
reasonable and don’t need a legal threat to comply – more of a nudge. I think we can provide this nudge by analysing what is being photographed and shared.

This is what Instagram (or any similar photo sharing service)
could look like if it wanted to make people think about privacy. It could even use face recognition to learn which of your friends don’t like their photo posted for the whole world to see. Or use other metadata like location, for example to make sure that photos you take inside or near your house are only available to your friends, not the whole world. There is no blanket rule that applies to everyone, but at the same time we’ve seen that people won’t realistically sort through all their photos and apply privacy controls. If we give them a chance to do it more easily, it might just work. And finally, if we accept that people will continue carrying a camera with them wherever they go, then the camera becomes an opportunity.

Some people, including some artists as we saw earlier, view
this as an annoyance – from the perspective of a designer, I see it as an opportunity. We have a lot of input devices pointing at something. What’s the best use for them? Could we get people to take back something more than just a blurry video? Could use them to show something interesting and enhance their experience? Could the whole concept of a live music event be different if everyone has a screen on them? It’s for us to try and find out. One thing is for sure:

The Architecture of Visual Information

The Architecture of Visual Information

More Decks by Alexander Baxevanis

Other Decks in Design

Featured

Transcript