A3 — “Cloud-sourced” image analysis

Posted in (A3) Mobile Forum, and Mobile Technologies

Speaking as a photography instructor, and photographer, mobile photographic technology – in particular, the pretty decent camera attached to nearly every mobile device – has been something of a double-edged sword.

On the one hand, never, ever has photography been as free, easy, convenient and accessible to so many. Keypoint Intelligence predicts that worldwide, we’ll shoot nearly 1.5 trillion new photographs in 2021, and it’s long been suggested that we currently take more photographs every two minutes than were taken in the whole of the 19^th century.

On the other hand, never has the world seen so many “selfies”, pet photos, or curated, retouched pictures of an entrée of tacos and salsa (or take-out coffee cups!). Perhaps putting a camera in everyone’s hand and expecting a new Ansel Adams is like giving everyone a slide rule and expecting a new Isaac Newton.

Mobile device cameras improve in technical quality with each new model, and there is no shortage of apps for taking photos, creatively editing them, or publishing and sharing them, yet other than a gimmicky new digital filter now and again, the technology seems relatively static, and the benefits of mobile device cameras would appear limited to their convenience, and the fact that people carry and use them.

But these new cameras, unlike all the old ones, are connected. No matter how mundane, predictable or derivative the subject, if these billions of photos make it to the cloud they have the potential to be part of a greater whole: crowdsourced imagery.

Crowdsourcing – obtaining ideas, information or input from a large and unspecific group of individuals – isn’t new, and neither is crowdsourcing of photographs. Google Maps solicits photographs from some smartphone users to supplement their “Street View” images of various locations. The WhatWasThere project collects historical photographs submitted by users to build a searchable and interactive map of former buildings and landmarks – a Google Maps of the past. And Picture Post, hosted by the University of Oklahoma, uses crowdsourced photographs to enable “citizen scientists” to monitor and track environmental change locally and globally.

But these are all examples of active crowdsourcing – photographers must choose to participate, take sometimes specific photographs, and upload them to a dedicated site. What of all those other photographs taken for their own purposes and shared? What information could we distill from them? Not from any particular one of them, but from all of them together?

As artificial intelligence software becomes more adept at analyzing photos and extracting information from them, these masses of individual photographs have the potential to become greater than the sum of their parts.

Google – who else? — is currently developing an artificial intelligence system that uses Neural Radiance Fields (NeRFs) to analyze multiple images made of the same scene (at different times, by different photographers, under different conditions) to create virtual 3-D models of that scene. The theory behind NeRFs is beyond the scope of this post, but in short this technique is able to determine where light rays terminate in individual photographs to calculate an aggregate volumetric model. This has been done by others, using purpose-made photographs, but crowdsourced photographs have the added complications of different lighting, different environmental conditions, and people or vehicles in the scene – all of which Google is able to effectively “subtract” by examining many, many images.

The resulting 3-D digital models can then be rendered as still images from any viewpoint or animated. Lighting and background can be changed, either based on samples from the image dataset or arbitrarily. As digital models, they can be used for archival purposes, for educational purposes, in augmented reality and virtual reality applications, for games and simulations, or anywhere a detailed virtual model of a real-world scene is needed.

The weakness of this technology is its dependence on a large number of photos of the same place. It’s unlikely that thousands of strangers have photographed your house, so the personal applications of this are distant, but for now, world landmarks and other popular locations can easily provide sufficient photographic material.

And unlike other crowdsourced photographic projects, this one is passive. Machine-read captions, embedded GPS data or object recognition software can identify photographs that have already been shared on image-sharing sites like Flickr or Instagram, as well as social media, news, travel or commercial websites.

This is not end-user technology – there’s no app to download and play with, nor as yet even a way to directly enjoy the results of this technology — but it does demonstrate the potential of “cloud-sourcing” of image material made possible by the ubiquity of camera-enabled mobile devices, and our willingness to use them prolifically and share the results. Outputs of this technology will also surely find their way back into our mobile gadgets in the form of 3-D games, virtual tourism, AR and VR applications and educational resources.

Future implementations of this or similar technology could include:

Using “cloud-sourced” images to identify and track shoreline erosion, deforestation or similar environmental change
Enhancements to current mapping applications such as Google Maps based on user photographs (imagine a 3-D, immersive “Street View”)
Accurate and detailed virtual architectural models for study, preservation or reconstruction (think: Notre Dame Cathedral)
Virtual human 3-D models (why should buildings have all the fun?)

At the rate of a trillion and a half new photographs each year, our collective “photo album” is only getting larger, and the larger it gets, the more information we can expect to be able to harvest from it. Not even Google could afford to hire millions of photographers around the world, but with our gadgets in our hands, and with no extra effort, we can all pitch in. Many hands make light work.

Watch the above video to see several impressive examples of NeRF analysis and rendering and how it’s done. The theory is abstruse, but the pictures, as always, are worth a thousand words. (YouTube, 3:41)

( Average Rating: 4 )

sean gallagher

View more posts

10 Comments

janice roper

Hi Sean,
Thanks for an interesting look at a new technology. It is fascinating to consider how many pictures are taken each year, and how many photos must exist of certain popular landmarks and sites. I like that something somewhat practical can be done with some of these photos. I sometimes think about how many times I have taken pictures of the same landmarks, then to add in how many other people have almost the exact same, but slightly different, pictures of the Taj Mahal or the Halifax Citadel! There are a few different educational applications that I can think of – a photography class contributing to a bank of photos, an art or history class studying a landmark, looking at past and current images of a site. The ubiquity of cameras has changed the way we interact with the world around us and it is amazing to see how new technologies are being adapted to take advantage of that.

( 0 upvotes and 0 downvotes )

April 10, 2021
|Log in to Reply
michael orlandi

While teaching drafting, I could see this technology being useful for looking at a famous area with a couple buildings, then discussing city site plans. Or being able to analyze a structure as it is being built. These are the ideas that comes to my mind as a shop teacher.

You mentioned how this could keep google street view up to date. I am currently house hunting with my partner and I checked out a google street view of a house before we requested a viewing. The street view looked pretty decent (they also didn’t have a photo of the front of the house on the listing). I don’t know how old the street view was but it looked way worse in person. Clearly neglected since the time of the photo. It would of been nice to have an update photo. Then again I don’t know how many would take a photo of the house and upload it to the cloud…..but I could have appreciated it.

( 1 upvotes and 0 downvotes )

April 7, 2021
|Log in to Reply
- sean gallagher
  
  Hi MIchael. I suppose it will be well into the future when sufficient photos of mundane places exist to allow this technology to model them, but while the aspect of this that most fascinates me is the use of existing images, there may come a time when, either for money or bragging rights, tech companies like Google use this or similar technology to volume-map everything. Remember when you pretty much had to live in a large city to see your house on Google Maps satellite view? Then pretty much everything could be seen on satellite view? Remember when street view was kind of the same? I suspect that once enough photographs exist to allow NeRFs to do their work, we’ll see more and more 3-D information about everything. Only a few trillion more pictures to go!
  
  ( 0 upvotes and 0 downvotes )
  
  April 9, 2021
  |Log in to Reply
Anton Didak

Hey there Sean,
I think the application for this would be well suited within education. Other fields of study like archeology, sociology, and even anthropology could benefit from this neet combination of photography and computer science. It is a shame an application has not yet been developed, but it would be enjoyable to play with and share creations in a google street view format.
Do you think this technology will only be applicable for education and business purposes or be featured in a more user-friendly format like street view in the near future?
Are there any other more accessible forms of technology that could be direct competition for NeRF?

( 1 upvotes and 0 downvotes )

April 7, 2021
|Log in to Reply
- sean gallagher
  
  Hi Anton. I could certainly see this technology being rolled into Google Maps street view. As illustrated, it features crowdsourced photos (rather than photos purposefully shot) but the same algorithms for, say, removing pedestrians or accounting for lighting and weather suggest that if the Google Maps car is willing to drive around the block a few times, a passable 3-D model of the houses, stores, buildings, etc., could be possible with minimal effort on Google’s part.
  
  Could the same tech find a place in an end-user app? Perhaps. I could imagine it being used with, say, Augmented Reality to allow better integration of data/information and the real world scene, or to generate “reverse” 3-D models of interior spaces for various purposes. As far as accessible similar technologies go, they would appear to exist (something automated, after all, must be generating the blocky, lego-like 3-D view that already exists in Google Maps street view, but this is just a level up from that.
  
  ( 0 upvotes and 0 downvotes )
  
  April 9, 2021
  |Log in to Reply
kelvin nicholls

Hi Sean,
Thank you for opening my eyes to a world that I have not yet had the chance to consider. I have never thought about the sheer number of photos that are being taken each day, and I think that this technology that you have presented is an innovative and creative way to take advantage of the massive amount of photographic data that is out there. As I was watching the video that you added to your presentation, I immediately thought about video games, and how this type of technology could provide a replicated world that could be used in a virtual setting, such as a video game. Video games, such as Microsoft’s new Flight Simulator, have attempted to render an exact replica of the earth and it’s landscape through the use of mapping software, but I feel like this is the next level, especially considering the realistic detail of the 3D images that are being generated using NeRFs. Beyond the creation of the images, did you come across any speculative uses for NeRF?

( 2 upvotes and 0 downvotes )

April 6, 2021
|Log in to Reply
- sean gallagher
  
  Hi Kelvin. I didn’t find any other specific uses for NeRF, above and beyond whatever we might imagine we could do with some automagically-generated 3-D models of places and things, though this could change either as the technology becomes feasible for things that aren’t buildings and such, or when we’ve amassed enough “cloud” images to do more than the CN Tower and similar. So the technology itself isn’t going to turn our lives, as we know them, on their heads, but I think it’s a good early example of what we might be able to do when we start to harness the power of what’s already out in the cloud, and in that respect, I suppose there’s considerable overlap with something like this and “Big Data”. Once we can analyze what’s out there passively and by machine, I could foresee doing similar things to analyze everything from fashion trends to global literacy. We’re making information faster than we’re using it, but we might catch up!
  
  ( 0 upvotes and 0 downvotes )
  
  April 9, 2021
  |Log in to Reply
Elixa Neumann

This technology could really help improve the realm of 3D modeling, design, and printing. Utilizing new technology like this can improve our immersive worlds, enabling for stronger access to world sites through technology instead of traveling directly to the sites increasing our carbon footprints. There are a lot of ways this can benefit our economy and environment as well as you mentioned.

What are some of the concerns or drawbacks you can see with this new technology? Who would own the rights to these models and concepts if they come from a built collective?

( 2 upvotes and 0 downvotes )

April 5, 2021
|Log in to Reply
- sean gallagher
  
  Hi Elixa. I’m not sure I see any specific concerns with this technology, or at least not now when it seems limited to public landmarks and similar — if we “progress” to where we can model humans based on crowdsourced/cloudsourced photos then perhaps I’d have more to say! As far as ownership goes, while I’m not a lawyer I would have to assume that typical copyright considerations would not apply. If I take a photo of a hummingbird perched on a loonie, I will own the copyright for the reproduction or distribution of that image, but copyright would not prevent someone else from (say) using that image to figure out how big (or small!) hummingbirds are. In other words, we can use copyrighted images… we just can’t republish or distribute them.
  
  That said, insofar as the landmarks that are currently the subject of this technology are, in a sense, in the public domain, I would certainly hope that Google would make the 3-D models of them public as well. In classic Google style, I suspect they did this to show they could, not in the hope of a financial windfall.
  
  ( 1 upvotes and 0 downvotes )
  
  April 9, 2021
  |Log in to Reply
  - ben zaporozan
    
    I see something similar here to the transformation of Don Tapscott’s career from the crowdsourcing of ideas in Wikinomics [https://dontapscott.com/books/wikinomics/], how mass collaboration changes everything, to The Blockchain Research Institute [https://dontapscott.com/research-programs/blockchain-research-institute/] where an agent for change takes a deep dive into the strategic business advantages of connectedness.
    
    You would not own the copyright to that image of the hummingbird on the Loonie, since copyright on currency belongs to the government of Canada; but would they share rights with you as well as royalties, connected through blockchain to monetize and manage the proliferation of the image?
    
    The deep architecture of the internet of things could include crowdsourced lidar cave maps, video, and image, and your thoughts about the future potential are really interesting. Using neural radiance fields to optimize a volumetric presentation of a scene by and for the collective good of the planet seems a worthwhile test, and you can start with some opensource code on GitHub: https://github.com/bmild/nerf. We will watch eagerly!
    
    ( 0 upvotes and 0 downvotes )
    
    April 10, 2021
    |Log in to Reply

A3 — “Cloud-sourced” image analysis

sean gallagher

10 Comments

Leave a Reply Cancel reply