It has been said that 80% of what people learn is visual.
Allen Klein
That's true for most people, unlike Veritasium who claims in his video he calls this the biggest myth.
Visual learner or not but visually stimulating something is an add-on to our experience. The entire Metaverse, 3D visual rendering technologies are based on that.
In this blog, we will shortly discuss one such research done by Google AI Team in which they were able to recreate majestic mountains, dramatic seascapes, and serene forests from Birds Eye view in extreme detailing by just providing one single Picture.
Infinite Nature
In a research effort, Google AI Team calls Infinite Nature, they show that computers can learn to generate such rich 3D experiences simply by viewing nature videos and photographs. Our latest work on this theme, InfiniteNature-Zero (presented at ECCV 2022) can produce high-resolution, high-quality flythroughs starting from a single seed image, using a system trained only on still photographs, a breakthrough capability not seen before.
They call the underlying research problem perpetual view generation: given a single input view of a scene, how can we synthesize a photorealistic set of output views corresponding to an arbitrarily long, user-controlled 3D path through that scene?
Perpetual view generation is incredibly challenging because the system must generate the latest content on the other side of large landmarks (e.g., mountains), and render that added content with high realism and in high resolution.
Perpetual view generation
In May 2016, Google released a research paper entitled “Perpetual View Synthesis for Virtual Reality” which proposed a method for generating new views of a scene using a deep neural network. This method, which they refer to as “perpetual view synthesis,” can generate new views of a scene from any vantage point, without the need for any additional training data.
The paper was authored by a team of researchers from Google Brain, DeepMind, and the University of Toronto. It was published in the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
The key idea behind perpetual view synthesis is to train a deep neural network to generate new views of a scene, from any given vantage point. The training data for the network is generated by rendering a 3D model of the scene from different viewpoints. The rendered images are then fed into the network, which learns to map the 2D images to the 3D model.
Once the network is trained, it can be used to generate new views of the scene, from any arbitrary vantage point. The generated images are realistic and faithful to the original 3D model.
The Google research team has demonstrated that their perpetual view synthesis method can be used to generate new views of real-world scenes, captured with a 360-degree camera. They have also shown that the method can be used to generate new views of synthetic scenes, such as those from video games.
The potential applications of perpetual view synthesis are numerous. For example, it could be used to generate new views of a scene for a virtual reality headset, from any desired vantage point. It could also be used to generate new views of a scene for a video game or to create special effects for movies and TV shows.
Perpetual view generation is based on Render, Refine and Repeat approach.
Previous Research Paper
Earlier Google Team released an initial version of this paper named “Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image”.
You can read more about the paper more on clicking the above link.
In fact, the Google Research team released a Colab demo on how to use it.
I will recommend that enthusiasts try the earlier versions to understand why it's ground-breaking research.
Pros of Perpetual view generation
There are many challenges in achieving Perpetual view generation from a single image. These can be discussed as follows:
- More than View Synthesis: Perpetual view generation allows us more than the regular view synthesis method which allows us to move the camera angle over a still image frame. A few of the notable research done in the field of View Synthesis are by Facebook SynSin.
- InPainting New Regions: In Perpetual view generation when the camera moves, the new view needs to be generated in real-time. The new view can be entirely new and like the old view.
Like the above images are marked in pink and need inpainting when the Camera moves forward because the view beyond that is unimagined or occlusion. AI needs to create new regions on its own.
3. Out Painting: The Perpetual view generation also can also generate views outside the image boundaries (not a part of the input image).
4. Super Resolution: Perpetual view generation also solves the issue of Blurry videos as the camera moves in. This is super helpful in creating HD and realistic Videos as if a Drone camera is flying over.
You can read more about my blog on Super Resolution using Real-ESRGAN.
Instructions for installing dependencies
There are a few requirements before we begin installation and using Infinite Nature. We need to have the following as our environment:
Python 3.8 and
Cuda11.3
We will be using Conda to install dependencies.
Please download Infinite Nature and save it into a directory using GitHub.
To install the required libraries, run:
conda env create -f enviornment_infinite_nature_zero.yml
To install softmax splatting for point cloud rendering, clone and pip install the library from here.
To install the StyleGAN2 backbone, run:
git clone https://github.com/rosinality/stylegan2-pytorch
Apply the patch to update the StyleGAN network code:
git apply stylegan2_upgrade.patch
and copy the modified model.py to models directory.
Finally, copy the ‘op’ folder to models/networks directory.
Downloading data and pretrained checkpoint
We include a pretrained checkpoint trained on the LHQ dataset and input examples that can be accessed by running:
wget https://storage.googleapis.com/gresearch/infinite_nature_zero/infinite_nature_zero_ckpts_data.zip
unzip infinite_nature_zero_ckpts_data.zip
Running Infinite Nature
The assumption is that input images are at resolution 256x256.
python -m pvg_lhq-test
This will run 100 steps of Infinite Nature Zero using auto cruise to control the pose and save the frames to release-test-outputs/. You will get the following results (note that different runs of the generator will generate different sequences).
References
- InfiniteNature-Zero Learning Perpetual View Generation of Natural Scenes from Single Images
- GitHub
- Research Paper
- YouTube
If you liked this article, please leave a clap and comment. Follow me for more tech blogs.
P.S: I have skipped the architecture of the model in this blog. Please comment if you want me to share that in detail in another Blog.
Generating 3D Drone Flythrough View from Single Image was originally published in AR/VR Journey: Augmented & Virtual Reality Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.