Megapixel resolution calculator

Megapixel resolution calculator driver#

Methodĭuring the first stage, we train our base model (Figure 2) by sampling two frames x s and x d from a random training video. The canonical volume is processed by a 3D convolutional network G 3D, and the driving volume v s → d is orthographically projected into 2D features and processed by a 2D convolutional network G 2D, which predicts an output image ^ x s → d.

Megapixel resolution calculator driver#

The first warping removes the source motion from the appearance features v s by mapping them into a canonical coordinate space, and the second one imposes the driver motion. They are used to predict the 3D warpings w s → and w → d via the separate warping generators W s → and W → d. These representations consist of the explicit head rotations R s / d, translations t s / d, and the latent expression descriptors z s / d. In parallel, we predict the motion representations from both the source and driving images using a motion encoder E mtn. To encode the appearance of the source frame, we predict volumetric features v s and a global descriptor e s from the source image via an appearance encoder E app. We address this problem in a novel way by combining supervised and unsupervised training and achieve considerably better performance for arbitrary motion data than the solution based on SISR. These classic approaches rely on supervised training procedures with an a priori known ground truth, which we cannot provide for the novel motion data since we only have one image per person. However, the quality of the outputs of the one-shot talking head model varies greatly depending on the imposed motion, which results in poor performance of standard SISR methods (Yang et al., 2020). This way, we require only the dataset of still high-resolution images for training, which is easier to obtain. Alternatively, this problem could be treated as single image super-resolution (SISR). This problem further restricts the enhancement of the output quality on the existing datasets using the standard high-quality image and video synthesis techniques (Wang et al., 2018c, b).

The resolution of the talking head models is currently upper bounded by the available video datasets (Chung et al., 2018 Wang et al., 2021), which contain videos of at most 512 × 512 resolution. In contrast, our method can impose motion from an arbitrary video sequence on an appearance obtained from a single image while still achieving megapixel resolutions of the renders. While all these methods can achieve an impressive realism of renders and fidelity of motions, they require multi-shot training data, are trained separately for each avatar, and often fail to represent motions unseen during training. These methods have different ways of handling the non-rigidity of motion and either learn it from scratch (Yang et al., 2021 Park et al., 2021a, b), use pre-trained motion extractors (Gafni et al., 2021) or pre-computed coarse meshes (Lombardi et al., 2018, 2019). The recent success of neural implicit scene representations (Mildenhall et al., 2020) for the problem of 3D reconstruction has inspired several works on the so-called 4D head avatars (Park et al., 2021a Gafni et al., 2021 Park et al., 2021b Yang et al., 2021 Lombardi et al., 2018, 2019), which treat the problem of appearance and motion modeling of the avatars as a non-rigid reconstruction of the training video. Real-time operation and identity lock are essentialįor many practical applications head avatar systems. Real-time and locks the identities of neural avatars to several dozens of Lastly, we show how a trained high-resolution neuralĪvatar model can be distilled into a lightweight student model which runs in High-resolution neural avatars, outperforming the competitors in theĬross-driving scenario. Weĭemonstrate that suggested architectures and methods produce convincing Rendered image quality and generalization to novel views and motion. Video data and high-resolution image data to achieve the desired levels of We propose a set of new neuralĪrchitectures and training methods that can leverage both medium-resolution

Synthesis, i.e., when the appearance of the driving image is substantiallyĭifferent from the animated source image. Resolution while focusing on the particularly challenging task of cross-driving In this work, we advance the neural head avatar technology to the megapixel