Dynamic Texture Modeling
by
Radek Richtr
A dissertation thesis submitted to
the Faculty of Information Technology, Czech Technical University in Prague,
in partial fulfilment of the requirements for the degree of Doctor.
Dissertation degree study programme: Informatics Department of Theoretical Computer Science
Prague, May 2018
Department of Pattern Recognition
Institute of Information Theory and Automation
Czech Academy of Sciences
Pod Vodárenskou Věží 4
182 08 Prague 8
Czech Republic
Copyright © 2018 Radek Richtr
________________________________________________________________________________________
This dissertation thesis deals with a novel dynamic texture model. The proposed toroidal model can represent a vast amount of possible dynamic textures realisation. The proposed model is based on triple toroid-shaped tiles and capable of texture analysis, synthesis, enlargement, and editing. The crucial toroidal property allows synthesizing infinitely large output simultaneously in a spatial and temporal domain. Furthermore, the analysis and synthesis parts are entirely separated.
This dissertation thesis presents several techniques to enhance the visual quality of results and their potential amount. Particular attention is therefore paid to a dynamic texture inpainting problem, especially inpainting with a dynamic background which is generally recognized as a challenging problem.
Afterwards, the dynamic texture similarity the problem is addressed and a novel Fourier transformation based criterion is presented. Subsequently, with proposed criterion utilising and psycho-physical test, the results of stated inpainting method and synthesizing result are validated.
Finally, a model capability, particularly its editing abilities, to extend the existing textural BTF model to the dynamic domain is discussed and furthermore illustrated. Consequently, the novel DBTF model for representing a dynamics object is introduced.
Keywords:
Dynamic texture, temporal texture, video texture, dynamic texture synthesis, dynamic texture editing, inpainting, error concealment, textural similarity, textural models, toroidal tile.
In particular, the main contributions of the dissertation thesis are as follows:
Creating a sufficiently general and descriptive model that offers a more in-depth insight into dynamic in the dynamic texture. The model is capable of synthesizing infinitely large output simultaneously in spatial dimensions and temporal dimension (see Chapter 3).
The novel toroidal model capable of the dynamic textures editing is presented. The method allows to suppress visual disturbances, edit texture property like dynamics and color tone or create the mix-of-DTs. Due to temporal editing method, the interactive dynamic texture that consists of states with varying dynamics can be created (see Chapter 4).
The proposed dynamic texture inpainting approach yields to handle challenging object removal case where the object to remove is placed on the dynamic background. The proposed method focuses on the perceptual quality in term of human perception and produces visually high-quality results (see Chapter 5).
The novel multidimensional spatiotemporal frequency criterion regarding human perception and a different way of perceiving spatial and temporal dimension is developed and validated through series of psycho-physical tests (see Chapter 6).
The Bidirectional Texture Function is currently the best visual texture representation of various textured materials which can be simultaneously modeled and acquired. The possible combination of BTF and DT model is addressed in this dissertation thesis. The stated combination allows developing novel medium for dynamic material under varying observing and illumination conditions (see Chapter 7).
________________________________________________________________________________________
First of all, I would like to express here my great gratitude to my dissertation thesis supervisor, Prof. Michal Haindl He has been a constant source of encouragement and insight during my research and helped me with numerous problems, professional advancements and especially proofreading.
Special thanks go to the staff of the Department of Pattern Recognition of the Institute of Information Theory and Automation, who maintained a pleasant and flexible environment for my research. I would like to express special thanks to the department management for providing most of the funding for my research. Further thanks belong to the staff around the SAGElab for providing psycho-physical tests background.
My research has also been partially supported by the Czech Science Foundation as project No. 14-10911S, by the Czech Science Foundation as project No. 102/08/0593, and SGS SGS15/117/OHK3/1T/18 and SGS14/102/OHK3/1T/18.
Finally, my greatest thanks go to my family members, for their infinite patience and care.
Dedication
Rosebud. Always.
________________________________________________________________________________________
*
________________________________________________________________________________________
List of Figures
*
________________________________________________________________________________________
List of Tables
*
Miscellaneous Computer Graphics Abbreviations
Miscellaneous Methods Abbreviations
Miscellaneous Abbreviations
2D Tile
Various operators and functions
Overlap error
Optimal cut
Spatiotemporal tile and patch
DT editing
DT inpainting
DT similarity criterion
DBTF
Graph Theory
The use of visual textures allows the concentration of considerable amount of information within a single medium. While for designers a texture adds material feeling to design and allows the improve simple geometry, for computer scientist textures provide a considerable amount of information as a key component to image analysis, human perception understanding, scene segmentation and texture synthesis.
In many cases, the static images are not enough to obtain the sufficient amount of information to scene content analysis and recognition. Multiple images of a scene from a different viewpoint or at different time instants can add a desired semantic description. For example, the movement vector of some object in the scene, the weather and wind direction determining from water surface or forest images, detection of movement behind scene object and many other. For many processes analysation, the entire sequences of images (textures) are crucial or even necessary.
This necessity is due to the fact that the crucial information is not detectable by the mere scene geometry in a one-time interval or by the scene radiometry but in the scene and its evolution over time, or its dynamics only. Even though this dissertation thesis is devoted to analysis, modeling, synthesis and editing of images of scenes that exhibit varying statistics in the scene geometry and radiometry through time series which is closely related to analysis, modeling, synthesis and editing of still texture the problem definition and topics overview will by often presented in still texture domain first. Video sequences that have certain homogeneous visual and dynamic properties will continue to be called as dynamic textures from now on.
Unfortunately, neither static nor dynamic rigorous mathematical texture definition exists. Dynamic textures (DTs) can be vaguely defined as spatially repetitive motion patterns exhibiting homogeneous properties in the temporal well as spatial domain. Dynamic textures examples might be smoke, haze, fire or liquids, also waving trees or straws or some moving mechanical objects.
Visual texture modelling is the crucial part for any computer-based visualisation application because whatever size is the measured texture (or generally any data), it is always insufficient and requires its enlargement to cover the required visualised object’s surface area. The typical example in still texture domain is the recent most advanced visual surface material representation in the form of the bidirectional texture function (BTF). Dynamic texture synthesis editing aims to create a visual texture which is perceptually similar, ideally visually indiscernible, to the target dynamic texture and has the required spatial and temporal extent.
Texture inpainting and error concealment (although it can also be seen as texture editing), unlike classical texture synthesis, does not enlarge the dimension of the resulting texture and instead removes inconsistent or unwanted objects in the texture. The original motivation for image inpainting was to reconstruct damaged images or videos frame by frame due to the ageing of stored media or noise.
From the original task that consists of error area and artifact restoration, methods continuously generalize to the problem of editing and object removal. This task has been intensively studies in the area of image restoration, and many useful results have been reached. However the subject of the video (or dynamic texture) inpainting is far less extensively studied, probably because of the complexity of high dimensional data processing and due to obstacles caused by the need for homogeneity of the temporal dimension. Since more methods can generate the desired results, it is necessary to compare the results between each other to obtain the optimum results. Method results comparing can be achieved either by psycho-physical tests or via a suitable validated similarity or quality metric. Emphasize that the term (video) inpainting in the context of this thesis does not just mean to recover missing or unwanted data but to synthesize it with regard to the whole dynamic texture. Therefore, in the interest of the quality, borderline patch error and homogeneity of the texture as a whole the minimal residual part of the input texture could be sacrificed to a certain extent.
As the term texture is crucial for this thesis topics, the term definition and usage is first briefly discussed in the Section 1.1, followed by a discussion of a more specific term dynamic texture in Section 1.2. The general complex texture models are briefly described in Section 1.3. The dissertation thesis motivation (Sec. 1.4), problem statement (Sec. 1.5), goals (Sec. 1.6) and structure are presented in the order given.
The world around us is full of textures and materials which are visually demonstrated as visual textures. The surface of any visible object is, in some scale, textured as well as the apparent visual surface (like a dense forest, or a crowd of people, thick bushes, …) because also more object can be in some scale perceived as a texture. A spectrum of visual textures is extremely wide from natural surfaces of a wood plant, skin, artificial surface like metal, car paint, linoleum, or complex visual surfaces like grass, hair, dense forest or crowd of people. In both real and apparent surfaces the texture refers to surface characteristics and appearance in terms of color, arrangement, roughness, glossy, coarse or fine, reflectivity, etc.
Texture can be obtained as a measurement from the real world by photographing or measuring with a specific photometric device. Also, texture can be painted or rendered from virtual scene artificially by computer graphics tools.
Here is proper to say that there is one very specific sort of textures which has a very tight connection to human perception - sound textures. Although the sound texture synthesis is interesting topics (see, e.g. [?, ?]) and even sound texture can be present in some natural dynamics visual texture (i.e. the sound of water in waterfall scene, birds song in the dynamic texture of the forest) this dissertation thesis is strictly focused to visual dynamic texture without any sound texture track.
The basic division between tactile textures and visual textures is represented by a primary sense that the given texture is perceived. Tactile texture directly refers to a tangible human feel of real surfaces. Visual textures refer to human visual perception. The impression of visual texture is more than the impression of tactile textures influenced by the environment - illumination direction, intensity and spectral characteristics, optical conditions of environment, but also spectral properties of the surrounding surface as well as the observer’s own (i.e., the properties of the eye). This dissertation thesis is focused on visual textures, so the term texture is therefore used as phrase visual texture although this will be highlighted in some specific situations.
After the insight into some dictionaries, various definitions of the term texture can be found. The terms are related to vast characteristics and research or art areas as painting, taste, music, etc. More relevant definitions are related to structural arrangement or fabrics. The various definitions of textures, for example, consist of:
English oxford living dictionary:
”The feel, appearance, or consistency of a surface or a substance.” |
Merriam-Webster:
”The disposition or manner of union of the particles of a body or substance; the visual or tactile surface characteristics and appearance of something.” |
Dictionary.com:
”The characteristic physical structure given to a material, an object, etc., by the size, shape, arrangement, and proportions of its parts.” |
Art Vocabulary Terms:
”The tactile quality of the surface of a work of art.” |
American Heritage Collegiate Dictionary:
”Something composed of closely interwoven elements; the structure formed by the threads of a fabric.” |
The vast spectrum of characteristics as a representation of the original 16th-dimensional function lead to many purpose-oriented models like Bidirectional Texture Function, Spatially Varying Bidirectional Reflectance Distribution Function, Bidirectional Surface Scattering Reflectance Distribution Function, etc. Depending on the model, textures are represented by two or more dimensional images or sets of images.
Visual texture
Although textures are remarkably often used in computer graphics and science neither visual nor tactile rigorous mathematical texture definition exists, and term texture is often considered to be very vague and obscure. The reason is probably the extremely wide spectrum of properties and close relation to particular human perception. Even worse, in many cases, the key features are clearly contradictory or almost mutually exclusive.
The range of these properties covers regularity versus stochasticity, uniformity versus distortion, sharpness versus noisiness, locality versus global properties. Moreover, these properties often fluctuate with the distinctive texture scale and view angle.
From a large number of approaches and researchers, let’s name the most basic ones which differ fundamentally from each other and represent basic views and approaches. The different texture definitions usually consequently lead to strictly different computational approaches to texture synthesis and analysis. Francos, Meiri and Porat[?] defines a texture vaguely as a structure which is made of a large ensemble of elements that resemble each other very-much, with some kind of an order in their locations, so that there is no one element which attracts the viewer’s eye in any unique way. Although the definition is intuitively correct, it is in many cases inadequate, even inaccurate in terms of visual perception and visual quality. The robust self-similarity criterion with no irregularity to ”attract viewer’s eye” leads to material-like or poor perception (see. i.e. marble texture in Figure 1.1).
Haralick[?, ?] defines texture as an ”organised area phenomenon”. As a typical structural approach, this leads to texture decomposition into ”primitives” with some specific spatial distributions. The structural decomposition is strongly driven by the human visual experience of visual textures. The spatial structure of primitive objects (i.e. apples, leaves, or scales on in image 1.1) is seen as a realisation of a stochastic field interestingly satisfy many of different texture definition in a similar way as a primitive itself. The primitives can be realised not in a stochastic way but some higher order placement (i.e. wool, knot pattern, tile floor). With both principle (stochastic and regular) structure approach can cover a wide variety of visual textures.
In apparent contradiction to structural approaches is stochastic one, i.e. based on Markov Random Field[?]. Cross and Jain[?] consider that texture is ”A stochastic, possibly periodic, two-dimensional image field.” This approach is popular probably due to the possibility of stochastic field mathematical notation and therefore a more explicit link with multidimensional mathematical processes, unlike structural approach. For example, texture definition as a stochastic process that generates the texture is possible with the stochastic approach.
Without limitation to generality and a particular model, the consensus across the community says that the key textural property is spatial and temporal homogeneity. From the structural point of view satisfying the need for homogeneity require a repetition of texture primitives and therefore some degree of self-resemblance. Even if the exact primitives can vary in color, illuminance or shape the basics, visual appearance is similar. Typical examples can be fractal[?] or building block, tiled floor, or kilts and tartans. The statistical approach defines homogeneity as statistical stationarity. Therefore the self-similarity is required at the level of the statistical values of the signals. The degree of homogeneity can serve to classify the texture into several basic groups. For example, Zhou[?] classifies the texture into the homogeneous, weakly-homogeneous, and inhomogeneous classes.
Texture Synthesis
The using of texture is a quintessential part of almost any computer-based visualisation application. Because of the desired object which needs to be textured can has any size the appropriate amount of texture is needed. Although in some exceptional cases the measured amount of data may be sufficient, it may easily prove inadequate if the size of the object is changed. What’s more, sufficiently precise measurement can be resource-intensive, time-consuming or even impossible, i.e., reconstruction and preservation of incomplete museum artifacts. The typical example of a precise but time-consuming medium in still texture domain is the recent most advanced visual surface material representation in the form of the bidirectional texture function.
Texture modelling and synthesis aims to create a visual texture which is perceptually similar, ideally visually indiscernible, to the target original texture and has the required spatial extent. The synthesized or enlarged texture should have similar, or the same perceptual characteristics as stated for example De Bonet[?] but should also avoid to verbatim repetitions[?] of some pattern like tileable textures (see Figure 1.2).
Texture enlargement and synthesis are closely related to texture model approach. The structural approach, therefore, leads directly to intelligent sampling - picking some representative patches of appropriate shape from given input texture sample of limited size and consequently using them to generate the desired size texture. Statistical approach based synthesis is used in some defined mathematical models, often with relation to a particular physical phenomenon. The generated texture is created according to given analytical function. Of course, the hybrid methods, i.e. as propose Nealen[?] are possible, usually trying to pick the advantages of both approaches.
At first, it is good to say that dynamic textures are sometimes called as temporal textures in some literature, e.g. articles as by Szummer[?], or video texture, e.g. articles by Schödel[?]. Both terms share a large part of the characteristics but due to the excessive generality of both terms which may in principle cover an unnecessary amount of data the term dynamic texture[?] will be exclusively used in this dissertation thesis. Remind, that the similar areas as i.e. motion textures[?] are not part of this thesis topic, as its do not model the time-varying spectral information but the motion measurement itself.
Similarly to other types of visual textures mentioned above, there is neither exact definition of the dynamic texture [?, ?]. The Dynamic texture can be characterised by some spatially invariant statistics like other visual textures [?], but additionally, temporal invariance through particular images is also required. Consequently, the spatial and temporal homogeneity is required. It is important to say that the spatial domain homogeneity and the temporal domain homogeneity have fundamentally different characteristics. This can readily be seen by a simple experiment: Imagine some dynamic texture, e.g. the irises on Fig. 1.2. For human perception the content of the dynamic texture is clear and distinct - yellow flowers of irises moving in the wind. If the spatial axes will be switched, the human recognized content should be preserved. Moreover, the DT’s dynamic still remains intact. When any spatial axis (for example, horizontal dimension) will be changed with the temporal axis (or, geometrically said, the whole data block will rotate around Y axis by 90 degrees) the resulting dynamic texture will be for human perception not the video of irises, but a video with oscillating yellow / green lines. Moreover, the dynamics of the resulting dynamic texture will drastically change. The experiment can be well viewed on volumetric view of a dynamic texture Irises on Figure 1.3.
In many typical cases, the static images are not enough to obtain the sufficient amount of information to analyse. The recognition capabilities can be strictly limited as an image is only one-time interval realisation. For example speed of some object, or its movement direction needs the set of images and thus set of more time interval observation for an effective estimation. Multiple images of a scene from a different viewpoint or at different time instants can add a semantic description which. For many processes analysation, the entire sequences of observations are crucial or even necessary. This necessity is due to the fact that the crucial information is not recognizable by the mere scene geometry in the one-time interval or by the scene radiometry, but also in scene dynamics.
The fields of application for dynamic texture consist of video restoration and recognition, animation, computer games, virtual reality, dynamic object modeling, video rendering and many other. The basics advantage of dynamic textures allows, due to temporal high-order characteristics, bypassing difficult modeling of some complex spatial structures and their dynamics in particular. Moreover, using dynamic textures enables to keep photorealistic material with plausible and real dynamics.
From interminable types of dynamic textures, the examples could be grass, trees in the wind, flowers, smoke, foliage, snowing, insect swarm, waving flags, human crowd, sea waves, haze, fire, and many others. The determination of a dynamic texture by classes are difficult problem itself with tight connections to human perception. An immense space of possible dynamic texture realisation with a different color, illumination of structure can be seen on Figure 1.3 as a small subset of WATER type DTs examples.
A real material surface reflectance is a complex physical phenomenon which depends on many physical conditions. The general reflectance function[?] can be expressed in form of 16-dimensions function:
(1.1) |
where ς_{i} is incident light spectral value. The triplet x_{i}, y_{i} and z_{i} denote the illuminating surface location in time t_{i}. ω_{i} = [θ,ϕ_{i}] is illumination spherical incidence angles observed at time t_{v}. Surface location is denoted by x_{v}, y_{v} and z_{v} with spherical reflectance angles ω_{v} = [θ_{v},ϕ_{v}] and appropriate spectrum ς_{v}. ω_{t} = [θ_{t},ϕ_{t}] are the corresponding transmittance angles. Note that ω = [θ,ϕ] are the elevation and azimuthal angles, respectively.
Clearly, GMF is an extremely complex function to measure or even model. Thus more suitable model for visual texture representing is needed. The wide spectrum of simplifying assumptions from general GMF leads to particular specific model as the existence of a subsurface scattering, texture isotropy condition, independence on simultaneous rotation of illumination and viewing azimuthal angles around the surface normal and many others. From complex GRF taxonomy, the following model has the closest relation to this dissertation thesis topics.
The Bidirectional Surface Scattering Reflectance Distribution Function (BSSRDF) is a 9-dimensional model proposed by Nicodemus[?] as a visual texture reflectance representation:
(1.2) |
where parameters are a GRF (1.1) model generalisation based on specular (reflected light) and diffuse components that are determined by the light scattering in surface micro-structure. Bidirectional Surface Scattering Reflectance Distribution Function is currently the most general yet, with some simplifying assumptions, suitable GRF submodel. Note that no satisfactory BSSRDF dataset was measured yet and only approximate visualization methods exist, but still, BSSRDF is probably the best GRF approximation which is possible that will be measured in the near future.
Bidirectional Texture Function (BTF) is 7-dimensional reflectance model and current visual texture state-of-the-art. The model is defined as a BSSRDF subspace:
(1.3) |
and was (in monospectral variant) first proposed by Dana[?]. The BTF is currently the best visual texture representation of various textured materials which can be simultaneously modeled and acquired even for nonplanar and opaque surfaces. The process of BTF measuring is time consumable and complex, but the results are currently the most high-end and physically correct surface materials appearance modeling. Above all, the adaptability of the model to the illumination conditions is extremely wide.
BTF as current state-of-the-art is used in top computer graphic applications, photorealistic material modeling , filming, virtual reality modeling, museums preservation, visual scene analysis, recognition of complex real-world, biometric and medical applications and many others.
Multispectral visual texture The well known and widely used model for GRF is a simple multispectral visual texture:
(1.4) |
where both viewing and illumination direction if fixed. Emphasize that contrary to frequently misunderstanding the function is 3-dimensional with two spatial and one spectral dimension. The VT is usual and mostly utilized form of a texture, or generally, any image representation.
Monospectral Texture Recall, that monospectral texture obviously miss the ς parameter:
(1.5) |
Procedural Texture The model similar to (1.4) is already mentioned material procedural (grayscale) texture in monospectral:
(1.6) |
or multispectral color variant:
(1.7) |
Dynamic texture Finally, the dynamic texture model is defined as a multispcestral texture (1.4) with dynamics and so in the most general and pure form as
(1.8) |
Texture synthesis In the computer graphic and more specifically in pattern recognition, the study of dynamics textures dates three decades back. The pioneering works perform dynamic texture classification based on temporal features extracted by segments of dynamic textures. By many years the definitions of this medium type vary from temporal texture, video texture, spatiotemporal textures, dynamic texture and many other. The medium was more or less precisely described in many different ways [A.5] based on statistical or structural properties. While the structural approaches typically exhibit high-quality results, the statistical ones present general methods but typically more time complex ones.
The methods typically try to generalise the pure spatially based approaches but lack the temporal dimension effective handling. Thus effective method as [A.2] that can produce high-quality results concerning main medium property - dynamics - is needed.
Texture editing Visual texture modelling is the critical part for any computer-based visualization application because whatever size is the measured texture, it is always inadequate and requires its enlargement to cover the required visualized objects surface area. The available material sample size is either too small for rendering complex and large virtual scenes if we measure material visual properties of real existing objects, or the measurement technique does not allow us to measure larger material samples. The typical example is the recent most advanced visual surface material representation in the form of the bidirectional texture function [?]. The amount of such measured data similarly to dynamic textures is immense, e.g., in the range of terabytes even for such spatiotemporally restricted measurements thus any such texture visualization inevitably requires enlargement and some compression capability simultaneously.
Thus texture editing method[A.3] capable of creating photorealistic synthetic textures, which are either difficult to measure or which even do not exist in nature and is general enough to create a broad spectrum of dynamics textures is desired.
Textural Similarity Criterion The ranking of dynamic texture similarities is a difficult problem due to real material textures complex dependencies on 16 physical observation parameters with a combination of temporal dimension behaviour. This problem is not satisfactorily solved even for simpler static textures Currently the only reliable, but extremely impractical and an expensive option is to exploit the methods of visual psycho-physics [?].
Clearly, criterion[A.1] that correlates well with psycho-physical tests and are capable of ranking the synthesis or inpainting results is missing.
Dynamic Texture Inpainting Inpainting (mostly image inpainting) is intensively researched part of image processing. The original motivation for image inpainting was to reconstruct damaged images or videos frame by frame due to the ageing of stored media mechanical damages, or noise [?]. Later many methods were introduced to use an image and video inpainting not only for reconstruction but also for editing of image or videos in the object removal area.
Even if there are many effective approaches (mostly working with static background), current methods have still many drawbacks. The most of dynamic textures inpainting approaches suffer from false optical flow or produces results with visually easy recognizable artefacts. Even if the synthesized dynamic texture is valid, the shape of a mask is easy observable like a (moving) silhouette in the dynamic background. Another usual problem is time complexity. Time demanding operations are typical, i.e. for convolution nets approaches in which the synthesizing of an even small texture can take weeks to proceed. So some method[A.4] which can overcome these drawback is needed.
This dissertation thesis aims to develop a novel dynamic texture model capable of synthesizing dynamic textures and improving the current methods. The proposed model is applicable in dynamic texture enlargement and capable of synthesising a significant amount of dynamic textures type from water, natural scenes, human-made objects, and many other.
The model generality leads to an ability to edit dynamic textures and even recreate photorealistic synthetic textures, which are either difficult to measure or which even do not exist in nature.
As the model is capable of effectively edit the dynamic textures, the intensively researched image processing topics called inpainting is handled with it.
As the only way to compare the quality of a synthesis and inpainting results are tedious psycho-physical experiments, the mathematical criterion of similarity is final key topics of this dissertation thesis.
The main goals of this dissertation thesis sustain of:
The thesis is organized into ten chapters as follows:
At the beginning of this chapter, in Section 2.1, the necessary theoretical minimum required to enter the topic is introduced. As the primary goals of this thesis is a novel approach to synthesize dynamic textures with using of a triple toroidal-patch, the dynamic texture as a term in relation to other complex textural media is explained first in Section 2.1.1 following with Section 2.1.2 where principles of a toroidal patch are recapitulated.
In the second State-of-the-art section (Sec. 2.2) goals coinciding with dynamic textures and field overlap between them are discussed first, followed with dynamic texture overview.
Whereas the using of synthesis approach to editing and inpainting is a crucial part of this dissertation thesis, the related areas of research covering still and dynamic textures synthesizing and inpainting are discussed next.
Hence any synthesizing method results must be compared in some ways, last part of state-of-the-art section summarises approaches for texture fidelity benchmarking, similarity computing and psycho-physical test performance to compare the quality of synthesis and/or inpainting results.
Let’s mention here what 4D means in the context of this dissertation thesis, dynamic textures and common still texture models. The most general and accurate description of any still visual texture can be obtained by 16D general reflectance function describing the real material textures complex dependencies on 16 physical observation parameters as postulates Haindl in [?].
The more general 9D bidirectional surface scattering reflectance distribution function (BSSRDF) is defined by Nicodemus in [?] as light transport within the interaction of the light beam with material surface.
Bidirectional texture function as introduced by Dana[?, ?] is BSSRDF generalisation with no surface scattering and thus depends on illumination and viewing direction and texture spatial coordinates resulting in 7D function.
Presumptions on material properties influencing light scattering have resulted in more disposable approximative 5D material representation called bidirectional reflectance distribution function. The bidirectional reflectance distribution function derivation is based on BTF representation of spatial dependency on the points of light incidence and reflection.
The most straightforward way to describe a 2D texture with spatial dimensions, say, u,v (width and height), where values at given coordinates can be defined as magnitude on black-white spectre. Color still 3D textures are natural extensions with two spatial and one spectral dimension, where the last dimension can be one channel from RGB, HSV, or in any other color model.
The solid color 4D textures are very specific procedural textures of material (e.g. wood, marble, …) with three spatial dimensions (width, height, depth) and one spectral dimension. On the contrary, the dynamic color textures (usually called as dynamic textures) are a direct extension of still color textures with two spatial, one spectral and one temporal dimension (see Table 1.1).
The most general term relevant to dynamic textures are video textures as postulate Schödl in [?] or as a general temporal autoregressive model by Szummer[?]. While video textures represent a general infinitely varying stream of any pictures with no exact limitations and represent general medium, dynamic textures as defined by Doretto[?] [?] are a subset of video textures and as a linear dynamical system (LDS) are restricted by many limits.
Although neither static nor dynamic (or temporal) rigorous mathematical texture definition exists[?], dynamic textures can be vaguely defined as spatially repetitive motion patterns exhibiting homogeneous temporal properties[?] (i.e., spatial and clearly discernible temporal homogeneity between frames). Examples might be grass, haze, fire or liquids, also waving trees or straws or some moving mechanical objects. So the dynamic texture can be defined as a realisation of 4D stochastic random field. Like 2D and 3D textures, many of them can be modeled only as a mix of several dynamic textures[?, ?] or by layered dynamic textures[?].
In general it can be said that dynamic textures must have certain spatially invariant statistics[?] (be homogenous) and clearly discernible temporal homogenity between frames. Similarly, Doretto[?] postulate dynamic texture as generative video model defined by a random processed with some observed variable and hidden variables.
Three significant properties[?, ?] are present in dynamic textures - the static texture itself, global dynamics and local dynamics. They intuitively define whole texture in the meaning of human perception and texture classification.
The static texture represents a structure of the scene, and its objects, its visual appearance, illumination, color in one DT frame. In the other words, the structure is analogous to still texture and represents whole frames of DTs. The preservation of these properties is crucial[?, ?, ?] to preserve the visual quality appearance of dynamic textures in the meaning of geometric (G^{0}) continuity.
The second significant property represents moving of whole scene (handshake, camera motion) and typically must be fixed by appropriate techniques[?].
The third property - local dynamics - is usually caused by the periodic motion of a small object in the texture (i.e. oscillation or overlapping and disappearing of small objects) and create the local optical flow (OF). Not preserving of the optical flow causes a violation of G^{{1,2,3}} continuity curves - inconsistencies in dynamics due to the entire image, create a false optical flow, or abrupt changes in OF velocity.
Optical flow Optical flow determining is one of fundamental problems[?] in the processing of image sequences. Its goal is to compute an approximation of the 2D or 3D motion field of velocities and directions and thus add spatiotemporal information to spatial image structure. The great potency to use such information reflect to wide research determined to this extensive field - many methods for optical flow with various approaches has been proposed.
As the optical flow computation becomes more standard and widely used process, for example in many computer vision libraries, variation of approaches grows. Today the optical flow approaches can challenges problems associated with complex natural scenes, including nonrigid motion, real sensor noise, motion discontinuities, automated tracking systems and many more.
Clearly, a fully comprehensive survey is beyond the scope of this paper. For this dissertation thesis the dense optical flow[?], widely used traditional pyramidal Lucas-Kanade[?, ?] can be used with advantage of simplicity in case of larger structures movement detection. For handling both rough and detailed local dynamics the optical flow prove itself to be satisfactory[?, ?, ?, ?].
Synthesis Approaches The video synthesis (and thus the dynamic texture synthesis, too) is extensively studied in past three decades. Many contributors try to established sufficient and exact statistical generative models of video sequences by many different approaches which can be divided by many characteristics.
The two main approaches can be applied to the dynamic texture modelling - the deterministic structural models approach based on the measured data sampling (intelligent modeling) and the mathematical statistical model based approach. The mathematical models can be further divided to two basic categories, while the first is procedural model devoted to some particular physical phenomena. The second mathematical model category are adaptive models like i.e. LDS or autoregressive models.
Note that the 4D procedural models for color 3D-spatial material textures (i.e. models of marble, wood) are offtenly taken into this class even if their labelling as ”dynamic” is questionable and comes from the number of dimensions. On the other hand models for behavior of water surface, hair, flames, or trees and plant is comes from this category.
The other distinction is the basis of synthesizing approach, which can be whole frames, spatial or spatio-temporal data patches, pixels, on other statistical basis functions (like Gabor filters or steerable filters).
Finally, the last major distinction is between a physically-based model that simulates some (usually very specific) DT types like water or gaseous phenomena i.e. by NavierStokes equations or fluid motion hydrodynamics and ad hoc experimental model with no more less physically-based principles i.e. on Potts field.
Mention that, of course, the division by dimension of output is possible as some methods can synthesise results only in some dimension (temporal enlargement) or are strictly limited in spatial dimension (synthesize only a multiple of original DT size) while others can create infinitely large results. Of course, an even infinitely large output may not be sufficient as the result of some method continually degrades.
Dynamic texture classes An extremely wide range of potential realizations of a particular dynamic texture requires their clustering to some particular classes. Although it is possible to use specific mathematical criteria for some sorting of textures (such as average color, homogeneity, entropy, …), it is more practical[] to use a rather semantic category. Note that the dynamic texture classes are addressed as FOLIAGE (marked with small uppercase caps) and the concrete dynamic texture realisation (measured or synthesized) as Walk (small caps). Due to the inaccurate definition of the dynamic texture itself, their classification into these thematic classes is of course very fuzzy.
Deterministic structural models approach based on the measured data sampling varies in the form of used measured data. The used patch form greatly determinate the synthesize (inpainting, editing, …) methods spectrum of outputs, their quality and possible method limitations.
Mention here, that the word ”tile” (noted as ) is used as the shape of the data used by given method regarding spatial and temporal dimensions. The term ”patch” (or ) then refers to the specific tile content given by the input data. Usually, one tile can generate more patches. The ambiguous term ”patch tile” then covers both of the above meanings, including their properties simultaneously.
If more than only one input dynamic texture is used, the terms tile and patches are used in a similar way. The tile term is usually shared by more inputs and addresses its shared properties. The patch term is then bound to concrete data from concrete dynamic texture realization.
Double Toroidal-Shaped Tile One of the possible and well-known forms of the data patch tile is a double toroidal-shaped tile which is created from given input (dynamic or still) texture sample. The double toroidal-shaped tile is a tile of which the top edge optimally fits the bottom edge, and simultaneously the right edge optimally fits the left edge. For calculating such type of tiles an (sub)optimal fit of opposite edges must be achieved to suppress the visual perception of edge existence. For creating nontrivial and visually high-quality results more than one tile must be found, so computed suboptimal tiles are a compromise between multiple tiles. For tile of size of M pixels in height and N pixels in width, the top vertical edge is common parametric curve:
(2.1) |
(2.2) |
and subsequently for horizontal edges of tiles:
(2.3) |
(2.4) |
Thus tile convex surface can be written as:
(2.5) |
(2.6) |
(2.7) |
Toroidal mapping To define mapping between toroidal surface and flat double toroidal tile let’s use for simplicity rectangular tile (Figure 2.1) with sizes M a N.
(2.8) |
The toroidal tile principles are based on a torus mapping. Here lets repeat the basics toroidal principles with illustration of a torus mapping. A rectangular double toroidal patch that correspond with unfolded surface of a torus has the size of M pixels in height and N pixels in width. Introducing both horizontal and vertical overlaps extends the M × N unfolded double toroidal patch by h additional rows and v additional columns.
The given rectangular texture patch is assumed to be indexed on the regular two-dimensional lattice:
(2.9) |
where M denotes the number of rows and N the number of columns (See Figure 2.1, the darker areas on left/right and top/down are the overlapping areas.) of the patch tile. In case of rectangular double toroidal texture patch:
(2.10) |
The surface of a torus τ can be defined parametrically using poloidal and toroidal coordinates as follows:
where s is poloidal coordinate, t is toroidal coordinate, R_{2} is the radius of the tube and R_{1} is the distance from the center of the torus to the center of the tube.
Mapping between rectangular lattice I_{r} and the torus surface τ, poloidal and toroidal angles are utilized as a generalization of poloidal and toroidal coordinates, respectively:
(2.15) |
Then, f_{τ} : (r_{1},r_{2}) → (s,t) is desired mapping function from the pixel position r = [r_{1},r_{2}] ∈ I^{T } to the poloidal angle s and toroidal angle t of a torus τ. They respectively, are defined as follows:
For all possible values from the range of f_{τ}, i.e., any pair of s ∈ and t ∈, inverse mapping f_{τ}^{-1} : (s,t) → (r_{ 1},r_{2}) can be meaningfully defined using the following formulas:
Using parametric equations (2.11), (2.12), (2.13), and (2.15) of the torus τ, poloidal and toroidal angle can be projected onto the surface of τ. Obviously, the pair (s,t) projects to any point P[x(s,t),y(s,t),z(s,t)] as well as pairs (s + 2kπ,t + 2lπ), where k,l ∈ ℤ. Such projection results in overlapping pixels.
The spectrum of research work in this dissertation thesis covers several areas of Computer Vision, Texture Fidelity and Texture Analysis and Synthesis. Therefore this chapter consists of several themes and related topics overview. At first, related textural models are discussed here as key topics for dissertation thesistheme. Next, the other topics work as inpainting and texture similarity are discussed.
The classical and the intuitively best way to create a physically correct model of the natural scene for obtaining desired render (video). In this approaches the precise geometric created manually, semimanually using photogrammetry [?], or physical model is created and later by (usually approximated or simplified) physical laws simulated through time to create an output video, dynamic or static texture in the desired time momentum under given circumstances.
Many techniques for building physical models were applied for synthesizing textures, mostly the natural phenomena (for a more detailed overview see Barzel[?]). In most cases physically-based approaches are focused to one specific dynamic visual process for example gaseous phenomena [?, ?, ?], fire [?, ?], hair [?], trees and plants [?], or water [?, ?, ?] (with higher or lower approximation of physical laws). Some of them use particle systems which can be used to synthesize fire [?] or other types of textures and goals [?, ?, ?, ?] and [?] where some other approaches are used in cooperation of the particle-like model. Many methods work with three-dimensional procedural textures [?].
Physically-based approaches have proven successful also for walkinggaits [?, ?, ?], mechanical systems [?], and some of these approaches have already been used to synthesize videos representing artificial life scenarios [?].
These approaches have many good qualities - the quality of textures are directly proportional to the modeled scenes details and if the simulated laws are not approximated, but correctly simulated, the output scene can be real as its physical artwork is (which not trivially implies, that the quality of output texture do not decrease by the time). Also, the ability to accurately edit any scene is extremely advantageous as can be seen in Popovic [?].
Unfortunately, physically-based approaches have many disadvantages. Models are usually adapted to simulate a particular physical phenomenon. The flexibility of a created model is low too since the model must often be created for every scene separately. Although they are physical models, still is in major part only approximation of phenomena, and in some cases (i.e., water) it is complicated to establish them.
It is oblivious that in case of simulating scene by geometric models there are two crucial and mutually complementary areas - the exact geometry of scene and material / textural properties. The modelling of the complex scene consisting, i.e. natural objects trees, leaves, grass of complex shape like water surfaces is extremely difficult and time demanding even using auxiliary geometry modelling techniques such as photogrammetry [?, ?, ?, ?] or procedural modelling, i.e. L-systems [?].
Many other approaches attempt not to build a robust model, but simply compute directly on the texture. As we later explain, we can say these approaches belong to statistical models, hence they use some additional information but not to build the exact physical model.
The pioneering work by Nelson and Polana[?] perform dynamic texture (or temporal textures as called by authors) classification based on temporal features extracted by segments of dynamic textures. More interestingly Nelson and Polana suggest categories from visual motion in their later article [?]: temporal textures, activities and motion events. Both activities and motion events can be described more or less precisely - activities (walking, jumping, car movement, etc.) as motion patterns that are strictly periodic in time and localised in space, and motion events (e.g., opening door or falling tree) that do not show temporal or spatial periodicity. In contrast, temporal textures exhibit some statistical regularity but have indeterminate spatial and temporal extent. The term temporal textures has gradually changed into a dynamic texture. Authors continued with [?] which focus to recognition with a model based on optical flow normal component features. Their approach is based only on non-rigid motion instead of (grayscale) picture itself. By this only temporal characteristics are used, and a large amount of information remains unused. This approach based only on optical flow is, of course, inaccurate for many types of textures.
In a later article, Nelson and Polana used spatio-temporal curves to track parts of texture to characterize and find periodic motions by using a centroid of the pack of moving pixels through frames. Every pack has one bounding box divided into smaller rectangular cells, and the motion magnitude in each cell is computed and summed. These sums of motions vectors are used as features for finding a periodic pattern by using of Fourier analysis.
Similar like Nelson and Polana[?] who used only temporal characteristics, Allmen and Dyer[?] used spatio-temporal curves that are tangent to the optical flow by using of Runge-Kutta approximation method for differential equations. They presented some result on synthetic textures (probably to overcome problems with noise, occlusion or disappearing of small objects).
Seitz and Dyer expand their work with another article [?] that goals to detect the cyclic motion. Their work needs fixed viewpoint but works well for some specific types of textures (f.e. X-rays of a heart).
Some other minor work used this approach based on computing spatio-temporal curves. Goulg and Shah[?] try to overcome a problem with disappearing object and regenerate continuity of curves, Anderson, Bur and van der Wall[?] similar like Heeger [?] used the motion energy as features and tries to use them with other techniques (pyramid representation instead of a rectangular grid) to track objects in the texture.
Approaches can also be divided according to their access to the texture data too - most common methods use Gabor filters (i.e. Bigun[?]), steerable filters by Freeman[?], Heeger[?], or Simoncelli[?]. Of course, the finding of the best filter can be part of model learning itself like in Zhu[?]. Many approaches like De Bonet[?], Efros[?], Wei[?] or Lin[?, ?] and others represent the texture in the form of Markov Random Field - the output texture is represented like a regular grid of nodes, where each node represents one pixel in the texture and marginal probability of a pair of nodes depend on the similarity of corresponding pixels data (typically red, green and blue values, but depends to a color model and similarity metrics). By this, the solution can be rewritten like the finding maximal likelihood of nodes in the grid. Usually, some approximation is searched, because this formulation called the problem of probabilistic inference is proven to be NP-hard.
There are some other statistical models, for example, based on fractals and their self-similarity across scales (which determine the fractal dimension). Heeger and Pentland[?] describe a fractal model for simulating gaseous phenomena like clouds, tree leaves and other turbulent flows. Similarly Stam[?] use Fourier analysis to model turbulent wind gaseous phenomena in similar ways as Li[?], who suggest similarly based segmentation model.
When some model, but not precisely physical base, is created and the output texture created by mathematical model simulating, we can speak about parametric model.
The modeling of a dynamic texture as a (linear) spatiotemporal auto-regressive model (STAR) was studied by Szummer[?, ?] who modeled grayscale dynamic textures like a spatiotemporal causal model. Szummers autoregressive model is intuitively, but (as he present it) grayscale limit and computationally demanding.
Later, as said before, Doretto[?, ?, ?, ?] and Soatto[?] presented auto-regressive moving average (ARMA) process based on SVD. This approach contains time-consuming iterative gradient method for parameters estimation and works in most cases on grayscale textures only, but allow to editing synthetized texture by changing of model parameters. Fast synthesis of dynamic color textures was later presented by Filip[?] Chan[?] or Constantini[?, ?] who use tensor decomposition and SVD techniques and others [?][?] presented similar approaches. Since parametric models are very general and usable for recognition[?] it can be difficult to model many types of dynamic and video textures.
The other approaches for the synthesizing of textures are non-parametric and use some samples to represent the texture. Many approaches from this class are focused on enlarging textures only in spatial dimensions, but the addressed problem is similar, and in some cases, it can be generalized to the time dimension (but in major part the simple and direct generalize is impossible). The sample of a texture can be the whole frame like presented Schödl[?] or noticeably smaller like published De Bonet[?] who used multi-scale-filter to analysis and synthesis of texture. Later works of Efros[?], Wei[?] and others improve this area. The part of this class which represents texture, not pixel-per-pixel, but by some larger patch is intensively studied by Efros[?], Kwatra[?, ?] and others[?, ?]. Wang[?] represent texture like a linear superposition of deformable patches which position is derived by MRF. Basically, these approaches are create by intelligent sampling of input data subspaces in different form by copying whole patches from input texture with some offset.
Procedural techniques are a relatively quick solution for the purpose of synthesis but often offer lesser semantic and intertexture view because there are not learnable statistical models which result in the lack of flexibility for purposes as recognition and classification.
The dynamic texture models can be used to synthesizing, texture enlargement, restoration, or many other purposes. The prominent one is the inpainting and error concealment.
The term inpainting was introduced by Bertalmio[?], but methods for hole fitting were researched earlier, e.g., Masnou and Morel[?] replaced arbitrary shaped areas in images by other content by level-lines to disocclude the hole to inpaint or Roosmalen[?] use autoregressive models and MRF-like approach. Inpainting methods often use texture synthesis methods like quilting[?, ?], graphcut [?], exemplars [?], Fourier transformation and decomposing to structure image and a texture image[?, ?], and several others[?]. These methods still mainly concentrate only to image inpainting or works with videos in another ways - i.e., Liu[?] and Huang[?] used a similar method to video stabilization.
In the area of DT (or video) processing focused to video inpainting, there are two major approaches, mostly consisting of the previously mentioned synthesizing method and models. The first one is focused on temporal dimension and infinite looping data creation and the second one is focused on spatial dimensions and similarly as the image inpainting, on the hole fitting. The Schödl’s method [?] is a typical temporal focused approach which can be with some adjustment utilised or used as inpainting methods. They typically work frame by frame, but it can only enlarge duration of a video. Similarly, Liao[?] and Agarwala[?] looped whole video by different ways, part of it or only some dynamic texture that occurs in videos as part of mix-of-DTs. The second main approach is focused mainly on spatial dimensions. Although this area was once primarily due to computational complexity neglected, it is now widely researched. Newson[?], Wexler[?], Granados[?], Patwardhan[?], Strobel[?], Ebdelli[?], Huang[?] or Xie[?] are the typical representative of video inpainting approaches, which can handle camera motion, occluding, and different illumination. These methods typically segment input to background and foreground and use small patches and energy minimizing border line error criterion to fill up the holes. Usually, the inpainting task required a user given mask [?, ?] even if is marked as an automatic algorithm.
It is interesting that only a very small part of current approaches (like Kwatra[?], Ding[?], Lin[?], Lguensat[?], Liao[?] or Huang[?]) works with input data as dynamic textures. The non-DT approaches typically suffer from a false optical flow, artifacts from deleted objects [?] or loss of frequencies in inpainted areas [?] and high computational requirements as they usually use computational demanding approaches as EM-like-optimization in [?], or neural convolution nets as in [?].
Nowadays poplar energy-based convolution neural network models like [?] allow to synthesize and inpaint large part of dynamic textures. Even if these approaches are promising in restoring a large number of smaller discontinuous areas like salt and paper noise, they lack the consistent visual quality, creating the visually easy recognizable silhouette and suffer from time-consuming extremal calculations and large learning data requirements.
Another great part that has connection to this dissertation thesis are texture similarity metrics. Whereas the dynamic texture basically consists of a set of images (and moreover because of significant lack of dynamic texture criterions) the similarity ranking of still texture is taken into account.
Mutual similarity assessment and similarity ranking of two or more visual textures is a challenging problem due to real material textures complex dependencies on 16 physical observation parameters as described for example in [?].
Evaluation of how well various texture models conform to human visual perception is essential not only for assessing the similarities between model output and the original measured texture but also for optimal settings of model parameters, for a fair comparison of distinct models, material recognition, etc. This problem is not satisfactorily solved even for simpler static textures [?, ?].
Currently the only reliable, but incredibly impractical and expensive option is to exploit the methods of visual psycho-physics. The methods for psycho-physical testing require a lengthy process of experiment design, tightly controlled laboratory condition, and above all representative panel of human testing subjects. Such testing obviously cannot be performed on a daily basis. Several published static texture criteria allow testing selected texture properties such as the texture regularity as presented by Lin[?], etc. Others like Rubner[?] or Zujovic[?, ?] claim to test general texture quality.
The recent test [?] on a public texture fidelity benchmark of several state-of-the-art image quality measures and several recently published static texture criteria confirms their insufficient reliability and low robustness. The evaluated criteria were - the structural similarity (SSIM) index[?], the visual information fidelity (VIF) methods[?], the visual signal-to-noise- ratio[?] (VSNR), the mean-squared error[?] (MSE), the complex wavelet - structural similarity (CW-SSIM) index[?], and the structural texture similarity measure (STSIM-1, STSIM-2, STSIM-M) by Zujovic[?].The results have demonstrated, that the standard image quality criteria (MSE, VSNR, VIF, SSIM, CW-SSIM) do not correlate well with a human quality assessment of textures at all. Inasmuch as the criteria are inaccurate in the field of static textures, their inappropriate behaviour in the time domain can be assumed.
Although the STSIM texture criteria have a significantly higher correlation with human ranking, they do not successfully solve similarity problem. The textural qualitative fully multispectral criteron by Kudělka[?] based on the generative Markov texture model statistics slightly outperforms the best alternative - the STSIM fidelity criterion.
All previous texture similarity criteria can be formally generalized also for dynamic textures if they are applied to the corresponding frame couples of compared dynamic textures and subsequently combined these partial results. While the mentioned methods like regularity ranking by Lin[?], quality evaluation by Rubner[?], Zujovic[?, ?], or structural similarity by Wang[?] and other method intended for static textures can by used frame-by-frame the criterion directly fitted to a temporal domain are more appropriate.
The another possibility of similarity ranging is of course possibility to evaluate similarity of parameters from known dynamic texture models like ARMA[?], STAR[?] or others. However, admissible ranking only for some oversimplified tests such as identical dynamic textures which differ only in additive noise level can be expected.
As there is no general textural model capable to model vast amount of possible dynamic texture realisation, this chapter consists of our novel model description which can overcome the most of current methods drawbacks. The examples of the enlargement and synthesis results are in Figure 3.1.
The sections in this chapter are as follows. Firstly, the basic principles and potential of the created model are stated in the Sec. 3.1. Next, in the Sec. 3.2, the crucial information about the toroidal method principles are defined. The toroidal method is then extend to temporal dimension in Sec. 3.3 in spatial and in Sec. 3.4 in temporal toroidal property keeping. Finally, the synthesis process is described in the Sec. 3.5.
The toroid-shaped patch dynamic texture model is capable of texture analysis, synthesis, enlargement, and editing. The model is developed concerning maintaining several vital requirements which consist mainly of the visual quality of the synthetic dynamic texture, editing capabilities of the approach, performance of the algorithm and dynamic texture compression. Also, the requirement of creating infinitely large output is a critical feature of the method. A synthesised dynamic texture should ideally produce such a visual perception so that original measured dynamic texture and the synthetic one are visually indiscernible.
The visual quality of the synthesized dynamic texture is naturally determined and limited by the chosen dynamic texture sample (or samples). Moreover, the used sample (or samples) represent whole dynamic texture - its dynamics, structure, color, etc. The representativeness and descriptiveness of the picked sample (or samples) is the essential prerequisite for visually plausible synthetic results. The intelligent sampling allows maintaining these requirements by choosing samples that consist of crucial representative texture parts. Also, observations-nuisance-factors like an artifact of unwanted regularity must be avoided.
The presented method principle extracts one or more mutually interchangeable texture patches with toroid-shape with borders optimized by minimal error boundary cuts. The extracted triple toroid-shaped dynamic texture patches can be due to toroidal property subsequently seamlessly repeated in all data space directions spatial as well as temporal. The method estimates optimal triple toroid-shaped tile and several triple toroid-shaped patches. Dynamic texture tiles are created by combining estimated (dense) optical flow and double toroid-shaped dynamic texture frame tiles. The basic principle of a presented method can be seen in Figure 3.2 together with a comparison to different typical approaches. The bottom three images illustrate methods patch and suggest the computational difficulty of the methods. The top three images demonstrate problems with the keeping of significant frequency from a given input texture.
The crucial characteristic of the proposed method is analytical part strictly separated from the synthesis step. Given these steps separated each part of approach can be implemented independently. The synthesis step then consists only by copying data patches and takes negligible computation. Due to separating and storing the crucial data patches only the compression of dynamic texture is possible. Moreover, this allows to implement different synthesis part - different arrangement, used driven arrangement, or even random arrangement only.
The presented toroid-shaped approach can be used on 4D multispectral dynamic textures, 3D monospectral dynamic textures or even still textures. As the temporal dimension has different visual perception properties than spatial dimensions, the principle of spatial and temporal cuts creating is similar but differs. Two distinct approaches for the spatial and optical flow driven temporal cuts respectively were then developed.
The reason of developing two distinct approaches the spatial and temporal approach instead of one general spatiotemporal method is widely used but an improper assumption of equality of spatial and temporal dimensions instead of accepting the temporal dimension as different and with a unique property. The justification can be easily seen from the spatial and temporal property of dynamic textures - a human perceives and detects errors differently in the space and time domain [?], e.g. the well-known visual phenomenon called filling-in have different behaviour in temporal and spatial domain [?].
The triple toroid-shaped tile is a patch from the original data with the specific property to have toroidal border condition in each spatial (horizontal and vertical) and temporal dimensions. The textural tile is assumed to be indexed on the regular three-dimensional toroidal lattice. The lattice size depends on the estimated texture periodicity.
The optimal tile cut search algorithm which respects its toroidal border condition produces a tile which can be seamlessly repeated in both spatial and temporal dimensions; thus the boundary between two tiles placed abreast is invisible. The main idea of the triple toroid-shaped tiling is to find some relevant and sufficiently representative patches from the source dynamic texture, which can be directly seamlessly copied to the output dynamic texture, where the offset is driven by the patch placed on the three-dimensional lattice of the patch label.
Recall that the term tile noted as represent toroidal shape while term patch noted as is bounded to particular data that fill the tile shape. The dimensions of both media consist of two spatial dimensions only for tile and two spatial and one spectral dimension for a data patch. The extent of tile and patch to temporal dimension can be found in the following section (Sec. 3.3).
Before the full triple toroid-shaped tile method finding let’s introduce the simpler double toroid-shaped tile finding and property. The double toroid-shaped tile, or patch from the original data, is two-dimensional shape with the specific property to have toroidal border condition in each spatial (horizontal, vertical) dimension (See Figure 3.3). The textural tile is assumed to be indexed on the regular two-dimensional toroidal lattice.
For simplicity, let’s assume that only one double toroid-shaped tile. The toroidal property ensures that the top edge has precisely the same shape as bottom tile and simultaneously the left edge has exactly the same shape as right edge. Note that the double toroid-shaped tile described in this section is addressed in one particular frame in whole textural space .
For determining the overlap area size in which the optimal cut can be found the distance between the overlap areas need to be computed (the illustration is in the Figure 3.4). Evaluation of the overlap size is based on pixel error on areas and particularly on the error between corresponding pixels from overlapping marginal and corner parts. The similar pixel error is subsequently used to optimal cut findings in the founded areas.
Let us define the square overlap error arrays _{}^{h} and _{}^{v} and their appropriate elements for a multispectral pixel vector _{r} as follows:
where the dim = 2 vector L_{2}norm is defined as:
The multiindex r has four components r = [r_{1},r_{2},r_{3},r_{4}], where the components are row, column, spectral, and frame indices, respectively. Operator ∙ denotes all values of the corresponding index. Remind that the overlap rectangular regions are defined as:
and remind that whole rectangular tile is defined as:
Note that the corner area evaluation is more complicated. In fact, the overlapping corner area has more than only one corresponding area. The diagonal area part must be taken into account. The corder areas and its index set is defined as:
(3.8) |
To handle corner areas, the error arrays ^{h}, ^{v} and ^{d} with specific horizontal, vertical and diagonal shift and elements _{r}^{v}, _{ r}^{h} and _{ r}^{d} for particular pixel Y _{ r} are calculated as follows:
which can be more comprehensive written down as _{}^{}:(3.12) |
The horizontal and vertical overlap error are calculated as sum of all particular pixel overlap error weighted by appropriate overlap region size on the ζ area.
(3.13) |
The optimal overlap is then found by minimizing the error:
(3.14) |
where ⌊ζ⌋ denotes the corresponding overlap area of ζ. I_{ζ} is the overlap area index set.
The particular total overlap error then consist of differences between overlapping horizontal and vertical regions. The appropriate errors and are calculated as the sum of all pixel errors in the overlap region weighted by the number of all pixels:
for every overlap area type. Or more readily for horizontal and vertical areas:
Considering one corner overlap only the overlapping corner error can be calculated as the following sum:
(3.19) |
With given optimal overlap the exact optimal cut which distinguishes between both cut sides can be found. Every optimal cut then divides overlap area to two-part - top/bottom and left/right which creates the desired toroidal shape (the illustration is in Figure 3.5). The searching for optimal cuts is adequate to find the minimal distance on simple vertex weight labeled undirected lattice graph
(3.20) |
where V is set of all vertices, E is set of all edges. Every edge is a pair of two vertices from V on rectangular 2D lattice with size v × h, h × v and (v - h) × (h - v) for horizontal, vertical and corner overlaps respectivelly.
The E sizes can be - as the G is lattice graph - computed as (M - 1) × N + M × (N - 1). The w is vertices weight function
(3.21) |
that is crucial for successfully representing the overlaps as graph G. The size of V is given by ⌊ζ⌋ and appropriate ⌊I_{v}⌋, ⌊I_{h}⌋ and ⌊I_{d}⌋.
The search for optimal cuts for both the horizontal and vertical edge is equivalent to find minimum cost path in two-dimensional lattice G from the bottom to the top row and similarly from left to right column (with some adjusting). The optimal cuts cannot be found independently - the critical toroidal property must be kept thus some adjustment is necessary. To ensure that the tile optimally fits the adjacent one, horizontal and vertical minimal error boundary cuts are searched within the optimal horizontal and vertical overlap regions only with adjusting the start and end node which must have the same vertical/horizontal index or indices within the ⌊Δ⌋ neighbourhood. This implies that due to toroidal property desired optimal cuts parts have limited degree of freedom - apparently, the cut termination is strictly given by a start part. Moreover, intersections of spatial cuts correspond to vertices comprising the rectangle and cuts in the four corner part must be identical one.
Spatial minimal error boundary cuts are searched in the error lattice ϵ_{}^{h} and ϵ_{}^{h} defined by appropriate ϵ_{r}^{h} and ϵ_{ r}^{v}values respectivelly. The whole spatial cut consists of two adjacent lattices defined as:
which are both defined simply as weighted composition of both appropriate lattice values:
(3.24) |
for the horizontal error lattice and for the vertical error array is defined as:
(3.25) |
The ensures the same range of both error arrays parts. For illustration of cut search and arrays arrangement in DT see Figure 3.5 and Figure 3.6. Remind that due to toroidal property the first and last cut points are the same and thus the error lattices are cylinders. Moreover, as the corner diagonal parts are precisely the same, the lattice can be mapped as torus where a one cut is longitudinal and other is latitudinal.
The desired spatial cuts C_{h} and C_{v} are series of nodes on lattice graph indices in the spatial dynamic texture subspace - frame). The appropriate cuts from regions I_{V } and I_{H} are defined as:
where the length of cuts (total number of cut nodes) are denoted as n_{V } for vertical and n_{H} for a horizontal cut. Note that the n_{V } and n_{H} are not equal to the appropriate lattice size ⌊I_{V }⌋_{v} and ⌊I_{H}⌋_{h} as the series C_{V } and C_{H} are not monotone. In the case of n_{V } = ⌊I_{V }⌋_{v} and n_{H} = ⌊I_{H}⌋_{h} the spatial cuts C_{*} and subsequent i_{0},…,i_{nV } and j_{0},…,j_{nH} are monotone series s_{n} = s_{n-1} + 1. Say that in the case monotonic series the dominant dimension is horizontal one for C_{h} and vertical for C_{v}. If the n_{V } > ⌊I_{V }⌋_{v} the C_{k} → C_{k+1} should be any node that is not from C_{0,…,k}.
The pixel or node neighbourhood for a given r is denoted by Δ_{r} operator. The neighbourhood could be monotone with respect to some dimension which is in all cases the dominant dimension of the area in which the pixel/node come from. The monotone neighbourhood is noted as ^{}.
The neighbourhood Δ^{8} represent well known 8-neigbourhood which is, in the monotonic case of ^{8} and limited to only 3 adjacent values (3.34). The ^{4} similarly denotes 4-neigbourhood. The hierarchical monotonic neighbourhood for given r with size of p is then denoted as _{r}^{p} (see eq. (3.35)).
The neigbourhood Δ_{Ck} for node C_{k} can be defined as 4-neigbourhood (Δ_{Ck}^{4}), 8-neigbourhood (Δ_{Ck}^{8}), p-distance monotonic neigbourhood (Δ_{ Ck}^{mp}), or any else. The neigbourhood definition is determined by a method used to found the optimal cut and by a textural property.
In case of using more general p-distance monotonic neighbourhood (^{p}) the distance is defined in a similar manner with the possibility of greater step in the non-dominant direction.
The optimal cut total error is defined as sum of all pixel errors from which the cut consist. For the cut lines if is defined as:
Here, let’s overlook the error lattice and optimal cut relation (see Figure 3.6, pixel error are illustrated as darker area, optimal cut are marked by yellow and violet lines. Note their the shape and temporal behaviour especially in the I_{d} areas.). For any given error lattice I_{} and some series start point [i_{0},j_{0}] one of dimension values is defined as 0 where the another can take any valid value from I_{}. For any start point s = [0,j_{0}] or s = [i_{0}, 0] the sub-optimal cut exist. Obliously, there exist ⌊I_{H}⌋_{v} and ⌊I_{V }⌋_{h} possible starting points. Denote the cut lines shat starts (or even contains) aby given point s as C^{{s}}. Hence the spatial cut C_{}^{{s}} is (in at least in one dimension) continuous line there exist at least ⌊I_{H}⌋_{v} and ⌊I_{V }⌋_{h} sub-optimal cuts that divide the lattice. The minimal spatial cut search then consists from minimizing the cuts error values and searching from appropriate optimal start points:
Subsequently, the optimal spatial error cuts searching consist of finding the minima error cuts than contains the start points s^{*} and t^{*}:
There are many ways to find particular optimal cuts specified as above. The alternative Kwatra approaches can be used, the graph algorithm as Dijkstra or A* for efficient cut search. Furthermore, the dynamic programming as more general and robust solution can be utilised.
Dynamic Programming One of many ways that are possible for an optimal toroidal cut search is to construct integrated error values arranged structures according to the selected adjacency nodal points (pixels). In cases of dynamic programming-like rules, the desired optimal cut is restricted to be monotonous in the way of dominant spatial dimensions for each horizontal and vertical case accordingly. For every horizontal and vertical cut search in the proper field I_{h} and I_{v} the lattice is taken as an oriented level graph in the direction of main spatial dimension. Each column/row for the horizontal/vertical case is considered as a single level of the graph structure, and the optimal cut is searched from the bottom to top level of a graph, and the appropriate start/end point has defined with the same row/column indices. The dynamic programming approach is limited to find the monotonous series only regardless of error definition. Naturally, this strictly limits the possibilities of using the neighborhood definition. The 8-neighbourhood (^{8}) with limiting to be monotonic one in natural pick there. Note that due to restricting the connectivity is then defined as:
(3.34) |
in the vertical case and similarly in the horizontal dimension.
In case of using more general p-distance monotonic neigbourhood (_{}^{p}) the distance is defined in similar manner with possibility of greater step in the non-dominant direction. For a given p-distance is vertical connectivity defined as:
(3.35) |
The horizontal and vertical cut search is then defined for Δ_{+}^{8} by the according rules:
and for _{}^{p} according to (3.35) as:
Like usually in the dynamic programming approaches the error lattice is filled from bottom to top (right to left) and traversed in the reverse direction. For lattice error composition and thus any given key start point p the values in the same row / column is set to ∞. The recursive level per level error rule application that starts from the appropriate row/column fill the whole lattice by the corresponding values ε_{r}^{h+} / ε_{ r}^{v+}. The optimal cut is then the minimal error series fun in the reversed main dimension direction.
Dijkstra Algorithm The search for optimal cuts for both the horizontal and vertical edge is equivalent to find minimum cost path in two-dimensional lattice G from the bottom to top row and similarly from left to right column.
The Graph G represent the error lattice. The edge relation is defined by the used neighbourhood relation. The full Δ^{8} symmetric neighbourhood is the most general case which leads to non-monotonic cut lines while the Δ_{+}^{8} or _{}^{p} can create monotonic cut lines as the dynamic programming approach. The given node weights as defined as:
(3.40) |
The weight for every node is computed as _{r}^{H} (3.24) and _{ r}^{V } (3.25) with _{ r}^{H} (3.28) and _{ r}^{V } (3.29) from the whole ζ area. The cut lines are then a result of a widely recognized Dijkstra algorithm.
Similarly, the optimal cuts C^{*} are found by multiple algorithm evaluations from the corresponding start and end points and final minimization. Note that in the case of symmetric relation some sub-optimal cut lines C_{}^{{sj}} and C_{}^{{si}} can be identical.
A* Algorithm If the resulted spatial cuts are desired to be a non-monotonical an A* algorithm can be used. The advantage of the pathfinding algorithm is apparently due to its possibility to evaluate distance to the desired final point with defining the heuristics to the final point as a L^{2} distance from actual to the final point. Even if the tiles produces by an A* algorithm can be very effective, the problematic use of the algorithm results in a spatiotemporal domain event and its hard restrictions on assembling planes lads to its non-use.
Double toroid-shaped tile The subspace of a texture that is generated as an intersection of the space between horizontal and vertical cuts is called a tile. Even if the searching algorithm can vary, the tile has the same toroidal property. With a term tile, the found shape is called, contrary to term patch which is used to particular data including shift if more patches are found in a given dynamic texture. The rectangular tile is a subspace of a whole dynamic texture or more precisely:
(3.41) |
in which the I_{h}, I_{v} and I_{d} was detected. In the case that cuts are monotonic-series the spatial patch can be easily defined as union of the space between two vertical cuts and two horizontal cuts:
(3.42) |
The located single frame double toroid-shaped tile found is propagated along the optical flow to subsequent frames to model the movement of the textural structures. The dense optical flow is used to represent a dynamic texture global dynamics. The propagated double toroid-shaped tile in every frame serves as the initialisation for local modification of such a double toroid-shaped tile to each frame-specific final shape.
The toroidal animated tile (and subsequentially all its derivate) is a three-dimensional shape where the bottom and down planes have the same shape as well as the right and left plane. The front and back plane are consist of double toroid-shaped tile I_{T } (eq. 3.2.2).
The overlap area in which the tile is computed (see Figure 3.7 for illustration, white arrows illustrates the tile movement) is moved by a local Φ_{} values to generate rectangular subspace I_{rt}. An optimal location for the triple toroid-shaped tile can be then found in the similar way that the double-toroid shaped tile
Temporal and dynamic dimension
As the animated double toroid-shaped tile (and all derived constructs) and patch are subspace that posses more than only two spatial dimensions, the restriction of only one actual frame that was used in the previous section is no longer valid.
Moreover the multi-index r has from now five components r = [r_{1},r_{2},r_{3},r_{4},r_{5}], where the components are row, column, spectral, frame and dynamics indices, respectively. The r_{5} dimension is not stored in original data but computed. So exactly it is a property of function of a given r, but hence the optical flow Φ of the whole is usually precomputed and assigned to appropriate locations. Note that as the r_{5} is optical flow vector in the three-dimensional space, it has of course three values x,y and t respectively.
Hence on now the r and all appropriate computing have more dimensions, the metrics between two corresponding overlapping pixels has to be updated. Moreover, the last dynamic dimension is three-vector. The simple difference of its components is not satisfying. As a r_{5} is an optical flow vector the important properties are the vector size and their direction.
The rectangular subspace of a in which the is searched for is called a rectangular tile I_{rt}:
(3.51) |
Even if the searching algorithm can vary, the tile has the same toroidal property. The animated (or dynamic) tile is a subspace of a whole dynamic texture . The tile consists of temporal-horizontal (^{h*}) and temporal-vertical (^{v*}) spatiotemporal union with the exception of corner duplicate parts. The appropriate size of spatiotemporal cut planes are:
Mention, that for more fluently writing the temporal-horizontal plane and temporal-vertical spatiotemporal will be called as horizontal and vertical if the context is clear. The spatioteporal is now defined as combination of the space between the horizontal and vertical spatiotemporal cut planes:
(3.54) |
The most simplest spatiotemporal and can be created as set of and with increasing temporal indices.
The combination of a dynamics represented by a optical flow Φ_{} from every previous _{(t-1)} can create an animated patch. The animated patch is then similarly defined as:
(3.55) |
where i is from 1 to T and Φ_{i-1}^{x} and Φ_{ i-1}^{y} are horizontal and vertical parts of an average optical flow computed on a appropriate and animated tile is defined in the similar manner. The resulting animated tile then replicate the texture movement but with boundary given from only one (first) frame.
As the set of double toroid-shaped tiles of the same shape is called the animated tile, the tiles fitted to local dynamics are called a dynamic double toroid-shaped animated tile.
Hence the dynamic approach is chosen to double-toroid tile search the corresponding process is expended to the temporal domain with some changes that allows adapting to local and global optical flow and structural changes in the dynamic texture throughout time.
Theoretically, the optimal triple toroid-shape tile can be found by propagating the optimal double toroid-shaped tile from _{[r4=0]} (0-th frame) to the rest of .
and subsequently searching for the minimal error spatiotemporal cut Ĉ_{y}^{h*} and simultaneously searching for the minimal error spatiotemporal cut Ĉ_{x}^{v*}, where x and y values denote the order of starting point from optimal cuts Ĉ^{h*} and Ĉ^{v*}. The desired cut planes ^{h*} and ^{v*} is then union of all Ĉ_{y}^{h*} and Ĉ_{ x}^{v*} respectivelly.
Unfortunately, this approach cannot lead to a visually satisfying solution. See this on an example of ^{h*}: The founded local (in the manner of one column/row) optimal spatiotemporal cuts Ĉ_{y}^{h*} and Ĉ_{ y±1}^{h*} has no relation and can be very different. Even if the spatiotemporal cut local changes are restricted to ±1 due to (3.57) and (3.56) this property are not shared in the vertical dimension and values of _{x,y}^{h*} and _{ x,y±1}^{h*} can strongly vary.
To ensure that the property will be consistent in both spatial dimensions the Ĉ_{y}^{h*} and Ĉ_{ x}^{v*} search has to be enhanced to keep the similar property in the primary dimension and in the secondary dimension too. In another word, the Ĉ should be computed with respect to adjacent cuts.
Iterative spatiotemporal cut searching The simplest way to restrict the relation between two adjacent spatiotemporal cuts is iterative computation of Ĉ_{y}^{h*} and Ĉ_{ x}^{v*} on base of previously found Ĉ_{ x-1}^{v*} and Ĉ_{ y-1}^{h*} respectively. The new spatiotemporal cut is then limited by previously found adjacent cut by its neighbourhood:
where ^{-}Ĉ denotes previously computed spatiotemporal cut and _{ r}^{v}, _{ r}^{h}, _{ r}^{ht} and _{ r}^{vt} are defined as spatial and spatiotemporal adjacent sets derived by used Δ neighbourhood, respectively. In the simplest case it consist of:
(3.60) |
and _{r}^{h} and _{ r}^{ht} is defined in similar manner.
The iterative spatiotemporal cut searching can be enhanced by using of different Δ, namely ^{p}. The used neighbourhood is not limited to be the same as in double toroidal-shaped tile search but should be consist. Note that using of general neighbourhood obviously lead to a general spatiotemporal plane and consequently higher computational time.
The iterative approach, of course, produces two solutions for every spatiotemporal plane search. The ^{↑}^{} and ^{↓}^{} according to chosen order of iterative process. Moreover, another problem occurs. Hence the tile borders are toroidal the repeated iterative finding of Ĉ_{Cyh}^{h*} lead after N - 1 steps to Ĉ_{Cy-1h}^{h*}. Again, the problem is possibility inconsistency between this two cuts. The solution is excluding some space and thus limiting the possible solutions from the first cut in reverse order by repeating the neighbourhood driven δ relation.
Restrictive Spatiotemporal Cut Searching
The drawback of an iterative solution process can by bypassed by two-steps optical flow driven tile animating process. The animated tile serves as the initialisation to define the rough area in which the exact dynamic spatiotemporal tile can be found (see Figure 3.8, note the areas marked around the already founded (sub)optimal cut).
At first an (dense) optical flow Φ in the whole space is detected. The array with values of an optical flow vectors of whole dynamic texture noted is as Φ_{I} and values of the optical flow vectors from corresponding pixel Y _{r} as Φ_{r}. The average optical flow of some area, namely the patch , is noted as Φ_{}
The rough solution is generated by combining the Φ_{} with the boundaries frame by frame (Sec. 3.3.1). The resulting animated tile then replicate the texture movement but with boundary given from only one (first) frame. This animated tile represents two of the three main scene features. The global dynamics (scene or large objects movement, e.g., whole trees) is represented by the optical flow and the texture itself by the double toroid-shaped tile. However, local dynamic (like leaves or grass blades shivering) cannot be undoubtedly sufficiently represented this way.
The local dynamics problem can be solved by the spatiotemporal plane fitting. The animated tile then can be used as initialisation to plane fitting. The combination of the restricted neighbourhood with the animated tile generates an extended animated tile subspace area (see Figure 3.9) in which the optimal spatiotemporal cuts (3.56) and (3.57) can be found without violating the consistency of and the spatiotemporal cutting planes.
The extended animated tile subspace area remarkably speeds up the computation time needed to find the optimal tile shape in comparison to search the whole textural space. Moreover, this approach is beneficial also in avoiding some typical DT modeling problems like artifacts introduced by extremely varying speed or problems with highly dynamic texture, e.g., fire or water details.
_{r}^{v*} | = ε_{ r}^{v+} + min | _{ [i=r2]v*+[0,-pv,∙,∙,-1]}^{v*} , … , | |||||
… , _{ [i=r2]v*+[0,0,∙,∙,-1]}^{v*} , … , _{ [i=r2]v*+[pv,0,∙,∙,-1]}^{v*} | (3.61) |
_{r}^{h*} | = ε_{ r}^{h+} + min | _{ [j=r1]h*+[-ph,0,∙,∙,-1]}^{v*} , … , | |||||
… , _{ [j=r1]h*+[0,0,∙,∙,-1]}^{h*} , … , _{ [j=r1]h*+[ph,0,∙,∙,-1]}^{h*} | (3.62) |
where _{[i=r2]}^{v*} denotes the r_{ 2}-th value of spatiotemporal vertical cut plane and _{[j=r1]}^{h*} denotes the r_{1}-th value of spatiotemporal horizontal cut plane and p = denotes the size of the boundary area (see Figure 3.9). The vector p is computed as average dense optical flow for the whole in the previous frame.
The resulting tile, whether created by an iterative or non-iterative method (restrictive approach is used in this dissertation thesis), respect all three dynamic texture properties. The global dynamics like whole scene movement is handled by tile animating and ensures the focusing of tile to the same structural part of a texture. The optimal spatial double toroid-shaped tile from the first frame roughly fit the exact spatial tile location. Despite the dynamics of the dynamic texture process through time the homogeneity of a dynamic texture keep this rough solution valid. The local structural changes, which are possible in the dynamic texture, are then handled by recomputing the exact tile shape by an iterative or restrictive method. This allows adapting the dynamic tile recomputing process to local dynamics which can vary throughout the time.
To disrupt visible periodicity of the synthetic texture generated using single double toroid-shaped tile, multiple mutually interchangeable tiles can be used. At first, a single, double toroidal tile of optimal size is found. Then multiple tiles are found as the most fitting patch with the same used tile or recomputed consensus tile. To allow efficient texture synthesis particularly from the visual quality point of view, it is necessary to find appropriate size of an optimal tile. Due to the assumption of homogeneity for the whole texture the computed tile size is optimal for the whole texture.
The desired patches _{{0,…,np}} are then searched by minimizing the overlap error on shifted tile _{r+δ} with three dimensional δ shift (see the illustration in the Figure 3.10, where three tile locations with three generated patch are illustrated). Emphasize that the overlap area only has to be taken into account as the inner part of texture (non-colored areas in Figure 3.10) can vary.
The final is then searched on the composite error array created as a sum of all particular errors. Apparently with the growing tiles number the particular total cut-planes error grows as the tiles are results of a compromise between more patches. Another approach is to find an optimal triple toroidal tile and the next tile search as minimizing the error between original _{i} and shifted _{r+δ}. Thus one optimal (primary) and some suboptimal (secondary) tiles are used.
Note that the satisfying number of patches (n_{p}) is usually low (see next chapter). The resulting tiles are sufficiently visually effective in the number of two, or three.
Fourier Transformation Tile Size Detection
Instead of the overlap error minimizing the Fourier transformation (illustrated by Figure 3.11 as block of proposed method scheme 3.18) can be utilized to determine an ideal size. The optimal tile size created by and dominant texture spatial frequency ensures that the patch will contain the sufficiently representative minimal information tor the texture synthesis.
Hence the frequency property of a dynamic texture is the same (or very similar) thought time due to DTs temporal homogeneity, the dominant frequency is computed on first dynamic texture frame (the average from whole DT can be computed, but the test proposed to suggest is it is not necessary).
Now, let’s assume color input texture t-th frame
where [r_{4} = t] notations is used to denote the same, but more lengthy _{[∙,∙,∙,t,∙]} notation. The frame has the size M × N × S. The S spectral planes are computed separately and the results are next combined.
Three mono-spectral texture samples Y is subsequently transformed with the aid of discrete 2D Fourier transform from the spatial domain to its frequency domain :
(3.63) |
Next, an amplitude spectrum A(u,v) is calculated from the texture representation in the frequency domain:
The result is then transformed to centered representation yielding in A(u,v), u ∈, v ∈. Because an image is the real function of two real variables defined on a finite domain, amplitude spectrum A(u,v) is symmetrical about the origin A(0, 0). Hereafter it is of course sufficient to consider only u ∈ and v ∈.
The dominant spatial freuency can be founde as maximum from amplitude spectrum A(u,v) with coefficient A_{m}:
Of course the central strongest frequency ( A(0, 0)) are excluded from the process. Coefficient A(0, 0) is image mean of , and it is not used. In order to create more optimal double toroidal tiles, it is necessary to detect dominant frequency which is repeated in in both spatial directions at least twice. It follows constraints on u and v in equation (3.66). Integer indices f_{r} and f_{c} give information how many times the dominant frequency is repeated in vertical and horizontal direction, respectively. If f_{r} < 0, then it is set to f_{r} = |f_{r}|.
In case that f_{r} = 0 or f_{r} = 1, detected dominant frequency f_{c} is aligned with horizontal direction and it is necessary to determine dominant frequency in vertical direction. Frequency f_{r} is determined according to the largest coefficient A_{m}^{r}:
In case that f_{c} = 0 or f_{c} = 1, detected dominant frequency f_{r} is aligned with vertical direction and therefore dominant frequency f_{c} in horizontal direction is determined according to the largest coefficient A_{m}^{c}:
Estimated parameters f_{r} and f_{c} are used for the following precise determination of double toroid-shaped tile size.
The dynamic double toroid-shaped tile can be used to synthesize infinitely large output in spatial dimensions only. The temporal transition which can create the toroidal cut plane in the already founded dynamic double toroid-shaped patch allows to seamlessly repeat the patch and create an infinitely large sequence in the temporal meaning.
The resulting tile that consists of six spatiotemporal cut planes is called a Triple Toroidal-Tile. Firstly let’s suppose only one temporal transition is used.
Similar to spatial overlap the temporal overlap has to be computed to determine the area in which the temporal cut can be searched for. TThis can be easily done as computing the all-to-all frames distance. This well-known principle is used e.g. in [?] and many other approaches with distance ϵ as defined in eq. (3.43):
(3.69) |
where t and u are unknown temporal coordinate of the first of most similar frames, the l_{t} is a minimal distance between the searched frames. The l_{t} of course depends on the source data to keep the consistent frequency property the l_{t} is set as:
(3.70) |
The temporal overlap is then defined in the same manner as spatial overlaps:
(3.71) |
The creating of the temporal cut which seamlessly loop the tile can be done e.g. by the graph-cut algorithm[?] of by a frame-to-frame loop[?] with using of similarity matrix(Figure 3.12) where every x - y location is representation of x and y frames similarity.
The computing of triple toroid-shaped tiles from the dynamic double toroid-shaped tile is not trivial. Toroidal properties of the sample must be maintained. On the Figure 3.13 this problem is illustrated: The temporal and any spatial dimensions are represented by the horizontal and vertical dimension respectively. First, two rows show the same texture with the cut in it; black line denotes the most similar patch frame u and t founded by eq. (3.70). The second row show textures synthesized in the temporal dimension. The third row presents synthesis in temporal and spatial dimensions. The red dotted line shows the location where the spatial continuity is violated. It can be easily seen, that if the temporal cut is not double toroidal itself the border violates the spatial homogeneity.
The cost of whole (permissible) temporal cut is minimized (see 3.69) so as to preserve the spatial properties of the toroid-shaped patch (maximal change of t between adjacent values of I_{x,y,t} and I_{x+i,y+j,t} is c_{t})
(3.72) |
or similarly as:
(3.73) |
where Ĉ^{Ty+} and Ĉ^{Tx+} represent horizontal and vertical cut lines and ^{Tx} their appropriate total error. ^{T+} then denotes whole temporal cut plane. Now lets look to ways how to find an optimal ^{T+}:
It is oblivious that the most similar temporal cut consists from the simple plane (i.e. double toroid-shaped tile). This solution is clearly not satisfying as the dynamic texture usually does not consist sets of the same frames. The set-of-cuts method is based on the similarity of not only one-pixel value but whole frame areas. However, rather than finding a single transition time for the whole image one way to determine the temporal cut is to find whole similar areas for the transition.
The overlaying frames are iteratively cut with minimal error cutlines to two halves (See Figure 3.14). Then the transition error of both halves is evaluated and compared. The final frame then consists of one half from frame t and another from frame t + ⌊I^{T }⌋_{ t}.
Next cuts are computed iteratively on both planes (second row in Figure 3.14 for the second iteration and so on.). The cutlines are computed until the areas consist of at least N pixels.
Of course, this approach can lead to the greedy-like algorithm. Optimal set-of-cuts can be easily found by not determining the next state but recursive applying the filling rules. Every cutline then create two possible states consisting from two halfplanes A and B which are filled by a frames t and t + ⌊I^{T }⌋ or t + ⌊I^{T }⌋ and t respectively. It is easy to see that the number of possible states is limited to 2^{2}^{i-1} for i > 0 where i is an iteration. Fortunately, the method reaches final state very quickly in less than five iterations. The all possible set-of-cuts (4-tree of cuts) is then compared by their transition error.
This result is very similar to determining the time transition per-pixel basis, but local spatial properties of the texture are kept. By this, the infinitely long sequence of a texture can be found, and because patch satisfies the global texture dynamic, this simple approach can work with it. Every spatial cut line Ĉ^{t+} are computed on the appropriate error arrays composed of error between two overlayed frames. The error ϵ^{t} for every pixel is defined as eq. (3.43). The error array is computed exactelly like in the dynamic toroidal shape tile as ϵ^{h+} (3.38) and ϵ^{v+} (3.39) as the set-of-cut approach is focused to spatial areas. The set of cuts then consist from set of C_{h0}^{*} computed on the given areas.
The first cut is computed on the error array as eq. (3.74):
and two adjacent cuts on the upperhalfplane and bottom halfplane:
Then the set-of-cuts is searched by minimizing all particular errors:
where every cut line is found as its own subtree minimization eq. (3.74). The index set I_{[Ca,Cb]}^{t} denotes area between two cuts C_{a} and C_{b} in the frame t. The t+ denotes next frame and t- previous frame. The _{v}^{*} is defined in the similar manner.
The problem with this approach can be viewed on Figure 3.13. This can be easy overcame by enforcing the start and end coordinates (in temporal dimension) to be the same similarly as in the double toroidal tile. Moreover, the highest and the lowest areas must be continuous (so it must consist of the same frame ± 1). It can be resolved by creating a tree of all potential cuts and then traversing them to minimize this error. This approach is effective in many textures, and because of every cut is created by the single way in one frame it can fit the objects in the texture very well.
The double-tree structure for ^{T }_{ h}^{*} can be defined as:
where every cut line Ĉ has up to four (L1, R1, L2, R2) descend from next iteration. And appropriate _{h}^{*} is created by filling the areas between upper and bottom plane with corresponding t values. The errors and cut lines structure (and thus cut plane) for the vertically oriented set is defined in the exact same manner.
Note that apparently the two optimal set-of-cuts can be found - ones with dominant horizontal dimension (like Figure 3.14 and ones with dominant vertical dimension. Obviously, both version can be computed with the one with lowest transition error can be picked.
Interestingly this the set-of-cuts approach is well suited for dynamic textures with very low dynamics (i.e. mostly static scene) and for dynamic textures with extremely high dynamics like (e.g. FIRE texture).
In some cases, the previous approaches for the temporal cut are not satisfying. Every single cut is computed in one frame and derived by spatial data, but in fact, it drives the temporal offset. If the temporal dynamic of the texture is significantly greater than the local spatial dynamic, the temporal cut plane can be computed differently. Like in set-of-cuts the problem is to create a cut with no boundary spatial errors. This can be done by using a similar procedure like in propagating the double toroidal tile though temporal dimension. This process depends on the direction (top-down or left-right), but in most cases, the results are nearly the same.
The finding the plane by the second way is better to the texture with lesser local spatial dynamics (e.g. leaves or grass in string wind, flowers or leaves). As can be seen in Figure 3.15 the planes generated by this approach is more general, but satisfy the toroidal boundary rules and are double toroidal like a spatial patch (this can be easily done to setting error cost of ending unwanted coordinates to infinity before computing the structure).
The three dimensional error array are defined in the exact same manner as eqs. (3.62) and (3.61):
where the temporal-horizontal and temporal-vertical error arrays are defined as:
The desired ^{T } plane can be found by iterative spatiotemporal cut search (Sec. 3.3.2) and/or restrictive spatiotemporal cut search (Sec. 3.3.2).
Similar problem to the visually repetitive pattern which is solved by using patches can occur in the temporal dimension. The short sequence of a dynamic texture can be easily recognised and degrade the visual impression. As in the spatial dimension, even in the temporal dimension, such an impression is undesirable.
The approach that can solve this is simple: to find more than one temporal cut-plane in the dynamic double animated toroidal-shape (see Figure 3.16). Another temporal cut-planes are found inside the founded triple toroidal tile by the similar process as the first one and similarly a case of finding more suboptimal tiles. Note that in the case of more temporal cut-planes only primary-secondary approach is used. The cut planes founded by fitting to more (at least three total transition error - Figure 3.16) are not visually satisfying.
For the determining of more temporal jumps two approaches could be utilized: The every next cut-plane could be founded by minimizing of the transition error to the already existing cuts (see Figure 3.16). By this way only the same shaped cut-planes shifted temporally can be found. The second approach is to step back to similarity metric and found the second similar frames and compute whole cut-planes again in the different time overlap. Note that the triple toroidal patch should be extended this way (see Figure 3.17) if one (or both) cut-planes are found outside already founded triple toroid-shaped tile but inside the dynamic animated double toroid-shaped tile (illustrated by the whole rectangular area in both figures).
Note that the defined error distance based on color and dynamics solves the well-known problems with periodical movements (e.q. pendulum) in which the image structure can be identical, but the dynamics differ.
The synthesis part is entirely separated from the analytical part and thus is is extremely fast. Texture analysis part, which is inspired by the idea of optimized but straightforward repetition of the input texture sample, consists in the determination of parameters δ_{h} and δ_{v} controlling the overlaps and subsequent minimal error boundary cut based edge optimisation. The output of the analytical part is one or more triple toroidal patch and hence the toroidal shape they can be easily put abreast without any additional computation.
As follows from the analytical part an idea, the synthesis of any texture of required size is an efficient repetition of the created triple toroid-shaped texture patch in all directions until the texture of any required size is generated.
The synthesis of any size textures is merely the repetition of the created triple toroidal patches in all dimensions until the desired size is fulfilled. The only computation in synthesis is to generate once a random sequence of patches labels and next computing the offset of pixels, which can be implemented in real time just from to the actual to the next frame.
The generated lattice of labels are called arrangement and can be user-driven od randomly generated. The process of generating arrangement is then called tiling. The overall method flowchart is in the Figure 3.18.
This chapter describes a set of specific synthesis algorithm enhancements that serve to increase the limited number of results and, above all, to consider their visual quality of creating mix-of-DTs (Figure 4.1). The pure texture synthesis methods can produce results with visually recognizable artifacts and patterns as recapitulated in the introductions section 4.1 followed with an overview of presented editing methods which simultaneously allows to spatially and temporarily enlarge the original dynamic texture(s) can be found in the section 4.2. Finally, the mix-of-textures principles and transition textures and temporal shift utilizing are described in the section 4.3.
Natural dynamic texture modelling is a very challenging and difficult task, due to an unlimited variety of possible materials, illumination, and viewing conditions, simultaneously complicated by the strong discriminative functionality of the human visual system. Visual texture modelling is the critical part of any computer-based visualization application because whatever size is the measured texture, it is always inadequate and requires its enlargement to cover the required visualized object’s surface area. The available material sample size is either too small for rendering complex and large virtual scenes. The amount of such measured data similarly to dynamic textures is immense, e.g., in the range of terabytes even for spatiotemporally restricted measurements. Moreover, the textural enlargement may not be sufficient to cover some complex scene. Texture editing enables to create photo-realistic synthetic dynamic textures, which are either difficult (or unable) to measure or which even do not exist in nature. The editing principles allow synthesizing result composed from from more than only one input dynamic texture . The carefully selected with one consistent or even more different shapes are then put together to create a desired result.
The resulting edited dynamic texture is, therefore, a mixture of several color dynamic textures that realistically match the given color textures appearance and respects their original dynamics. The presented method simultaneously allows to spatially and temporarily enlarge the original dynamic textures to fill any required four-dimensional dynamic texture space.
Dynamic texture editing and its results allow to reach huge compression ratio if numerous DTs can be constructed from several bases small DT samples, or to study the relationship between model parameters and their impact on visual DT appearance, etc. The common visual problem repetition in the spatial and temporal dimension is suppressed by not only using more samples and their random placement in the synthesized texture but with temporal deformation of the texture dynamics too.
The realistic appearance of the dynamic textures mix requires to edit the patch color space and to find transition patches which consist of more than one type of the texture. These border patches are found similarly to the multi-texture analysis patch step. Again, all time-consuming processing such as the finding of optimal spatio-temporal triple toroidal patches, are done in the analytical step only. The analytical and synthesis step are still separated. Therefore the synthesis of the edited and enlarged texture can be done very efficiently by applying the simple set of repeating rules to fill the three-dimensional arrangement lattice of labels.
The presented editing method is based on using several triple toroid-shaped tiles (Sec. 3.4). Moreover, with patches from more than one DT types, the number of possible outputs is significantly higher. Few triple toroid-shaped tiles can create numerous noticeably larger synthesized DTs with similar stochastic properties. By using of toroidal-shaped tile and fitting them by optical flow, the synthesized texture can be recreated from various input textures to generate many arrangements of output textures or even mix-of-textures.
The repetition of the spatial and temporal dimension is suppressed both by using overlapping samples and their random placement in the synthesized texture and temporal deformation of the texture dynamics. The method can create a large scale of dynamic texture types and create results with high-quality, visually satisfying realistic appearance.
Editing method Presented approach allows DT editing in several ways. The system of toroidal samples allows to apply the textural operators to tone colors, change dynamics or differs texture itself without the possibility to affect the homogeneity of the whole dynamic texture. For example to change the dynamics of one object in the dynamic texture usually requires area segmentation which must be consistent in all frames, detection of a given object and finally changing its dynamics.
Hence the editing methods take more than one input dynamic textures to distinguish them one of them is called primary, and rest are called secondary. The primary dynamic texture is used to compute (multiple) optimal triple toroidal tile while other textures only serve as a patch source for the same shaped-tile textural data (Figure 4.2). Whereas the quality of a tile decrease with the number of patches. Using more dynamic textures for finding the optimal global tile (tile optimal for more than one DT) has the similar effect. Even if in some cases the optimal global tile can be found, usually the primary-secondary approach has more satisfying results.
Presented approach obviously allows to skip the segmentation step since the texture is divided into logically consistent patches (given by spatial similarity and similar dynamics) from previous analysis steps so the changes made in one patch will not affect other areas.
Elementary, but not trivial type of texture editing is to synthesize texture consisting of multiple input textures. The illustrative example of a resulting arrangement can be seen in figure 4.3. The result is created from from more input dynamic textures (different colors in Figure 4.3) with the same shape. The resulting texture is, therefore, a mix of textures, which can vary in two of the three fundamental properties of DT - the texture itself (this includes changes in lighting, but also the other conditions - detail, different viewing angle, etc.) and local dynamics (intensity and direction of wind, rain, etc.).
Of course, the resulting combination of given inputs should be meaningful. For example, the combination of DT types WATER and CLOUDS textures can satisfy some trivial texture similarity, but of course are preposterous in most cases (except horizon and similar arrangement). The example of a mix between River and Shrubs which can be on first look correctly synthesized is in the Figure 4.4. Therefore, the rest of the text assumes that this requirement is met.
Lets look at the possible amount of synthesized results. It is obvious that the maximal number of possibly synthesized textures is:
(4.1) |
if we assume a constant distribution of patches in input textures. N_{in_tex} is number of input textures, N_{p_per_tex} is average numbers of patches per input texture. The denotes the resulted synthesized dynamic texture. Therefore ⌊⌋_{w} and ⌊⌋_{h} are the vertical and horizontal size of the synthesized texture respectively, and ⌊⌋_{h} and ⌊⌋_{v} are the horizontal and vertical size of the used optimal , respectively.
More generally and because we often use the different number of patches in every texture (to ensure the better visual appearance of synthesized textures) the (4.1) can be simplified to:
(4.2) |
where first product in (4.1) becomes to N_{patches} as the number of all patches found by analysis part of the algorithm.
The synthesis of the mix-of-DTs consists just from finding the same triple toroid-shaped tiles or - more accurately - by finding one (or more) optimal triple toroid-shaped tiles in one primary DT and finding the minimal cost of placing this shape to other textures by finding the optimal time-spatial shift δ for the optimal tile. If there are many input textures the computational time can be reduced by using texture pyramid, testing only local optical flow, or testing global optical flow first (respectively in backward order).
Using several input textures can cause the problem (see Figure 4.4) with slightly different color tone or illumination of particular input DTs. This problem can be overcomed, i.e., by tone mapping [?, ?, ?], by tone operators[?] and many others.
The color tone of patches can be fitted to the color tone of the selected input texture or a consensus of all textures. There are many advanced algorithms and approached for color tone mapping problem [?] which can be used for mapping color tone property of one texture to another. Again, proposed solution enables to mapping color property of only used patches and not the whole texture. Here straightforward but satisfying approach - fitting average H, S, L values of one texture to another by per pixel multiplication. This simple approach creates good results in the most of cases (see Figure 4.5) of synthesized DTs, but any advance method, i.e. Durand[?] method is of course possible. Of course, this straightforward approach works well primary on the DT with the same type and DT that are visually similar. The advanced tone reproducing operators like Ward[?] or Tumblin[?]are required (see Durand[?] for comprehensive overview).
With varying color tones the number of possibly synthesized textures obviously then grows to:
(4.3) |
where N_{in_tex}^{*} is a number of input textures (color tones) to which the synthesized result can be mapped. In this automatic approach, we assume that the input DTs are selected manually and are visually similar and do not differ drastically in color (i.e. grass and clouds). The similar dynamic texture structure is therefore needed.
Figure 4.5 illustrates the method. An optimal shape is computed in the primary texture and found by a spatio-temporal shift in the secondary texture. Colortone operators are extracted from all input textures. In the synthesis step, the color tone operators of textures are fitted to one input texture to create a similar color appearance.
This trivial approach can handle the camera automatic color balance (as in the Irises), different illumination or similar problems and increase the credibility of the result. Of course, as the color tone operator are vast and popular research area the developing of an advanced dynamic texture color tone operators is beyond the scope of this work and is part of one of the possible follow-up of this work.
Even if the triple dynamic toroid-shaped tile approach can successfully handle the mix-of-DT (see Figure 4.5) only by trivial color tone matching, in many cases it is insufficient. Sometimes dynamic textures strongly vary in both structure and dynamics, and patches that satisfy the similarity cannot be found.
The solution to this problem is based on using transition textures. We assume the existence of the transition (border) between types of DT but also the lack of one (or both) texture itself. Figure 4.6 illustrate basics method idea with subfigures consisting from two DTs (labelled non-mutually).
The precomputation of a transition patches (Figure 4.7, right) may be difficult to obtain, but once (and if) they are computed, the synthesis and analysis are fully separated, and thus thy creation of any arrangement is extremely quick. This is called as an offline method from now. The second approach consists of transition patch founding by the process of establishing every arrangement. Hence the transition patches are created on the flow the method is called as online from now.
The offline method is, of course, a variant of a well-known marching cubes[?] algorithm. The basics 2D examples (subspace form 16 needed cases) can be seen on the Figure 4.7. The drawback of this method is clear - the high requirements on the input data. Note that usually, it is not possible to find and precompute all possible cases even in the spatial dimensions only, but some consistent subspaces are possible.
In the transition patches approach the optimal is computed in the primary texture only and by appropriate cut-plane errors minimizing found by a spatio-temporal shift in the secondary textures. Finally, when the arrangement is filled, the δ^{*} to desired patch in the transition texture is searched for.
The difference is composed of difference between at least two of temporal-horizontal (top or bottom), temporal-vertical (left or right) or horizontal-vertical (front and end) cut-planes. The appropriate shift (location of _{t} in the _{t}) are found by error minimization:
where parameters of every ϵ function are transition texture _{T }, an optimal tile _{P }, desired shift δ_{r}, parameter δ_{s} that established if appropriate cut-plane (from transitional texture) is left / bottom / front or right / top / end and finally the primary or secondary texture patch ,
The error is computed with using of pixel error (3.43) between appropriate pixels from both textures. Their location is given by the which has its own location in the primary and secondary textures and desired location in the transition texture. As the empty arrangement location are filled iteratively, some location on the arrangement lattice could be empty. Therefore the difference is computed as:
(4.5) |
or if the arrangement hole is in the straight line:
(4.6) |
where ← denotes results from previous steps and therefore the toroidal knowing possibility of the future patch (as illustrate purple to green and yellow to purple areas in Figure 4.8).
The particular errors are computed as:
where δ_{} is patch location (global indices for the [0, 0, 0] of appropriate I_{th}) in the appropriate texture. ^{th}, ^{tv} and ^{vh} are derived as part of . The ϵ^{tv}, ^{tv}, ϵ^{vh}, and ^{vh} are computed in the exact manner with previously error (3.43).
The random DT arrangement is generated for an online method by iterative growing (erosion) of DT types on a 3D lattice from two or more arbitrary coordinates. Areas of contiguous type grow in the random direction by given defined neighbourhood until it is possible. The areas on the lattice which remain unfilled are then filled by transition patches iteratively for every particular location and adjacent patches.
For each area labeled as a transitional (labeled ’??’ in Fig 4.8) the best fitting patch from border texture is found by minimizing the error between this sample itself and each already completed adjacent sample (the one patch for every location or one patch for every type of surroundings DT). Patches that are computed in the second step are searched for every empty location the best-fit patch is found in the given transition textures. The computed to different is illustrated by violet in Figure 4.8. The yellow marked location (and thus ) and green marked location denotes another cut-planes. One of them (or both if this is last empty space, none is this is first filled space) are computed in the previous step. The optimal patch is then computed by minimizing ϵ* consisting from at least two border error cost to different or even . The distance is computed even for each already transition patch completed as adjacent in the previous step as can be seen on yellow and green marked areas. By this, the second analysis phase is added, but still, the analysis and synthesis step are separated.
Although such type of the source data is difficult to obtain, it is visually sufficient to have even a small data sample. The dominant component that determines the quality of the appearance of the synthesized result tends to be both significant pure DTs.
Mix of textures with border tiling
Again, lets look at the amount of possibly existing results. In transition texture based approach, the number of possibly synthesized textures is limited by:
(4.8) |
where N_{t} denotes the possible number of transition texture combinations which is of course limited by theoretically all combinations on the maximal side and using of only one texture (transition is a line) on the minimal side:
(4.9) |
The ^{i}N_{ res} then denotes all possible result combinations from every DT type area i:
(4.10) |
where ⌊^{i}⌋ is size of the appropriate i-th area and ^{i}_{ ct} possibly color-tones variations.
Many orders in which the right transition patch is, of course, possible, e.g., weighted by a number of already know neighbourhood patches. Recall that the transition patch detection can be in online version seen as the second analysis step which follows the arrangements assembly. Again the final result synthesis follows this second analysis step. Moreover, if the assumption of temporal homogeneity is used, we can presume that optimal temporal overlap can be found in the first analysis part of a whole dynamic texture. Because the temporal jumps detection is the dominating time-consuming part of an algorithm, is evaluating in first analysis step greatly reduce the second step computational time.
The using of more patches from more inputs, even border type is not satisfactory in many cases. The one universal tile is in many cases not capable of representing the various dynamic texture content.
In this cases, the easy but powerful enhancement could be done. Of course, every approach increasing the generality of the method brings its own limitations and limits the fundamental advantages of the method. In this case, it is a crucial toroidal property. If we give up the toroidal properties on a limited number of samples (transition tiles), we can provide a higher possibility of adapting the method to the desired result mix-of-DTs arrangement.
Instead of locating the transition patches with a given toroidal shape the new optimal transition tile is searched for.
The transition tile can be found not only with a given adjacent content but as a transition between two DTs types and two corresponding optimal tiles. The desired transition tiles then consist of (at least) two cuts from two optimal tiles. The last cuts are searched ad hoc in the online variant, or can be computed as toroidal to transition tile itself.
Again, the optimal rectangular tile for the transition tile searching is founded by error minimization. Lets denote the original pure dynamic texture as ^{A} and ^{B} and the dynamic texture in which the transition is searched as ^{T }. The appropriate index I_{ rt}^{T } set for a rectangular tile is then:
The optimal shift δ^{*A} and δ^{*B} are found by minimizing the error cost between at least two cut planes. The fixed tiles cut planes from both pure dynamic textures and all possible locations in the transition texture are searched by spatiotemporal shift. The difference is composed of a difference between at least two of temporal-horizontal (top or bottom), temporal-vertical (left or right) or horizontal-vertical (front and end) cut-planes. The appropriate shift is found by error minimization:
where parameters of every ϵ function are transition texture, pure texture patch and tile, desired shift, and the last parameter established if appropriate cut-plane (from transitional texture) is left / bottom / front or right / top / end. The _{S} and _{S} are driven by appropriate arrangements. Again, the ϵ^{*}_{} and ^{*}_{} are computed as before:
(4.14) |
where ← denotes results from previous steps and therefore the toroidal knowing possibility of the future patch (as illustrate purple to green and yellow to purple areas in Figure 4.8).
The particular error are computed as:
where δ_{} is patch location (global indices for the [0, 0, 0] of appropriate I_{th}) in the appropriate texture and ^{th} is from . The ϵ^{tv}, ^{tv}, ϵ^{vh}, and ^{vh} are computed in the exact manner.
Lets remind here, that a deterministic finite state machine[?] is a quintuple (Σ,S,s_{0},δ,F), where:
with this on the mind, the interactive textures can be defined as a medium with the (nondeterministic) state machine behavior, where every state q ∈ S is represented as appropriate patch in the meaning of temporal persistence. The here represent the whole texture .
Interactive texture The DTs mixing and editing can be done directly in temporal meaning only. The input textures and their corresponding patches can be manually labeled by its global and local dynamics (i.e. strong of wind, the density of rain and so on).
The patches from their labeled textures are then found and labeled so on. The SM states then represent . The transitions δ represents temporal cut:
and/or transition to another :
where the (4.3.3) represent loop and (4.3.3) state change. Note that if more temporal jumps exist, the more loops are defined. Likewise more very similar patches can be labelled as one state with inside transitions or the superpositioned state can be defined. s_{0} is any state, F is empty. The Σ consist of one symbol that drives the state change every T frames. Alternatively, Σ consist of one symbol per every q ∈ S and state changes are driven. In the automatic approaches every δ the state-transition function has its probability p.
For example for Thuja SM in figure 4.11 four states of a dynamic texture is illustrated: heavy rain, start rain, stop rain and no rain. All states except no rain are represented by one patch, while the no rain is represented by two patches no rain (texture-1) and no rain (texture-2). The arrows denotes possible changes in the states of a dynamic texture with corresponding probability p:
Note the p = 1 probabilities handle the nonrepeatable textures. This transition textures has the same meaning like transition textures in the section 4.3, but with strictly temporal meaning. With rule that can chance the inner state of the no rain:
and with complementary rules to ensure a possible repeat of the same texture:
the rules set is complete.
The resulting interactive texture could be played fully automatically with predefined probabilities to vary its appearance. The state changes, even if not strongly vary itself and its nondeterministic repeating strongly enhance the visual quality of the results.
For the nondriven, automatic dynamic texture the patches are organised with the transition from state to state:
(4.18) |
where card(q) denotes the state cardinality.
Local Temporal Shift In order to avoid the typical synthesizing problem - periodicity (usually given by the small amount of input data) - the spatial editing methods like multiple patches, several inputs DTs or a mix-of-DTs are utilised. Besides editing of the spatial dimension to increase results variability, another pure temporal method can be used.
As was discussed before, using only several patch samples leads to evident repetition. The same samples only slightly timeshifted produces different (but still correct) dynamics, which disrupt the overall impression of their periodicity.
Of course, simple random time shift of one part of DT can cause many inconsistencies in the synthesized result. The optimal approach would be a segmentation (consistent in all frames), detection of an object and changing only its dynamics. Our approach allows to skip the segmentation step since the texture is divided into logically consistent parts (given by spatial similarity and similar dynamics) from previous analysis steps.
The time shift Λ should be applied gradually with increasing size with the distance from the appropriate cut plane. The particular gradual λ step should be limited by T to avoid recognizable distortions. Obviously, the size of the area in which the time is gradually changed is:
(4.19) |
If λ < 1, the desired values, of course, does not exist. The desired data could be approximated (which typically leads to blurring) of the nearest existing frame, or a rounded t-th ⌈λ_{t}⌉ one could be used. The Λ and appropriate λ values are chosen as:
(4.20) |
of if the interior patches are supposed to have only consistent time shift as:
This technique strongly increases the quality of results with suppressing the visual periodicity. While the patches remain structurally the same, the different dynamics drastically suppress the visual periodicity. For examples and the detailed view, we refer the results section.
This chapter overview the inpainting problem and capability of our dynamic texture model to deal with it (see Figure 5.1 for results illustration). First, the inpainting problem is addressed in section 5.1 in the relation of this dissertation thesis. Next, in section 5.2, the adaptation of the presented synthesizing and editing model to inpainting problem is explained together with addressing the different condition and circumstances like an inability to obtain exact ground truth and ways to handle it. The section 5.3 consist of core definition of a inpainting model principles. Finally, in the last section, 5.4 is explained how to bypass the need for a mask for inpainting algorithm and cases where it is possible.
As discussed previously in chapter 2, the inpainting and error concealment problem addresses many particular problems with various assumptions and circumstances. The main key property for this problem consists of:
This very basics question could determine the circumstances and method limitations. It is oblivious that in the case of using an artificial mask (like in all Thuja DTs) there is already ground truth information. Regrettably, this not means that the original DT part and newly inpainted data could be easily compared, i.e. directly by pixel-by-pixel distance. As the texture (and subsequently dynamic texture) definition says, the texture as some random field realization could vary in its exact (pixel) values. In other words, the inpainted part of texture can be very different and still have the property of the original texture. This drastically limits the possibilities of comparison or even determination which method is suitable, better, or even satisfyingly working (more at chapter 6). The existence of the original dynamic texture, of course, could be used for visual comparing or psycho-physical test.
The critical role in problem has, of course, the object which is meant to be inpainted. Even if the occluding object naturally occurs in the scene, there are several cases that may more occur. In the first case, the object is moving thought scene, or during the duration of the scene, it exits (or enters). So there is a time momentum for every location in the texture, for which it is unoccluded. Note that the specific amount of time depended on scene dynamics is required and so short sequence even for whole texture is not enough. Moreover, the size of the area to inpaint must be taken into account. The size could be expressed precisely as a percentage of the actually viewed data and to whole dynamic texture (if the object varies in size) or inaccurately with relevance to the property of the inpainted dynamic texture. In the latter case, it is appropriate to determine whether the area to inpaint is substantially smaller than the underlying objects (mostly lower frequencies) from which the dynamic texture is composed, the same size or greater size (i.e. waves on water DT, or leaves/branches in the trees containing DT).
Of course, the primary case of a scene containing more than only one dynamic texture is different from scene consisting of only one. Ideally, the complex scenes could be divided into segments which could be handled separately.
The case with object naturally occurs in the scene can be extended to the case with many objects. The higher number of objects themselves may not be problematic. The problem occurs if we want to keep some of them and inpaint only the others. The case with many moving objects all of which are intended to be removed can be addressed as above. Let’s note that the problem becomes drastically more difficult if the original texture is supposed to preserve.
With the above-defined assumptions, it is possible to define the following cases to be used in this work:
The presented triple toroid-shaped patch model could be easily adapted to inpainting problem by inpaint by synthesis approach. The analysis part of the algorithm that focuses on the dynamic texture which is damaged could determine the property of a damaged texture (i.e. optimal tile size). The input 4D DT’s frequencies and the optimal double toroid-shaped tile are computed by using the Fourier analysis. Additionally, time properties are analysed again by a pyramidal Lucas-Kanade optical flow of the whole 4D and Fourier analysis. Finally, the search for similar patches of the same shape is done together with and computing the optimal temporal jumps.
Synthesizing part of the approach is based on finding small triple toroid-shaped tiles which can be used to create seamlessly fill the of arbitrary size in any spatial and temporal dimension. The exact location and shapes of triple toroid-shaped tiles can be improved by the error mask which is usually given by the user but is not necessary for presented toroidal approach.
Thus the method could be easily modified to synthesise the data to fill the error hole or replace the unwanted object with newly synthesised texture.
The presumption of toroidal tile shape is (in almost all cases) in contradiction to the shape of area to inpaint. This could be easily solved by recomputing the patch in significantly smaller and restricted area. In fact and with using of some assumptions and textural property this does not need to be taken as a disadvantage.
On the other hand, the toroidal approach allows reducing the computational time drastically. Classical intelligent sampling and statistical methods typically try to minimize cost around the three-dimensional spatial area , so the amount of method calculation is therefore dependent on the size of a . This lead to significantly increasing computational cost with the size. Furthermore, as the synthesized area consists from more samples, the inside borders and patch require their own optimization and thus the methods computational time depends on both error hole surface and volume. Compared to that, the toroidal approach computation is only weakly dependent (or complete independent in some cases) on size at all as the computation time is given only by tile size which used multiple tiles. The all possible hole dependent calculation consists only on a one-off calculation of the optical flow value and its comparison and outside border error minimization.
The area to inpaint, both artificial or natural, is denoted as . Usually is visualised as a black or white area in the dynamic texture visualization in the case or artificial mask and as a reddish area in case of removing some objects from the scene (see Figure 5.1). In the particular case of dynamic texture with no mask (as Walk and Daisies) no mask is shown.
The key property is, of course, its size. Due to dynamic texture analysis and size given by the dominant texture frequency, the optimal tile could be smaller or larger than the existing . In the first case, the in texture is smaller than the primary visual part of surrounding dynamic texture. The dominant frequency given optimal tile could be relatively small in case of very stochastic texture (like water, sand, …) but relatively large for texture consists of trees of more complex scenes too. In this case, the set of patches must be used to cover the whole area as illustrated on Figure 5.2. The smaller patch size allows utilizing some degrees of freedom - moving with the patch in both spatial and temporal axis to find the minimal boundary cost. Of course, hence the combining size of used is larger then the and some overlap is created (see Figure 5.2, the ζ^{*} area) In the second case the is significantly larger than the and again, some overlap is created.
Of course, the using only necessary subpart of a used patch is possible (again with possible shifts) but this, even for well-computed data patches, usually leads to visually easy recognizable silhouettes (see Figure 5.6 and accompanying videos) . The final approach is to use I_{ζ} area to find the between the inside similar as interior part of a rectangular tile I_{T } and surrounding dynamic texture. The area and appropriate index sec could be easily approximated by a rectangular area I_{ζ}^{r} = I_{ ζ} \ I_{}. Emphasize that in this case the presented method partially lost its great advantage of independence to size. Moreover, as the proposed test showed, the using of an optimal synthesizing tile is sufficient, and then the -optimal patch is not necessary.
The synthesis of dynamic texture to fill the consists of the appropriate use of the computed patches and to cover it. The main idea is to cover a hole and minimal amount of patches placed on the three-dimensional tile label lattice. The toroidal property allows covering the area with a seamlessly consisting texture that sufficiently represents surroundings dynamic texture with an advantage of the possibility of assembling many arrangements of tiled tiles (see Figure 5.3).
Lets note ^{ζ} as the neighbourhood area. The whole rectangular index set I_{ζ}^{r} = I_{ ζ} \ I_{} is its more straightforward and rough form and consist of possible unnecessary areas and could be fitted better.
As the ζ area of a particular _{i} is _{i}^{ζ} = \. The more suitable ^{ζ} can be defined as a symmetric difference of and all used ’s area: ^{ζ} = △ ∪_{ i=0}^{n} _{ i} The desired ⌊^{ζ}⌋ area size is of course minimal with meaning of pixel count.
The error cost then consists similarly like in the transition patches founding approaches from minimizing the appropriate cut-planes error with a given tile shape. The desired δ_{r}^{*} shift is due to (combined) size with relation to size limited.
Lets denote the rectangular index of as I_{}:
(5.1) |
The surrounding area in which the tile could occur is then of limited by its original optimal size. With relation to I_{}, the appropriate index set is:
The I_{ζ} is then the neighborhood of the hole in which the final (possibly combined) tile shape could be placed to encompass it.
The desired δ_{r}^{*} naturally could come only from I_{ ζ}. Moreover, the only some subspace of I_{ζ} is appropriate. The desired subspace of I_{ζ} is its left-bottom-front quadrant.
Lets denote the left-bottom-front I_{ζ-} quadrant as I_{ζ-}.
or as a I_{ζ} subspace:
then the optimal shift searching is based on minimizing appropriate errors:
where ^{B} are border patches only, _{ B} is border cut-planes error and ϵ^{ϕ} is the dynamic error. Note that ϵ^{ϕ} should be expected to zero in time-homogenous dynamic texture (as the all are representative texture samples). Moreover, the values are precomputed, so their difference estimation is extremely effective. Remind that is average optical flow vectors and ϵ^{ϕ} their difference. The ⌊∩⌋ factor scale the error down for only partially used patches.
The particular error for cut planes are computed as:
where the particular cut-plane error are computed only if the plane is outside plane or, in another words, for only unique indices:
Here, δ_{} is patch location (global indices for the [0, 0, 0] of appropriate I_{th}) in the appropriate texture and ^{th} is from . The ^{tv}, ^{tv}, ϵ^{vh}, and ^{vh} are computed in the exact manner.
An example can be seen in Figure 5.4 where lime, green, pink and violet ellipses mark the bot verticals and both horizontals cuts. Every color marked cuts is then computed with distance to appropriate toroidal cuts (lime to green and pink to violet) and the appropriate color marked area in the ^{ζ}.
The proposed inpainting method (see Figure 5.4) is as follows: Input - obtaining an input 4D DT and given mask (the retained parts are denoted by the mask in blue while the parts to be removed parts are in red) or computed of one. Double Toroid-Shaped Tile - the optimal tile computing using FT. The red ovals marks area for vertical toroidal cut, violet ovals marks area for horizontal toroidal cuts. A set of Double Toroid-Shaped Tiles - if the mask is given, the double toroid-shaped tile borders are fitted to hole (denoted by red in input mask). The tiles are (as suggest layered image) propagated through time by the OF and their exact shape is computed. The color ovals mark corresponding areas for recomputing the patch borders. Triple Toroid-Shaped Tiles - the set of double toroidal-shaped tiles are used to find optimal temporal jump (blue circle arrow) and create the triple toroidal-shaped tiles. The red round arrow denotes the exact continuity of the trimmed temporally toroidal sample.
The input 4D DT’s frequencies and the optimal double toroid-shaped tile are computed by using the Fourier analysis, their time properties and pyramidal Lucas-Kanade optical flow of whole 4D input are analyzed in the parallel pipeline, followed by the search for similar patches of the same shape and computing the optimal temporal jumps. This information is used for its generalization to the dynamic triple toroid-shaped tile as in pure synthesis variant.
The exact location and shapes of triple toroid-shaped tiles can be improved by the error mask which is usually given by the user but is not necessary for the presented approach. Usually, we use only rough hand-drawn area which is entirely sufficient for good results.
The static double toroid-shaped tiles are found by the pure synthesis method and later expanded using DTs optical flow and propagated throughout the time.The tiles are finally fitted to local dynamics by minimizing the boundary error (5.6) and estimating the optimal tiles shift (5.5). The specific cut-planes can be computed in the whole I_{ζ}^{r} area in a similar manner as whole new optimal not-toroidal tile or as a composition of more planes for the I_{ζ}.
Emphasize that as the do not need to have similar size in all dimension, the ideally fitted cut plane could consist of several cut planes for a general nonrectangular I_{ζ}. The different plane fitting method could be used but as their result are visually problematic (See Figure 5.6) that approaches are not used in this thesis.
Moreover, with the assumptions that the input data are a homogenous stochastic dynamic texture or mix-of-DTs and that the used patches and so every particular used patch are a dynamic texture of the same type with the same (or very similar) property the created output will be consistent DT with the same type. Thus in case of temporal homogeneity:
(5.9) |
as the dynamics is consistent thought temporal dimension.
The border error and visually discriminable inpainted part correlates with the increasing occurrence of complex structures, and hence with declining randomness. For a significant part of dynamic textures this is satisfied, and thus the tile can be computed globally, and the exact fitting is not required. Thus, moreover, the
(5.10) |
where _{inp} is arrangement of patches that cover the error hole. And thus the estimation could by bypassed for very stochastic and selfsimilar dynamic textures.
The illustration of this textural property can be seen in Figure 5.5, where founded optimal patch are randomly placed to and dynamic texture. Both simplifying cases are used in both Cherry and Withered. Note that in Withered the first simplifying assumptions are used very loosely and mostly as an example for the test. Both simplifying cases are not of course mutually exclusive. Note that in the dynamic texture the main and mostly dominating property is texture’s dynamics and so the similar dynamics of the inpainted patch and its neighbourhood is crucial.
Typical inpainting problems
Every inconsistency criterion has it own typical manifestation (see Figure 5.6). The sufficiently synthesized dynamic texture of a patch is, of course, crucial for a suitable inpainting result. The inadequate texture synthesis results are easy to observe as it is de facto just replacing one unwanted texture (object) with another. Usually, this criterion is satisfied. The manifestation of wrongly synthesized texture could be missing or shifted high frequencies, which are easily recognizable even in still frames. The insufficiently optimized border error leads to visually easy observable patch location and discontinuity of the dynamic texture in both spatial and temporal dimensions even if synthesized texture itself could have similar textural property. It is important to distinguish between border error caused by wrongly synthesized dynamic texture (waves DT in Figure 5.6) and sufficiently synthesized texture with high border error (stairs).
Mostly the border error optimizing is insufficient only in the temporal domain - the Boat texture (bottom in Figure 5.6) seems to be sufficient and challenging to detect in the still picture visually, but are easily recognizable as moving silhouette when observed as a video. This is a typical problem of methods that optimize the patch frame by frame or do not distinguish between spatial and temporal dimensions. As for the dynamic texture synthesis, the approach from chapter 3 is used, the essential texture frequencies together with structural and color property are preserved. Toroid-shaped patches of a suitable size naturally respect and reproduce the lost frequencies without loss of structural information and without the need to adjust the results through an artificial supply of texture features, estimating parameters and other techniques like ofthomographies [?, ?] which can enhance the quality of the result.
Local dynamics problems are usually caused by the periodic motion of small objects in the texture and create the optical flow. Not preserving of the optical flow causes a violation of G^{{1,2,3}} continuity curves - inconsistencies in dynamics due to the entire image, create a false optical flow, or abrupt changes in optical flow velocity.
The potential maximum number of needed patches c for rectangular I_{ }
Sˆtriviallydependsontheholeandpatchareasizesandequalsto :
The detection of a toroid-shaped patch does not require a knowledge of the mask. The criterion of patch minimal error is trivially minimized for samples consisting entirely from one single dynamic textures from which the surrounding consist from. As the rough tile size and thus even possible arrangement can be known before computing the tiles itself, the optimal patch tile could be computed can be created concerning the future minimization. Thus using of, even consistent and optimal toroidal, patches from different dynamic texture occurring in given input (mix-of-texture) naturally broke criterion and are excluded. Detection of multiple samples and their mutual errors then ensure elimination of a sample that consists entirely of . This can be arranged by excluding the data by its mask (is exist) and using a positive mask (denoting the source texture for tile computation).
The hole or an object to be inpainted is usually given by the mask [?] or segmentation as the preprocessing is needed. Since presented approach deals mainly with DTs, which must preserve textural property, these undesirable parts can be roughly detected automatically. υ, ϒ
Presented approach can detect inconsistencies primary due to different optical flow [?]. The 5D optical flow field Φ is used to detect parts of texture with different local dynamics.
The dissimilarity measure ϒ detects a measure from a some (small) area Δ_{{r}}^{p} with dimensions given by p = [p_{1},p_{2},p_{3}] and centered around r to another the most diverging area Δ_{{r+δr}}^{p}. ϕ(Δ_{ {r}}^{p}) denotes dynamics of the area and I_{ {r}}^{Δp} its index set.
The dissimilarity measure of an area Δ_{{r}}^{p} to the rest of is basically sum of area-to-area measures from original to every another area in the texture (with excluding overlaps)
(5.12) |
The dissimilarity measure ϒ between two areas and their corresponding inside dynamics vectors is defined as:
where υ(,r,s) is a measure between _{r}^{ϕ} and _{ r}^{ϕ}. _{ [x,y,t]} is size vector c = [c_{1},c_{2},c_{3}] composed from three x,y,t values respectively. _{r}^{ϕ} is equal to ϕ(_{ r}) and therefore to optical flow vector for corresponding pixel r from dynamic texture . Both _{r}^{ϕ} and _{ s}^{ϕ}values are part of the superstructure areas with the corresponding shift inside. The particular measure of corresponding pixel vectors is:(5.14) |
r and s are multiindices. ⌊Δ^{p}⌋ denotes the size of the used areas, * is a scalar product. High values of ϒ indicate parts with different dynamics than the rest of the dynamic texture. The equation (5.15) computation between some area and whole rest of the dynamic texture can be effectively enhanced by ϒ computation to not the whole but only to some randomized placed Δ areas. We usually use ϒ_{n} = 5 to 7 areas with the size of ∝⌊⌋ plus forbidding computation of very near (adjacent) pairs which forbids local similarity in an inpainted area. The 5.15 then becomes:
Emphasize that if the locations of Δ_{{r}}^{p} regions are randomized and ϒ is calculated for ⌊Δ⌋∝⌊⌋ sized regions, the automatic inpainting in the resulting DT can leave small inconsistent areas (see DT Water). The high ϒ values suggest that given area should be inpainted. Due to the ϒ distribution, the high and low values are well separated and can be easily detected (i.e. by moving average and standard deviation abruption). The higher values are sufficient indicator for inpainting. Since analysis and synthesis are divided an adjustment of the number of samples can be made with the negligible computational time.In this chapter, an overview of presented criterion for comparison dynamic textures is presented. At first in the section 6.1 the frequency characteristics and analysis are discussed. Next, in the second section 6.2, the developed criterion model is presented in its most general form. The adaptation of the criterion to the synthesis problem and the inpainting problem are in section 6.3. Finally, the criterion is compared to the psycho-physical test in the last section 6.4 at the end of this chapter.
There are two typical basic approaches to similarity / recognition / classifying problem in still texture domain. The first ones focus on pixel-based features and characteristics of texture and their clustering and their subsequent comparing. The second approach then focuses on whole texture description as its color, structural, wavelet characteristics etc.
While pixel-based characteristics can be relatively easily extended to more dimensions, the comparable structural approaches development is notably more complex. The fundamental problem in extending pixel-based features is a different property of a temporal dimension in the first place and a close relationship to the importance of spatial relationships. The computing of metrics frame by frame is also inadequate as this approaches calculates similarities only in the spatial dimension and lack the temporal property. Even when computed from the all or several time points (frames) the criterion only approximate a fully dynamical criterion.
Despite these drawbacks, spatial metrics have their place in calculating the likeness of dynamic textures. The color shift in the synthesized result, the occurrence (or vice versa) of noise, and the occurrence of artifacts are cases that are well-detectable by these methods. Without strictly dynamic metrics, however, their use - as the psycho-physical tests (see Sec. 6.4) show - is still insufficient.
Moreover, the criterion should have an ability to adapt to human vision system and take into account the human possibility to recognize and distinguish particular spatial and temporal frequencies.
Contrast Sensitivity To take into account the properties of a human visual system and its sensitivity to contrast the criterion that suppresses frequencies difficult to obtain or distinguish is suitable. As the crucial frequency principle is a binary decision-making process, another method to suppress unwanted and enhanced important frequencies is desired. As the weighting function which well suited our method the contrast sensitivity function (CSF) model of Burbeck and Kelly[?] can be used. The factor can be computed as:
(6.1) |
where f_{t} is temporal frequency and f_{t} spatial frequency without any difference in direction.
The proposed DT similarity criterion for dynamic textures is based on the three-dimensional Fourier transformation. As the notion of Fourier transformation properties varies in literature, the short introduction based on defines notation and terminology used in this dissertation thesis follows. The Fourier transformation of a function f(x_{1},x_{2},x_{3}) finds the spatial frequencies ξ = (ξ_{1},ξ_{2},ξ_{3}). The 3-dimension Fourier transformation for the f(x_{1},x_{2},x_{3}) function can be written as:
(6.2) |
where the set of all spatial frequencies is called the spectrum. The intensities of one given frequency is called a magnitude |f(ξ_{1},ξ_{2})|. The energy spectrum or the power spectrum, is symmetric about the origin because |f(ξ_{1},ξ_{2})| = |f(-ξ_{1},-ξ_{2})|. The function f(x_{1},x_{2}) is the intensity of light at each point (x_{1},x_{2}). Even when we take x = (x_{1},x_{2}) as spatial variable and ξ = (ξ_{1},ξ_{2}) as frequency variable, the frequency values of ξ is still strictly related to spatial dimensions of f(x_{1},x_{2}). Similarly the ξ = (ξ_{1},ξ_{2},ξ_{3}) of f(x_{1},x_{2},x_{3}) (of f^{3} for simplicity) has only temporal meaning only if the vector ξ has direction parallel with temporal axis.
There is a natural geometric interpretation of the zero phase condition which will be often used in this dissertation thesis as important properties of the complex exponential. For a fixed ξ the equations
(6.3) |
determine a family of parallel lines in the (x1,x2) - plane. Take n = 0. Then the condition on x_{1} and x_{2} is
(6.4) |
and we recognize this as the equation of a line through the origin with (ξ_{1},ξ_{2}) as a normal vector to the line. Then (ξ_{1},ξ_{2}) is a normal to each of the parallel lines in the family.
In higher dimensions the words to describe the harmonics and the spectrum are analogous and due the linear separability - the 3-dimension Fourier transformation 6.2 for the f(x_{1},x_{2},x_{3}) function can be rewritten as:
In three-dimensions a given ξ defines a family ξ ⋅ x = integer of parallel planes (of zero phase) in the (x_{1},x_{2},x_{3})-space. The normal to any of the planes is the vector ξ = (ξ_{1},ξ_{2},ξ_{3} and adjacent planes are a distance apart. The exponential is periodic in the direction ξ with period .
Significant Dynamic Texture Frequencies
The combination of 2D and 1D Fourier transformation is used to detect dynamics of significant local and global spatial frequencies. Although there is no pure 3D Fourier transformation, due to linear separability resulting values respond to 3D Fourier transformation. Note that spectral dimension data are solved multiple over the entire detection, and therefore 4D dynamic texture is taken as an n-tuple of 3D monospectral dynamic textures.
Whereas the crucial part for video similarity perception is its structures, dynamic behavior[?] the key principle of the presented criterion is to detect spatially significant frequencies ξ from the set of f^{2} and observe their behaviour by another Fourier analysis eq. (6.2). This behavior is mainly described by a complex exponential with the normal ξ with dominant temporal component ξ_{3}. Components ξ_{3} similar to ξ_{1,2} are less significant due to combining spatial and temporal directions.
Because of the separability of the individual vectors information and for simplicity of computational model the harmonics with unit normal vectors are projected to ξ^{t} with only temporal dimension. In other words, the any given is represented by nearest ∥ t, t = (0, 0, 1), where the distance d between (ξ_{1},ξ_{2},ξ_{2}) and (ξ_{1}^{t},ξ_{ 2}^{t},ξ_{ 2}^{t}) is defined as standard L_{ 2} norm in ℝ^{3} space:
(6.5) |
The set of all unit vectors are denoted as . The ^{t} then denotes all unit vectors ∥ t, t = (0, 0, 1). In order to try to separate the key frequencies
Provided there is a group of representative normal (unit) vectors we assume a new harmonic . Note that obliviously the
Whereas the harmonics set expresses the behavior of one spacial frequency alongside temporal dimension, it is essential to use only those harmonics that carry essential visual information ( ). Only if the spatial frequency temporal behavior can be considered as significant, it should be used in the criterion. Note that the significant frequencies designation characterises the given frequencies through all the compared DTs (i.e., Y _{s} and Y _{c}). In other words, when a specific spatial frequency shows high magnitude values and is therefore crucial for the reconstruction of a given dynamic texture, this particular frequency and its time behavior are used for comparison.
Of course, not only the harmonics carry the necessary information. One can admit that even information about non-occuring spatial frequencies can carry a considerable amount of information, but with respect to the general aspect of image processing and it Fourier transformation usage, we consider this information as a complement to strong frequencies.
There are many problems in crucial frequency determination: How to separate crucial and inessential frequency? What values of magnitude should be included in these groups? Should be the priority the spatial frequencies distribution or magnitude values maximization? Also, how many harmonics are sufficient to represent the dynamic texture for the comparison?
For now let’s just suppose that there is significantly smaller number of ≪ MN ≈ w log _{2}(MN). For more detailed analysis and testing, see next chapter.
The DT similarity criterion is based on the comparison of harmonic frequencies time series related to significant spatial frequencies in the compared dynamic textures (Figure 6.1).
Let’s denote a set of harmonics (of Y _{s} or Y _{c} DT) ξ_{Y }^{ζ,∙} for δ spectral band:
(6.6) |
where is defined as w_{f} log _{2}(NM)-th largest vales from Y _{c} ∪ Y _{s}. ∙ denotes all corresponding index values (e.g., temporal frames) and ζ = (ξ_{1},ξ_{2}) is some fixed spatial frequencies ξ_{1},ξ_{2}. w_{f} ≥ 1 is the parameter set to speed up the process and to better separate the noise from crucial spatial frequencies. We set experimentally this value to w_{f} = 2 to 3. Recall that due to steadily growing audiovisual data resolution, this step sharply reduces the criterion of computational time complexity. From now, takes an illustrative example in Figure 6.2 of 10 computed crucial frequencies (horizontal axis) with magnitudes from 0 to 255 (vertical axis) from original and synthesized dynamic texture.
Whereas the harmonics set expresses the behavior of one spacial frequency alongside temporal dimension, it is essential to use only those harmonics that carry essential visual information. If the spatial frequency temporal behavior can be considered significant, it is used in the criterion. Note that the significant frequencies designation characterizes the given frequencies through all the compared DTs (i.e., Y _{s} and Y _{c}). This detection ensures that the missing or modified frequencies in the synthesized texture are detected and evaluated.
Now with the rough knowing of characterisation let us denote general function with a measure L between two DTs Y _{s} and Y _{c} as:
(6.7) |
where ↑ is the desired value orientation, β is the Fourier transformation maximum, θ_{Y sδ,Y cδ}^{ξ1,ξ2} for a δ spectral band is:
(6.8) |
where Y _{s} and Y _{c} are source and comparison DTs, respectively. L(u,v) is the measure between impulse characteristic u and v (a distance between two harmonics sets) like eq. (6.12), (6.13) or(6.14). M_{x3}(γ) is a function which return the maximal value for the argument vector γ and scale value to the range ⟨0; 1⟩ over all frame’s maxima. ΘY _{s},Y _{c} is a set of all measure values θ_{Y s,Y c}^{∙,∙} between DTs Y _{ s},Y _{c} across all spatial frequencies. Physical meaning of θ_{s,c}^{ξ1,ξ2} represents different dynamics of the spatial frequency ξ = (ξ_{1},ξ_{2}), i.e., periodicity and magnitude variability between textures Y _{s} and Y _{c}. The M function could be substituted by a different function that reproduces human visual sensitivity as i.e. CSF(ζ,x_{3}).
Emphasize that the ^{L}θi,j^{Y s,Y c} is taken as negative to preserve the definition of similarity meaning of criterion and therefore the maximization criteria. For the purpose of comparing the effectivity of determination the equation 6.7 for whole set of appropriate harmonics.
The harmonics or more precisely the distance function values between them are then weighted by the function M() which correspond to human perception and further emphasize the substantial difference frequencies.
The M calculation optimization
Understanding what frequencies are crucial for texture comparison, of course, should not be taken simply from a statistical point of view. The order of statistical and human perception criterion enhancements is questionary. The alternative description of equation 6.6, which could look as:
(6.9) |
Clearly the explicit version of the perception enhancement in eq. 6.6 involve less computational time. Again, for now just suppose that M is function takes value from 0 to 1 with relation to human perception of particular time / space frequency. Moreover, the CSF(ζ,t) (6.1) can be used to emphasize the visually important frequencies.
where c_{i} and s_{i} are i-th values from the given harmonics ξ. This effortless distance has an advantage of simplicity, metrics definition and above all physical significance. Many textural metrics (let us mention them, for example, STSIM) have poor or no exact physical relationship.
The physical meaning of θξ_{1},ξ_{2}^{s,c} without weighting represents pure different dynamics of the spatial frequency ξ = (ξ_{1},ξ_{2}), i.e., periodicity and magnitude variability between textures Y _{s} and Y _{c}. The values are directly dependent on image representation and in the most usual and general are from -255 to 0, or in the little readable form -θξ_{1},ξ_{2}^{s,c} from 0 to 255, where the value has physical meaning of the sum of differences between harmonics set per impulse. The any given value of the distance can be directly interpreted.
Another clear advantage is the probability distribution of values directly related to the input textures and thus the whole range of values.
Lost frequencies
When the eq. 6.10 interprets the harmonics distance in its purest way as the absolute difference; there are another and more interesting ways to distinguish the dynamics that can occur in the compared dynamic textures.
For the purpose of comparing dynamic textures where one was created by a modification of the other and thus it shares major common part of the criterion function, some additional information can be obtained.
Let us assume that we have two dynamics textures - the synthesized dynamic texture Y _{s} and original source texture Y _{o}. The dynamic texture Y _{s} was created by modifying the original DT Y _{o}.
The loss of information between the synthesized dynamic texture Y _{s} and the original dynamic texture Y _{o} can be expressed concerning dynamics as the absence of some spatial/temporal frequencies (see Figure 6.4) which thy synthesizing / modeling approach is not able to articulate. This particular case can be caused, for example, by inappropriate sample or insufficiently learned the model and is very common. The HPS consequence of this is a loss of structure, missing sharp edges and details, as well as low frequencies that DT synthesizing methods are unable to represent.
The lost frequencies, both spatial and temporal, term is:
where s_{i} (original) and c_{i} (compared) are i-th values from the given harmonics ξ.False frequencies
However, due to synthesis or inpainting methods, some other phenomena that reduce visual quality of the synthesized DTs exist. The problem is not the missing information, but false frequencies introduced by these methods and not related to the original texture. False synthetic DT periodicity is visually disturbing the perception of the quality of DTs. This is the apparent repetition of the same samples as well as the visible boundaries between data samples from which the new texture composed.
Problem is not always the frequency variation made by the method. The missing information computed as L^{lost}(ξ_{ Y s}^{ζ,∙},ξ_{ Y c}^{ζ,∙}) is not the only problem that can occur.
The missing information problem is often directly caused by the used method and not by the input dynamic texture. That approach determined phenomena reduce visual quality of the synthesized dynamic textures and are manifested as false edges, step color changes, etc. Often these phenomena may be described as a violation of G^{0} or C^{0} continuity in both spatial and temporal dimensions.
The problem is therefore not the missing information, but false frequencies (see Figure 6.5) introduced by these methods and not related to the original texture. Usually, this method determined unfaithfulness information are manifest themselves as false periodicity and homogeneity violation.
In relation to the name of the equation (6.12), it can be named as false frequencies of approach determined frequencies. For the sake of simplicity (and as a complement to eq. (6.12) the first term will be used here.
The false frequencies, both spatial and temporal, term is:
Here it is good to remind that the newly created false frequencies are harmonics e^{±2πixξ} with normal ξ = (ξ_{1},ξ_{2},ξ_{3}) where ξ_{(1,2,3)} varies to any combination as the patches are R^{n} space with noticeable error (which are usually minimized by the synthesizing method). Thus L^{false}(s,c) can be efficiently computed on any harmonics ξ_{ Y }^{x1,x2,x3} from which we, for simplicity and to streamline the computation process, pick ξ_{Y }^{ζ,∙} and ξ_{ Y }^{x1,x2,x3} where x_{(1,2,3)} are maximized mutual value difference (i.e., ξ yielding an oblique harmonics).
In the most frequent case the false repetition is caused by an inappropriate patch (which are usually easy to recognizable), unstable or inadequately learned synthesis model, etc.
The inadequate tile size or using of only one patch is the most obvious example: If there is only one patch (even tileable one), the repeating pattern could be easily noticeable even if there are no visually recognizable patch border which could which would reduce the visual quality of the result. Even when the repetition is due to texture stochasticity hard to observe in image space, the peaks are in Fourier frequency space are - comparing to original texture - noticeably easier to observe.
Combined measure
Both methods (6.12), (6.13) complement the absolute difference in texture behavior and allow more accurate behavior comparison of different DT synthesis approaches and more detailed analysis of their weaknesses and strengths.
In the proposed criterion we use their combination:
where α_{l} and α_{f} are weight parameters between 0 and 1. After our extensive tests, it seems that for the purpose of computed similarity of DTs the values of α_{l} = α_{f} = 1 are sufficient. For applications comparing the original and modified (synthesized or inpainted) textures the parameter α_{l} seems to be more important and determining, thus α_{l} ≤ α_{f}.Whereas quality criteria try to emulate human perceptions, it is necessary to verify them. Thus any computed similarity criteria must be compared to human perception. Unfortunately, this requires tedious psycho-physical experiments.
The psycho-physical test should be ideally performed in a controlled environment under the supervision of the presenter. Of course, the other form of testing could be done - i.e. testing on participants own PC in their similarly as the test performed by Haindl and Kudělka[?]^{1} . In these cases, of course, the different variants of the environment (surrounding objects, lighting, etc.), hardware, and much more needs to be taken into account.
Because the DT compression is one of the critical factors for DT inpainting test, it is optimal to perform a test on uncompressed data. Unfortunately, the data amount produced by current resolution in uncompress form are difficult to handle in a remote test. The use of compression methods leads to blurring of sharp edges (both edges in the image and especially edges between the original and synthetic parts of DT). The similar effect occurs in downsampling the resolution. The using of compressed or downsampled dynamic texture for test leads to biased evaluations, often deflected in favour of modified data, in which the adjustment is less pronounced.
For our tests, participants should always compare the data in maximum quality and resolution undiminished. For comparing with another method, dynamic textures resolution^{2} will be downsampled from full resolution to reproduce the similar artifacts and texture quality or alternative methods.
For a proper comparison of inpainting methods, two test with a different setup is suitable. First suited to evaluate the ability to determine the border and the second test with a goal to rank the inpainted result:
Blind test For blind tests, participants should rate the displayed dynamic texture by its visual quality and whether it contains or does not contain obvious interference, retouch or artifacts. The used scale from 0 to 5, is suitable and sufficiently detailed. The maximal values 5 denotes the highest quality of the presented DTs. As a control mechanism, the original dynamic textures without any editing or inpainting is appropriate to be added to the evaluating process.
Informed test In the informed tests, the same set of dynamic textures should be used. Unlike the blind test, additional information should be presented to participants. This information consists of original dynamic textures and the corresponding masks thus participants can directly compare changes in the presented results.
To obtain a direct comparing ranking, two dynamic textures result can be simultaneously presented. If results from more inpainting methods exist, the all pair combinations should be presented and evaluated. To ensure the blindness of the test, the presented dynamic texture pairs should be randomly placed on the left / right sides. The designed arrangement can be seen in Figure 6.6.
The bidirectional texture function is currently the best visual texture representation of various textured materials which can be simultaneously modeled and acquired. The process of BTF measuring is time consumable, but the results are currently the most high-end and physically correct surface materials appearance modeling. Above all, the adaptability of the model to the illumination conditions is extremely wide. This short chapter consists of a BTF and DT model combination demonstration (see Figure 7.1).
The short BTF model property overview is in the first section 7.1, while the dynamic bidirectional texture function and using of a proposed toroidal model editing capabilities to synthesize is discussed in the second section 7.2.
The BTF model is a 7-dimensional model, subspace of a more general BSSRDF model. Remind, that the 7-dimensional multispectral function is defined as:
(7.1) |
where ς is spectral index, x and y are planar coordinates. ω_{i} = [θ_{i},ϕ_{i}] is illumination direction and ω_{v} = [θ_{v},ϕ_{v}] is viewing direction. Y denote a random field (true unobservable image) for pixel r.
The BTF data acquiring is a lengthy, complex and complicated process resulting in a huge data set. For example, single texture measurement on the University of Bonn setup consists of 6561 (a combination of 81 view (ω_{v}) and 81 illumination (ω_{i}) source position) multispectral images.
For the desired DBTF model the BTF can be then taken as generalised DBTF model with the assumption of time dimension as a constant. The whole BTF image space is then subspace of DBTF model:
(7.2) |
where t denotes an observing time. The whole dynamic BTF model is then defined as:
(7.3) |
with a clear relationship to BTF model:
(7.4) |
where Ø denote missing index. Recall, that the dynamic texture is in its simplest form defined as:
(7.5) |
Similarly, the direct extension of a BTF model (1.5) to temporal domain as combination of BTF and DT model leads, due to compliance parameters ς,x,y that are shared by both models in the exact same meaning, to the resulting DBTF 8-dimensional model which can be suited as:
(7.6) |
where ς is spectral index, x and y are planar coordinates. ω_{i} = [θ_{i},ϕ_{i}] is illumination direction and ω_{v} = [θ_{v},ϕ_{v}] is viewing direction. The t parameter is again a newly added temporal dimension index.
The measures are then stored as the set of classical rectified original measurement (frames) with possibility to convert them into set of apparent BRDF: ABRDF_{[r,t]}(θ_{i},ϕ_{i},θ_{v},ϕ_{v}).
As this dissertation thesis recommends to address dynamic texture as 5-dimensional function with specifically designated dynamic dimension as a representation of high-order texture inner dynamics, the function 7.9 becomes to:
(7.7) |
where vector ϱ denotes the appropriate dynamic, namely, i.e. optical flow vectors.
DBTF Measuring
The crucial problem of acquiring a suitable DBTF data is, of course, demanding measurement. The BTF measuring itself is a very complex process which is furthermore complicated due to an assumption of a homogenous dynamic process.
The inability of a natural dynamic texture deterministic behaviour through all measurements complicates the whole data obtaining process as the dynamics is usually supplied artificially by any sort of a proper device (of course, the artificial dynamic texture with deterministic behaviour is also possible, but usually is not our object of interest).
Inasmuch as the measurement is an iterative process, the sample is different for every measure. Moreover, every measurement takes a significant time. The possibility of obtaining more measurement simultaneously is theoretically possible, but extremely complicated (currently, BRDF simultaneous measuring is possible).
The problem of varying subject form, however, is only apparent as the texture is defined as the random realisation of a stochastic field with unvarying parameters. So, with the textural assumptions applied to dynamic texture temporal dimension the samples of a dynamic texture (and thus even dynamic BTF) are defined as the same texture (even different realisation) as the dynamic texture are seen as generative video model defined by a random processed with some observed variable and hidden variables[?].
Therefore, if an (artificial) dynamics has constant properties through whole measuring process, all samples, even obtained in the different time, can be considered as one dynamic texture.
Interestingly, the constant properties of a dynamic process source can be defined as a hidden ℧ process, which defines the texture inner dynamics. The general multidimensional function ℧ thus generate the texture dynamics:
(7.8) |
The dynamic BTF whose dynamic properties vary but has the same structural data can be therefore defined in a more general way as varying dynamic DBTF:
(7.9) |
The general dynamic generative function ℧ with hidden vector parameter κ_{i} is then, i.e. fan wind and δ the direction, wind strength etc. The ℧ function can then define the high order and long-term dynamics property and can be understood as generative and determining function for dynamic texture transition function in temporal meaning. For example, temporal transition textures between two stable states (e.g. strong wind and weak wing, or raining process) can be modeled this way.
Approximate DBTF modeling
Whereas a moderate precision measurement of one single material made by controlled robotic arms with camera and light source can take around one terabyte, the amount of a dynamic material represented by a dynamic BTF, therefore, increases significantly. Even if the temporal dimension length can usually be noticeably smaller then a spatial resolution, the amount of data is considerably tremendous.
The amount of data and problematic comprehensive measurement leads to use an approximate model, in which only noticeably smaller subset of a ω_{i} and ω_{v} are used.
Moreover, the BTF measuring problem caused by self-occluding is even more significant in dynamic materials. Especially in the case of extreme low angles of view, the distortion of the given dynamic texture is due to a non-planar material plane problematic and excludes the simple approximation of values.
The using of additional information as a measured and registered depth map can be used to produce a slightly better result, i.e. by utilizing the displace mapping algorithm. The depth map could be used as an additional spectral plane to archive better suited toroidal tile and patch. The methods that blur the sharp edge can be utilized.
DBTF modeling
The utilizing of a proposed toroidal model to a DBTF material is similar to a BTF synthesizing process and DT editing process. The set of measured dynamic textures are labelled by with appropriate ω_{i} and ω_{v}. The optimal triple toroidal tile is found in a primary dynamic texture (say sample with ω_{v} = [0^{∘}, 0^{∘}] and ω_{ i} = [45^{∘}, 0^{∘}]) and propagated to secondary measures or can be computed as a compromise solution. The appropriate sample is then applied to a desired location in the rendered scene.
The problem occurs when the scene illuminance r observing direction is varying. The temporal length of a used triple toroidal patches limits the angular speed of the light source and even the viewing direction. The angular speed
(7.10) |
with respect to the texture plane is directly limited by a sample density and patches length, t denotes time and ϕ an angle. As the only sparse measurements are performed the maximal illumination and viewing angle changes are defined as:
(7.11) |
If the ω are not linear dependent to a ⌊⌋_{t} value, the dynamic texture transition is inaccurate. Obviously, the visualization expected error measure can be described as:
(7.12) |
and should be minimized.
The simple example of modeling a dynamic texture measured under varying illumination condition and arrangement from measured texture patches is demonstrated in Figure 7.2.
This chapter consist of two main parts both closely links to previously presented inpainting method and dynamic texture similarity criterion. At first section 8.1, the created dynamic texture database dismantled with a comparison to existing one, namely and primary DynTex database is presented. Next, in section 8.2, the database is used to compare results and validate the Spatiotemporal Fourier Transformation Criterion which was presented in the previous chapter.
As an excellent, inspiring material, the widespread and often used DynTex[?] database class labels could be used. The database offers three classification benchmarks dataset of dynamic textures with various length measured under varying conditions including camera span, zoom, and different weather but unfortunately only in PAL resolution. Most DynTex textures are in three occurrences per each class, and current database contains over 650 sequences while covering the wide texture classes as flowers, water, fountain, foliage, flags and many others. For used datasets the classes can be seen in Table 8.1:
The illustration of DT types Calm water and Flower an be seen of Figure 8.1 and Figure 8.2. Usually, every class contains tens of textures. All three datasets consist almost 650 textures with various length, camera move or level of detail.
Database Description Four our database we consider setting which is considered as optimal for presented method test but can be used in many other approaches. The basics of technical settings parameters are:
while the main reason is to measure the maximal amount of data with the same parameters.
For the validating purpose and to test of presented inpainting and synthesizing method it is optimal to satisfy not only given technical parameters but to focus on parameters of dynamic textures themselves. The more shots of the same dynamic textures, more measures with different dynamics and luminance etc. is then needed. Varying parameters allow for the extensive test of various modeling methods and its robustness.
The other appropriate parameters for build database should consist of:
Figure 8.3 illustrates representative measured subset of types GRASS and FOLIAGE DTs. Note i.e. DT 11 and 33 as a textures with similar structure, but very different dynamics. Our whole database consist of 1751 dynamic textures with duration of from twenty second at least one minute.
The dynamic texture database and its classes were used to spatio-temporal Fourier transformation criterion presented in the previous chapter. It is clear that just like a slightly edited dynamic texture (i.e. by inpainting) should be very similar to the original dynamic texture, the value of similarity of textures belonging to the same class should be noticeably higher than for textures outside this class. So the ratio between inter-class and intra-class similarities could be used to cross-validate the criterion definition (or alternatively to determine the class consistency). Again note, that for the purpose of this dissertation thesis the primary and crucial property of a dynamic texture is its dynamics.
Lets remind that the similarity based on spatio-temporal Fourier transformation criterion between two given dynamic texture _{s} and _{c} with specific metric L is denoted as:
(8.1) |
where |^{A}| and |^{B}| denotes number of textures in classes A and B.
Recall, that the scrips notation are organised for any function α as
More comprehensive with using of ∙ notation the similarity between classes could be written down simply as:
(8.2) |
The weighted value of the intra-class similarity is then
(8.3) |
Since ∙ denote all possible value of given variable or all elements of a set, ^{∙}Y denotes a set of all DT classes and ^{∙\A}Y a set of all DTs classes other than A. Consequently, ^{∙\A}Y _{ ∙} denotes all DTs from all classes excluding A. The sum of similarities of one DT ^{A}Y _{ s} from given class A to all DTs in all other classes is then the inter-class similarities is:
(8.4) |
The ratio of similarity values between one particular DT ^{A}_{ x} from a given class labeled by A to all other DTs in every class (for simplicity suppose the same class size) is then ratio between intra-DT class similarity values and inter-DT class similarity values:
The overall inter-class average ratio can be subsequently express as:(8.6) |
or precisely, but more cryptic with two-level ∙∙ notation as:
The total mean overall ratio can be therefore express as:The ratio values ^{}ψ = 1 represent extremely similar dynamics or dynamics that is unrecognizable by our criterion.
Since our criterion does not acquire absolute values smaller than 1, the ratio ψ > 1 determines significantly higher mutual similarities of the intra DTs than similarity to DTs from other classes. Thus the overall and total Ψ > 1 should be maximized to achieve the highest classes discriminability.
Let us mention that in some significant cases, the ratio of the distance of values may be less than 1. These cases are usually caused either by different dynamics (e.g., leaves versus branches, lee, etc.) within the DT or by the texture being composed of multiple DT textures.
Suppose, that every texture is random field realisation with given parameters. Then even vaguely defined dynamic texture classes should have similar parameters. Hence the proposed criterion is defined as similarity it is obliviously maximized for identity.
Since each texture is a random field realisation with some given parameters, which is homogeneous in both spatial and temporal domain, the texture property from any texture subparts should be remarkably similar but not identical. With respect of this fact, any given dynamic texture can be seen as its concrete dynamic texture class and its sub-parts as textures belonging into (note that the sub-parts should have similar size due to due to possible variations in texture scale).
So for the validation test, the given amount of small randomly placed dynamic texture sub-parts were extracted from every used dynamic texture. Every dynamic texture is then taken as its own dynamic texture type and texture sub-parts as a concrete dynamic texture. As the input DTs (or classes for this purpose) are homogenous texture, the requirement of a consisting property is trivially satisfied.
As a source for used dynamic textures, the class GRASS was chosen as an example and typical class with the most dynamic textures and the most differing dynamics realisations. The several sup-parts from chosen high-resolution textures can be seen in Figure 8.4.
The illustration of the sub-parts similarity values and used sup-parts is on Figure 8.5. The single values inside textures represent the inter-texture similarities and intra-texture similarities ratio and then express the similarity of every particular texture to all other textures. Clearly, the Daisies texture (on the left) is different from most another DTs. On the other side, the Grass textures are more similar to all other. The paired values then represent direct ration between two textures (exactly the all-to-all sub-part similarities ratio) with an order of top-left and left.right. Then the similarity from Daisies DT to bottom Grass DT is 1.590 and subsequently the ratio from bottom to right Grass texture is 1.093.
The proposed dynamic texture model shows the ability to model suitable and visually plausible dynamic textures. Here, representative frames from synthesized results are showed. Step between showed frames is at least 10 frames in every sequence. Examples of representative frames of a natural dynamic textures are shoved in Table 9.1. The last column of a table demonstrates that method could synthesize result from input consisting of more DTs. Figure demonstrates model ability to adapt to optical flow and enlarge DT in all dimensions. Moreover, two patches are used (here without any editing enhancements).
As textures consisting of fire and water phenomena are one of the hardest to synthesise, some examples of representative frames are shoved. Figure 9.1, Figure 9.3 and Figure 9.4 shows the synthesized (time and spatial enlargement) fire sequences. The input sequences representative frames are in Figure 9.2.
All synthesized Fire textures were synthesized with method set-of-cuts to handle very erratic flame behaviour. Note, that after many iterations and when flames drastically change their location or even disappear the cuts quality can degrades for a small amount of time.
Another dynamic texture type which is difficult to model is water. Three different dynamic texture enlargement with the various property is shown. The various temporal and horizontal enlargement is demonstrated in Figure 9.5 and various vertical and temporal enlargement in Figure 9.6. The Figure 9.7 demonstrate problematic case as the strong visible content (the reflection of the tree on the surface) is presented. The method set-of-cuts is used in the second case. Let’s mention, that all dynamic textures are recommended to see as a video.
Limitations Presented toroidal editing approach can work with many input DTs types, but there are some restrictions. If some perspective distortion is present, the quality of the result can vary widely depending on the severity of distortion (but still can be synthesized, see Tab. 9.1, third row), dynamic textures with small non periodical color gradient generate optically visible transition on seams because the are not any repetitive parts to find. The strong unpredicted local dynamics with fast disappearing of small objects can create artifacts too.
The result with noticeably spatial enlargement can be seen in Figure 9.2. More patched used in every sample. The original texture is on the right, the three textures on the left are downsampled. Figure 9.10 demonstrates noticeably changes in the visual quality if more patches are used.
A set of individual synthesis algorithm enhancements that serve to increase the limited number of results and, above all, to consider their visual quality was proposed. The enhancement utilized the previously proposed 4D dynamic multispectral textures synthesis and editing based on toroid-shaped tiles.
The synthesized examples illustrate good performance of our approach (many types of DT can be seen on the video). Representative frames of a result subset are shown in figures in this chapter.
The mix-of-DTs are showed in Figure 9.12, Figure 9.11 and Figure 9.13. The Figure 9.12 is pure mix-of-DTs with slight colortone enhancement, the Figure 9.11 demonstrates border patches approach and and Figure 9.13 demonstrate border tile approach.
Moreover, dynamic texture editing approaches are suitable to synthesize dynamic BTF textures. The example of results can be seen in section 9.6.
The figure 9.15 illustrate zoomed detail of a synthesized texture that consists of only one used patch where the time shift of Λ = 5 and Λ = 10 are used. Notice that extremely small value of (sec) is sufficient and safe from disturbing the global Φ values.
The temporal dimension editing is showed in Figure 9.15 and scheme of an interactive texture with multiple temporal jump in Figure 9.14. Note small zoomed detail that helps to recognize slightly different dynamics.
The proposed method was extensively tested on high-resolution dynamic textures from the DynTex database and our database of DTs. All used dynamic textures were color with resolution up to horizontal dimension 720 pixels and varying length (from seconds to minutes). Both natural and artificially added occlusions and holes were used. The analysis usually takes several minutes (see Table 9.5) and it strongly depends on the length and the self-similarity of a texture (due to the branch and bound method). The synthesis time is negligible, and it is only limited by memory operation and the video storing/coding operations. Because the inpainting time of our method consists of strictly separated analytical and synthesis parts, our method allows adjustment of the mask or patch usage (picking object to remove - i.e. chose which duck to remove in the DT Ducks). Presented inpainting approach can work with video texture as well, but dynamic texture must be present. Presented dissimilarity criterion was tested on numerous DTs with various, but usually satisfying results. Non-periodical color gradient, strong handshake or chaotic local dynamics might degrade the resulting inpainted DT quality but can be efficiently solved by a various known method.
The inpainting time of our method consists of strictly separated analytical and synthesis parts. Moreover, hence the patches are found in the analytical part and computation of the minimizing borders is negligible or can be skipped. Our solution is strongly resilient to the hole size or the number of holes - i.e., on DT Cherry the analysis part takes 37 minutes for one hole, 38 minutes for two holes, and DT Whitered where the analysis part takes 23 minutes for two holes and 26 minutes for one hole, respectively. Similarly, the time complexity of analysis of Water DT with one or tho holes are almost the same (see Table 9.5). The comparison or our non-optimized C++ implementation with the inpainting algorithm of Newson [?] can be checked on Table 9.5 (measured on a laptop with 2.3Ghz Intel i7 CPU and 8GB memory). Note that although we can inpaint much longer sequences, here due to the time complexity of the Newson method [?] we use much shorter ones. No other calculations than the analysis and synthesis time are required.
Note that the alternative Newson methods are published as Matlab code, so the time comparison should be interpreted with this on the mind. Even so, it is necessary to state that the presented method is (by taking advantage of toroidal approach) an order of magnitude faster. Emphasize, however, the principal drawback of combined synthesis and analysis in alternative approaches. The presented method does not need a significant additional amount of time with increasing output dimension resolution. Comparison to this efficient algorithm with typically inpainting approach can demonstrate the strengths of our approach - strong resilience to the error area size and the number of error regions (error concealment) in contrast to other approaches, i.e. [?].
The other state-of-the-art method like [?] and [?] typically takes at least hours, i.e. [?] process short video with 854 × 480 pixels, 90 frames and 18% missing took around 3 hours. DTs with a similar property like Thuja or Kafka took around minutes in our solution. The only time-consuming part of our approach is to find proper time jumps, but this can be extremely shorted by user input or setting desired loop length. Usually, our solution is extremely fast due to triple toroid-shaped approach while other approaches repeatedly computed borders and new patches inside the hole which is extremely time-consuming for larger videos and holes.
Long and short DT inpainting Typically, the overlay error reaches a minimum if the source DT contains similar or exactly the same data like with the same or different internal structure (δ_{r}^{*} contains only temporal shift). This type of DT we called long DT. Usually, inpainting of the long DT creates a visually much better result (for comparison see Figure 9.16). Note that even in the (very) short DT type our solution creates visually seamless results (see Figure9.17, detailed images of DT Walk) in comparison with alternative methods. Recall that all other input DTs are the short type.
Dissimilarity A simple but sufficient criterion for optical flow dissimilarity was presented. This criterion was shown to be sufficient for most DTs used. The ability to substitute segmentation of inpainted area, or using additional masks have been demonstrated. Due to the placement of area-to-inpaint inside the set of patches and thus no need of tight mask the presented measure is mostly sufficient.
Synthesis-based inpainting We presented DT inpainting not as an iterative greedy-like process but by synthesizing a sufficient large DT and insertion this newly synthesized DT to the inpainted area by the energy minimization criterion. The synthesizing process preserves all essential textural properties like local dynamics and both high and low frequencies. No additional computing or editing of the inpainted data are needed to reach visually satisfying results.
Video inpainting We demonstrated the ability of our method to work properly with video textures containing DTs. A rough mask indicating the source of dynamic textures is useful but not always necessary - information from the dynamic textures surrounding the repaired area is mostly sufficient to determine the correct sample through an iterative search for the optimal filling samples.
Psycho-Physical Validation The presented method results were tested on four groups of almost 150 participants in two test containing blind quality rating and informed pair evaluation. The result (Figure9.21 and Fig 9.20) has shown that the presented inpainting results dominated in most cases and their cases the results were of comparable quality.
Spatio-Temporal Criterion Validation The presented method results were evaluated by spatio-temporal Fourier transformation based criterion and compared to the results of three alternative approach. Proposed method dominates over most of the comparisons with the exception of two cases where the alternative method of Newson showed less lost data values (however, mention here a noticeably higher rate of false frequencies).
Structure preserving Our approach works well on various types of DTs but if the structure of the DT lack high homogeneity diminished visual quality can follow. If the input DT does not contain sufficiently similar (or another sample DT) structures, the presence of a spatial or temporal cut can be visually recognizable (e.g., Walk-short taken from 10 frames). In these particular cases, another DTs can be used as an input, or a similar patch must be adjusted (e.g., by color-tone modeling, manually editing, etc.). In the case that the structural problem is given by the temporal cut, it can be easily solved by reusing a time-consistent toroid sample and preserving the inpainted patches throughout all DT duration.
Frequency preserving We demonstrate an extremely good efficiency of our solution in the case of preserving both low and high frequencies from the original DTs. Many alternative methods have problems, due to random patch sizes, with preserving some frequencies - typically high on very homogeneous DT (Whitered or Cherry and Walk), and low on more video texture-like ones (like Kafka or Walk). DT Walk is a good example, where due to the inadequate patch size the inpainted alternative methods results can not synthesize larger structures (see Figure9.17, fifth row, middle column). The parameter of the patch size is in our solution driven by the most crucial frequency, and so our solution can preserve them. Different problems can be seen on the Kafka input where a patch based solution cannot recognize extremely strong and important periodicity (and to obtain satisfying result parameters need to be manually changed). Similar to the spatial frequency problem is a problem with temporal frequencies and optical flow as it can be seen on the inpainted Walk DT. The optical flow (moving of grass in the wind) of the scene is in our approach well preserved but can be insufficiently reproduced by patches with the wrong size.
The illustration of an ^{L}ϕ^{Y s,Y c} behavior and typical changes in the DTs FT is shown in Table 9.7 for original undamaged frames (taken at a different time) and inpainted frames. Note that false and lost frequencies behavior are method given, strongly similar throughout all frames, not only visualized ones.
The psycho-physical tests were performed as a double-blind test on four groups of participants (48, 48, 24, and 30 participants). Note that participants were students unfamiliar with results as well as presenters. As environments for test the schoolroom were used. In first two (more significant) test dynamic textures were presented by the high-resolution projector. For third and fourth test the Scalable Amplified Group Environment with the resolution of 9600x4320 was used. Whereas the results of both types did not show any observable variations, they are presented as one. All participants in every group and environment setting were subjected to the two designed test - blind and informed.
The psycho-physical test enviroment All tests were performed in Faculty of Information Technology background. For the first and second test of dynamic texture similarity criterion, the schoolroom with high-resolution was chosen. For the third and fourth test the background of SAGElab ^{1}. All dynamic textures were presented on the Scalable Amplified Group Environment with the resolution of 9600x4320.
The psycho-physical test property All performed psycho-physical tests were performed as a double-blind test thus all participants including presenters were unfamiliar with results. Of course, only authors provided / publicated results and codes were used. Four groups of participants were subjected to tests in the number of 48, 48, 24 and 31 participants. The used scale for the rating was used values from 0 to 5, where 5 denotes higher quality of presented DTs.
Blind psycho-physical test arrangement The first psycho-physical test arrangement consists of presenting one inpainted dynamic textures after another while participant should write down the presumed quality of a presented dynamic texture (except low-resolution caused artifacts ). Participants should rate the displayed dynamic texture by its visual quality and whether it contains or does not contain apparent interference, retouch or artifacts. As a control mechanism, the original dynamic textures (if exists) was added to test set.
Informed psycho-physical test arrangement Participants should rate the two displayed dynamic textures by its visual quality and whether it contains or does not contain apparent interference, retouch or artifacts. In the informed test the additional information was presented to participants: The original dynamic texture and mask (if available). For used arrangement see Figure 9.19. Thus participants could directly compare the inpainted texture with original and moreover compare more results at one time. In case of more than two methods result exist, all possible combination was used. The presented dynamic texture pairs were randomly placed on the left / right side and both participants and presenters were unfamiliar with the right order.
The human ratings (see Appendix B) of over 150 participants was used to validate the proposed criterion. The observers evaluated similarity using a rating between 0 to 5. Tab. C.1 shows the comparison between our criterion values and values obtained from testing. Since the difference may also have negative values, the goal is to minimize the absolute value. A positive value means an underestimation of the criterion due to values obtained from testing; a negative value means overestimation of the criterion.
The complete results of both psycho-physical tests are in Figure 9.20. Note that results are organized from left to right with declining predominance of our method. For comparing inpainting results the more comprehensive graph is shown in Figure 9.21. Emphasize that pairs labeled as R×O compared our inpainting results with original unedited data (taken from the slightly different time or without the artificial mask). Labels R denote proposed method, N denotes alternative method by Newson[?], X denotes Xie [?] method and label E denotes method by Ebdeli[?]. Label O denotes using of original dynamic texture. The values are grouped by used dynamic texture and mask thus DT Thuja has more results.
From the tests carried out, it can be clearly seen that the present method dominates in the vast majority of the comparisons performed. Very interesting is the apparent dominance when comparing more complex textures and the difference between the blind and the informed test. The distinction between the blind and informed test values is expected. The apparent inability to determine (without additional information) the texture edited area and therefore the original removed texture data if of course inpainting goal. Note also the low variance of the informed test values for our method and therefore the relatively consistent quality of our results.
It seems surprising that the values of the informed test in most cases exceeds blind solution values. This indicates high-quality results of all methods that stand up to the participants’ knowledge of the mask. Moreover, the knowledge of the mask and thus ability to focus at exact inpainted locations with a combination of great blind-informed quality jump suggest that used method is very effective. Usually, that means that participants cannot find an artifact in the blind test or anything that suggest what was inpainted. Note the great leap in H-Ivy, Q-Thuja or J-Kachny. The decreasing results C-Boat and T-Withered suggest that with the knowledge of mask, the participant were able to (or they thought so) identify artifacts / exact mask boundary.
Note that presented inpainting solution is considered as lower in both cases only in one Thuja version but with a difference of 0.07 in informed test and thus is considered as the same. Proposed solution is overcomed only in P-Thuja, D-Canoe and K-Water with difference of 0.36, 0.34 and 0.3. Interestingly, the difference between our inpainted solution and undamaged original (case A-Boat) is only 1,17. From that point of view, the difference in order of < 0, 0.5 > can be considered as extremely small. Moreover, this stressed the clear domination of proposed method in a major part of tested DT sequences.
Note that some other tests were performed on small groups (tens) of participants and different DTs and arrangements.
The behavior of a proposed similarity criterion with varying w_{f} is showed in Table 9.8. It can be clearly seen that with the decreasing w_{f} and thus with a smaller number of used frequencies (which must be subsequently stronger in the spatial frequency magnitude) the value ratio is consistent. Thus for test included in this article the value of w_{f} is set to 3.
The overview of all three L methods and comparison of the corresponding ^{L}ϕ^{Y s,Y c} with all used frequencies and the crucial only (w_{f} is set to 3) is showed in Table 9.9 as a difference between proposed and alternative methods criterion value. The values for DTs Boats, Walk and Daysies is shown. Used resolution is 720 × 576 for the first DT and 1920 × 1080 for the second and third one.
The overall comparison of the spatio-temporal Fourier transformation criterion values is shown in Table 9.10 as the difference between alternative method values and our method values comparison for DTs Boats, Canoe, Grass, Ivy and Walk. The values of Our^{L}ϕ^{Y s,Y c} - Alt^{L}ϕ^{Y s,Y c} are thus maximal and positive if presented method result is rated as more similar and negative if the alternative method result is rated as better (more similar). Note that only in two cases - Grass_{lost} and Ivy_{lost} the alternative methods are rated as better and in case of Boat-Ebdeli_{lost}, Canoe-Newson_{lost} and Walk-Newson_{lost} the results are very similar.
The time complexity comparison of our method with a very effective approach of Newson[?] can be seen in Table 9.5, the quality comparison in Figure 9.17, and in the accompanying video. We chose the Newson’s method over newer one as [?] as a new and very good inpainting algorithm with open Matlab code, existing comparison with another inpainting method[?] with known strengths and limitations, and as typical patch based inpainting approach.
Experimental Dynamic Textures Subset
The proposed similarity criterion was tested on our dynamic texture database and the DynTex[?] database. To determine the texture class, the DynTex class labels (SEA, FLOWERS, FOLIAGE, etc.) were used also for our own database.
Our DTs have a noticeably higher resolution (including Full HD and higher) and time duration in the order of tens of seconds or minutes. Figure 9.22 illustrates representative GRASS and FOLIAGE Full HD DTs with all to all computed similarity criterion (only a representative part is shown for explanation purposes).
The dataset consists of visually similar textures, yet with different dynamics. The database contains textures with pronounced (strong wind, rain) and low dynamics as well as textures with similar dynamics (directional wind) but very different structure.
Due to the high variability of dynamics and structure, while maintaining a similar class, this is an interesting and representative dataset allowing useful comparison and demonstration of strong and weak characteristics of the criterion. The criterion was also tested on other datasets with similar results. The furthermore criterion values can be found n the next chapter.
The submitted dataset has been subjected to detailed psycho-physical testing with more than 150 users who evaluated the likeness of individual DTs. This user evaluation (Tab. B.1) is therefore used for the proposed criterion validation.
These results show both pairs rated as extremely similar (21-14), and couples that have been rated as highly different (11-12, 10-12). For example, DTs 23-22 and 02-12 have similar structures and similar dynamics but very different color, but these couples were evaluated as relatively similar.
The resulting synthesis of an approximative DBTF shows that the DBTF as a medium is possible and valid. The incredibly complex natural object consisting of a high number of self-occluding and self-shadowing could be very problematic to model in a classical way.
Of course, proposed DBTF and DVBTF model are topics, which will be in focus in further works.
The presented result shows Grass texture in Figure 9.23 composed of many samples under a varying illuminance. Note that only one patch is used and thus temporal shift editing method is applied to suppress the dynamics repetition. Larger arrangement is showed in Figure 9.24 with stronger time shift (δ = 7).
More complex results which consist of the samples acquired under different viewing angles can be seen in Figure 9.25. The measured object (plant), has a strongly varying property under different view directions (top three row, all consist of a one spatial and one temporal dimension enlarged texture). The result is then composed together to one resulting texture which covers the desired object texture from θ_{v} = 90to10.
The presented results (Figure 9.23) show Grass texture composed of many samples under a different illuminance. Note that only one patch is used and thus again editing method is applied to suppress the dynamics repetition.
The new approach for fast 4D multispectral dynamic textures modeling was presented. The proposed dynamic texture model is based on a triple toroid-shaped tile, has separated analysis and synthesis and is able to create an infinitely long dynamic texture in both spatial and temporal domain. Therefore the synthesis can be done very quickly and effectively. The synthesized high-resolution examples illustrate good performance of a proposed approach for many types of dynamic textures. Our approach can edit temporal and spatial properties of DT and can easily create a mix of dynamic textures.
A set of specific synthesis algorithm enhancements that increase the limited number of results and, above all, to enhance their visual quality was proposed. The editing ability and creating of mix-of-DTs was subsequently applied to the synthesis of approximative dynamic BTF textures.
The proposed dynamic texture model ability to inpainting problem was demonstrated. The model utilizing allows very efficient filling of an error area with maintaining the visual plausibility. Moreover, due to toroidal principle, the process takes several minutes instead of hours, days, or more. The inpainted examples illustrate good performance of our approach for many types of DTs. We have demonstrated our solution on several DT inpainting problems and validated its efficiency through performed psycho-physical tests.
A novel entirely spectral dynamic textural similarity criterion was presented and successfully validated with several state-of-the-art criteria, which were straightforwardly extended to temporal domain. Due to the absence of reliable DT benchmarking extensive psycho-physical measurements of dynamic texture similarities with more than 150 participants was performed. The presented criterion significantly corresponds with similarity values obtained from psycho-physical tests. The validation of presented criterion has clearly shown its discriminatory and sufficiently diversifying capabilities within a given dynamic texture class.
A sufficiently general and descriptive model that offers a more in-depth insight into dynamic in the dynamic texture, its regularity and many others properties of a dynamic texture was proposed. The creation (synthesis) of the new dynamic texture which is visually similar to the original one is the fundamental contribution of this dissertation thesis (see Chapter 3).
Spatial Domain For only spatial domain enlargement, the double toroidal-shaped patch approach was presented. The sufficiently representative toroidal-shaped patches allowed extremely fast yet effective synthesis with the possibility to handle various types of DT.
Temporal Hence the perceptual property between visual and temporal dimension differs, the temporal domain must be handled differently. The presented model temporal synthesis utilize spatial dimensions synthesis and enhance it with an ability to adapt to the dynamics of an every particular DT.
Pure synthesis approach is useful but not satisfactory enough in many specific cases. The editing of patch arrangements, suppression of visual disturbances, editing of the dynamics or specific color balancing that increases the amount of synthesis results and field of uses was demonstrated (see Chapter 4).
Interactive texture Due to the presented general temporal synthesis approach, the transition between more DT types can be created. Synthesized results have varying visual and dynamics circumstances like illumination or weather conditions. The transition order can be due to separated analysis and synthesis easily controlled.
Mix-of-DTs The general video data usually contains more dynamic textures. Presented approach can analyze mix-of-textures and subsequently synthesize great amount of specific arrangements containing many different dynamic texture types.
Temporal editing One of the intelligent sampling method disadvantages is the strictly limited amount of source data patch which can lead to visually repetitive effect. The slight temporal changes of the dynamics and structural factors which can distinctly reduce this undesirable behaviour was shown.
Object removal Assuming that unwanted object distorts surrounding area, which is supposed to have dynamic texture property, the need of some exact mask can be, in some cases, bypassed. Distorting parts in given dynamic texture can be roughly marked automatically by proposed dissimilarity measure based on optical flow and color information.
Multidimensional Frequency Similarity Criterion A novel dynamic texture criterion based on the Fourier transformation and properties of spatiotemporal frequencies was presented. The proposed criterion allows for a qualitative comparison of both synthesis and inpainting results. Presented criterion results correlate well with a performed psycho-physical test (see Chapter 8).
Synthesis validation The multidimensional frequency similarity criterion has made it possible to detect and measure crucial frequencies in the temporal and spectral domain and thus be used to synthesis results quality verification.
Inpainting validation A typical inpainting problem is the neglect of fundamental spatial frequencies in the newly created data patch or the creation of frequencies that are not to be found in a given source dynamic texture. The proposed criterion has made it possible to measure such frequencies and thus compared two inpainting results in quality in this way.
The author of the dissertation thesis suggests to explore the following:
________________________________________________________________________________________
________________________________________________________________________________________
[A.1] Richtr, R. and Haindl, M. Dynamic Texture Similarity Criterion, In the 24^{th} International Conference on Pattern Recognition (ICPR). IEEE, August 2018.
[A.2] Haindl, M. and Richtr, R. Dynamic Texture Enlargement, In Proceedings of the 29^{th} Spring Conference on Computer Graphics. SCCG ’13, pages 5-12, New York, NY, USA, 2013. ACM.
The paper has been cited in:
[A.3] Richtr, R. and Haindl, M. Dynamic Texture Editing, In Proceedings of the 31^{th} Spring Conference on Computer Graphics. SCCG ’15, pages 133-140, New York, NY, USA, 2015. ACM.
Journal papers under review:
[A.4] Richtr, R. and Haindl, M. Dynamic Texture Inpainting, In: IEEE Transactions on Image Processing.
________________________________________________________________________________________
[A.5] Radek Richtr Dynamic Textures Modeling. Ph.D. Minimum Thesis, Faculty of Information Technology, Prague, Czech Republic, 2013.
[A.6] Richtr, R. and Haindl M. Dynamic Texture Editing. Poster at 7th International Workshop Data - Algorithms - Decision Making, Mariánská, Czech Republic, 2011.