Posted in research | Leave a Comment »
Throughout this semester, I mentioned the importance of skills to convert some heuristic observation into some principled methodology. In this blog, I would like to summarize several examples we have seen in image processing:
1. Heuristics: importance of edge orientation. It is easy to observe that an idealistic filtering should go along the edge orientation but not across it (to avoid blurring of edges). How do we convert such heuristics into principles? There are at least two ways: using covariance-based adaptation and using geometric PDE. The former is the autoregressive model we have covered (edge orientation information is embedded into covariance) and the basic mathematical tool is Least-Square estimation; the latter is the anisotropic diffusion model invented by Weickert (a good reference book is “Mathematical Problems in Image Processing”) where edge orientation information is carried by the local gradients. In fact, the connection between these two approaches has not been explored and might lead to some interesting result (just like Perona-Malik diffusion admits a robust statistiscial interpretation).
2) Heuristics: importance of edge preserving. It is easy to see the need for a filtering algorithm with edge-preserving capability. But how to achieve this goal? We have shown how wavelet-thresholding works – since edges correspond to significant coefficients (singularities) in the wavelet space, nonlinear thresholding is an effective strategy. We have shown how Perona-Malik diffusion works – the introduction of edge-stopping function is nonlinear. In fact, total-variation diffusion also has such capability because of the l1 norm it adopts. So you see many different tools in fact implement the same heuristic idea.
So here is the tip for from heuristic to principle: come up with a mathematical formulation of your heuristic and dig into the mathematical literature for solutions. The use of mathematics separates science from art.
Posted in Uncategorized | Leave a Comment »
From a long time, I think I did not understand the problem of pattern recognition for the following reasons: how do we define pattern? and what we mean by recognition? Intuitively you can brush away these questions by saying pattern is just some feature we can extract from the given data and recognition is to give it some label. I have no objection to these engineering (application-driven) interpretation but what bothers me is whether there exists a scientific meaning for PR. In other words, how do I place PR under the hierarchy of scientific disciplines? It appears to me PR is closely related to machine learning but unlike learning (which addresses the problem of how to improve a machine system or how to reverse engineer the brain), PR seems to be an ill-posed problem because it is an abstraction of many engineering problems of similar flavor. But when you dig deeper and try to gain a fundamental understanding, you find nothing because the deeper issues all involve the two questions I asked before and PR cannot answer them alone (i.e., how are we supposed to solve a problem of chicken-and-egg flavor?). After so much struggle, I conclude that it is better for me to decouple PR into pattern theory and recognition application. Pattern theory gets sided with learning theory (in fact they often talk about the same hypothesis); while recognition application is a dreadful business where you might beat the very best out there without knowing how you did it (like the outcome of Netflix challenge).
Posted in understanding | Leave a Comment »
In many image processing applications, human eyes are often the ultimate judge for the goodness of reconstructed images (Yes, MSE-based metrics are also used but they often correlate more with the fidelity than the quality). It is almost by default that the objective of image processing is to turn bad-looking images into good-looking images (e.g., to remove noise, blur, artifacts or to improve resolution and contrast). However, there are several scenarios where propagating such belief should be careful:
1. Unfair comparison. Take super-resolution (SR) as an example. You have obtained two SR images, one is linearly interpolated (model-based) and the other is sparsity-regularized reconstruction (learning-based or data-driven). The latter could look much better than the former and this is indeed a great engineering achievement. But scientifically what conclusion can we draw? None – because learning-based has used information (training data) inaccessible by model-based. This example also illustrates the difference between hypothesis-driven and result-driven research (seeing is not believing when you do not start from a legitimate hypothesis – note that training data helps SR is a trivial hypothesis which must be true by default).
2. Small samples. Taking image denoising as the example. Algorithm A outperforms Algorithm B for all eight images under test (both visually and in terms of PSNR). Yes, Algorithm A will be a great technical achievement. But what scientific conclusion can we draw? Can we claim A is better than B for all natural images? Apparently no. So the morale of this example is that even if A outperforms B in some small sample, it cannot be generalized without caution. We can only say A seems to better fit certain class of images than B based on the results we have obtained.
3. Uncontrollable distortion. This scenario might sound ad-hoc but is disguising (to careless people). Take high-dynamic-range image reconstruction as an example. How do we compare different algorithms? Seeing them on a common LCD monitor will be meaningless because almost all display devices these days are LDR (you will need a HDR display device in this case). More common distortion occurs in the procedure electronic publishing due to resizing, compression or halftoning. Therefore, seeing images on printed journals might cause different impression from seeing them on LCD monitors.
Posted in ee565 | Leave a Comment »
The following animation shows an intriguing property of motion perception: there exist two attractors (clock-wise and anti-clockwise) in this dynamic system.

Folktales tell you whether you see it clockwise or anticlockwise will determine whether your left brain or right brain is dominating. What is more interesting to me is how can we have two stable interpretation of the same phenomenon. It clearly shows the existence of multi-stabilities in cognitive system, which is advocated by Gestalt psychologists a long time ago. But the underlying machinery remains elusive – no neuroscientist can explain how the network of neurons can achieve such multi-stabilities.


One useful clue is the existence of multiple attractors in Hopfield network. For example, if x is an attractor, so is -x. The other important clue is the wagon-wheel effect (shown above), which can be viewed as a simplified multi-stable system. This time, the changing speed of green waves makes it easier to appreciate the phase-transition: i.e., from one direction to the opposite (all of a sudden). Careful inspection shows that when the speed is slow, no uncertainty in our visual perception; as the speed increases (regardless of the direction), it appears our eyes become more and more difficult to catch up with the apparent motion – the interesting thing is that the extreme cases of two opposite directions are indistinguishable, which causes the phase transition. The only sensible explanation for this confusion is the limited sampling resolution of HVS (the same reason as why we can’t visually see hummingbird’s flapping wings – they are way too fast).
Posted in digital video processing | Leave a Comment »
This week ,we have discussed wavelet thresholding – an extremely simple operation but has shown effective in image denoising applications. Given the fact that thresholding is among the simplest nonlinear operators, there is a lot we can say about the role of nonlinearity in image processing.
What is wrong with linear models? Linear combination of Gaussian is still Gaussian; linear MMSE estimation of Gaussian is optimal. We have seen so many nice results if we constrain ourselves to the linear regime – why do we want to get out? The scientific reason is: nature does not work by the linear law. As I mentioned in the class, heavy-tail distribution (also refer to pareto distribution) or 80-20 rule have been widely observed for many complex systems in natural and social science. The fundamental flaw with linear laws is they are too “simple” to be responsible for the complexity in natural or social systems. Starting from chaos (e.g., white Gaussian noise), linear filtering alone would not be enough to create any meaningful order. You might start to think of the IIR filter (AR model) we have used for texture synthesis – it takes random noise as input, outputs some synthetic patterns and the operation appears to be linear (inverse filtering). Those images contain some order but no more than the linear superposition of sines and cosines (e.g., the linear Wiener filtering result we have shown in the class). Even though the linear superposition of wavelet bases has shown much more order than Fourier bases, I personally think it is still far from the truth (one argument I have is that Hilbert space is too small for accounting for complexity).
Then how do we come up with nonlinear models? In addition to thresholding, we have polynomial functions, trigonometric functions, exponential functions, hyperbolic functions … Which one should we choose? Facing the jungle of nonlinear functions, one can’t help wondering if nature is created by some universal principle, it must be simple and elegant. All these names or equations are created by humans not for revealing the essence of nature but to facilitate the inter-person communication (my cliche: mathematics is a language to support logical reasoning). The fundamental law of nonlinearity must be both simple (to describe mathematically) and complex (to capable of accounting for the complexity in nature). Where is this law? I am looking for it and welcome you to join my search.
Posted in ee565 | Leave a Comment »
In the class, I mentioned Translation Invariance (TI) is a difficult concept in image processing. To understand TI, you need to understand down-sampling first; to understand down-sampling, we need to talk about sampling theorem or Analog-Digital conversion. Nyquist-Shannon sampling theorem states the condition for perfect reconstruction of band-limited signals from their discrete samples. The interpolation formula is based on some sinc function (inverse FT of a band-pass filter). Where does TI come from? DSP textbooks tell you the story after AD conversion (i.e., how aliasing is introduced by down-sampling). But the full story starts from the continuous space – i.e., given a pair of AD and DA converters: f(t)->f(n)->\hat{f}(t), how can we assure it is TI?
Unless f(t) is band-limited, the issue of TI is more subtle than you might think. Consider f(t) to be some speech signal and f(t-s) where T is an arbitrary real number. Common sense tells us human auditory system does not distinguish between f(t) and f(t-s) – so it is kind of invariance to translation. But how does HAS achieve this? Imagine you want to build an engineering system (AD+DA) to do the same thing. If a uniform sampling strategy is used (assuming the sampling period is T), let us look at the maximum (or minimum whichever you like) of f(n) and \hat{f}(t). No matter what kind of linear filtering is used, we can show max[\hat{f}(t)]<=max[f(n)] (mathematically any linear filter h(n) satisfying \sum h(n)=1 is non-expansive). Well then let us compare the maximum of sampled version of f(t) and f(t-s), we should be able to make they different by twisting s, right? Then asymptotically as T goes zero, we can conclude the maximum of \hat{f}(t) and \hat{f}(t-s) can’t be the same. But if it the PR, we will end up with max[f(t)] \neq max[f(t-s)]. Contradiction. The fundamental flaw in the attempt to achieve TI is the linearity. I can’t rigorously proof you can’t achieve TI by uniform sampling and linear interpolation; but nature has shown it goes the other way around (nonuniform sampling and nonlinear interpolation).
Posted in ee565 | Leave a Comment »
Convexity is a concept you don’t see in image processing textbooks. It is a little advanced mathematical tool for engineering students. In the mathematical literature, you can refer to Rockafeller’s “Convex Analysis” (comprehensive and deep) and Boyd&Vandenberghe’s “Convex Optimization (online available at http://www.stanford.edu/~boyd/cvxbook/). In the literature of signal processing, Youla’s 1978 paper was likely the fist to introduce convex projection into image restoration. Combette’s 1993 review article on Proc. of IEEE was very readable for engineering students. Projection-based methods have found several successful applications in image processing such as deblurring, inpainting and post-processing. Many other methods such as wavelet thresholding and total-variation (TV) diffusion can also be interpreted as projection onto some convex set.
However, this blog is more about non-convexity than about convexity. Why do we care about non-convexity? As I mentioned in the class, if you think about the collection of all photographic images of the same size, they do not form a convex set because for a pair of images x and y, ax+(1-a)y does not produce another meaningful image (the deeper explanation comes from the lack of superposition principle in the physical world – i.e., two objects cannot occupy the same location in the space). Geometrically it is useful to think about the collection of images as a manifold which is locally isomorphic to a lower-dimensional Euclidean space. How to exploit such nonconvex manifold constraint has been the grand challenge for IP community for many years. Recent advances (including my own work) have shown deterministic annealing (also called graduated non-convexity) is an effective technique for nonconvex optimization. Please refer to the book titled “Visual Reconstruction” and authored by A. Blake for more details (available online and my VIP library should contain a link).
Posted in ee565 | Leave a Comment »
In this week’s computer assignment, many of you faced the obstacle of It turns out only few students with biometrics background knows how to calculate this Receiver-Operational-Curve thing. You might feel disappointed since I never even mentioned the ROC in the class – “how am I supposed to work this out? it is not covered at all!”
Yeah – I admit my responsibility. However, as you might have tasted the flavor of this class, there are many things I intentionally leave out and expect you to learn on your own. I take this approach because I think it is an important step in research and development – you are not always given all you need to solve a problem. When there is a concept you never heard before, just google it and learn; when there is a tool which you are not familiar with (e.g., MEX, LaTex, Perl …), just grab it and play it until you become a master. Since no class can cover ALL useful concepts and tools, why don’t I leave out some and give you a chance to learn yourself.
Learning new concept and tool is never easy. A good tip is to evoke your curiosity – being a curios George is a wonderful thing in learning. When we are small kids, we have great curiosity about the world and that is why we can learn fast. What happened to our curiosity when we grow up? How come so many adults become used to their routines and never willing to touch new things? I don’t know and no one knows. But apparently, the more curious you are, the more you can learn and a better chance you can succeed regardless of your profession – a simple fact that many people tend to forget.
Posted in ee565 | Leave a Comment »
This year’s Nobel Prize in Physics was unexpectedly awarded to three engineers: one is the “father of fiber optics” Charles Kao (a Shanghaiese who was the Chancellor of CUHK) and the other two are inventors of CCD sensors. In the history of Nobel prize, last time engineers got lucky when two Bell Lab engineers accidentally discovered cosmic microwave background radiation (CMBR) in 1964. The discovery of CMBR might be a matter of luck; the invention of fiber optics and CCD sensors are the fruit of perseverance and ingenuity. Before Kao’s first success of fiber optics, information transmission through glass fiber was thought of a crazy idea. It took him great courage and perseverance to find low-loss glass fibers and support the validity of his theory.
The invention of CCD sensors is also quite a story. The original intention of Boyle and Smith was to design a better memory device (not a sensing device). The outcome, called charge bubble device, would have been a failure for information storage applications (the invention of flash memory has to wait much later – by Fujio Masuoka in 1980). However, the genius of Boyle and Smith was to make connection with Einstein’s photoelectric effect and turn a poor memory device into a fabulous information acquisition one. This story clearly shows the “relativity theory” of engineering inventions – a mediocre idea could turn into a miracle as long as you keep an open and curious mind.
In short, it is good to see pioneering work in the profession of EE gets recognized at the highest level – not just deep theories change our view about the world; great inventions and designs could also have a huge impact on our lives. I think this award is a blessing to all engineering students – engineers can also be well respected as long as their work can make an impact.
Posted in ee565 | Leave a Comment »