Feeds:
Posts
Comments

I love mathematics – because it is the only “language” that humans can communicate each other without the barrier of their native language; because it is as beautiful as art and you can really enjoy the joy of mastering a mathematical topic (just like playing a musical instrument or painting a picture); because it is both pure (“simple enough but not simpler” – to a mathematician) and applied (“anything you can do, I can do it better” – to a math-trained mind). I can only look up to the greatest mathematical minds in human history and thank them for revealing so many elegant concepts and theorems to their descendants. I do believe what Erdos call “proofs in the book” – mathematical reasoning represents the highest level of intellectual gymnastics and any humble mind cannot help wondering if the creator knows a simpler proof. I am grateful to my own creator that I am talented enough to appreciate the beauty of math even though at a level of lower than professional players because …

I also hate the mathematics. It is a language abused by some to disguise the truth; it is often used as “the emperor’s new cloth” (indeed it works even better because a mathematically-decorated paper can more easily fool less-mathematically-trained minds); it deals with mentally-reproducible objects only – if its connection with the physical world is cut off, one can inevitably get trapped in delusion. Abusing mathematical language is like pouring the paint to a cloth and calling it a creative art; legendary physicist Richard Feynman has warned us a long time ago the law of gravity can be derived from at least three different math formulations and they are all equivalent. So no mathematical topic/tool is more fundamental/important than the other (e.g., algebra vs. geometry). I still detest a probability course I took in my graduate study when the instructor claimed that I was “completely confused by the advanced notation of probability theory”. To me, if the depth of a topic has to be conveyed by the  mastering of some specific symbol system, it is no better than a dried meat/fish with good nutrition being gone already.

So how do I love and hate mathematics at the same time? For every piece of mathematical topic (e.g., definition, theorem, algorithm), I will find out its history and understand its context in the first time. I firmly believe good math is a long-living meme with a distinguished character from bad math. A mathematician does not need an apology as long as he/she recognizes there is no better mathematics (e.g., advanced probability theory has a higher rank than elementary probability theory). Meanwhile, for every piece of scientific topic (e.g., in physics, chemistry, biology and sociology etc.), I will look for the simplest mathematical model or language for communication. I will keep in mind “all models are wrong, some are useful” (George Box). I will not be fooled by better experimentally reproducible results (often on a highly limited test data set) alone but try to understand their implications (i.e., underlying models or hypothesis) in a mentally reproducible manner. I might never be a good mathematician or a good scientist but I will be happy to be the king of the middle-ground – maybe that is what a good engineer is about.

This blog is a follow-up of my previous blog: https://masterxinli.wordpress.com/2009/09/01/the-relationship-of-mathematics-to-image-processing/. I had sent out some cautious note about abuse of mathematics in image processing in that blog. The tone might sound a little bit overshoot; so I will “correct” myself  in this blog. As I have stated in the class, mathematics “turns art into science” – so even though I maintain my position that math is only a tool, it is always a good idea to learn to use more and different tools. Here is a list of the tools relevant to image processing:

1. Euler-Lagrange equation in variational calculus.

2. EM algorithm in statistics.

3. Eigenvalues and Laplacian of a graph in spectral graph theory.

4. Convex optimization in optimization theory.

5. Finite difference method in numeral analysis.

6. Fixed-point theorem in dynamical systems.

7. LLN and heavy-tail distribution in probability theory.

8. The first fundamental form of surface in differential geometry.

9. Manifold in topology.

10. Singular value decomposition in matrix theory.

In the first week’s lectures, I have emphasized the difference between mentally reproducible and experimentally reproducible research (more or less: theory vs. experiment). Mathematical theories are mentally reproducible objects – you can understand them if you have the required background and think hard enough. From this perspective, mathematics is no different from art such as Van Gough’s painting – you can appreciate its beauty because you can mentally reconstruct the mathematical or artistic sensation from the objects presented to you. This is the simple reason why Gaussian distribution or Da Vinci’s Mona Lisa can become the cultural heritage and passed on from generation to generation (called “meme” by Richard Dawkins’ words in his book “Selfish Genes”).

Engineering deal with experimentally reproducible objects – the make of a car, a smartphone, a telescope etc. According to wikipedia, the ultimate goal of engineering is to “safely realize improvements to the lives of people”. Therefore, it is not surprising to see engineering inventions often have much less life expectations than scientific theories or artistic products. We see cassette replaced by CDs, film cameras replaced by digital ones, dial-up modem replaced by high-speed internet connections. It is Darwin’s evolutionary law applied to the technological world. If your hero is someone like Bill Gates or Steve Jobs, their greatness simply lies in their vision in creating the right products. Experimental reproducibility is a necessary (but not sufficient) condition for a product’s commercial viability.

What is good engineering? There are various aspects – sometimes being the first is important (think of the invention of telephone by Bell – that is why patent or intellectual property is valued by industry); sometimes you do not need to lead the race (think of the invention of iphone – as long it is good design, you can still catch up). I emphasize the importance of experimental reproducibility to good engineering because it is still a sad fact that research reproducibility has not become the standard norm for all technical communities (please refer to the supplementary reading I have posted to the course website). So it has become difficult (especially for young minds entering the field) to tell the real progress from the bogus one from the bulk of published papers every year.

Recently I have been given some thoughts to the concept of memory. I started from Rose’s award-winning book The Make of Memory: from Molecules to Brain and spent quite some time on understanding the phenomenon hysteresis. Then I came across Smolin’s entertaining book The Life of the Cosmos and as I read it through, “to understand a quark or an electron, we may have to know something about the history or the organization of the universe.”

If the above hypothesis is true,  the environment influencing elementary particles will not be bounded by stars or galaxies but the objects at a much larger spatial scale. And the underlying reason is that the universe of today is the evolutionary result of interactions among elementary particles through billions of years. The subtle relationship between space and time has not been recognized by humans until last century. It is very likely that our understanding about the nature would experience even more dramatic changes in the future – maybe centuries, maybe millenniums.

Toward this improved understanding, one fundamental question seems to be: does universe have memory? Here by memory I mean the physical laws/principles of today are the same as those a long long time ago. Of course, such question cannot be answered scientifically but only philosophically in present time. But it is at least aesthetically appealing to believe that the universe has followed one or few universal principles. If that is not the case, how would the universe or God decide which principle to use at a given time and what makes him to change his mind tomorrow?

1. Forgot to convert to double. I mentioned that if you use ‘imread’ to read in an image, it will be in uint8 format. This format is less reliable than double because it does not “support” floating-point operation. It is a good idea to convert images to double format before any calculation of quantities such as MSE.

2. Pitfall with imwrite. Sometimes imwrite(x,…) does not produce the correct result. You need imwrite(x/255,…) instead.

3. imrotate: there are two directions – clockwise (negative angle) and anti-clockwise (positive angle).

4. hough: “[H, THETA, RHO] = HOUGH(BW) computes the SHT of the binary image BW. THETA (in degrees) and RHO are the arrays of rho and theta values over which the Hough transform matrix, H, was generated.” If you have carefully read this help information, THETA and RHO are not relevant to peak detection in the Hough transform domain.

5. noise power calculation: given a clean image x and a noisy image y (assuming impulse noise), the noise power is the percentage of x~=y (not x==y).

According to wiki, “In physics, the principle of relativity is the requirement that the equations, describing the laws of physics, have the same form in all admissible frames of reference.” The importance of reference frame has been less appreciated in other sciences and it is the purpose of this blog to understand its relevance to visual perception.

Why do we need the frame of reference? A common example related to motion-related  illusion – sometimes one feels an still train or airplane is moving because the observe his/her own motion. The underlying reason for such illusion appears to be the lack of coordination between vestibular system and vision system. It also shows the relativity of motion perception – in the above illusion example, ambiguity is often easily resolved if the observer looks at somewhere else (change of the reference frame).

The story does not end here. Pioneering studies by Nobel Laureates Hubel and Wiesel in 1950s have shown the abundance of movement-sensitive cells in visual cortex of a cat. It is easy to understand their role in dorsal pathway from the perspective of motion detection for the survival but how about their role in ventral pathway – how did movement-sensitive cells analyze a stationary landscape? It involves both saccade and microsaccade which really echoes J. Gibson’s saying “We move because we see, we see because we move”. More subtle implication lies in the scale of movement on visual perception – if saccade (global motion) is for the purpose of accumulating local information into a holistic understanding; the role played by microsaccade (local motion) is more relevant to the functioning of ventral pathway involving object recognition.

In other words, out eyes seldom process “absolutely stationary” images (the cells simply won’t fire); the perception of even stationary scene is the consequence of moving eyes around (both globally and locally). Therefore, it is really a lame approach to understand the biological counterpart of image processing because there is none. A biologically inspired approach toward image processing is to cast images as the subspace of video and understand color and texture along with motion and disparity. The principle of relativity implies that both saccade and microsaccade are important to the functioning of movement-sensitive cells because they sense “relative” changes all the time.

This blog is a continuation of my previous blog on the relationship between mathematics and image processing and aims at more technical virtuosity than conceptual understanding.

What are the most influential works by mathematicians on image processing research in the past three decades? I would say MRF in 1980s; Wavelets and PDEs in 1990s; still-too-early-to-tell in 2000s. Before Geman and Geman’s PAMI paper in 1984, image processing was still an art with little scientific insight. It is Gemans’ paper that showed image processing can also be tackled by tools from statistical mechanics. Even though the analogy between pixels of an image and gas particles of an Ising model is artificial, this work has long-lasting impact. It should be noted that the mathematics in this paper is not entirely new (several results such as Gibbs sampling have been established by other researchers before) but it is the first successful application of statistical physics in image processing. From a historical perspective, it is not surprising to see Ising model, which has dramatic impact on modern physics, is also applicable to other scientific fields (MRF, Hopfield network and Boltzmann machine are all related to Ising model).

If one predicts the future of image processing in 1980s based on the history of theoretical physics before 1980s, he might say: renormalization group should be a cool idea for non-physicists to explore because the decade of 1970s belonged to renormalization group (RG) – a mathematical apparatus that allows one to investigate the changes of a physical system as one views it at different distance scales. Indeed,  the decade of 1990s belonged multiscale modeling of images: wavelet and PDE -based approaches both address the issue of scale though from different perspectives. Wavelets are local schemes whose effectiveness lies in the good localization property of wavelet bases; PDE-based models are global schemes which is often characterized by minimization of certain energy functional (note that they do admit local implementation based on diffusion). It is safe to say that both wavelet-based and PDE-based models have high impact in image processing and their underlying connection has been established for certain specific cases. The unsettling issue is the varying role of locality – in physics, it is a fundamental assumption that interactions are local; but such assumption does not hold for images or more precisely for visual perception of image signals because of nonlocal interaction among neurons.

Since 1998, nonlocal image processing has been studied under many disguises – e.g., bilateral filtering, texture synthesis via nonparametric sampling, nonlocal mean denoising (more sophisticated version is BM3D denoising) and nonlocal TV et al. All of a sudden, lots of experimental findings seem to suggest something beyond the scope of wavelet and PDE. What is it? I have been baffled by this question for many years and I am still groping for the answers – one promising direction is to view images as fixed point of some dynamics system (abstraction of neural systems) characterized by similitude and dissimilitude (abstraction of excitatory and inhibitory neurons). History of theoretic physics cannot help here because as the complexity of system increases, physics become chemistry and chemistry evolves into biology. The new breakthrough, if I can make it, will not come from mathematical virtuosity (I simply am not even close to Daubechies or Mumford) but from physical intuition. I think there exists a universal representation of the physical world and a universal cortical algorithm for neurons to encoder the sensory stimuli. From this perspective, image processing is really paving one possible path toward understanding fundamental phenomenon such as biological memory and its implications into intelligence. Mathematics surely will still play a role of communicating my findings to others but hopefully I might only need  math skills of Shannon or Ashby’s level for this task.

Last, I happened to learn that the first Millennium Prize of Clay Institute was awarded to Perelman for his resolution of Poincare’s conjecture. There is a very-well written report about this famous conjecture which I think even non-mathematicians like myself will enjoy reading. My instinct tells me someone might have already applied this new fancy tool of Ricci flow into image processing. Indeed several Jewish researchers have pursued this direction but from their preliminary findings, I think they won’t go very far unless they can supply the Ricci flow with some physical intuition first. It has been suggested “an issues-directed approach is more fruitful than a method-directed approach to science” which really echoes the point of this blog.

What is memory? It describes a lagging effect which causes the state of a system is not just dependent on the current input but its history. Such lagging effect can be understood by considering ferromagnetism. Permanent magnets are unique objects with memory characteristics in the inanimate world. Can a piece of iron have memory? Yes. On one hand, ferromagnetic hysteresis implies that “the detailed domain pattern on a piece of iron is dependent on every magnetic influence that the iron has experienced since it first become ferromagnetic. Domain patterns may be very complex and large number of domain patterns are possible within a given assembly of atoms. In this sense a piece of iron has a memory of immense capacity” (Cragg&Temperley, 1955). On the other hand, shape memory alloy represents another artificial  example. It is also important to recognize the connection between magnetic systems (e.g., Ising model) and neural networks (e.g., Hopfield network). From this connection it is not surprising to see how Dr. Hopfield came up with his influential model – he was inspired by the analogy between memory of physical and neural systems.

However, the above analogy suffers from a pitfall – inanimate and animate worlds belong to different complexity scales. Between physics and biology, there is chemistry (including physical and biochemistry). Therefore, it will further help our understanding if we can identify the chemical basis of memory. In fact,  chemical reaction and diffusion is among the primitive means of communication between early life forms (single-cellular) and physical environment. The need for communication has to wait until the evolution of multicellular organisms in which the activities of cells in different parts can be coordinated. Since chemical diffusion is too slow and lacks directionality, nature has discovered that the electrical properties of cells can be exploited for faster and directed communication. The change of membrane potential (around 110mv) results in a wave of electrical activity called action potential passed along axons at the speed of 1-100 mps (note axons are different from wires). The evolution of cells with action potential and chemical signaling processes are the chemical basis of biological memory.

Why action potential and chemical signaling?  The underlying physical interaction is electromagnetic force – unlike other forces operating at either astronomical or atomic scale, it interacts with the magnetic field of physical environment and plays a direct role in supporting the evolution of life forms. What is the origin of electromagnetic force and why does it travel so fast? Nobody knows. But in view of the fundamental role of electromagnetic force in forming chemical bond, it is likely this force is also crucial to the origin (e.g., the iron-sulfur world theory) as well as the evolution of life. From this perspective, it is not surprising how humans – the most intelligent life forms on earth – have harnessed this force to support the communication at the next level: telecommunication (from individual beings to human society). Just like the evolution of biological memory, I think cyber-physical systems (CPS) consisting of artificial sensing, communication and control systems is going to follow a similar evolutionary pattern. Even still at its infancy, the complexity or intelligence of CPS will be measured by how its memory is organized – a lesson which we have learned from the history.

In frequentists’ framework, there is a well-known phenomenon called bias-variance dilemma: a simple model tends to have lower variance but introduce higher bias; a more complex model could reduce the bias at the price of higher variance. Therefore, it is often necessary to find a tradeoff and avoid over-complicated models. The preference of simple models has also been given a fancy name – “Occam’s razor”.

For Bayesians, it does not make sense to adjust the complexity of a model based on training data. The matching between model complexity and data complexity will be done automatically by Bayesian formula. That is, if some model, simple or complex, does not well explain the training data, its posterior probability will decrease. In other words, Bayesians always work with a class of models instead of a single one. The result of future prediction is always the integration over the whole class of models. From this perspective, Bayesian approaches are more general than frequentist approaches because models are being averaged instead of selected.

I have found that the perspective of Bayesian model averaging is useful to gain a deeper understanding of many seemingly unconnected ideas scattered around in different fields. For example, image representation based on overlapped blocks has been explored in motion-compensated video coding and patch-based image denoising. The basic idea is to recognize the weakness of committing to a single model; instead of assigning a single motion vector to each block or associating a single block with each pixel, we can average across a class of models.

Similar idea has also been exploited by Gaussian scalar mixture denoising: instead of committing to a single variance estimate (e.g., in the ML sense), we can integrate over a class of models with varying variance parameters. The current experimental findings appear to suggest that Bayesian methods are more powerful for regression (where bias matters) than for classification (where variance matters). How to make Bayesian methods really work for classification (not just naive-Bayesian) is a major challenge facing vision and learning community in my own opinion. The leaders in this race include  Bayesian batch (Feifei Li et. al, ICCV2003) and incremental Bayesian (Feifei Li et. al, CVIU2007).

For a long period I was baffled by the relationship between mathematical and physical sciences (like Platonism vs. Aristotelian). If mathematics and physics are about the study of mentally reproducible and experimentally reproducible objects respectively, which one is more fundamental? Especially in view of the increasing influence of mathematics on science and engineering (for better or for worse), I cannot help wondering why. Especially it appears to me that not all mathematics are good or useful (e.g., see Hardy’s “A mathematician’s apology“) despite the unreasonable effectiveness of mathematics in physical sciences. For instance, the role played by pure mathematics and applied mathematics is different even though their boundary is not always clear. The more I think about this, the more I feel it is related to the foundational issues of mathematics.

Fortunately, I think Harold Jeffreys’ Theory of Probability has clarified many doubts I have. In the beginning of that book (Fundamental Notations), he pointed out pure mathematics is developed from deductive logic; while inductive reasoning is a common technique used by scientists. He gave an example of freely-falling object under gravity: mathematically there exists infinitely many different solutions to the data fitting problem; but physically we choose the simplest law. For pure mathematicians, inductive reasoning makes no sense because it is against the logic. Sometimes it is said induction is only a peculiarity of human psychology (i.e., we trust our common sense). From this perspective, deductive logic and inductive reasoning really serve two connected but disparate objectives: communication of ideas (to other people) and discovery of principles (from nature).

In fact, Karl Pearson (one of the figures in frequentists?) wrote this, “The unity of all science consists alone in its method, not in its material”. I think his view was echoed by Abdus Salam (a renowned physicist) who said “nature is not economical of structures but of principles”. As argued by Jeffreys, “any inductive inference involves in its very nature the possibility that the alternative chosen as the most likely may in fact be wrong” (in K. Popper’s word, it can be falsified). It is from this perspective that he introduced general rules (principles) as a priori propositions and “induction is the application of the rules to observational data”. What kind of rules do we care?

Jeffreys stated five rules: 1) all hypotheses must be explicitly stated and the conclusions must follow from the hypotheses; 2) the theory must be self-consistent; 3) any rule given must be applicable in practice; 4) the theory must provide explicitly for the possibility that inferences made by it may turn out to be wrong; 5) the theory must not deny any empirical proposition a priori. Rules 1&2 are already required in deductive logic; rules 3&5 enforce the distinction between a priori and empirical proposition; rule 4 highlighting the distinction between induction and deduction appears to be aligned with Popper’s philosophy of science.  Additionally, he states three more rules as useful guides: 6) the number of postulates should be reduced to a minimum; 7) human mind is the only available and a useful reasoner; 8) we cannot hope to develop induction more thoroughly than deduction.

A little story from my drive-to-work experience this morning is enlightening to explain the subtle difference between deduction and induction. I was starting the car and usually NPR morning news would be heard from the radio. But this morning – nothing can be heard. My first impression was “damn, my new Honda CR-V – the radio system was broken.” I switched to CD and it worked just fine. Back to radio; still no sound. I tried the switch between AM and FM, no use. After about 1 minute, I turned the tuning knob and as the frequency went from 90.9 to 90.5, the familiar voice from NPR came along. What a relief! Then I started thinking: how come I did not think of the turning the knob solution in the first place? Because I never had a similar experience before. All the prior knowledge I have is that cars have a higher failure probability than a radio station. Why did I keep thinking my car radio system could be broken? Because radio systems these days are built so good that you don’t hear sissy white noise when the channel is not tuned in. If it were my old Camry97, I would know the radio is still functioning from the white noise it produced. Apparently, I was doing all kinds of inductive reasoning and the power of deductive logic is limited in such real-world scenario because one simply cannot take all related uncertainty factors into account. How is this story related to the theme of this blog? As humans, we use common sense (inductive reasoning) all the time; if someone uses deductive logic all the time like serious mathematicians do in their jobs, he will be called too subtle (in Francis Bacon’s words).