April 27, 2009 by masterxinli
It is usually extremely difficult to solve a nonlinear differential equation A[u]=0 directly but it may be much easier to discover the minimum or maximum points of an appropriate energy functional I(u) where A=I’.
Such variational principle is advocated by Max Plank as a fundamental law in nature (e.g., relativity theory follows a geodesic path derived from this principle in space-time). Its connection to functional approximation is also interesting – it might be very difficult to obtain the complete knowledge of some function f but much easier to work with its minimum and maximum.
It should be noted that how nonlinearity and its associated variational principle is in contrast with more conventional reductionism approach. In linear systems, reduction works just fine; it is the nonlinearity in nature which calls for a holistic solution. The universality of variational principle lies its capability of handling a wide range of phenomenons involving complex interaction among a large number of individual units.
Another closely related technique is fixed-point methods – intuitively, minimum and maximum of a function are fixed points robust to local perturbations. The desirable properties of nonlinear mapping such as contraction and compaction often require strong assumption about the regularity of the solution (e.g., global Lipschitz) which makes it practically less useful.
Posted in understanding | Leave a Comment »
April 27, 2009 by masterxinli
I have personally come across this famour Heisenberg’s Uncertainty Principle many times in my lifetime – when I was a college student, when I started to learn about what is short-time FT, when I prepared my courses on wavelet. I must admit it is a difficult concept to grasp and I often adopt the strategy of avoiding the elaboration of this principle in my teaching. Recently after reading Feynman’s lecture notes on physics, I have finally reached the peace with this mind-bothering concept. I am writing this blog to share my thought experience – not only as a reassurance for myself but also as an exemplar situation where the subtle role of mathematics in understanding nature can be illustrated. My lesson is: to understand some profound concept such as uncertainty principle, good intuition and logic are more important than fancy mathematical languages.
The reasoning starts from a popular double-slit experiment designed to explain the particle-wave nature of electrons. In short, you can run this experiment for particles (e.g., idealistic bullets) or waves and conclude how interfence is associated with wave (the interference term – determined by the distance between two holes – is needed to properly account for the recorded experimental results).
Now consider the same experiment for electrons. The most interesting observation is that you are going to get different mathematical results depending on whether you “look at” the electrons or not (so mathematics merely serves as a tool of describing the phenomenon though it does provide useful hints to the underlying law of nature). In Feynman’s word, “If the electrons are not seen, we have interference”. Therefore, one could strive harder to reduce the disturbance to the experiment by reducing the frequency of light source (the device which helps us “see” electrons). However, as the wavelength becomes comparable to the distance between two holes, “visual inspection” of electrons is impossible – what you will see is a big fuzzy flash. It is from this perspective that one reaches the conclusion that one cannot simultaneously determine the location (which hole did the electron go) and the frequency (to eliminate the interference) simultaneously.
Heisenberg’s Uncertainty Principle is the foundation of quantum mechanics. Even though wavelets are closely related to this principle, I tend to argue that wavelets, just like Fourier transform, is just a tool invented for us to probe into the fundamental property of nature. We might obtain seemingly-conflicting mathematical results by twisting the experiment slightly. What is more important is how the deeper result such as uncertainty principle could be obtained by reconcling the conflicts at the surface.
Posted in wavelets | Leave a Comment »
April 21, 2009 by masterxinli
1. Be more selective on reading papers and books. The game of scientific research is a constrained optimization problem – maximize your productivity (in terms of both quality and quantity) subject to certain time constraint. This time constraint simply dictates that you need to more carefully think about how you want to spend your time. Reading more papers might not be good (especially from the perspective that your thinking become more and more bounded by the existing ideas). I suggest you to use Google Scholar as a sifter – if a paper’s citation is less than 100 times, it is probably not worth your time.
2. Thinking is more important than reading. From your email, I noticed you have many ideas, which is good; but they lack a coherent train of thought. By train of thought I mean you need to start from some well-posed question – e.g., why do we need a nonlinear model for speech signals? To answer this question, one can start from the linear models first and convinces himself why the class of linear models are insufficient. For example, does linear combination of two speech signals produce another valid speech signal? After you conclude linearity is too limited to account for the complexity in speech, you will move on to what kind of nonlinear systems is suitable for speech, what fundamental aspect of speech is captured by your nonlinear approach (but beyond the reach of linear ones), how can one analyze such complex systems regardless of nonlinearity (maybe some linear approximation is necessary) … The point is: without organizing ideas into a train of thought, those ideas bump into each other like random particles and help little in terms of understanding the problem better.
3. Understand the logic of scientific reasoning. This point is likely to be the most important and difficult for Chinese students. Logical reasoning is not just what you do in GRE tests but a basic weapon in deepening your thoughts. For example, consider the same question I asked above – why nonlinearity? If you take a different train of thought, you might start from what kind of speech signals are we talking about? Chinese or English? Is speech model we pursue language independent or not? If speech is inherently related to natural language as well as acoustics, should not we understand the evolution of language as well as the physical modeling of human vocal system? The key is when you do such reasoning, the path is seldom unique – we are constantly facing multiple choices – e.g., deterministic vs. statistical, parametric vs. nonparametric, linear vs. nonlinear, bias vs. variance. It is an art to maintain a good balance between blindly going through a single path and hesitating to take any risk of going down any path.
4. Understand the connection among different technical problems. As we have seen, speech is connected to language and acoustics. If you have heard about Mountcastle’s uniformity principle which states that neurons handling visual information are no different from those handling auditory one, you might start to see the connection between speech and image. They are both sensory data containing some complex patterns yet human brain can recognize them effortlessly. Therefore, speech recognition might not be that different from face recognition from a neuroscience perspective (of course they are different if you take an engineering stand). Speech signals are also naturally connected to time series (e.g., stock market price indexes) and turbulence. Therefore, the same tool developed for speech processing, if it is general enough, should be also applicable to time series and turbulence analysis. One might challenge this claim by saying that speech is different from stock price which is different from turbulence which is different from sunspot – here is the defending argument from Nobel winner Ulam “Nature is not economical of structural diversities but organizational principles”. What science has pursued is the few organizational principles underlying the richness and complexity of mother nature.
Posted in tips | Leave a Comment »
April 16, 2009 by masterxinli
1. How to improve the efficiency of your MATLAB program? 1) Vectorizing your codes: get familiar with functions such as all, end, repmat, squeeze, any, find, reshape, sub2ind, ind2sub, permute, shiftdim, sum, diff, ipermute, prod, sort; 2) Avoid frequent file I/O: if you have to store/retrieve some intermediate results, use save/load instead fwrite/fread because load and save have been optimized to run faster and reduce memory fragmentation; 3) Coding loops in a MEX-file: avoid using loops as much as possible; if you have to use loop, learn to implement it by a MEX function instead (for more information about MEX, please refer to http://www.csee.wvu.edu/%7Exinl/courses/ee465/apiext.pdf – pages 3-4 to 3-8 are most relevant). 4) Preallocate to Improve Performance: MATLAB allows you to increase the size of an existing matrix incrementally, usually within a for or while loop. However, this can slow a program down considerably, as MATLAB must continually allocate more memory for the growing matrix and also move data in memory whenever a contiguous block cannot be allocated. It is much faster to preallocate a block of memory large enough to hold the matrix at its final size. 2. Function Management 1) Functions are faster scripts: Therefore whenever your matlab codes start to get longer and longer, it is a good idea to consider converting some of them into a function. Functions provide a hierarchical structure suitable for both maintenance and extension. 2) Function name selection: it is a good idea to use “which -all xxx.m” to check whether xxx.m has been used by MATLAB or others. To avoid conflict, you might want to add a suffix to all names of your own function – e.g., if you create a Toolbox of Object Detection (TOD), you might name all functions by TOD_hough.m, TOD_line_detection.m TOD_harris.m and so on. 3) Function variable selection: if you have some parameters (e.g., threshold, block size, filter length) which need to be manually adjusted, it is a good idea to include them into the input variables of MATLAB functions (instead of hard-coding them inside the function). To avoid a long list of parameter settings, you can use a vector to store all parameters you want to pass to the function.
Posted in ee565 | Leave a Comment »
April 9, 2009 by masterxinli
This blog contains some tips I shared with the students taking my EE465 (undergraduate image processing class). The objective of the lecture is to use circle detection (an elementary image analysis problem) as an example of demonstrating how to make a better use of MATLAB to solve image processing problems:
1. Master the language – enrich your vocabulary, imitate what professionals do, read programming tips whenever free (it is posted at the course website);
2. Know how to search – understand the help topics provided by MATLAB (sometimes Google a topic might be more efficient). These days, you can almost find anything on the internet;
3. Have an attack – understand the known and unknown, devise a plan by divide-and-conquer. If you have not read Polya’s book “How to solve it”. Google the title and at least get some rough ideas about Polya’s problem solving principles;
4. Know how to debug – when something does not work, you need to check whether it is the bug in your implementation or the pitfall in your idea. Debugging MATLAB codes is very convenient now; if you don’t know how to set a break point, google it and learn basic debugging skills today.
5. Under the statistics or psychology. If you are research-oriented, statistics is important to properly interpret the result you have obtained. If you are development-oriented, what you need next is an effective way of presenting your results (e.g., GUI or video). In the business world, good salesman skills are often more important than technical superiority (Windows vs. Unix is the best example).
Posted in ee565 | Leave a Comment »
February 9, 2009 by masterxinli
While at Sharp, I worked on compound image coding problem – a compound image consists of the mixture of photographic pictures, graphics and texts. Djvu and PDF have become the standard document images formats. In the past four years, especially due to the increasing popularity of YouTube, more and more video clips are available online. I have noticed that there seems to be a need for the study of compound video coding – the counterpart of compound image coding.
The compound nature of video source is particularly valid in applications related to distance learning (mixture of PPT slides and classroom experience), multimedia presentation (mixture of text slides and graphic/motion pictures) and gaming (screenshot of video games). But the definition of compound source can be generalized to incorportate more traditional view – e.g., foreman sequence is compound because it contains the mixture of slow and fast camera motion; flower-garden is compound in the sense of mixing objects at varying scene depths (layered representation is the essential idea underlying MRC adopted by djvu image coding algorithm). Of course, segmentation will likely be the main technical challenge again in compound video coding. But from a system perspective, unifying coding with analysis is desirable because it supports both higher coding efficiency and content-based retrieval.
Posted in digital video processing | 1 Comment »
January 28, 2009 by masterxinli
Motion segmentation is a task essential to many video processing applications from coding and tracking to recognition and restoration. While unlike image segmentation, research progress in motion segmentation has remained slow in the past decade. One way of justifying this claim is to use “motion segmentation” as the keyword in Google Scholar search. You will find that no single article has received citations more than 250 (by contrast, leading image segmentation techniques are often cited thousands of times). Mixture model based and graph cut based appear to be the most promising approaches in the literature so far.
Why is motion segmentation so difficult? On one hand, motion segmentation is tangled with motion estimation in a chicken-and-egg fashion. Fundamental issues such as intensity uncertainty related to aliasing and shading remain poorly understood. Iso-intensity constraint along the motion trajectory is conceptually simple but its scientific basis is weak. It is likely that HVS exploits multiple visual cues in middle-level perception. Therefore, motion is intrinsically decoded along with shape, texture and color.
On the other hand, there still lacks systematic study about the probability model for motion (i.e., prior from a Bayesian learning perspective). Limited results in the literature (e.g., spatial statistics of optical flow and motion synthesis) are based on a small number of sample video and do not properly address the relationship between motion and vision (e.g., the modeling of ego-motion). There is another possibility deserving some investigation – i.e., motion vs. depth. Although one could claim he can watch a movie with one eye closed, it is known since 1963 that monocularly deprivation has severe impact on cats’ visual perception. Therefore, it is possible that motion segmentation should be studied for stereoscopic video instead of monocular one not only because it is simpler but also due to its scientific plausibility.
Posted in digital video processing | Leave a Comment »
January 27, 2009 by masterxinli
A fundamental difference of video from image is that it contains motion information. That is why video is also called motion pictures and MPEG stands for Motion Pictures Expert Group. Well, if a MPEG expert explains the details of block-based video coding algorithm to a neuro-scientist, he might be questioned as soon as he speaks: why block? I did not see any picture is decomposed of blocks. Then the expert has to explain what is motion compensation and why we can only transmit motion information on a block-by-block basis (to keep the overhead low) and etc. etc. The neuro-scientist might nod and comment – “I see. It is an engineering trick that you have shown it works for you. But I am afraid there must be a better solution to this because I know HVS clearly has not endorsed this trick in its long evolution process.”
Then how is motion represented by HVS? Or scientifically we are interested in how motion is perceived by HVS. We know a little but not too much. JJ Gibson (http://huwi.org/gibson/index.php) – an influential American psychologist – says “We move because we see; we see because we move.” From his viewpoint, human developed the capability of motion perception through the coordination between motor and vision system. Another important contribution is a computational model for motion perception by Adelson and Bergen (Spatiotemporal energy models) in 1985. While working with Adelson, Simoncelli introduced distributed motion representation in his PhD thesis, which leads to a Bayesian view towards motion estimation or optical flow computation.
Motion is tangled with many other things such as depth (in stereo vision), shapes (for segmentation), color and texture. All these visual cues contribute to the formidable complexity in video data. But if you truly believe that “God creates this world in a unified fashion, when you get stuck with a problem, seek your inspiration from around: nature, art and other sciences. Essentially, the principles are the same”. The richness of structure in video could arise from simple organizational principles underlying the data (at least such simple principles can be effortlessly accommodated by HVS after two years’ training: 0-24months).
Maybe the following easy experiment can help. If you ever watch the slide show generated by WII (I am sure some software does similar tricks), you can get the sensation of motion pictures from a still one. We know our eyes are tricked because the source is a still image; the motion is artificially introduced by some random program – e.g., zooming in so we can see some object gets bigger and bigger. Nothing mysterious. Now think about the real experience – if nobody tells you that this is the real video data or cheated version from manipulating a single frame, do you think you will know the difference? I can’t and I don’t think HVS is designed to tell such difference. If such hypothesis is true, what can we learn about video and the tantalizing motion issues?
It is important to understand and model the prior of motion. Statistical law of motion in natural world is not that complex because it is bounded by the laws of physics. We are more used to horse running, bird flying, dog chasing than Brownian motion, particle collision, random jittering. Important types of motion such as ego-motion is learned when we were young along with the motor capabilities. Even though occasionally HVS could be fooled by relative motion outside the window of a still train (it appears that the train was moving), we can easily select moving objects in our field of view no matter we stand still or walk around.
Before we can solve the more challenging motion-related problems, I propose to think about: how do we separate ego-motion from object motion?
Posted in digital video processing | Leave a Comment »
January 27, 2009 by masterxinli
The other poorly addressed issue in video acquisition is color. Yes, we have heard about lots of technical terms such as YUV, YIQ, 4:2:0, 4:4:4 etc. Those convey little meaning in a scientist’s mind. The fact is: color is a notoriously difficult problem which we do not understand well. I once heard that at least 20 people worked on color-related science problem first and thought it was too hard and then switched to another topic and won a Nobel Prize (can’t guarantee the number is accurate, but I did know Erwin Shrödinger is one). I also have frequent experience with color-related artifacts while watching TV or video clips on the internet – poor YUV 4:2:0 format – another “failure” of video engineering effort.
If you probe into how nature solves this video acquisition problem, you will be amazed by the elegance of HVS – motion, color, texture, shape, depth – you name it, are all figured out (decoded) by some population of neurons organized in a particular way. Well, it might be unfair to compare the result of billions years of evolution with some clever design of an engineer such as Bryce E. Bayer of Eastman Kodak. But can we learn sth. from nature? The spatial arrangement of L,M.S cones in human retina is nothing like Bayer pattern; the relationshap between ventral and dorsal systems (sensing what and where) is seldom exploited in machine vision systems.
Posted in digital video processing | Leave a Comment »
January 7, 2009 by masterxinli
Raw video data have two important attributes: spatial resolution and temporal resolution. Spatial resolution refers to the total number of pixels in a frame – as sensor technology advances, we have witnessed ever-increasing spatial resolution – e.g., from standard TV to HDTV (of course 1080p is a better and more expensive option than 720p). Temporal resolution is typically 24 or 30 frames per secons though so-called the acquisition of high-frame-rate (>300fps) video is also an ongoing research topic.
When one looks back upon the development of video technology, many devices and systems are not well thought or planned from the beginning. For instance, there is so-called interlaced video, which is the heritage of analog video transmission. To save bandwidth, even and odd fields of video frames are transmitted in an alternate fashsion. Well as video display devices become more and more dominated by progressive format (e.g., LCD TV or monitor) these days, conversion from interlaced to progressive (so-called deinterlacing) is an important problem in video processing. This is a concrete example of how technology evolves and adapts (conceptually similar to the evolution of biological systems).
Well, if we spend some time on studying the evolution result in nature, we will be surprised by the striking difference between artificial sensors (e.g., video cameras) and biological ones (e.g., human eyes). There are two types of photoreceptors – cones (chrominance) and rods (luminance)- in human retina. Their spatial distribution is not uniform at all; the temporal sampling of visual stimuli is continuous (30 fps is a magic number invented by movie industry engineers – why this rate is sufficient is a scientific question related to motion perception which we don’t have a complete answer yet). Therefore, the important point I want to make here is: despite the popularity of uniform sampling in space-time for representing video data, it is an engineering solution. Just like interlaced video format, we might find even progressive video format is not the right choice eventually when video technology evolves to a comparable state to human vision system.
Posted in digital video processing | Leave a Comment »