Invited papers
Paper 148: Advanced Analysis And Visualization Of Brain Imaging Data
The aim of Cognitive Neuroscience is to unravel the neural mechanisms that underlie higher levels of human mental activity, such as visual and auditory perception, language, motor control, attention, awareness, memory, imaging, decision making and emotion. The study of these mental processes was revolutionized by the advent of functional brain imaging techniques, most notably functional Magnetic Resonance Imaging (fMRI). This non-invasive technique created the possibility to scan the living human brain with high spatial and good temporal resolution while subjects or patients perform various cognitive tasks. The rich spatiotemporal data obtained from fMRI experiments presents a challenge for data analysis. I will present advanced analysis tools to analyze both anatomical and functional MRI data, which are based on methods from statistics, signal analysis, image processing and computer graphics. The developed tools have been successfully applied to create scientific visualizations that aid cognitive neuroscientists and medical professionals to interpret the recorded brain imaging data. More specifically, I will present advanced techniques to segment and visualize the human brain, to separate grey (neurons) from white (fiber tracts) matter, to measure cortical volume and cortical thickness (morphometry), to extract the cerebral cortex as a mesh representation, and to produce spherical and flattened cortex representations. The obtained representations not only provide novel ways to visualize functional data but also form the basis for the automatic alignment of brain structures across multiple subjects using non-linear warping. Finally I will focus on the possibilities and challenges of real-time fMRI. With our developed real-time analysis tools, subjects can see their own brain activity during an ongoing measurement, which allows them to learn to activate specific brain regions at will (neurofeedback).
Paper 149: Vertebrate-type Dynamic Vision for Ground Vehicle Guidance
Efficient real-time visual perception on road networks and in cross-country driving has to take advantage of foveal - peripheral differentiation for data economy and of active gaze control for a number of benefits like: gaze stabilization, visual tracking of fast moving objects reducing motion blur, a large field of view in the near range (coarse angular resolution), high resolution in a small field of view that can be shifted rapidly at will. For own behavior decision, motion behaviors of objects both in the wide f.o.v. nearby and in several regions of special interest further away have to be understood in conjunction. In order to achieve this efficiently, three distinct visual processes with specific knowledge bases have to be employed. In the wide f.o.v., bottom-up feature extraction has to answer the question: 'Is there anything of special interest?' On initialization, feature extraction operators have to give indications of objects of interest all over the image. Stable feature aggregations over several cycles have to trigger object hypotheses for the second stage. It works on single objects, however, on multiple of these in parallel. When looking almost parallel to the ground this is necessary for proper scaling since each line in the image represents a different distance on the ground plane. For this reason, feature extractors and state estimators for each object have to be tuned specifically. Representing 3-D objects in 3-D space and time allows exploiting the first order derivative matrix of the perspective mapping process (the so-called 'Jacobian') and spatial interpretation despite the fact that range has been lost in each single image point. The parallel processes of stage 2 yield best estimates of the relative state 'here and now' for all objects observed. They are symbolically represented in a scene tree known from computer graphics. Stage 3 works on time series of these symbolic data in order to more deeply understand motion sequences on larger spatial and temporal scales (maneuvers, mission elements) affecting own decision making. Experimental results with the test vehicles VaMoRs (5-ton van) and VaMP (Mercedes 500 SEL)will be discussed.
Paper 163: SIMD Vector Signal Processor for Wireless Embedded Systems
Digital signal processors (DSPs) have become a key component for the design of communications ICs, in particular for wireless solutions. However, why does it make sense to use them and for which target application area? How can the new processing power requirements be met?
The block nature of wireless communications leads to SIMD (single instruction multiple data) vector signal processors being a natural fit to the problem. Novel methods of developing architectures make these "V-DSPs" small in size, power consumption, and flexible in programming.
Paper 165: Rate-distortion optimisation in audio coding using a perceptual distortion measure
In state-of-the-art low bit rate audio coding algorithms a substantial part of the coding efficiency is obtained by removing perceptually irrelevant information. The resulting error signal is masked by the remaining encoded signal and is therefore inaudible. In such algorithms the decision of what information is irrelevant is guided by a model of auditory masking that defines for each frequency region a threshold which may not be exceeded by the error signal. In this presentation a new approach for removing irrelevant information is discussed, which is based on a perceptual distortion measure that predicts the audibility of the error signal rather than a fixed threshold. This distortion measure, contrary to what is commonly assumed, takes into account recent psycho-acoustical findings that indicate that perceptual distortions are integrated across frequency. It will be discussed how this approach influences the rate-distortion optimisation process and how it can lead to more efficient audio coding.
Paper 166: Turbo estimation of channel and synchronization parameters
The invention of the turbo codes a decade ago has lead to an increased interest for the use of soft information based iterative processing in digital communications. The idea of iterative processing has first been applied for decoding and then later to detection. More recently iterative joint source/channel decoding has also been proposed. Yet another application of this "turbo principle" is the use of soft information in iterative parameter estimation. In this presentation, a review will first be provided about different techniques proposed to use soft information in iterative parameter estimation. It will then be shown how this problem can be structured at the light of the EM (expectation-maximization) algorithm. With respect to EM estimation, improvements can be brought to account for bias and improve the speed of convergence. Solutions will be proposed and discussed. A comparison of different methods will be provided for carrier phase and timing recovery, and for the channel estimation of frequency selective and/or MIMO channels.
Paper 167: Speech Enhancement for Hands-free Voice Communication
Voice communication systems and voice controlled devices are frequently operated in noisy environments. To achieve satisfactory performance of such systems it is necessary to reduce the level of noise and reverberation in the microphone signal.
In this talk we will discuss algorithms for noise reduction which are based on statistics and optimal estimation techniques. The focus will be on single channel methods, especially on the estimation of the spectral coefficients of the clean speech signal and of the background noise power. Contrary to most other techniques we assume that the clean speech spectral coefficients obey a supergaussian distribution. We present analytic solutions for the Minimum Mean Square Error estimation of these coefficients and discuss their properties. The use of these estimators in the context of mobile telephony, hearing aids, and speech recognition will be outlined.
Furthermore, we will present multi-microphone techniques for speech enhancement and discuss some recent developments. The performance of these methods will be demonstrated with audio samples.
Paper 168: Trends In 2D And 3D Face Recognition
During the last years the interest in automatic face recognition is strongly increasing, mainly driven by improved and more convenient man-machine-interfaces and increasing security needs. Examples for the latter are access control and surveillance applications, database mining and the deployment of biometric-enabled E-Passports during the next years. From a broader perspective, each of these and almost every other scenario of biometrics falls in one of the following categories: 1:1 verification (i.e., the verification of the identity of a particular person), 1:n identification (i.e., to determine the identity of a particular person by means of a large data base) and watchlist applications (i.e., to decide whether or not a particular person is on a list of wanted persons). This categorization is important as it determines the choice of suitable performance measures to compare biometric applications.
Depending on the application scenario facial recognition is performed on two sources of information, 2D color or grayscale images and 3D models of the shape of the head. While details of the current approaches depend on the sensor, the main processing steps are the same for all of them: Face detection and tracking, generation of a model or template that describes discriminating features of the face, and a matching step that quantifies the similarity of two faces based on the generated model.
Beside other approaches Hierarchical Graph Matching is an universal approach to 2D and 3D object recognition. The main ingredient is an elastic graph that is automatically adapted to a single object and carries the shape and texture information of the object. The optimization of this technique with respect to facial recognition leads to one of the most powerful and most universal approaches to overcome some well known problems in facial recognition. Applications of facial recognition are known to suffer from insufficient conditions in the environment of the application like the illumination in the scene as well as the quality and resolution of the acquired images. While these are rather general issues in computer vision, the algorithms have to cope with problems more specific to facial recognition as well. Examples are changes in facial expression, the appearance of beards and glasses, and the pose of the face. A wide variety of investigations and approaches to cope with these problems have been created in the recent years. Here, the pose problem is one of the best studied of them and successful solutions are available today. They are based on explicit or implicit models of the head and are capable of counterbalancing the influence of the pose to the facial features.
Today, the extension of facial recognition based on 2D images with respect to 3D shape information is under heavy discussion and experimental investigation. The hope is to increase the accuracy of facial recognition due to the additional shape features and the insensitivity of 3D facial recognition to pose and illumination. Furthermore, recent trends include the fusion of 2D and 3D strategies to combine the best of both worlds.
Our presentation describes the above-mentioned topics in detail. It starts with an overview of algorithms and applications of 2D and 3D face recognition and presents details of the Hierarchical Graph Matching approach. Furthermore, we discuss the aspect of robustness against influences like pose and illumination. We end up with recent benchmark results on the performance of facial recognition and conclusions drawn from these results.
Paper 172: Programming and validation of domain-specific re-configurable processors
Silicon Hive owns a re-configurable processor generation methodology in which re-configurable processors can be generated for many application domains. In this talk I will use an example of a low-end video coding processor and show how the embedded software development is carried out. Also the architecture and design goals for this processor will be shortly reviewed. During the development phase multiple (mixed-level) simulation levels are deployed, each providing incrementally more insight in the design at hand. Finally this procedure leads to detailed specification figures for energy consumption, area in a CMOS technology, and performance.
Regular papers
Paper 104: Using Embedded C For High Performance DSP Applications
Embedded C is a language extension to C that is the subject of the ISO technical report "Extensions for the programming language C to support embedded processors." It aims to provide portability and access to common performance increasing features of processors used in the domain of DSP and embedded processing. Embedded C adds fixed-point data types, named address spaces and hardware I/O to C. Fixed-point primitives and named address spaces are performance increasing features. They are motivated by a practical and economic need to program DSP processors in a high level language instead of assembly. The presentation discusses Embedded C and its performance implications.
Paper 110: End-to-end Voice-over-IP system demonstration
This abstract proposes a demonstration of an end-to-end packet telephony system developed by InAccess Networks. The demonstration will exhibit many aspects of Voice-over-IP technology including voice coding, tone/DTMF generation/detection, echo cancellation, soft PBX functionality and voice quality monitoring.
The demonstration setup consists of a PC with an Intel processor operating as a high channel density H.323/SIP VoIP gateway (provider side) and a residential gateway (RG) with 4 telephony ports developed by Inaccess Networks (user side). These 2 systems are interconnected through an IP network.
The DSP algorithmic support executed on both sides is identical. The only difference is that the VoIP gateway code has been optimized using the Intel Performance Primitives (IPP) to take advantage of the Pentium Streaming SIMD extensions, while the RG employs a low-cost Agere DSP executing assembly- optimized code to carry out the same tasks. The demo shows that the same code is suitable for both PC-based and embedded systems.
Finally, an innovative end-to-end voice quality monitoring scheme developed by InAccess Networks will also be demonstrated. Such a system is a valuable aid to operators and users being faced with the intriguing aspects of the voice quality of VoIP transport.
Paper 113: Performance Assessment Method for Speech Enhancement Systems
A new method to assess noise reduction algorithms with respect to their ability to enhance the perceived quality of speech is presented. Such algorithms cover both single microphone systems and multiple microphone systems. Tests of the presented method show a higher correlation with subjective assessments than any other objective system known by the authors. It is believed that this method is suitable to improve the comparability between noise reduction algorithms. Another area of application could be the optimization of parameters in a noise reduction algorithm, as well as the optimization of the geometric microphone positioning.
Paper 114: Experimental Results of a Multi-Channel Speech Dereverberation Algorithm based on a Statistical Model of Late Reverberation
In general, acoustic signals radiated within a room are linearly distorted by reflections from walls and other objects. These distortions degrade the fidelity and intelligibility of speech, and the recognition performance of automatic speech recognition systems. We have investigated the application of signal processing techniques to the improvement of the quality of speech distorted in an acoustic environment.
One important effect of reverberation on speech is overlap-masking, i.e. previous phonemes overlap following phonemes. In [1] a multi-channel speech dereverberation method based on Spectral Subtraction was introduced to reduce this effect. The described method estimates the instantaneous power spectrum of the reverberation based on Polack's statistical model of late reverberation.
In this paper we present experimental results obtained using signals measured in a real acoustic environment. Additionally we will propose a modified gain function to reduce noise. Preliminary results are available for listening on the following web page: http://www.sps.ele.tue.nl/members/e.a.p.habets/sps05/sps05.html.
References
[1] E.A.P. Habets, "Multi-Channel Speech Dereverberation based on a Statistical Model of Late Reverberation," ICASSP 2005, accepted for publication, 2005.
Paper 115: Better Spectral Analysis by Improved Preprocessing
Blind Source separation problem is generally handled by ICA . In this paper, ICA has been used as feature extractor. After centering and whitening, the presprocessing by filtering gives better results. For preprocessing, the LPC coefficients were used. Applying the Cepstral coefficients generated to ICA as preprocessing has developed a new signal processing approach. LPC coefficients are linear and independent. Thus, the mixing matrix remains the same. Hence LP cepstrals prove to be better candidates for Linear ICA preprocessing. This gives much better results against MFCC and ICA separately, both for word and speaker recognition.. Matlab is the tool used for all comparisons. The database used is samples of ISOLET.
Paper 116: On Design-Time Performance Predictions of Object-Based MPEG-4 Video Applications
Development costs and time-to-market of software-intensive media processing systems can be reduced when correct decisions are made at an early design phase. However, the real-time requirements imposed on these systems, such as frame skipping and latency limitations, can only be validated after system implementation, e.g. at the test phase. To avoid system redesign for ensuring real-time performance, a system performance analysis should be done as soon as possible. This paper addresses a scenario simulation approach for performance predictions already at the design phase. This approach is presented through a real case study, for which we developed an advanced MPEG-4 coding application. The benefits of the approach are threefold: a) high accuracy of the predicted performance data; b) an efficient real-time software-hardware implementation of a system, because the generic computational costs become known in advance, and c) improved ease of use because of a high abstraction level of modelling. Experiments showed that the prediction accuracy of the system performance is within 10%, while the prediction accuracy of the time-detailed processor usage (performance) does not exceed 30%. However, the real-time performance requirements are sometimes not met, e.g. when other applications require intensive memory usage, thereby imposing delays on the decoding data retrieval from memory.
Paper 117: Binaural noise reduction for hearing aids: Preserving interaural time delay cues
This paper presents a binaural extension of a monaural multi-channel noise reduction algorithm for hearing aids based on Wiener Filtering. This algorithm provides the hearing aid user with a binaural output. In addition to significantly suppressing the noise interference, the algorithm preserves the interaural time delay (ITD) cues of the received speech, thus allowing the user to correctly localize the speech source. Unfortunately, binaural multi-channel Wiener Filtering distorts the ITD cues of the noise source. By adding a parameter to the cost function the amount of noise reduction performed by the algorithm can be controlled, and traded off for the preservation of the noise ITD cues.
Paper 118: IDDX measurement solutions - Demonstration
This is a proposal for a Demonstration
Q-Star Test - Innovation in Test & Measurement.
Q-Star Test is the prime provider of advanced high-speed high-resolution IDDX test and measurement solutions, providing highly repeatable results, that allow improving product quality whilst cutting down on test time and costs.
Our products, which will be put on display and of which the operation will be demonstrated, provide a complete solution to deploy a supply current based test strategy for analog, digital and mixed-mode circuits, and, successfully enable the application of current tests to DeepSubmicron and NanoTechnologies, thereby offering improved defect detection, in combination with test cost and test escape risk reduction as well as improved defect localization and identification.
Next to that we also provide Design for Test and test optimization consulting services as well as Creative engineering solutions. With Q-Star Test, you get more than just technology. Behind our solutions stands an experienced team who know your problems, needs and expectations!
Paper 119: Current testing for nanotechnologies: a demystifying application perspective
This paper addresses the challenges imposed on current testing with the advent of Nanotechnologies. It shows why existing measurement solutions embedded in ATE systems are not adequate to meet these challenges and illustrates that there are alternative add-on solutions available that not only can be used to overcome the challenges but meanwhile also help to improve screening efficiency, reduce test efforts, time and cost without compromising on test and product quality and even are offering ways to improve the latter. The paper also considers the application requirements imposed by advanced add-on IDDQ measurement solutions and illustrates their application and achievable benefits on base of a number of real-life case studies. At the end conclusions are drawn and guidelines for the future are presented.
Paper 120: Split Time Warping For Improved Automatic Time Synchronization Of Speech
In this paper, we propose an improvement to the robustness of automatic time synchronization of speech. The basic synchronization system consists of two parts. In the first part the timing relationship between two speech utterances is measured using the well-known technique of dynamic time warping (DTW). In the second part, the original signal (source) is time scaled to synchronize it with the utterance that is used as the timing reference (reference). Both for DTW and for the time-scaling of speech, numerous solutions are already known to exist, but automatic time-synchronization often yielded distorted results in realistic situations. This paper presents a dedicated time warping algorithm that overcomes many of the original problems and achieves a more robust automatic synchronization.
Paper 121: On Delay in Parametric Audio Coding with Adaptive Segmentation
This study concerns the trade-off between delay and audio quality using adaptive (variable-length) segmentation in a sinusoidal coding environment. The length of the frame, in which the segmentation is adapted, determines the delay. For shorter frames, the advantage of adaptive segmentation is expected to be smaller because of lower flexibility for segmentation. Experiments show, however, that above 20 ms the frame length does not have an important influence on the sound quality: effects are small for frame lengths between 20 and 50 ms, and above 50 ms they are negligible. This suggests that low delay is feasible for parametric coding with adaptive segmentation when used in for example speech communication applications.
Paper 122: Performance of MIMO OFDM systems in fading channels with additive TX and RX impairments
The performance of OFDM based systems is seriously affected by imperfections in system implementation. To gain a beter understanding of the influence of these impairments on the performance of multiple antenna OFDM systems, this paper studies a zero-forcing based MIMO OFDM system with imperfections modeled as additive noise sources in both transmitter (TX) and receiver (RX). Based on this model, expressions are derived for the probability of error of uncoded impaired MIMO systems in fading and non-fading environments. These results allow for insightful comparison between the influence of TX and RX impairments. It is concluded that the influence of RX imperfections decreases with an increasing number of RX branches, while this is not the case for TX deficiencies.
Paper 123: Why to do without Model Driven Architecture in embedded system codesign?
The Model-Driven architecture is an initiative by the Object Management Group (OMG) to define an approach to software development based on modeling and automated mapping of models to implementations. The basic MDA pattern involves the definition of a platform-independent model (PIM) and its automated mapping to one or more platform-specific models (PSMs).
By defining different PIM and PSM dedicated to embedded systems, we will show the benefits of using the MDA approach in System on Chip codesign. From UML 2.0 profiles to System C or VHDL codes, the same model transformation engine is used with different rules expressed in XML.
Paper 124: Active Noise Control in Frontages of Buildings
Active noise control is an interesting alternative to the problem of frontage insulation in low frequency, where passive solutions are less efficient. The major objective of this research project is to consider, in an integrated way, the contribution of active insulation methods to decrease harmful effects of noise caused by the traffic growth (aircraft, railway, road).
Measurements in the vicinity of Liège Airport have shown that the main weak parts of the building insulation are the windows (including the frame), the roof, the shutter boxes (if any) and of course the air supply system which is intended to bring fresh air, for example in the sleeping rooms. Our project is presently focused on ventilation ducts and applications of active systems in a shutter box. Concerning the air supply system, a prototype was designed with the requirement of being incorporated in a frontage.
Acoustical transducers are used and the involved techniques are based on feedforward algorithms (FXLMS) implemented on a DSPACE system.
This work has been performed in the framework of the ISACBAT project funded by the Walloon Region of Belgium.
Paper 125: Acoustic Echo Cancellation in the Presence of Continuous Double-Talk
The use of a double-talk detector in acoustic echo cancellation cannot improve the adaptive algorithm's performance if near-end noise is continuously present. We propose a new way of dealing with such a continuous double-talk situation, which may occur e.g. in an automatic gain adjustment application. If the microphone and loudspeaker signals are prefiltered with the inverse near-end signal model, a minimum variance room impulse response estimate can be obtained. However, the near-end signal model is unknown and time-varying and has to be estimated concurrently with the room impulse response. We apply three different prediction error identification algorithms to this problem, that were originally developed for adaptive feedback cancellation. Simulation results indicate that only the prediction error method based adaptive filtering algorithm applying row operations (PEM-AFROW) outperforms standard RLS or NLMS adaptive algorithms.
Paper 126: Classifying white matter damage in non-compensated ultrasound images
In this article we present new results on the classification of the neonatal White Matter Damage brain disease. One of the common diagnostic methods nowadays used in clinical practice is the visual inspection of Ultrasound images of the neonatal brain. Given the poor image quality of Ultrasound images and the different machine settings used in practice, the diagnosis highly depends on the interpretation of the medical doctor and is subjective to some degree. In this paper we investigate if the texture present in the Ultrasonic images could have prognostic implications for detecting affected tissue, and thus in creating semi-automatic tools to assist the experts. We also try not to compensate for the machine settings as we did in former experiments because this compensation is often machine dependent and quite tricky since we have to guess up to some degree what goes on inside of the Ultrasound machine. We will show it is possible to get very high classification rates without this preprocessing which is a great step forward in the quantitative analysis of the images.
Paper 127: Real Time Digital Emulation Of Copper Access Networks For VDSL Applications
The wide scale deployment of fibre optic backbone networks is already a fact, yet, for the last mile(s) up to the subscriber, the reuse of current copper access networks is standard practice. This explains the success of Digital Subscriber Line technology (xDSL), which can support megabit data rates over the phone lines connecting nearly very home. Copper pair networks were designed for voice traffic with a limited bandwidth of 3.4 kHz, but HSDL, ADSL, T1, E1 and VDSL may exceed voice bandwidth more then a thousand fold. The efficient exploitation of copper pair access networks at high bandwidth therefore presents a formidable challenge, and the capability of accurately analyzing and emulating specific access line cables and topologies has become quite important.
In this paper, a reconfigurable platform for the real time digital emulation of copper access networks presented. The module, using hard real time DSP techniques on Xilinx Virtex II FPGAs, is capable of accurately reproducing the physical layer of a xDSL connection. The insertion and return loss of the access loop consisting of cables with various length and characteristics is digitally emulated over the full VDSL bandwidth in the instrument. The innovative character of the digital approach over conventional loop emulation techniques is explained and its performance is assessed using several test cases.
Paper 128: H.264 baseline profile codec on a multiprocessor dsp platform
The latest ITU-T/MPEG standard, H.264 (MPEG-4 AVC) is gaining popularity in several markets such as video surveillance, set-top boxes, video conference ... This is mainly due to its ability to achieve high-quality video over IP links and its ability to reduce storage requirements by 50% versus MPEG-2.
However, this latest compression codec requires more signal-processing computations than previous codecs. As a result, they are mostly been implemented with dedicated silicon solutions. The dedicated chips offer little flexibility if specifications changes. A redesign of the chip would prove costly and take many months. To overcome the performance limitations of a general-purpose DSP and provide a level of flexibility equal to a software-based solution, specialized multiprocessor system-on-a-chips (SoC) have entered the market. By combining multiple DSP engines on a single chip, algorithms can exploit the natural data parallelism inherent in video and imaging applications.
In this demonstration, an H.264 (MPEG-4 Part 10) full CIF, 30-frame/second baseline profile codec with a 256-kbit/s transmission rate, will be shown on a multiprocessor DSP platform.
Barco Silex, Barco's center of competence for micro-electronic design, has acquired worldwide expertise in advanced image and video processing applications such as digital motion pictures, surveillance, aerospace, printing, industrial and consumer electronics. The unique combination of high-end competence in image and video coding (JPEG, JPEG-LS, JPEG2000, MPEG-4 ...) and custom design services (DSP/FPGA/ASIC/IP) provides the most optimal and tailored video processing implementations.
Paper 129: Array Interpolation applied to Conformal Array STAP
Space-time adaptive processing (STAP) is a well suited technique to detect slow-moving targets with a moving radar in the presence of a strong interference background. Much of the work on STAP has been done for uniform linear antenna arrays (ULAs). However, there is a growing interest in doing STAP with more general antennas and, in particular, with conformal antenna arrays (CAAs) that match the surface of the carrying platform. In the case of a sidelooking (SL) ULA, the filter used to mitigate the interference is generally estimated by averaging data snapshots at neighboring ranges. However, to have an optimal estimate, these snapshots have to be independent and identically distributed (iid). If a CAA or a non-SL ULA is used instead of a SL ULA, applying this estimator degradates the detection performance due to the range dependence of the snapshots at neighboring ranges. These snapshots are in fact not iid. In this paper, we propose an enhanced array interpolation technique combined with a range-dependence compensation technique which allows STAP algorithms to be applied to CAAs. We apply our technique to a circular-arc array and we evaluate the performance of the proposed algorithm. We show that near optimum performance can be achieved.
Paper 131: DSP Verification Using Word Level Symbolic Simulation
In this paper, a novel DSP validation approach which uses word level symbolic simulation is presented. A word level data flow graph is created from description of DSP core in synthesizable VHDL. This data flow graph contains arithmetic, relational and Boolean operators. Afterward a set of linear integer equations is generated from the data flow graph. A set of sequence based properties is also defined for verifying the correctness of the DSP processor. These properties are mapped to the word level data flow graph in order to be checked with the linear integer equations generated from the design. Symbolic simulation is used to verify the design using these properties.
Paper 132: On the accuracy of EERs in face recognition and the importance of reliable registration
In face recognition the recognition performance is often quantified in terms of the equal error rate (EER). An indication of the reliability of these numbers is seldom given, though they depend strongly on the number of tests and training/test data. A single error rate without an estimate of its reliability is hardly informative. We show that the standard deviation is a suitable measure for the reliability of the EER in face recognition systems.
The second topic of this contribution is the registration (alignment) prior to feature extraction. We will show that the recognition performance is very sensitive to proper registration by performing recognition tests while the registration was disturbed with noise. Alignment or many points is much more robust and also performs a lot better then alignment on only 2 points. Using a shape free patch and the shape information combined does outperform those two methods for low noise but is less robust to noise then simple alignment on 20 points.
Paper 133: Design of application-specific instruction-set processors for multi-media, using a retargetable compilation flow
Chess/Checkers is a retargetable tool-suite for the design of application- specific instruction-set processors (ASIPs), offered by Target Compiler Technologies. The tool-suite includes an optimising C compiler, assembler/disassembler, linker, instruction-set simulator generator, RTL hardware generator, test-program generator, and on-chip debugging infrastructure. Using the nML processor description language and Chess/Checkers' architectural exploration capabilities, optimised ASIPs can be designed quickly in conjunction with their embedded software. In the past, Chess/Checkers has been used successfully to design ASIPs in products like hearing instruments, MP3 players, car radio, xDSL modems, and 3G phones.
In this presentation, we will demonstrate how the concept of designing ASIPs with a retargetable tool-suite can be extended to the domain of video and image processing, both for mobile and home multi-media systems. To meet the high data-rate requirements for pixel processing, without burning the power of general purpose VLIW machines, specialised ASIPs can be designed that offer dedicated parallel functional units with single-instruction, multiple-data (SIMD) processing capabilities. These SIMD ASIPs use vector registers and memories, which can be supported well by the Chess/Checkers tools.
The effectiveness of the design methodology and tools will be illustrated by means of a reference design, implementing critical functions of MPEG4 video coding.
Paper 134: A New Transceiver Design for Ultra Wideband Impulse Radio Systems
A new transceiver design for M-ary bi-orthogonal modulation scheme is proposed for reducing the complexity caused by the large number of correlation functions at the receiver. M-ary bi-orthogonal modulation scheme for ultra wideband impulse radio systems provides high multiple access performance for high data rate transmissions. Large values of the modulation index increase the receiver complexity. The proposed receiver, based on a bank of RAKE units, reduces the number of correlation functions by means of a new definition of template signals. The results illustrate that the reduced complexity RAKE receiver outperforms the conventional RAKE receiver for low values of correlation operations and high modulation levels.
Paper 136: Relevant Feature Selection for Seizure Detection in Neonates
Automatic seizure detection has attracted interest as a method to obtain valuable information concerning duration and timing of seizures. Methods currently in use to detect seizure in EEG of adults are not well suited for the same task in neonates because they lack information about specific age- dependant features of normal and pathological EEG and seizure. This paper seeks the feature selection methods which improve the accuracy of the conventional seizure detection systems in neonates. Three feature selection techniques, correlation-based, distance-based, and similarity-based, are applied to parameterized EEG data acquired from 6 neonates aged between 39 and 42 weeks. The effectiveness of these methods was compared with the detector performance in terms of average detection rate of seizure and non-seizure segments. Then the ranked feature subsets were fed into a multilayer neural network as a detector. Distance-based feature selection turned out to be the best of the three methods and yielded an average seizure detection rate of 91%, an average non-seizure detection rate of 93%, an average false rejection rate of 93% and an average seizure detection rate of 92% using an optimal feature subset size of 30. This method allows a feature reduction up to 75%.
Paper 137: FPGA Implementation of Finite Interval CMA
An FPGA implementation of the Finite Interval Constant Modulus Algorithm (FI-CMA) using the Virtex-E and Virtex-II devices is presented in the paper. The algorithm consists of two parts: one, performing batch-QR decomposition of the data matrix, and second, used for an iterative equalizer optimization, using the columns of the Q-matrix as input. The resource reuse and minimization of the total latency have been emphasized in the design. For floating point calculations, required to achieve sufficient accuracy, a Logarithmic Number System (LNS) based library has been used.
Paper 138: Time Segmentation Algorithms for the Time-Varying MDCT
In this paper, several time segmentation algorithms for the time-varying MDCT are discussed and compared. A time-varying MDCT is employed in an audio coding system. Time segmentation optimization procedures based on fast tree pruning and dynamic programming are investigated. MDCT windows having fixed and variable overlapping tails are considered and both entropy and rate-distortion based cost functions are applied. Experimental results in the form of SNR curves are presented. The obtained results show a clear trade-off between performance and computational complexity over a large range of bit rates, with a performance gap of 3 dB between low -and high-complexity systems.
Paper 139: Mobile High-resolution Direction Finding of multipath radio waves in azimuth and elevation
This paper presents a novel direction-finding (DF) system based on a 3-axis crossed antenna array in combination with three-dimensional Unitary ESPRIT. The application and design considerations are discussed and the implemented antenna array is shown. The 3-D Unitary ESPRIT algorithm is modified to solve a rank-deficiency problem in order to obtain better estimation results. The improved resolution capabilities are addressed and simulation results are presented that show the excellent DF capabilities of the new system.
Paper 141: Signal Processing Functions, Algorithms and Smurfs: the Need for Declarativity
The gap between modelling techniques for DSP functionality and those for software implementations is widening. This impedes unifying formalisms for analog, digital and software systems. Recovering these opportunities requires declarativity.
A suitable formalism is outlined, based on a mathematical rather than a programming language. Examples show how it unifies continuous and discrete mathematics, from analysis to formal program semantics. The formalism provides crucial advantages in reasoning by calculation about all aspects of SP and paves the way for software tools of the next generation.
Paper 142: Critically subsampled filterbanks implementing Reed-Solomon codes: An algebraic point of view
Filterbanks have long been known to be a powerfull tool for image and audio processing. Their importance has also been recognized in communication systems: Many modulation schemes including OFDM, DMT, TDMA, and CDMA can actually be viewed as filterbanks that build input diversity (add redundancy) at the transmitter. Recently, we unveiled a filterbank structure behind the Reed-Solomon codes showing that there exists also a strong relationship between filterbanks and error correcting codes. Using this filterbank decomposition, a RS code is broken into many smaller subcodes. This decomposition can be used to build a Soft-In Soft-Out (SISO) RS decoder. In addition, these filterbanks can be applied to e.g. CDMA, where the spreading and error correcting codes are jointly designed such that the overall code is a RS code.
A limitation of this previous work is that it is only applicable to RS codes where the codeword and dataword length are not coprime. In this paper, this constraint is eliminated. A purely algebraic method is presented to construct a filterbank decomposition for any RS code, as long as a subfield exists in the Galois field in which the RS code operates. This method gives a lot of insight into the algebraic structure of RS codes and their corresponding filterbanks.
Paper 143: Adaptive feedback cancellation in hearing aids based on the autoregressive modelling of the desired signal
In this paper, we propose adaptive feedback cancellation techniques that are based on direct closed-loop identification of the feedback path as well as the (autoregressive) modelling of the desired signal, with the aim of providing an unbiased feedback path estimate for unknown and time-varying desired signals. Simulation results demonstrate that the proposed techniques outperform the standard continuous-adaptation algorithm and the filtered-X algorithm that uses a fixed estimate of the average desired signal spectrum.
Paper 144: MMSE Estimation of Basis Expansion Models for Rapidly Time-Varying Channels
In this paper, we propose an estimation technique for rapidly time-varying channels. We approximate the time-varying channel using the basis expansion model (BEM). The BEM coefficients of the channel are needed to design channel equalizers. We rely on pilot symbol assisted modulation (PSAM) to estimate the channel (or the BEM coefficients of the channel). We first derive the optimal minimum mean-square error (MMSE) interpolation based channel estimation technique. We then derive the BEM channel estimation, where only the BEM coefficients are estimated. We consider a BEM with a critically sampled Doppler spectrum, as well as a BEM with an oversampled Doppler spectrum. It has been shown that, while the first suffers from an error floor due to a modeling error, the latter is sensitive to noise. A robust channel estimation can then be obtained by combining the MMSE interpolation based channel estimation and the BEM channel estimation technique. Through computer simulations, it is shown that the resulting algorithm provides a significant gain when an oversampled Doppler spectrum is used (an oversampling rate equal to $2$ appears to be sufficient), while only a slight improvement is obtained when the critically sampled Doppler spectrum is used.
Paper 145: Frequency offset and channel parameter estimation for a multi-user DS-CDMA system using the code-aware SAGE algorithm
We consider the problem of joint multi-user detection and channel parameter estimation in a space-time bit-interleaved coded modulation (ST-BICM) scheme for an asynchronous DS-CDMA uplink transmission over frequency selective channels. The performance of standard coherent detectors relies on the availability of accurate estimates of the channel parameters and Doppler shifts. Conventionally, these are estimated using pilot symbol in the burst, a technique that reduces both the energy- and bandwidth efficiency. We will derive an iterative estimation technique, based on the SAGE algorithm, that combines pilot symbols and information from the detector in an elegant and efficient manner. We show through computer simulation that the proposed receiver considerably outperforms conventional channel estimation schemes using the same number of pilot symbols.
Paper 146: Recognition Of Isolated Digits Using A Liquid State Machine
The Liquid State Machine (LSM) is a recently developed computational model with interesting properties. It can be used for pattern classification, function approximation and other complex tasks. Contrary to most common computational models, the LSM does not require information to be stored in some stable state of the system: the inherent dynamics of the system are used by a memoryless readout function to compute the output. We apply this framework to the practical task of isolated word speech recognition. We investigate two different speech front ends and different ways of coding the inputs into spike trains. The robustness against noise added to the speech is also briefly researched. It turns out that a biologically realistic configuration of the LSM gives the best result, and that its performance rivals that of a state-of-the-art speech recognition system.
Paper 147: Evaluation of Spatial and Temporal Detection Algorithms for Interictal Epileptiform EEG Activity
The reliable detection of interictal epileptiform activity in the EEG (electroencephalogram) would be highly desirable, and many approaches have been proposed in the literature. However, it is difficult or even impossible to compare the detection performance reported for the different methods, because (i) the performance highly depends on the data set used, (ii) the performance depends on the labeling by an expert, but discrepancies between different experts exist, and (iii) some differences in the calculation of the evaluation measures exist.
This study evaluates different detection methods (temporal as well as spatial methods) on the same data set, in which the epileptiform spikes were labeled into two categories (definites and questionables) by two experts. We describe in detail how the performance measures sensitivity and selectivity were calculated. We also assessed the inter-reader agreement. We conclude that (i) a dichotomous labeling leads to a better agreement between the experts and better reflects the true nature of EEG spike detection, and (ii) for spike detection as such, a pure spatial approach is not sufficient, and a combination of temporal and spatial methods should be used.
Paper 150: Alleviating Memory Bottlenecks by Software-controlled Data Transfers in a Data-Parallel Wavelet Transform on a Multicore Dsp
Users expect future hand-held devices to provide extended multimedia functionality at minimal battery exhaustion. This type of application imposes heavy constraints for both performance and energy consumption, which can be met using a parallel implementation. An efficient parallelization should pay attention to the memory subsystem, which can become a severe bottleneck in a multiprocessor environment. In this paper we present a mapping of the Wavelet Transform to ADI's dual-core Blackfin 561. We do a coarse data-level split of a previous implementation on a single-core Blackfin 533 with cache. Unfortunately, doing so we observe only a marginal speed-up due to memory access inefficiencies. This problem is addressed in three steps: first we replace local caches by a software-controlled scratchpad memory filled using DMA transfers. Next we let one core schedule these transfers for both cores. Finally we modify the transfers to avoid page misses to SDRAM. All these steps result in a x1.86 speed-up against a single-core version, or equivalently a 40% power saving at the same performance, showing how efficient data-level parallelization on a multi-core DSP is possible when attention is paid to memory subsystem needs.
Paper 151: Power Consumption Study of CoolFlux DSP, an Embedded Ultra Low Power C-Programmable DSP Core
CoolFlux DSP is a new licensable embedded DSP core from Philips designed for audio products, such as headsets, hearing aid devices and portable audio players. Design goals for the core were ultra low power consumption, a small core size and a small memory footprint. CoolFlux DSP is programmable in ANSI-C with a highly optimizing and very efficient C compiler, whose results are comparable to handcrafted assembly code, in terms of code size and clock cycles.
The hardware architecture of CoolFlux DSP comprises a dual Harvard memory architecture, full 24/56 bits data paths, two 24x24 bit multipliers and 56 bit accumulators. The gate count of the core is 43k in a 0.18u CMOS process, and the maxi-mum clock frequency is 135MHz (WCCOM).
In this paper, we study the power consumption distribution over the different elements of the CoolFlux DSP (i.e core blocks and memories) for a set of applications. For each of these applications, power consumption is estimated using a pre-layout netlist. Using these results, the design decisions that were taken during the design of the processor are validated. This analysis has identified improvement opportunities for next generations of the CoolFlux DSP.
Paper 152: Speaker Adaptation by Maximum Likelihood Linear Regression With Application to Computer Aided Learning
This paper presents an implementation of the acoustic adaptation of continuous density hidden Markov models using Maximum Likelihood Linear Regression (MLLR). We present a possible solution for the problem of updating Gaussians that are shared across multiple states. We use a tree-based partitioning of the different mixture components in regression classes. Evaluation is done on the CoGeN (Corpus Gesproken Nederlands) dataset and on a dyslexia reading diagnosis test for children (logopaedic data).
Paper 153: A Study of the Distribution of Time-Domain Speech Samples and Discrete Fourier Coefficients
We study the distribution of time-domain speech samples as well as the distribution of Discrete Fourier Transform (DFT) coefficients obtained from speech segments. We consider four possible pdf model types, namely Gaussian, Laplacian, Gamma, and a Generalized Gaussian density (GGD). Our time-domain results suggest that for segment lengths of 20-200 ms, the Laplacian density is the better choice, while for shorter segments the Gaussian model is more appropriate. For segments of 20 ms, the Gaussian model is advantageous for broad speech classes of fricatives, nasals and glides, but stop sounds are much better represented with the Laplacian model, making the latter model better on average. Finally, our study supports the often made assumption that DFT coefficients collected within short time intervals can be considered Gaussian distributed, across all types of speech sounds.
Paper 154: Improving Parallelism of a Hardware Friendly, Scalable Wavelet Entropy Codec
In the RESUME project we explore the use of reconfigurable hardware for the design of portable multimedia systems by developing a scalable wavelet-based video codec. A scalable video codec provides the ability to produce a smaller video stream with reduced frame rate, resolution or image quality starting from the original encoded video stream with almost no additional computation. For this codec we developed a new, hardware friendly wavelet entropy codec which is further optimized in this article. It was possible to increase the parallelism and reduce the memory footprint by a factor three with only a minor decrease of the compression ratio.
Paper 155: Improved Subspace Based Speech Enhancement Using an Adaptive Time Segmentation
Subspace based speech enhancement relies on the decomposition of the vector space spanned by the covariance matrix of noisy speech into a noise subspace and a signal subspace, where the noise subspace is nulled and the signal subspace is modified by applying a gain function. This gain function is determined by the eigenvalues of the noise and noisy speech covariance matrix that are typically estimated from the noisy data using a fixed segmentation. A fixed segmentation often leads to covariance matrix estimates with an unnecessary high variance or a bias, because segments are shorter or longer, respectively, than the region where the noisy data is stationary. To overcome this problem we present an adaptive time-segmentation algorithm combined with subspace based speech enhancement. As a result, smearing of speech sounds and musical noise in the enhanced speech signal are reduced. Experiments show improvements in terms of segmental SNR of 0.6 dB and symmetrical Itakura-Saito distortion measure over the use of a fixed segmentation.
Paper 156: Improved Passive Reconstruction Of Wavelet-coded Images Through Directional Image Correlation
In this paper, we present a novel passive error concealment algorithm for the reconstruction of wavelet-coded images which are damaged due to packet loss. The proposed interpolation scheme calculates a lost coefficient from its neighbours while adapting the interpolation weights to the image correlation in each direction. All subbands are processed independently which allows a fast, parallel execution. This is interesting for real-time video applications such as two way video communication.
We also propose a new packetisation scheme to spread neighbouring coefficients over different packets, which makes the quality of all restored images more equal, so there is less fluctuation from frame to frame. At a loss rate of 25% of the coefficients, the minimal quality (PSNR) of our restored images is circa 0.30 dB (up to 0.8 dB) higher than the minimal quality obtained with older restoration algorithms of the same complexity. For low loss rates, the gain in quality is even higher (up to 4 dB).
Paper 157: Flat maps rendering the complete outer surface of the brain: their reconstruction, their relevance and possibilities for visualization and study of brain anatomy, their usage for automatic recognition
Automatic recognition of the cortical sulci is an often investigated subject in the field of pattern recognition. In this paper we present and analyse a method for the identification of the sulci in a flat representation of the human brain cortex, which captures the cortical structure in one single image. The sulci are represented as two-dimensional linear structures which are decomposed into building blocks represented by short line segments. In the course of identification, the line segments are labelled according to the geometrical information extracted from the images called flat maps. The technique is based on a statistical model describing spatial and contextual information of the structures of interest. The Ising model is used to extend the characteristics by means of modelling the local dependencies between building blocks described by labels which represent the sulci types. The paper emphasises the Ising model and its application to sulcus identification and discusses the results of experiments and profits brought by the model in terms of statistical evaluation and running performance.
Paper 158: Trajectory Clustering Using Longer Length Units for Automatic Speech Recognition
One of the major deficiencies of conventional hidden Markov modelling (HMM) is known as the trajectory folding phenomenon. Multipath Models can eliminate the trajectory folding problem by relying on the fact that the variation in acoustic data can be classified and then modelled separately. In this paper, we present an approach based on Trajectory Clustering (TC) to automatically cluster multi-dimensional dynamic trajectories corresponding to speech data. We define multipath HMM topologies using the trajectory clusters found. The trajectories are described using Gaussian mixtures. Based on the hypothesis that variation of speech is more systematic at a segmental unit level that is longer than a phone, we used modelling units defined in terms of Head-Body-Tail (HBT) models. Comparison experiments were carried out in the context of connected digit recognition for Dutch. We find that TC-based multipath HMM topologies outperform HMM topologies based on prior knowledge. These results suggest that Trajectory Clustering can identify important differences in pronunciation variation.
Paper 159: Refined Noise-Robust Motion Estimation For Noise Reduction in Video Sequences
In this paper, we propose a novel noise robust motion estimation scheme in the wavelet domain for video denoising. We consider noisy image sequences with additive white Gaussian noise. The proposed method performs recursive motion estimation on noisy sequence. Subsequently, recursive temporal filtering is performed through the estimated motion trajectory, after which adaptive spatial denoising is applied. In order to optimize the performance of the proposed denoising scheme, we adapt the method to currently estimated noise variance.
The complete framework proposed in this paper is in the wavelet domain. We use a nearly shift-invariant non-decimated wavelet transform which enables accurate motion estimation and compensation. We have tested our algorithm on different grey-scale image sequences and white Gaussian noise of various variance. The results demonstrate that the proposed spatio-temporal filter outperforms other state-of-the-art filters both in terms of PSNR and visually.
Paper 160: On the reconstruction of strip-shredded documents
Until recently, the forensic reconstruction of shredded documents has always been dismissed as an unsolvable problem. Manual reassembly of the physical remnants can always be considered, but for large amounts of shreds this problem quickly becomes an intangible task. In this paper we propose and discuss several image processing techniques that can be used to enable the reconstruction of strip-shredded documents stored within a database of digital images. We discuss the use of feature based matching and grouping methods for classifying the initial database of shreds, and the subsequent procedure for computing more accurate pairing results for the obtained classes of shreds. Finally, we briefly discuss the actual reassembly of the different shreds on top of a common image canvas, and point out some possibilities for further research.
Paper 161: Stability of energy-aware radio link control with imperfect path loss knowledge
Recently, cross-layer energy-aware radio link control has been applied on OFDM based WLAN transceivers. It has been proven that this radio link control outperforms existing power management techniques to bound the data rates versus energy consumption trade-off. In indoor environments, mobile terminals provide high throughput under low mobility conditions. The low mobility constraint has been exploited by the cross-layer power management control: the average path loss is assumed to be constant during a relatively long time. However, in a real communication system, the path loss is still subjected to variation, for instance when the position of one or both communicating nodes changes. In all cases, the path loss is unknown a priori. In this paper, we study the impact of the imperfect knowledge about the pathloss on the stability and efficiency of the cross-layer energy-aware radio link control. To improve this control, a novel adaptation policy is proposed, which is able to reduce the system power consumption by a factor 2 compared to the non-adaptive version.
Paper 164: A priori SNR estimation for speech enhancement using Kalman filter
In this paper we propose an a priori SNR estimation scheme that is based on Kalman filtering. In our previous work we proposed a method, based on the Kalman filtering technique, for estimation of the power spectral density (PSD) of the nonstationary noise when a noisy speech signal is given. The Kalman filter developed for the noise PSD estimation problem also gives an estimate of the clean speech PSD that we use in this paper for a priori SNR estimation. The proposed method is a recursive estimation scheme which makes it well suited for real time implementations. The method can be combined with any speech enhancement algorithm that requires an a priori SNR estimate.
Paper 170: Optimization of the driving signal of an ultrasonic transducer using a genetic algorithm
A method of reducing the response time of an ultrasonic transducer by optimizing the driving signal is presented. The optimization is performed with all hardware in the optimization loop. The driving signal is divided into two parts, a part used to excite the transducer and a part for damping the vibration of the transducer. The latter part is optimized by using the Genetic Algorithm Toolbox of Matlab in combination of an arbitrary waveform generator, transducers and an oscilloscope, which are controlled by the Instrument Control Toolbox of Matlab. The results of the optimization show that this is an effective way of reducing the response time of the transducers. The results show an average response time reduction of approximately 25%.
Paper 171: SVEN - Scaleable Video Engine for the DTV market
Video decoder have very demanding processing power and data bandwidth requirements, while power dissipation needs to be very low. This paper describes a new video processor architecture, the "Scaleable Video Engine" - SVEN, which is the first processor in the market capable of handling high definition multi standard compliant video codec implementations on a fully programmable core.
Future video decoders inside set top boxes shall have to handle H.264 video compression to benefit from increased compression ratios and MPEG2 video compression to be compliant to the already established standard. Due to divergent market interests, other standards such as Windows Media Version 9 are popping up and will have a market share. It is obvious that only a software programmable solution will be able to handle all these decoding standards within a single core. Nevertheless video decoders capable of processing high definition formats are still hardwired for a simple reason: there is no DSP available in the market powerful enough of handling an HD compliant video decoder. DSP based solutions would need additional hardware accelerators (e.g. DCT hardware accelerator) and external modules (e.g. for the Entropy decoding) to deliver a suitable solution. The architecture presented here is a fully programmable processor, which handles multi standard video decoders with HD formats.
The internal Video Processor is based on a VLIW fixed-point arithmetic architecture consisting of parallel computing entities which is very suitable for all parallel algorithms like motion estimation or transform algorithms (DCT, Integer Transform etc.). The processor has been added with additional capabilities to interface to the video buffer using a scaleable memory controller. For transport stream processing, frame parsing, entropy decoding etc. of the video decoder, there is an additional front end processing unit, which has been optimized for bit stream processing. This entire solution is programmed and debugged with a single programming platform, which also includes a host controller that handles all processes of a complex SoC implementation.
This paper describes the architecture of SVEN, a low power real time video processor. Power and chip area values for standard definition and high definition decoder implementations will be illustrated to further outline the high efficiency of this core for video processing applications.
Paper 174: About IMEC
IMEC is a world-leading independent research center in nanoelectronics and nanotechnology. Its research focuses on the next generations of chips and systems, and on the enabling technologies for ambient intelligence. IMEC's research bridges the gap between fundamental research at universities and technology development in industry. Its unique balance of processing and system know-how, intellectual property portfolio, state-of-the-art infrastructure and its strong network of companies, universities and research institutes worldwide, position IMEC as a key partner with which to develop and improve technologies for future systems. IMEC is headquartered in Leuven, Belgium and has representatives in the US, China and Japan. Its staff of more than 1300 people includes over 400 industrial residents and guest researchers. In 2004, its estimated revenues were EUR 159 million. Further information on IMEC can be found at www.imec.be
Paper 175: Automatic Music Mixing Technology
It is an emerging technology to automatically sort contents based on some similarity criteria and playing them in a smooth rhythmically consistent way. The later procedure is referred to as AutoDJ. In AutoDJ, a database of songs is first analyzed to extract parameters that represent the rhythmical and perceptual properties of the content. A play list generator is then applied to create a suitable play list using the extracted database. Finally, the player compares the parameters of consecutive songs to determine the most suitable transition between the songs. The result is streamed into the output-rendering device (e.g. loudspeaker) in a rhythmically consistent and smooth way.
Paper 176: About TELiNDUS
Telindus [Euronext Brussels: Telindus Group, Ticker :TEL] is a group of companies offering network-based ICT solutions, meeting business and public sector needs. Telindus serves these markets as a solution & sourcing partner, delivering secure multimedia network solutions, underpinned by management and support services.
The Telindus R&D department is part of the Telindus Access Products (TAP) organisation, which is a division within Telindus NV. The mission of the Telindus Access Products division is to develop, produce, market, sell and support high quality data communication access products and related services for the professional market. The R&D team specialises in the in-house development of modem and other access products (multiplexers, switches, routers) to cover all needs within the professional access market (modems, switching and general data access products, Customer Premises Equipment, Central Office concentrators and Routers).
Telindus is one of the pioneers of voice band modem technology and one of the few companies worldwide with the technological competence of the complete PSTN modem stack for fax and data transmission. This technology, in addition to Telindus own family of voice band modems, also finds it's way to the global market through a close collaboration with Analog Devices Inc.
Paper 177: Anisotropic diffusion filtering steered by Bayesian framework and a Laplacian prior for ideal image gradient
The anisotropic diffusion filter introduced by Perona an Malik can be steered using a variety of diffusivity functions. In this work, the relationship between diffusivity functions that affect edges and probabilities of edge presence under a marginal prior on ideal, noise-free image gradient are examined. In particular we assume a Laplacian prior for the ideal gradient and analyze the explicit specification of the diffusivity function in terms of edge probabilities under the assumed prior. The obtained probabilistic diffusion function belongs to class diffusivity functions that induce actively edge enhancement while flattening weak edges and, unlike most of its counterparts, it does not require the optimization of free parameters. The topology study of the proposed function indicates that edges can be enhanced over a broad range of gradients. Experimental results support this assessment. Furthermore, our results also offer a new and interesting interpretation of some widely used diffusivity functions, which are now compared to edge-stopping functions under a marginal prior for the ideal image gradient.
Paper 178: Consistency Checks for Particle Filters with Application to Image Stabilization
An inconsistent particle filter produces in a statistical sense larger estimation errors than predicted by the model on which the filter is based. Inconsistent behaviour of a particle filter can be detected online by checking whether the predicted measurements (derived from the particles that represent the one-step-ahead prediction pdf) comply in a statistical sense with the observed measurements.
This principle is demonstrated in an image stabilization application. We consider an image sequence of a scene consisting of a dynamic foreground and a static background. The motion of the camera (slow rotations and zooming) is modelled with an 8-dim state vector describing a projective geometrical transformation that, inversely applied to the current frame, compensates the camera motion. The dynamics of the state vector is modelled as a first order AR process.
The measurements of the system are corner points (detected in the first frame using the Harris operator) that are tracked using Lucas-Kanade. These measurements are nonlinearly related to the state vector. The purpose of the particle filtering is to estimate the state vector using the measurements. However, the filter behaves inconsistently because a few corner points belong to the foreground. Using inconsistency checks these foreground points are detected and removed from the list of measurements.
0 comments:
Post a Comment