# Advances in Nonlinear Speech Processing

The problem that the authors want to solve is to improve the classification rates for Automatic Emotion Recognition in Speech.

The authors refer to the work of affective computing and indicate that they use the several machine learning techniques as a basis for their work.

The main idea of authors is to use the Feature Subset selection method to reduce irrelevant or redundant features and improve the classification rates.

The authors conducted an experiment using RekEmoZio Database, classifier are Instance-Based Learning (IB), Decision Tree (ID3, C4.5), Naive Bayes (NB) and Naive Bayesian Tree learner (NBT). Experiments were carried out with and without FSS in order to compare the difference classification rates by feature selection process.

The authors claim that classifiers with FSS approach can improve the accuracy more than 15%.

Cantu-Paz, E. (2002), 'Feature subset selection by estimation of distribution algorithms', , 303--310.

The authors want to determine if the EDAs present advantages over other GA methods in terms of accuracy or speed in this problem.

They begin by referring the work of [Inza et al. 2000] and the authors present experiments with four evolutionary algorithm (simple GA and three estimation of distribution algorithms: compact GA [Harik et al., 1998], the extended compact GA [Harik et al., 1999], the Bayesian Optimization Algorithm [Pelikan et al. 1999]).

The classifier induced in the experiments was a Naive Bayes (NB). The experiments used the C++ implementations of the ecGA and BOA that are distributed by their authors on the web. The sGA and Naive Bayes were coded by C++. Complier is g++ 2.96 using -O2 optimizations. The experiments were executed on a single processor with dual 1.5 GHz Intel Xeon processors and 512 Mb of memory. The operation system is Linux Red had 7.1. Follow are the data set used in experiments.

The first four data sets are available in the UCI repository [Blake and Merz, 1998]. The definition of Random21 is from the paper by [Inza et al. 2000]. The authors used 5 iterations of 2-fold cross validation (5x2cv) and a combined F test proposed by [Alpaydin, 1999].

The authors state that "they did not find any evidence to support or reject the use of the sophisticated model building EAs in this problem. Taking into account the (preliminary) experiments where the simple GA with smaller populations was much faster than the other algorithms and found feature subset of similar quality." This result is different with [Inza et al. 2000]. The authors state this may be some different detail in experiments setup.

Chen, H.; Yuan, S. & Jiang, K. (2005), 'Fitness Approximation in Estimation of Distribution Algorithms for Feature Selection', Lecture notes in computer science 3809, 904.

The problem of FSS-ENBA which the authors identified is "In the wrappers for feature selection optimization, the time to run EDAs is dominated by the 'slow-to-compute' fitness function evaluation".

The authors begin by referring to the work of [Inza et al., 2000] and indicate that their paper is an improvement on the method of FSS-ENBA.

The authors claim that the evolution control and surrogate approach can be used to reduce the computational cost by integrating approximate model by using the past evaluation knowledge in some cases. But in some extreme situations, such as high dimensionality, ill distribution and limited number of training samples, surrogate does not work well.

The authors introduce a "Fast estimation of Distribution Algorithm" (FEDA). It use Bayesian Networks to estimate the probability distribution of each generation and extended as approximate models to assign approximate fitness value. The main idea is the high possible solutions have more chances to be evaluated by actual fitness function and FEDA rejects repeat evaluate some useless individuals. The authors also introduce the Random Control and Model Management Strategy method to avoid local optimal.

Datasets are German, soybean, chess, anneal and mushroom from UCI repository. The populations have 2000 individuals. Classifier is a naive Bayes (NB)

The authors claim that the FEDA can get a more compact classifier with higher accuracy and less actual evaluations compare with FSS-ENBA

Inza, I.; Larranaga, P.; Etxeberria, R. & Sierra, B. (2000), "Feature subset selection by Bayesian network-based optimization", Artificial Intelligence 123(1-2), 157--184.

The authors begin by referring to the work of Feature Subset Selection and state that how to determine the rate of crossover and mutation is a problem in GA.

The authors appear first present a new search engine FSS-ENBA for feature subset selection. The authors begin by introducing the concepts and terms of Feature Subset Selection (FSS), Estimation of Distribution Algorithm (EDA) and the Bayesian Network. EDA base on GA and does not have crossover and mutation operator. It generates new solutions by the factorization of the probability distribution of best individuals in each generation of the search.

The authors state that in ENBA, this factorization is carried out by a Bayesian network.

The two learning algorithms are Decision Tree (ID3) and Naive Bayes. 5 iterations of a 2 fold cross validation were applied. Experiments were run in a SUN SPARC machine The MLC++ software was used to execute Naive Bayes and ID3 algorithms.

The authors claim "In the majority of real datasets, the accuracy maintenances with considerable dimensionality reductions are achieved for ID3, in the case of NB the dimensionality reduction is normally coupled with notable accuracy improvements".

The authors also claim "A reflection on the over fitting problem in FSS is carried out and inspired on this reflection the stop criteria of FSS-EBNA is determinedso related with the number of instances of the domain"

Saeys, Y., Degroeve, S., Aeyels, D., Van de Peer, Y. and Rouze, P. "Fast feature selection using a simple estimation of distribution algorithm: a case study on splice site prediction," Bioinformatics (19:90002), 2003, pp. 179--188.

The authors want to find a way to perform a fast detection of relevant feature subsets using the technique of constrained feature subsets.

The authors begin by referring to work of tradition Feature Subset Selection (FSS) methods which are sequential and are based on a greedy heuristic [kohavi and Johnm 1997]. Some advanced methods use heuristics to search the space of feature subsets, such as Genetic Algorithm [Kudo and Sklansky, 2000; Siedelecky and Sklansky, 1988; Vafaie and De Jong, 1993], Estimation of Distribution Algorithms. In this paper, the authors present a simple EDA as wrapper for feature subset selection for splice site prediction.

In this case, the authors compare using greed search with heuristic search using a sample EDA (UMDA, [Mühlenbein, 1998]) approach. Both techniques combine the Naive Bayes Method (NBM), the Linear Support Vector Machine (LSVM) and the Polynomial-SVM (PSVM) for discriminating. A q9 statistic [Zhang and Zhang, 2002] is used to be a selection criterion.

The authors claim that their method "performs a fast detection of relevant feature subsets using the technique of constrained feature subsets. Compared to the traditional greedy methods the gain in speed can be up to one order of magnitude, with results being comparable or even better than the greedy methods."

Saeys, Y., Degroeve, S., Aeyels, D., Rouzé, P. and Van de Peer, Y. "Feature selection for splice site prediction: a new method using EDA-based feature ranking," BMC bioinformatics (5:1), 2004, pp. 64.

The authors claim that the problem of the most common usage of GA/EDAs in feature selection is to search for an optimal solution which is lost information and not enough for whole elimination process.

In this paper, the authors present a new method for feature subset selection applied to splice site prediction, based on estimation of distribution algorithms. People may want to know more information about the underlying processes to help eliminate as more as possible irrelevant features and do not sharply decrease the classification performance. The authors derived a feature ranking method (EDA-R) from the estimated distribution of the algorithm to solve this problem. The authors also claim that another advantage of EDA-R is it can be used to evaluate the features weight. As a result, the importance of the features can be distinguished and visualized.

The authors use three dataset (400, 528, 2096 features). Naive Bayes classifier and the Support Vector Machine are used in experiments as classification methods (as Wrapper approach). Compare EDA-R feature ranking to two other selection strategies , sequential backward elimination (SBE) and an advanced filter method described by [Koller and Sahami, 1996] (KS)

The authors stated that "the feature subset selection using EDA-based ranking provides a robust framework for feature selection in splice site prediction." Another benefit is this method can evaluate the feature weights, which is shown to be useful to extract knowledge from complex data.

### Reference:

- Koller D, Sahami M: Toward optimal feature selection. In Proceedings Of the 13th International Conference on Machine Learning 1996:284-292