Preface Neural Networks, or Artificial Neural Networks to be more precise, represent a technology that is rooted in many disciplines: neurosciences, mathematics, statistics, physics, computer science, and engineering. Neural networks find application in such diverse fields as modeling, time series analysis, pattern recognition, signal processing, and control by virtue of a important property: the ability to learn from input data with or without a teacher. This book provides a comprehensive foundation of neural networks, recognizing the interdisciplinary nature of the subject. The book consists of four parts, organized as follows: Introductory material, chapters 1 and 2 Learning machines with a teacher, chapters 3 through 7 Learning machines without a teacher, chapters 8 through 12 Nonlinear dynamical systems, chapters 13 through 15 Chapter 1. Introduction 1.1. What is a Neural Network? The brain routinely accomplishes perceptual recognition tasks (e.g., recognizing a familiar face embedded in an unfamiliar scene) in approximately 100-200 ms, whereas tasks of much lesser complexity may take days on a conventional computer. For another example, consider the sonar of a bat. In addition to providing information about how far away a target is, a bat sonar conveyes information about the relative velocity, the size, the size of various features, and the azimuth and elevation, of the target. The complex neural computations needed to extract all this information from the target echo occur within a brain the size of a plum. We offer the following definition of a neural network viewed as an adaptive machine: A neural network is a massively parallel distributed processor made up of simple processing units, which has a natural propensity for storing experiential knowledge and making it available for use. It resembles the brain in two respects: 1. Knowledge is acquired by the network from its environment through a learning process. 2. Interneuron connection strengths, known as synaptic weights, are used to store the acquired knowledge. 1.2. Human Brain 1.3. Models of a Neuron We identify three basic elements of the neuronal model: 1. A set of synapses, or connecting links, each of which is characterized by a weight or strength of its own. 2. An adder for summing the input signals. 3. An activation function for limiting the amplitude of the output of the neuron. The neuronal model also includes a bias, which has the effect of increasing or lowering the net input of the activation function. 1.4. Neural Networks Viewed as Directed Graphs 1.5. Feedback 1.6. Network Architectures 1.7. Knowledge Representation 1.8. Artificial Intelligence and Neural Networks 1.9. Historical Notes Chapter 2. Learning Processes 2.1. Introduction The property that is of primary significance for a neural network is the ability to learn from its environment, and to improve its performance through learning. A neural network learns about its environment over time through an interactive process of adjustments applied to its synaptic weights and bias levels. 2.2. Error-Correction Learning 2.3. Memory-Based Learning 2.4. Hebbian Learning 2.5. Competitive Learning 2.6. Boltzmann Learning 2.7. Credit Assignment Problem 2.8. Learning with a Teacher 2.9. Learning without a Teacher 2.10. Learning Tasks 2.11. Memory 2.12. Adaptation 2.13. Statistical Nature of the Learning Process 2.14. Statistical Learning Theory 2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
Preface Neural Networks, or Artificial Neural Networks to be more precise, represent a technology that is rooted in many disciplines: neurosciences, mathematics, statistics, physics, computer science, and engineering. Neural networks find application in such diverse fields as modeling, time series analysis, pattern recognition, signal processing, and control by virtue of a important property: the ability to learn from input data with or without a teacher.
This book provides a comprehensive foundation of neural networks, recognizing the interdisciplinary nature of the subject.
The book consists of four parts, organized as follows:
Chapter 1. Introduction 1.1. What is a Neural Network? The brain routinely accomplishes perceptual recognition tasks (e.g., recognizing a familiar face embedded in an unfamiliar scene) in approximately 100-200 ms, whereas tasks of much lesser complexity may take days on a conventional computer. For another example, consider the sonar of a bat. In addition to providing information about how far away a target is, a bat sonar conveyes information about the relative velocity, the size, the size of various features, and the azimuth and elevation, of the target. The complex neural computations needed to extract all this information from the target echo occur within a brain the size of a plum. We offer the following definition of a neural network viewed as an adaptive machine: A neural network is a massively parallel distributed processor made up of simple processing units, which has a natural propensity for storing experiential knowledge and making it available for use. It resembles the brain in two respects: 1. Knowledge is acquired by the network from its environment through a learning process. 2. Interneuron connection strengths, known as synaptic weights, are used to store the acquired knowledge. 1.2. Human Brain 1.3. Models of a Neuron We identify three basic elements of the neuronal model: 1. A set of synapses, or connecting links, each of which is characterized by a weight or strength of its own. 2. An adder for summing the input signals. 3. An activation function for limiting the amplitude of the output of the neuron. The neuronal model also includes a bias, which has the effect of increasing or lowering the net input of the activation function. 1.4. Neural Networks Viewed as Directed Graphs 1.5. Feedback 1.6. Network Architectures 1.7. Knowledge Representation 1.8. Artificial Intelligence and Neural Networks 1.9. Historical Notes Chapter 2. Learning Processes 2.1. Introduction The property that is of primary significance for a neural network is the ability to learn from its environment, and to improve its performance through learning. A neural network learns about its environment over time through an interactive process of adjustments applied to its synaptic weights and bias levels. 2.2. Error-Correction Learning 2.3. Memory-Based Learning 2.4. Hebbian Learning 2.5. Competitive Learning 2.6. Boltzmann Learning 2.7. Credit Assignment Problem 2.8. Learning with a Teacher 2.9. Learning without a Teacher 2.10. Learning Tasks 2.11. Memory 2.12. Adaptation 2.13. Statistical Nature of the Learning Process 2.14. Statistical Learning Theory 2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
Chapter 1. Introduction
1.1. What is a Neural Network? The brain routinely accomplishes perceptual recognition tasks (e.g., recognizing a familiar face embedded in an unfamiliar scene) in approximately 100-200 ms, whereas tasks of much lesser complexity may take days on a conventional computer. For another example, consider the sonar of a bat. In addition to providing information about how far away a target is, a bat sonar conveyes information about the relative velocity, the size, the size of various features, and the azimuth and elevation, of the target. The complex neural computations needed to extract all this information from the target echo occur within a brain the size of a plum. We offer the following definition of a neural network viewed as an adaptive machine: A neural network is a massively parallel distributed processor made up of simple processing units, which has a natural propensity for storing experiential knowledge and making it available for use. It resembles the brain in two respects: 1. Knowledge is acquired by the network from its environment through a learning process. 2. Interneuron connection strengths, known as synaptic weights, are used to store the acquired knowledge. 1.2. Human Brain 1.3. Models of a Neuron We identify three basic elements of the neuronal model: 1. A set of synapses, or connecting links, each of which is characterized by a weight or strength of its own. 2. An adder for summing the input signals. 3. An activation function for limiting the amplitude of the output of the neuron. The neuronal model also includes a bias, which has the effect of increasing or lowering the net input of the activation function. 1.4. Neural Networks Viewed as Directed Graphs 1.5. Feedback 1.6. Network Architectures 1.7. Knowledge Representation 1.8. Artificial Intelligence and Neural Networks 1.9. Historical Notes Chapter 2. Learning Processes 2.1. Introduction The property that is of primary significance for a neural network is the ability to learn from its environment, and to improve its performance through learning. A neural network learns about its environment over time through an interactive process of adjustments applied to its synaptic weights and bias levels. 2.2. Error-Correction Learning 2.3. Memory-Based Learning 2.4. Hebbian Learning 2.5. Competitive Learning 2.6. Boltzmann Learning 2.7. Credit Assignment Problem 2.8. Learning with a Teacher 2.9. Learning without a Teacher 2.10. Learning Tasks 2.11. Memory 2.12. Adaptation 2.13. Statistical Nature of the Learning Process 2.14. Statistical Learning Theory 2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
1.1. What is a Neural Network? The brain routinely accomplishes perceptual recognition tasks (e.g., recognizing a familiar face embedded in an unfamiliar scene) in approximately 100-200 ms, whereas tasks of much lesser complexity may take days on a conventional computer. For another example, consider the sonar of a bat. In addition to providing information about how far away a target is, a bat sonar conveyes information about the relative velocity, the size, the size of various features, and the azimuth and elevation, of the target. The complex neural computations needed to extract all this information from the target echo occur within a brain the size of a plum.
We offer the following definition of a neural network viewed as an adaptive machine:
A neural network is a massively parallel distributed processor made up of simple processing units, which has a natural propensity for storing experiential knowledge and making it available for use. It resembles the brain in two respects: 1. Knowledge is acquired by the network from its environment through a learning process. 2. Interneuron connection strengths, known as synaptic weights, are used to store the acquired knowledge.
1.2. Human Brain 1.3. Models of a Neuron We identify three basic elements of the neuronal model: 1. A set of synapses, or connecting links, each of which is characterized by a weight or strength of its own. 2. An adder for summing the input signals. 3. An activation function for limiting the amplitude of the output of the neuron. The neuronal model also includes a bias, which has the effect of increasing or lowering the net input of the activation function. 1.4. Neural Networks Viewed as Directed Graphs 1.5. Feedback 1.6. Network Architectures 1.7. Knowledge Representation 1.8. Artificial Intelligence and Neural Networks 1.9. Historical Notes Chapter 2. Learning Processes 2.1. Introduction The property that is of primary significance for a neural network is the ability to learn from its environment, and to improve its performance through learning. A neural network learns about its environment over time through an interactive process of adjustments applied to its synaptic weights and bias levels. 2.2. Error-Correction Learning 2.3. Memory-Based Learning 2.4. Hebbian Learning 2.5. Competitive Learning 2.6. Boltzmann Learning 2.7. Credit Assignment Problem 2.8. Learning with a Teacher 2.9. Learning without a Teacher 2.10. Learning Tasks 2.11. Memory 2.12. Adaptation 2.13. Statistical Nature of the Learning Process 2.14. Statistical Learning Theory 2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
1.2. Human Brain
1.3. Models of a Neuron We identify three basic elements of the neuronal model: 1. A set of synapses, or connecting links, each of which is characterized by a weight or strength of its own. 2. An adder for summing the input signals. 3. An activation function for limiting the amplitude of the output of the neuron. The neuronal model also includes a bias, which has the effect of increasing or lowering the net input of the activation function. 1.4. Neural Networks Viewed as Directed Graphs 1.5. Feedback 1.6. Network Architectures 1.7. Knowledge Representation 1.8. Artificial Intelligence and Neural Networks 1.9. Historical Notes Chapter 2. Learning Processes 2.1. Introduction The property that is of primary significance for a neural network is the ability to learn from its environment, and to improve its performance through learning. A neural network learns about its environment over time through an interactive process of adjustments applied to its synaptic weights and bias levels. 2.2. Error-Correction Learning 2.3. Memory-Based Learning 2.4. Hebbian Learning 2.5. Competitive Learning 2.6. Boltzmann Learning 2.7. Credit Assignment Problem 2.8. Learning with a Teacher 2.9. Learning without a Teacher 2.10. Learning Tasks 2.11. Memory 2.12. Adaptation 2.13. Statistical Nature of the Learning Process 2.14. Statistical Learning Theory 2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
1.3. Models of a Neuron We identify three basic elements of the neuronal model: 1. A set of synapses, or connecting links, each of which is characterized by a weight or strength of its own. 2. An adder for summing the input signals. 3. An activation function for limiting the amplitude of the output of the neuron. The neuronal model also includes a bias, which has the effect of increasing or lowering the net input of the activation function.
1.4. Neural Networks Viewed as Directed Graphs 1.5. Feedback 1.6. Network Architectures 1.7. Knowledge Representation 1.8. Artificial Intelligence and Neural Networks 1.9. Historical Notes Chapter 2. Learning Processes 2.1. Introduction The property that is of primary significance for a neural network is the ability to learn from its environment, and to improve its performance through learning. A neural network learns about its environment over time through an interactive process of adjustments applied to its synaptic weights and bias levels. 2.2. Error-Correction Learning 2.3. Memory-Based Learning 2.4. Hebbian Learning 2.5. Competitive Learning 2.6. Boltzmann Learning 2.7. Credit Assignment Problem 2.8. Learning with a Teacher 2.9. Learning without a Teacher 2.10. Learning Tasks 2.11. Memory 2.12. Adaptation 2.13. Statistical Nature of the Learning Process 2.14. Statistical Learning Theory 2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
1.4. Neural Networks Viewed as Directed Graphs
1.5. Feedback 1.6. Network Architectures 1.7. Knowledge Representation 1.8. Artificial Intelligence and Neural Networks 1.9. Historical Notes Chapter 2. Learning Processes 2.1. Introduction The property that is of primary significance for a neural network is the ability to learn from its environment, and to improve its performance through learning. A neural network learns about its environment over time through an interactive process of adjustments applied to its synaptic weights and bias levels. 2.2. Error-Correction Learning 2.3. Memory-Based Learning 2.4. Hebbian Learning 2.5. Competitive Learning 2.6. Boltzmann Learning 2.7. Credit Assignment Problem 2.8. Learning with a Teacher 2.9. Learning without a Teacher 2.10. Learning Tasks 2.11. Memory 2.12. Adaptation 2.13. Statistical Nature of the Learning Process 2.14. Statistical Learning Theory 2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
1.5. Feedback
1.6. Network Architectures 1.7. Knowledge Representation 1.8. Artificial Intelligence and Neural Networks 1.9. Historical Notes Chapter 2. Learning Processes 2.1. Introduction The property that is of primary significance for a neural network is the ability to learn from its environment, and to improve its performance through learning. A neural network learns about its environment over time through an interactive process of adjustments applied to its synaptic weights and bias levels. 2.2. Error-Correction Learning 2.3. Memory-Based Learning 2.4. Hebbian Learning 2.5. Competitive Learning 2.6. Boltzmann Learning 2.7. Credit Assignment Problem 2.8. Learning with a Teacher 2.9. Learning without a Teacher 2.10. Learning Tasks 2.11. Memory 2.12. Adaptation 2.13. Statistical Nature of the Learning Process 2.14. Statistical Learning Theory 2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
1.6. Network Architectures
1.7. Knowledge Representation 1.8. Artificial Intelligence and Neural Networks 1.9. Historical Notes Chapter 2. Learning Processes 2.1. Introduction The property that is of primary significance for a neural network is the ability to learn from its environment, and to improve its performance through learning. A neural network learns about its environment over time through an interactive process of adjustments applied to its synaptic weights and bias levels. 2.2. Error-Correction Learning 2.3. Memory-Based Learning 2.4. Hebbian Learning 2.5. Competitive Learning 2.6. Boltzmann Learning 2.7. Credit Assignment Problem 2.8. Learning with a Teacher 2.9. Learning without a Teacher 2.10. Learning Tasks 2.11. Memory 2.12. Adaptation 2.13. Statistical Nature of the Learning Process 2.14. Statistical Learning Theory 2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
1.7. Knowledge Representation
1.8. Artificial Intelligence and Neural Networks 1.9. Historical Notes Chapter 2. Learning Processes 2.1. Introduction The property that is of primary significance for a neural network is the ability to learn from its environment, and to improve its performance through learning. A neural network learns about its environment over time through an interactive process of adjustments applied to its synaptic weights and bias levels. 2.2. Error-Correction Learning 2.3. Memory-Based Learning 2.4. Hebbian Learning 2.5. Competitive Learning 2.6. Boltzmann Learning 2.7. Credit Assignment Problem 2.8. Learning with a Teacher 2.9. Learning without a Teacher 2.10. Learning Tasks 2.11. Memory 2.12. Adaptation 2.13. Statistical Nature of the Learning Process 2.14. Statistical Learning Theory 2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
1.8. Artificial Intelligence and Neural Networks
1.9. Historical Notes Chapter 2. Learning Processes 2.1. Introduction The property that is of primary significance for a neural network is the ability to learn from its environment, and to improve its performance through learning. A neural network learns about its environment over time through an interactive process of adjustments applied to its synaptic weights and bias levels. 2.2. Error-Correction Learning 2.3. Memory-Based Learning 2.4. Hebbian Learning 2.5. Competitive Learning 2.6. Boltzmann Learning 2.7. Credit Assignment Problem 2.8. Learning with a Teacher 2.9. Learning without a Teacher 2.10. Learning Tasks 2.11. Memory 2.12. Adaptation 2.13. Statistical Nature of the Learning Process 2.14. Statistical Learning Theory 2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
1.9. Historical Notes
Chapter 2. Learning Processes 2.1. Introduction The property that is of primary significance for a neural network is the ability to learn from its environment, and to improve its performance through learning. A neural network learns about its environment over time through an interactive process of adjustments applied to its synaptic weights and bias levels. 2.2. Error-Correction Learning 2.3. Memory-Based Learning 2.4. Hebbian Learning 2.5. Competitive Learning 2.6. Boltzmann Learning 2.7. Credit Assignment Problem 2.8. Learning with a Teacher 2.9. Learning without a Teacher 2.10. Learning Tasks 2.11. Memory 2.12. Adaptation 2.13. Statistical Nature of the Learning Process 2.14. Statistical Learning Theory 2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
Chapter 2. Learning Processes
2.1. Introduction The property that is of primary significance for a neural network is the ability to learn from its environment, and to improve its performance through learning. A neural network learns about its environment over time through an interactive process of adjustments applied to its synaptic weights and bias levels. 2.2. Error-Correction Learning 2.3. Memory-Based Learning 2.4. Hebbian Learning 2.5. Competitive Learning 2.6. Boltzmann Learning 2.7. Credit Assignment Problem 2.8. Learning with a Teacher 2.9. Learning without a Teacher 2.10. Learning Tasks 2.11. Memory 2.12. Adaptation 2.13. Statistical Nature of the Learning Process 2.14. Statistical Learning Theory 2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
2.1. Introduction The property that is of primary significance for a neural network is the ability to learn from its environment, and to improve its performance through learning. A neural network learns about its environment over time through an interactive process of adjustments applied to its synaptic weights and bias levels.
2.2. Error-Correction Learning 2.3. Memory-Based Learning 2.4. Hebbian Learning 2.5. Competitive Learning 2.6. Boltzmann Learning 2.7. Credit Assignment Problem 2.8. Learning with a Teacher 2.9. Learning without a Teacher 2.10. Learning Tasks 2.11. Memory 2.12. Adaptation 2.13. Statistical Nature of the Learning Process 2.14. Statistical Learning Theory 2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
2.2. Error-Correction Learning
2.3. Memory-Based Learning 2.4. Hebbian Learning 2.5. Competitive Learning 2.6. Boltzmann Learning 2.7. Credit Assignment Problem 2.8. Learning with a Teacher 2.9. Learning without a Teacher 2.10. Learning Tasks 2.11. Memory 2.12. Adaptation 2.13. Statistical Nature of the Learning Process 2.14. Statistical Learning Theory 2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
2.3. Memory-Based Learning
2.4. Hebbian Learning 2.5. Competitive Learning 2.6. Boltzmann Learning 2.7. Credit Assignment Problem 2.8. Learning with a Teacher 2.9. Learning without a Teacher 2.10. Learning Tasks 2.11. Memory 2.12. Adaptation 2.13. Statistical Nature of the Learning Process 2.14. Statistical Learning Theory 2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
2.4. Hebbian Learning
2.5. Competitive Learning 2.6. Boltzmann Learning 2.7. Credit Assignment Problem 2.8. Learning with a Teacher 2.9. Learning without a Teacher 2.10. Learning Tasks 2.11. Memory 2.12. Adaptation 2.13. Statistical Nature of the Learning Process 2.14. Statistical Learning Theory 2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
2.5. Competitive Learning
2.6. Boltzmann Learning 2.7. Credit Assignment Problem 2.8. Learning with a Teacher 2.9. Learning without a Teacher 2.10. Learning Tasks 2.11. Memory 2.12. Adaptation 2.13. Statistical Nature of the Learning Process 2.14. Statistical Learning Theory 2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
2.6. Boltzmann Learning
2.7. Credit Assignment Problem 2.8. Learning with a Teacher 2.9. Learning without a Teacher 2.10. Learning Tasks 2.11. Memory 2.12. Adaptation 2.13. Statistical Nature of the Learning Process 2.14. Statistical Learning Theory 2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
2.7. Credit Assignment Problem
2.8. Learning with a Teacher 2.9. Learning without a Teacher 2.10. Learning Tasks 2.11. Memory 2.12. Adaptation 2.13. Statistical Nature of the Learning Process 2.14. Statistical Learning Theory 2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
2.8. Learning with a Teacher
2.9. Learning without a Teacher 2.10. Learning Tasks 2.11. Memory 2.12. Adaptation 2.13. Statistical Nature of the Learning Process 2.14. Statistical Learning Theory 2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
2.9. Learning without a Teacher
2.10. Learning Tasks 2.11. Memory 2.12. Adaptation 2.13. Statistical Nature of the Learning Process 2.14. Statistical Learning Theory 2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
2.10. Learning Tasks
2.11. Memory 2.12. Adaptation 2.13. Statistical Nature of the Learning Process 2.14. Statistical Learning Theory 2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
2.11. Memory
2.12. Adaptation 2.13. Statistical Nature of the Learning Process 2.14. Statistical Learning Theory 2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
2.12. Adaptation
2.13. Statistical Nature of the Learning Process 2.14. Statistical Learning Theory 2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
2.13. Statistical Nature of the Learning Process
2.14. Statistical Learning Theory 2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
2.14. Statistical Learning Theory
2.15. Probably Approximately Correct Model of Learning 2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
2.15. Probably Approximately Correct Model of Learning
2.16. Summary and Discussion Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
2.16. Summary and Discussion
Chapter 3. Single Layer Perceptrons 3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
Chapter 3. Single Layer Perceptrons
3.1. Introduction 3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
3.1. Introduction
3.2. Adaptive Filtering Problem 3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
3.2. Adaptive Filtering Problem
3.3. Unconstrained Optimization Techniques 3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
3.3. Unconstrained Optimization Techniques
3.4. Linear Least-Squares Filters 3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
3.4. Linear Least-Squares Filters
3.5. Least-Mean-Square Algorithm 3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
3.5. Least-Mean-Square Algorithm
3.6. Learning Curves 3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
3.6. Learning Curves
3.7. Learning Rate Annealing Techniques 3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
3.7. Learning Rate Annealing Techniques
3.8. Perceptron 3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
3.8. Perceptron
3.9. Perceptron Convergence Theorem 3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
3.9. Perceptron Convergence Theorem
3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment 3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
3.10. Relation Between the Perceptron and Bayes Classifier for a Gaussian Environment
3.11. Summary and Discussion Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
3.11. Summary and Discussion
Chapter 4. Multilayer Perceptrons 4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
Chapter 4. Multilayer Perceptrons
4.1. Introduction 4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
4.1. Introduction
4.2. Some Preliminaries 4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
4.2. Some Preliminaries
4.3. Back-Propagation Algorithm 4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
4.3. Back-Propagation Algorithm
4.4. Summary of the Back-Propagation Algorithm 4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
4.4. Summary of the Back-Propagation Algorithm
4.5. XOR Problem 4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
4.5. XOR Problem
4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better 4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
4.6. Heuristics for Making the Back-Propagation Algorithm Perform Better
4.7. Output Representation and Decision Rule 4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
4.7. Output Representation and Decision Rule
4.8. Computer Experiment 4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
4.8. Computer Experiment
4.9. Feature Detection 4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
4.9. Feature Detection
4.10. Back-Propagation and Differentiation 4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
4.10. Back-Propagation and Differentiation
4.11. Hessian Matrix 4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
4.11. Hessian Matrix
4.12. Generalization 4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
4.12. Generalization
4.13. Approximation of Functions 4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
4.13. Approximation of Functions
4.14. Cross-Validation 4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
4.14. Cross-Validation
4.15. Network Pruning Techniques 4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
4.15. Network Pruning Techniques
4.16. Virtues and Limitations of Back-Propagation Learning 4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
4.16. Virtues and Limitations of Back-Propagation Learning
4.17. Accelerated Convergence of Back-Propagation Learning 4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
4.17. Accelerated Convergence of Back-Propagation Learning
4.18. Supervised Learning Viewed as an Optimization Problem 4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
4.18. Supervised Learning Viewed as an Optimization Problem
4.19. Convolutional Networks 4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
4.19. Convolutional Networks
4.20. Summary and Discussion Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
4.20. Summary and Discussion
Chapter 5. Radial-Basis Function Networks 5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
Chapter 5. Radial-Basis Function Networks
5.1. Introduction 5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
5.1. Introduction
5.2. Cover's Theorem on the Separability of Patterns 5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
5.2. Cover's Theorem on the Separability of Patterns
5.3. Interpolation Problem 5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
5.3. Interpolation Problem
5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem 5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
5.4. Supervised Learning as an Ill-Posed Hypersurface Reconstruction Problem
5.5. Regularization Theory 5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
5.5. Regularization Theory
5.6. Regularization Networks 5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
5.6. Regularization Networks
5.7. Generalized Radial-Basis Function Networks 5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
5.7. Generalized Radial-Basis Function Networks
5.8. XOR Problem (Revisited) 5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
5.8. XOR Problem (Revisited)
5.9. Estimation of the Regularization Parameter 5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
5.9. Estimation of the Regularization Parameter
5.10. Approximation Properties of RBF Networks 5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
5.10. Approximation Properties of RBF Networks
5.11. Comparison of RBF Networks and Multilayer Perceptrons 5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
5.11. Comparison of RBF Networks and Multilayer Perceptrons
5.12. Kernel Regression and Its Relation to RBF Networks 5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
5.12. Kernel Regression and Its Relation to RBF Networks
5.13. Learning Strategies 5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
5.13. Learning Strategies
5.14. Computer Experiment 5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
5.14. Computer Experiment
5.15. Summary and Discussion Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
5.15. Summary and Discussion
Chapter 6. Support Vector Machines 6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
Chapter 6. Support Vector Machines
6.1. Introduction 6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
6.1. Introduction
6.2. Optimal Hyperplane for Linearly Separable Patterns 6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
6.2. Optimal Hyperplane for Linearly Separable Patterns
6.3. Optimal Hyperplane for Nonseparable Patterns 6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
6.3. Optimal Hyperplane for Nonseparable Patterns
6.4. How to Build a Support Vector Machine for Pattern Recognition 6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
6.4. How to Build a Support Vector Machine for Pattern Recognition
6.5. Example: XOR Problem (Revisited) 6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
6.5. Example: XOR Problem (Revisited)
6.6. Computer Experiment 6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
6.6. Computer Experiment
6.7. Epsilon-Insensitive Loss Function 6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
6.7. Epsilon-Insensitive Loss Function
6.8. Support Vector Machines for Nonlinear Regression 6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
6.8. Support Vector Machines for Nonlinear Regression
6.9. Summary and Discussion Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
6.9. Summary and Discussion
Chapter 7. Committee Machines 7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
Chapter 7. Committee Machines
7.1. Introduction 7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
7.1. Introduction
7.2. Ensemble Averaging 7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
7.2. Ensemble Averaging
7.3. Computer Experiment I 7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
7.3. Computer Experiment I
7.4. Boosting 7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
7.4. Boosting
7.5. Computer Experiment II 7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
7.5. Computer Experiment II
7.6. Associative Gaussian Mixture Model 7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
7.6. Associative Gaussian Mixture Model
7.7. Hierarchical Mixture of Experts Model 7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
7.7. Hierarchical Mixture of Experts Model
7.8. Model Selection Using a Standard Decision Tree 7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
7.8. Model Selection Using a Standard Decision Tree
7.9. A Priori and A Posteriori Probabilities 7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
7.9. A Priori and A Posteriori Probabilities
7.10. Maximum Likelihood Estimation 7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
7.10. Maximum Likelihood Estimation
7.11. Learning Strategies for the HME Model 7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
7.11. Learning Strategies for the HME Model
7.12. EM Algorithm 7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
7.12. EM Algorithm
7.13. Application of the EM Algorithm to the HME Model 7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
7.13. Application of the EM Algorithm to the HME Model
7.14. Summary and Discussion Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
7.14. Summary and Discussion
Chapter 8. Principal Components Analysis 8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
Chapter 8. Principal Components Analysis
8.1. Introduction 8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
8.1. Introduction
8.2. Some Intuitive Principles of Self-Organization 8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
8.2. Some Intuitive Principles of Self-Organization
8.3. Principal Components Analysis 8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
8.3. Principal Components Analysis
8.4. Hebbian-Based Maximum Eigenfilter 8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
8.4. Hebbian-Based Maximum Eigenfilter
8.5. Hebbian-Based Principal Components Analysis 8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
8.5. Hebbian-Based Principal Components Analysis
8.6. Computer Experiment: Image Coding 8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
8.6. Computer Experiment: Image Coding
8.7. Adaptive Principal Components Analysis Using Lateral Inhibition 8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
8.7. Adaptive Principal Components Analysis Using Lateral Inhibition
8.8. Two Classes of PCA Algorithms 8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
8.8. Two Classes of PCA Algorithms
8.9. Batch and Adaptive Methods of Computation 8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
8.9. Batch and Adaptive Methods of Computation
8.10. Kernel-Based Principal Components Analysis 8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
8.10. Kernel-Based Principal Components Analysis
8.11. Summary and Discussion Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
8.11. Summary and Discussion
Chapter 9. Self-Organizing Maps 9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
Chapter 9. Self-Organizing Maps
9.1. Introduction 9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
9.1. Introduction
9.2. Two Basic Feature-Mapping Models 9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
9.2. Two Basic Feature-Mapping Models
9.3. Self-Organizing Map 9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
9.3. Self-Organizing Map
9.4. Summary of the SOM Algorithm 9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
9.4. Summary of the SOM Algorithm
9.5. Properties of the Feature Map 9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
9.5. Properties of the Feature Map
9.6. Computer Simulations 9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
9.6. Computer Simulations
9.7. Learning Vector Quantization 9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
9.7. Learning Vector Quantization
9.8. Computer Experiment: Adaptive Pattern Classification 9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
9.8. Computer Experiment: Adaptive Pattern Classification
9.9. Hierarchical Vector Quantization 9.10. Contextual Maps
9.9. Hierarchical Vector Quantization
9.10. Contextual Maps