mudit verma

34
Machine Learning What is Learning?  “Changes in the system that are adaptive in the sense that they enable the system to do the same task or tasks drawn from the same population more efficiently the next time." --Herbert Simon  "Learning is constructing or modifying representations of what is being experienced." --Ryszard Michalski  "Learning is making useful changes in our minds." --Marvin Minsky

Upload: mudit-verma

Post on 02-Jun-2018

233 views

Category:

Documents


0 download

TRANSCRIPT

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 1/34

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 2/34

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 3/34

Critic Sensors

Learning Element Performance Element Effector

Problem Generator

Components of a Learning System

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 4/34

• Learning Element -- makes changes to the system based onhow it is doing.• Performance Element -- the part that chooses the actions totake.• Critic tells the Learning Element how it is doing (e.g., successor failure) by comparing with a fixed standard of performance.

• Problem Generator suggests "problems" or actions that willgenerate new examples or experiences that will aid in trainingthe system further.• In designing a learning system, there are four major issues to beconsidered:

• components -- which parts of the performanceelement are to be improved• representation of those components• feedback available to the system• prior information available to the system

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 5/34

Major Paradigms of Machine Learning• Rote LearningOne-to-one mapping from inputs to stored representation."Learning by memorization”. Association-based storage andretrieval.• Analogy

- Determine correspondence between two differentrepresentations.- It is an inductive learning in which a system transfersknowledge from one database into that of a different domain.• Inductive Learning- Use specific examples to reach general conclusions.- Extrapolate from a given set of examples so that we canmake accurate predictions about future examples.

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 6/34

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 7/34

• Clustering

It is unsupervised, inductive learning in which "natural classes"are found for data instances, as well as ways of classifying them.

• Discovery

- Unsupervised learning, specific goal is not given.- It is both inductive and deductive learning in which system learnswithout the help from a teacher.

- It is deductive if it proves theorems and discovers concepts aboutthose theorems.- It is inductive when it raises conjectures (formulation of opinionusing incomplete information).• Reinforcement- Only feedback (positive or negative reward) given at the end of asequence of steps.• Requires assigning reward to steps by solving the credit assignmentproblem i.e., which steps should receive credit or blame for a final

result?

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 8/34

• Learning from examples (Concept learning)- Inductive learning in which concepts are learned from setsof labeled instances. - Given a set of examples of some concept/class/category,determine if a given example is an instance of the concept ornot.- If it is an instance, we call it a positive example .

- If it is not, it is called a negative example .

• Supervised Concept Learning by Induction- Given a training set of positive and negative examples of a

concept, construct a description that will accurately classifywhether future examples are positive or negative.- That is, learn some good estimation of function f given atraining set { (x1, y1), (x2, y2), ..., (xn, yn) } where each yi is

either + (positive) or - (negative).

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 9/34

Inductive Bias• Inductive learning is an inherently conjectural process

because any knowledge created by generalization fromspecific facts cannot be proven true; it can only be provenfalse. Hence, inductive inference is falsity preserving , nottruth preserving .• To generalize beyond the specific training examples, weneed constraints or biases on what f is best.• That is, learning can be viewed as searching theHypothesis Space H of possible f functions.• A bias allows us to choose one f over another one.• A completely unbiased inductive algorithm could onlymemorize the training examples and could not say anythingmore about other unseen examples.• Two types of biases are commonly used in machine

learning:

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 10/34

• Restricted Hypothesis Space BiasAllow only certain types of f functions, not arbitrary ones.• Preference BiasDefine a metric for comparing f s so as to determine whetherone is better than another.Inductive Learning Framework• Raw input data from sensors are preprocessed to obtain a

feature vector X , that adequately describes all of the relevantfeatures for classifying examples. Each X is a list of (attribute,value) pairs.For example,

X = {(Person, Sue), (Eye_Color, Brown, (Age, Young),(Sex, Female )} • The number of attributes (also called features) is fixed. Eachattribute has a fixed, finite number of possible values.

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 11/34

Inductive Learning by Nearest-NeighborClassification

• One simple approach to inductive learning is to save eachtraining example as a point in Feature Space, and thenclassify a new example by giving it the same classification

(+ or -) as its nearest neighbor in Feature Space .• The problem with this approach is that it doesn'tnecessarily generalize well if the examples are not"clustered."

Inductive Concept Learning by Learning Decision Trees• Goal: Build a decision tree for classifying examples aspositive or negative instances of a concept• A decision tree is a simple inductive learning structure.

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 12/34

• Given an instance of an object or situation, which isspecified by a set of properties, the tree returns a "yes" or"no" decision about that instance.• In decision tree each non-leaf node has associated with itan attribute (feature), each leaf node has associated with it aclassification (+ or -), and each arc has associated with it oneof the possible values of the attribute at the node where the

arc is directed from.• Decision Tree Construction using a Greedy Algorithm -

- Algorithm called ID3 or C5.0, originally developedby Quinlan (1987)

- Top-down construction of the decision tree byrecursively selecting the "best attribute" to use at thecurrent node in the tree.- Once the attribute is selected for the current node,generate children nodes, one for each possible valueof the selected attribute.

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 13/34

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 14/34

Case Studies

• Many case studies have shown that decision trees are atleast as accurate as human experts.• For example, one study for diagnosing breast cancer hadhumans correctly classifying the examples 65% of thetime, and the decision tree classified 72% correct.• British Petroleum designed a decision tree for gas-oilseparation for offshore oil platforms.• Replaced a rule-based expert system.

• Cessna designed an airplane flight controller using90,000 examples and 20 attributes per example.

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 15/34

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 16/34

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 17/34

• Deductive learning :- Deductive learning works on existing facts and knowledge and

deduces new knowledge from the old. So d eductive learning orreasoning can be described as reasoning of the form if A then B .- Arguably deductive learning does not generate "new"knowledge at all, it simply memorizes the logical consequences

of what is known already.- Deduction is in some sense the direct application ofknowledge in the production of new knowledge.- However, this new knowledge does not represent any newsemantic information : the rule represents the knowledgecompletely as the added knowledge since any time the assertions(A) are true then the conclusion B is true as well.- Purely deductive learning includes method like explanationbased learning.

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 18/34

Explanation-based learning

• This is based on deductive learning concept which convertsprinciples into usable rules.

• This kind of learning occurs when the system finds anexplanation of an instance it has seen, and generalizes theexplanation.

• The general rule follows logically from the backgroundknowledge possessed by the system.• The basic idea is to construct an explanation of the observedresult, and then generalize the explanation.• Then a new rule is built in which the left-hand side is the leavesof the proof tree, and the right-hand side is the variablized goal,up to any bindings that must be made with the generalized proof.• Any conditions true regardless of the variables are dropped.

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 19/34

PAC learning

Probably Approximately Correct learning. A system is PAC

if Pr[error(f, h) > e] < d , where e is the accuracy parameter ,d is the confidence parameter and h is hypothesis .

Bayesian learning

• Learning that treats the problem of building hypotheses as aparticular case of the problem of making predictions.• The probabilities of various hypotheses are estimated, andpredictions are made using the posterior probabilities of thehypotheses to weight them.

Adaptive dynamic programming

Adaptive dynamic programming is any kind of reinforcementlearning method that works by solving the utility equationsusing a dynamic programming algorithm.

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 20/34

Neural network

• A computational model somewhat similar to the humanbrain; it has many simple units that work in parallel with nocentral control. Connections between units are weighted, andthese weights can be modified by the learning system.

• It is a form of Connectionist learning in which the datastructure is a set of nodes connected by weighted links, eachnode passing a 0 or 1 to other links depending on whether afunction of its inputs reaches its activation level.

Relevance-based learning

This is a kind of in which background knowledge relates therelevance of a set of features in an instance to the generalgoal predicate.

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 21/34

• For example, if I see men in Rome speaking Latin, and Iknow that if seeing someone in a city speaking a language

usually means all people in the city speak that language, I canconclude Romans speak Latin.• In general, background knowledge, together with theobservations, allows the agent to form a new, general rule toexplain the observations.• The entailment constraint for Relevant Based Learning is

Hypothesis ^ Descriptions |= ClassificationsBackground ^ Descriptions ^ Classifications |=

Hypothesis

• This is a deductive form of learning, because it cannotproduce hypotheses that go beyond the backgroundknowledge and observations.• We presume that our knowledge base has a set of functionaldependencies that support the construction of hypotheses.

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 22/34

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 23/34

• New population is formed using old population andoffspring based on their fitness value.

• The "goodness" of an individual is measured by some fitness function .• This is repeated until some condition (for example noimprovement of the best solution or finite number ofrepetition) is satisfied.• Search can takes place in parallel, with manyindividuals in each generation.• The approach is a hill-climbing one, since in eachgeneration, the offspring of the best candidates are

preserved.• In the standard genetic algorithm approach, eachindividual is a bit-string that encodes its characteristics.

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 24/34

Outline of the Basic Genetic Algorithm1. [Start] Generate random population of n chromosomes(suitable solutions for the problem)2. [Fitness] Evaluate the fitness f(x) of each chromosome x in thepopulation3. Repeat until terminating condition is satisfied

3.1. [Selection] Select two parent chromosomes from apopulation according to their fitness (the better fitness, thebigger chance to be selected).3.2.[Crossover] With a crossover probability, cross over theparents to form new offspring (children). If no crossover wasperformed, offspring is the exact copy of parents.3.3. [Mutation] With a mutation probability, mutate newoffspring at each locus (position in chromosome).3.4. [Accepting] Generate new population by placing newoffspring

4. R eturn the best solution in current population

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 25/34

• The algorithm consists of looping through generations.• In each generation, a subset of the population is selected to

reproduce; usually this is a random selection in which theprobability of choice is proportional to fitness.• Selection is usually done with replacement (so a fit individualmay reproduce many times).• Reproduction occurs by randomly pairing all of theindividuals in the selection pool, and then generating two newindividuals by performing crossover , in which the initial n bits(where n is random) of the parents are exchanged.• There is a small chance that one of the genes in the resulting

individuals will mutate to a new value.• We may think that generating populations from only twoparents may cause you to loose the best chromosome from thelast population.

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 26/34

This means, that at least one of a generation's best solution iscopied without changes to a new population, so the best solution cansurvive to the succeeding generation.• Genetic algorithms are broadly applicable and have the advantagethat they require little knowledge encoded in the system.• However, as might be expected from a knowledge-poor approach,they give very poor performance on some problems .

• As the outline of the basic GA is very general. There are manyparameters and settings that can be implemented differently invarious problems. The following questions need to be answered:

* How to create chromosomes and what type of encoding tochoose?

* How to perform Crossover and Mutation, the two basicoperators of GA?* How to select parents for crossover?

(This can be done in many ways, but the main idea isto select the better parents)

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 27/34

Encoding of a Chromosome

A chromosome should in some way contain information aboutsolution that it represents.

The commonly used way of encoding is a binary string. Each chromosome is represented by a binary string and could

look like this:

Chromosome 1 1101100100110110Chromosome 2 1101111000011110

Each bit in the string can represent some characteristics of thesolution.

There are many other ways of encoding. The encoding dependsmainly on the problem.Crossover

After we have decided what encoding we will use, we canproceed to crossover operation.

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 28/34

Crossover operates on selected genes from parent chromosomesand creates new offspring.

The simplest way is to choose randomly some crossover pointand copy everything before this point from the first parent and thencopy everything after the crossover point from the other parent.

Crossover can be illustrated as follows: ( | is the crossover point):Chromosome 1 11011 | 00100110110

Chromosome 2 11011 | 11000011110Offspring 1 11011 | 11000011110Offspring 2 11011 | 00100110110

There are other ways to make crossover, for example we canchoose more crossover points.

Crossover can be quite complicated and depends mainly on theencoding of chromosomes.

Specific crossover made for a specific problem can improveperformance of the genetic algorithm.

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 29/34

Mutation

After a crossover is performed, mutation takes place. Mutation is intended to prevent falling of all solutions in

the population into a local optimum of the problem. Mutation operation randomly changes the offspring

resulted from crossover. In case of binary encoding we canswitch a few randomly chosen bits from 1 to 0 or from 0 to 1.

Mutation can be then illustrated as follows:Original offspring 1 110 1111000011110Original offspring 2 110110 01001101 10

Mutated offspring 1 110 0111000011110Mutated offspring 2 110110 11001101 10 The technique of mutation (as well as crossover) depends

mainly on the encoding of chromosomes.

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 30/34

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 31/34

Two point crossover - two crossover points are selected, binarystring from the beginning of the chromosome to the first crossoverpoint is copied from the first parent, the part from the first to the

second crossover point is copied from the other parent and the restis copied from the first parent again

11 0010 11 + 11 0111 11 = 11011111

Arithmetic crossover - Arithmetic operation is performed to make

a new offspring11001011 + 11011111 = 11001001 (AND)

Uniform crossover - bits are randomly copied from the first orfrom the second parent

110010 11 + 110111 01 = 11011111

2. Mutation

Bit inversion - selected bits are inverted

11001001 => 1 0001001

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 32/34

Permutation Encoding1. Crossover

Single point crossover - one crossover point is selected, thepermutation is copied from the first parent till the crossoverpoint, then the other parent is scanned and if the number isnot yet in the offspring, it is added

Note: there are more ways how to produce the rest aftercrossover point

(1 2 3 4 5 6 7 8 9) + ( 4 5 3 6 8 9 7 2 1) =

(1 2 3 4 5 6 8 9 7)

2. Mutation Order changing - two numbers are selected and exchanged

(1 2 3 4 5 6 8 9 7) => (1 8 3 4 5 6 2 9 7)

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 33/34

Value Encoding 1. Crossover

All crossovers methods from binary encoding can be used

2. Mutation

Adding (for real value encoding) - a small number is added to(or subtracted from) selected values

(1.29 5.68 2.86 4.11 5.55) => (1.29 5.68 2.73 4.22 5.55)

Tree Encoding1. Crossover

Tree crossover - one crossover point is selected in both parents,and the parts below crossover points are exchanged to producenew offspring

2. Mutation

Changing operator, number - selected nodes are changed

8/10/2019 Mudit Verma

http://slidepdf.com/reader/full/mudit-verma 34/34

• Evolutionary programming is usually taken to mean a morecomplex form of genetic programming in which the

individuals are more complex structures.• Classifier systems are genetic programs that develop rulesfor classifying input instances.• Each rule is weighted, and has some set of feature patternsas a condition and some pattern as an action.• On each cycle, the rules that match make bids proportionalto their weights.• One or more rules are then selected for application withprobabilities based on their bids.

• Weights are adjusted according to the contribution of therules to desired behavior; the bucket brigade algorithm[Holland, 1985] is an effective method of doing this.