GENETIC ALGORITHMS
Probable Not Impossible
By: Murugan Andezuthu Dharmaratnam
INTRODUCTION TO GENETIC ALGORITHM
What is genetic algorithm?
Genetic algorithms are inspired by Darwin's theory of evolution. Simply said,
problems are solved by an evolutionary process resulting in a best (fittest)
solution (survivor) - in other words, the solution is evolved
Short history of genetic algorithm
Evolutionary computing was introduced in the 1960s by I. Rechenberg in his work
"Evolution strategies" (Evolutionsstrategie in original). His idea was then
developed by other researchers. Genetic Algorithms (GAs) were invented by John
Holland and developed by him and his students and colleagues. This lead to
Holland's book "Adaption in Natural and Artificial Systems" published in 1975.
In 1992 John Koza has used genetic algorithm to evolve programs to perform
certain tasks. He called his method "genetic programming" (GP). LISP programs
were used, because programs in this language can expressed in the form of a
"parse tree", which is the object the GA works on.
Biological Background
All living organisms consist of cells. In each cell there is the same set of
chromosomes, chromosomes consist of blocks of DNA’s called the genes. Each gene
encodes a particular protein. Each gene encode a trait (Eg color of eye) . Each
gene has a position in the chromosome called the locus.
During reproduction recombination first occurs. Genes from parents combine to
form a whole new chromosome. The newly created chromosome can then be mutated.
In mutation the elements of DNA are bit changed. Changes are mainly caused by
errors in copying genes from parents.
Fitness of an organism is measured by success of organism in life (survival)
Why use genetic algorithm?
Purely analytical methods widely proved their efficiency. They nevertheless
suffer from a insurmountable weakness: Reality rarely obeys to those wonderful
differentiable functions your.
Other methods that arrived were to combining mathematical analysis and random
search.
This type of method works only with reduced search spaces.
EVOLUTION AND GENETIC ALGORITHMS
John Holland, from the University of Michigan began his work on genetic
algorithms at the beginning of the 60s. A first achievement was the publication
of Adaptation in Natural and Artificial System7 in 1975.
Holland had a double aim: to improve the understanding of natural adaptation
process, and to design artificial systems having properties similar to natural
systems8.
The basic idea is as follow: the genetic pool of a given population potentially
contains the solution, or a better solution, to a given adaptive problem. This
solution is not "active" because the genetic combination on which it relies is
split between several subjects. Only the association of different genomes can
lead to the solution.
Holland method is especially effective because he not only considered the role
of mutation (mutations improve very seldom the algorithms), but he also utilized
genetic recombination, (crossover) : these recombination, the crossover of
partial solutions greatly improve the capability of the algorithm to approach,
and eventually find, the optimum.
Search Space
If we are solving a problem, we are usually looking for
some solution which will be the best among others. The space of all feasible
solutions (the set of solutions among which the desired solution resides) is
called search space (also state space). Each point in the search space
represents one possible solution. Each possible solution can be "marked" by its
value (or fitness) for the problem. With GA we look for the best solution among
among a number of possible solutions - represented by one point in the search
space.
Looking for a solution is then equal to looking for some
extreme value (minimum or maximum) in the search space. At times the search
space may be well defined, but usually we know only a few points in the search
space. In the process of using GA, the process of finding solutions generates
other points (possible solutions) as evolution proceeds.
Example of a search space
The problem is that the search can be very complicated.
One may not know where to look for a solution or where to start. There are many
methods one can use for finding a suitable solution, but these methods do
not necessarily provide the best solution. Some of these methods are
hill climbing, tabu search, simulated annealing and the
genetic algorithm. The solutions found by these methods are often considered
as good solutions, because it is not often possible to prove what the optimum
is.
Functioning of a
Genetic Algorithm
As an example, we're going to enter a world of simplified
genetic. The "chromosomes" encode a group of linked features. "Genes" encode the
activation or deactivation of a feature.
Let us examine the global genetic pool of four
basilosaurus belonging to this world. We will consider the "chromosomes" which
encode the length of anterior members. The length of the "paw" and the length of
the "fingers" are encoded by four genes : the first two encode the "paw" and the
other two encode the fingers.
In our representation of the genome, the circle on blue
background depict the activation of a feature, the cross on green background
depict its deactivation. The ideal genome (short paws and long fingers) is:.
The genetic pool of our population is the following one:
We can notice that A and B are the closest to their
ancestors ; they've got quite long paws and short fingers. On the contrary, D is
close to the optimum, he just needs a small lengthening of his fingers.
This is such a peculiar world that the ability to move is
the main criteria of survival and reproduction. No female would easily accept to
marry basilosaurus whose paws would look like A's. But they all dream to meet D
one day.
The fitness is easy to compute : we just have to give one
point to each gene corresponding to the ideal. The perfect genome will then get
four points. The probability of reproduction of a given subject will directly
depend on this value. In our case, we'll get the following results :
Subject
|
Fitness
|
Reproduction probability
|
A
|
1
|
1/7 = 0.143
|
B
|
1
|
1/7 = 0.143
|
C
|
2
|
2/7 = 0.286
|
D
|
3
|
3/7 = 0.428
|
Total
|
7
|
7/7=1
|
We'll consider a cycle of reproduction with for
descendants, i.e. four mating concerning height subjects. D will be selected
four times and will then get four descendants. C will be selected twice and will
get two descendants. Finally A and B will only be selected once.
The reproduction pattern is the following :
Subject
|
Received genes
|
Genome
|
Fitness
|
Reproduction probability
|
A'
|
A : D :
|
|
2
|
2/10=0.2
|
B'
|
B : D :
|
|
2
|
2/10=0.2
|
C'
|
D : C :
|
|
3
|
3/10=0.3
|
D'
|
C :
D :
|
|
3
|
3/10=0.3
|
Total
|
|
|
10
|
10/10=1
|
During reproduction crossovers occur at a random place
(center of the genome for A', B' and C', just after the first gene for D'). The
link existing between the degree of adaptation and the probability of
reproduction leads to a trend to the rise of the average fitness of the
population. In our case, it jumps from 7 to 10.
During the following cycle of reproduction, C' and D' will
have a common descendant :
D' : + C' : =
The new subject has inherited the intended genome : his
paws have become flippers.
We can then see that the principle of genetic algorithms
is simple :
-
Encoding of the problem in a binary string.
-
Random generation of a population. This one includes a genetic pool representing
a group of possible solutions.
-
Reckoning of a fitness value for each subject. It will directly depend on the
distance to the optimum.
-
Selection of the subjects that will mate according to their share in the
population global fitness.
-
Genomes crossover and mutations.
-
And then start again from point 3.
The functioning of a genetic
algorithm can also be described in reference to genotype (GTYPE) and phenotype
(PTYPE) notions10.
-
Select pairs of GTYPE according to their PTYPE fitness.
-
Apply the genetic operators (crossover, mutation...) to create new GTYPE.
-
Develop GTYPE to get the PTYPE of a new generation and start again from 1.
Crossover is the basis of genetic algorithms, there is
nevertheless other operators like mutation. In fact, the desired solution may
happen not to be present inside a given genetic pool, even a large one.
Mutations allow the emergence of new genetic configurations which, by widening
the pool improve the chances to find the optimal solution. Other operators like
inversion are also possible, but we won't deal with them here.
D- Adaptation and Selection : the scaling problem
We saw before that in a genetic algorithm, the probability
of reproduction directly depends on the fitness of each subject. We simulate
that way the adaptive pressure of the environment.
The use of this method nevertheless set two types of
problems:
-
A "super-subject" being too often selected the whole population tends to
converge towards his genome. The diversity of the genetic pool is then too
reduced to allow the genetic algorithm to progress.
-
With the progression of the genetic algorithm, the differences between fitness
are reduced. The best ones then get quite the same selection probability as the
others and the genetic algorithm stops progressing.
In order to palliate these problems, it's possible to
transform the fitness values. Here are the four main methods :
1- Windowing: For each subject, reduce its fitness by the
fitness of the worse subject. This permits to strengthen the strongest subject
and to obtain a zero based distribution.
2- Exponential: This method,
proposed by S.R. Ladd11, consists in taking the square roots of the fitness plus
one. This permits to reduce the influence of the strongest subjects.
3- Linear Transformation: Apply a linear transformation to
each fitness, i.e. f ' = a.f + b. The strongest subjects are once again reduced.
4- Linear normalization: Fitness are linear zed. For
example over a population of 10 subjects, the first will get 100, the second 90,
80 ... The last will get 10. You then avoid the constraint of direct reckoning.
Even if the differences between the subjects are very strong, or weak, the
difference between probabilities of reproduction only depends on the ranking of
the subjects.
To illustrate these methods, let's consider a population
of four subjects to check the effect of scaling. For each subject, we give the
fitness and the corresponding selection probability.
Subjects
|
1
|
2
|
3
|
4
|
Rough Fitness
|
50/50%
|
25/25%
|
15/15%
|
10/10%
|
Windowing
|
40/66.7%
|
15/25%
|
5/8.3%
|
0/0%
|
Exponential
|
7.14/36.5%
|
5.1/26.1%
|
4.0/20.5%
|
3.32/16.9%
|
Linear transfo.
|
53.3/44.4%
|
33.3/27.8%
|
20/16.7
|
13.3/11.1%
|
Linear normalization
|
40/40%
|
30/30%
|
20/20%
|
10/10%
|
Windowing eliminates the weakest subject - the probability
comes to zero - and stimulates the strongest ones (the best one jumps from 50 %
to 67 %).
Exponential flattens the distribution. It's very useful
when a super-subject induces an excessively fast convergence.
Linear transformation plays slightly the same role than
exponential.
At last, linear normalization is neutral towards the
distribution of the fitness and only depends on the ranking. It avoids as well
super-subjects as a too homogeneous distribution.
What is Voronoi diagram?
Suppose that geometric objects are given in a space, a Voronoi diagram is
defined by a set of Voronoi regions which are closer to the corresponding object
than any other objects. Below figures show the Voronoi diagrams of point and
circle sets in a plane. Once such a Voronoi diagram is represented in an
efficient data structure, we can efficiently and exactly analyze various
structural characteristics of particles in the space.
Potential value of the research
Since the Voronoi diagram is one of the most efficient and effective tool for
analyzing spatial structure of particles, our understanding of atomic structure
for nature including both organisms and materials will be significantly
improved. In biology, for example, Voronoi diagrams will make very hard problems
such as identifying pockets or understanding protein dockings, which is core
problem in drug design, easier ones. In material science, it can help to
understand spatial distribution of the particles consisting material and the
spatial characteristics efficiently. Therefore, our research on Voronoi diagrams
will enhance the quality of human life and will contribute the competitiveness
of industry
Research objectives
Our research group study the exact and fast computation methodology to analyze
spatial structure of system which consists of particles, and implement the
software based on developed theory.
Based on these studies, our research target is solving important problems
efficiently in various applied fields include life engineering and material
engineering efficiently.
Development of the theories and algorithms for Euclidean Voronoi diagram of
spheres in 3D and higher dimensions
Development of spatial reasoning algorithms to solve various application
problems
Identifying and solving important application problems from domains of biology,
molecular chemistry, and material sciences.
General Algorithm for Genetic Algorithms
Genetic algorithms are not too hard to program or understand, since they are
biological based. Thinking in terms of real-life evolution may help you
understand. Here is the general algorithm for a GA:
Create a Random Initial State
An initial population is created from a random selection of solutions (which are
analagous to chromosomes). This is unlike the situation for Symbolic AI systems,
where the initial state in a problem is already given instead.
Evaluate Fitness
A value for fitness is assigned to each solution (chromosome) depending on
how close it actually is to solving the problem (thus arriving to the answer of
the desired problem). (These "solutions" are not to be confused with "answers"
to the problem, think of them as possible characteristics that the system would
employ in order to reach the answer.)
Reproduce (& Children Mutate)
Those chromosomes with a higher fitness value are more likely to reproduce
offspring (which can mutate after reproduction). The offspring is a product of
the father and mother, whose composition consists of a combination of genes from
them (this process is known as "crossing over".
Next Generation
If the new generation contains a solution that produces an output that is
close enough or equal to the desired answer then the problem has been solved. If
this is not the case, then the new generation will go through the same process
as their parents did. This will continue until a solution is reached.
Applications of GAs
The possible applications of genetic algorithms are immense. Any problem that
has a large search domain could be suitable tackled by GAs. A popular growing
field is genetic programming (GP).
Genetic Programming
In programming languages such as LISP and Scheme, the mathematical notation is
not written in standard notation, but in prefix notation. Some examples of this:
+ 2 1 : 2 + 1
* + 2 1 2 : 2 * (2+1)
* + - 2 1 4 9 : 9 * ((2 - 1) + 4)
Notice the difference between the left-hand side to the right? Apart from the
order being different, no parenthesis! The prefix method makes life a lot easier
for programmers and compilers alike, because order precedence is not an issue.
You can build expression trees out of these strings that then can be easily
evaluated, for example, here are the trees for the above three expressions.
+ * *
/ \ / \ / \
1 2 + 2 + 9
/ \ / \
1 2 - 4
/ \
2 1
|
You can see how expression evaluation is thus a lot easier. What this have to do
with GAs? If for example you have numerical data and 'answers', but no
expression to conjoin the data with the answers. A genetic algorithm can be used
to 'evolve' an expression tree to create a very close fit to the data. By
'splicing' and 'grafting' the trees and evaluating the resulting expression with
the data and testing it to the answers, the fitness function can return how
close the expression is. The limitations of genetic programming lie in the
huge search space the GAs have to search for - an infinite number of
equations. Therefore, normally before running a GA to search for an equation,
the user tells the program which operators and numerical ranges to search under.
Uses of genetic programming can lie in stock market prediction, advanced
mathematics and military applications.
Evolving Neural Networks
GAs have successfully been used to evolve various aspects of GAs - either the
connection weights, the architecture, or the learning function. You can see how
GAs are perfect for evolving the weights of a neural network - there are immense
number of possibilities that standard learning techniques such as
back-propagation would take thousands upon thousands of iterations to converge
to. GAs could (given the appropriate direction) evolve working weights within a
hundred or so iterations.
Evolving the architecture of neural network is slightly more complicated, and
there have
CONCLUSION
[1]
Genetic algorithms are original systems based on the supposed functioning of the
Living. The method is very different from classical optimization algorithms.
-
Use of the encoding of the parameters, not the parameters themselves.
-
Work on a population of points, not a unique one.
-
Use the only values of the function to optimize, not their derived function or
other auxiliary knowledge.
-
Use probabilistic transition function not determinist ones.
It's important to understand that the functioning of such an algorithm does not
guarantee success. We are in a stochastic system and a genetic pool may be too
far from the solution, or for example, a too fast convergence may halt the
process of evolution. These algorithms are nevertheless extremely efficient, and
are used in fields as diverse as stock exchange, production scheduling or
programming of assembly robots in the automotive industry.
Reference:
-
http://www.rennard.org/alife/english/gavintrgb.html
-
http://cs.felk.cvut.cz/~xobitko/ga/
-