Graph optimization¶
In this section we will define a couple optimizations on doubles.
Todo
This tutorial goes way too far under the hood, for someone who just wants to add yet another pattern to the libraries in tensor.opt for example.
We need another tutorial that covers the decorator syntax, and explains how to register your optimization right away. That’s what you need to get going.
Later, the rest is more useful for when that decorator syntax type thing doesn’t work. (There are optimizations that don’t fit that model).
Note
The optimization tag cxx_only is used for optimizations that insert Ops which have no Python implementation (so they only have C code). Optimizations with this tag are skipped when there is no C++ compiler available.
Global and local optimizations¶
First, let’s lay out the way optimizations work in Theano. There are
two types of optimizations: global optimizations and local
optimizations. A global optimization takes a FunctionGraph
object (a
FunctionGraph is a wrapper around a whole computation graph, you can see its
documentation
for more details) and navigates through it
in a suitable way, replacing some Variables by others in the process. A
local optimization, on the other hand, is defined as a function on a
single Apply node and must return either False
(to mean that
nothing is to be done) or a list of new Variables that we would like to
replace the node’s outputs with. A Navigator is a special kind
of global optimization which navigates the computation graph in some
fashion (in topological order, reverse-topological order, random
order, etc.) and applies one or more local optimizations at each step.
Optimizations which are holistic, meaning that they must take into account dependencies that might be all over the graph, should be global. Optimizations that can be done with a narrow perspective are better defined as local optimizations. The majority of optimizations we want to define are local.
Global optimization¶
A global optimization (or optimizer) is an object which defines the following methods:
-
class
Optimizer
¶ -
apply
(fgraph)¶ This method takes a FunctionGraph object which contains the computation graph and does modifications in line with what the optimization is meant to do. This is one of the main methods of the optimizer.
-
add_requirements
(fgraph)¶ This method takes a FunctionGraph object and adds features to it. These features are “plugins” that are needed for the
apply
method to do its job properly.
-
optimize
(fgraph)¶ This is the interface function called by Theano.
Default: this is defined by Optimizer as
add_requirement(fgraph); apply(fgraph)
.
-
See the section about FunctionGraph
to understand how to define these
methods.
Local optimization¶
A local optimization is an object which defines the following methods:
-
class
LocalOptimizer
¶ -
transform
(node)¶ This method takes an Apply node and returns either
False
to signify that no changes are to be done or a list of Variables which matches the length of the node’soutputs
list. When the LocalOptimizer is applied by a Navigator, the outputs of the node passed as argument to the LocalOptimizer will be replaced by the list returned.
-
One simplification rule¶
For starters, let’s define the following simplification:
We will implement it in three ways: using a global optimization, a local optimization with a Navigator and then using the PatternSub facility.
Global optimization¶
Here is the code for a global optimization implementing the simplification described above:
from theano.gof import toolbox
class Simplify(gof.Optimizer):
def add_requirements(self, fgraph):
fgraph.attach_feature(toolbox.ReplaceValidate())
def apply(self, fgraph):
for node in fgraph.toposort():
if node.op == div:
x, y = node.inputs
z = node.outputs[0]
if x.owner and x.owner.op == mul:
a, b = x.owner.inputs
if y == a:
fgraph.replace_validate(z, b)
elif y == b:
fgraph.replace_validate(z, a)
simplify = Simplify()
Todo
What is add_requirements? Why would we know to do this? Are there other requirements we might want to know about?
Here’s how it works: first, in add_requirements
, we add the
ReplaceValidate
FunctionGraph Features located in
toolbox – [doc TODO]. This feature adds the replace_validate
method to fgraph
, which is an enhanced version of replace
that
does additional checks to ensure that we are not messing up the
computation graph (note: if ReplaceValidate
was already added by
another optimizer, extend
will do nothing). In a nutshell,
toolbox.ReplaceValidate
grants access to fgraph.replace_validate
,
and fgraph.replace_validate
allows us to replace a Variable with
another while respecting certain validation constraints. You can
browse the list of FunctionGraph Feature List and see if some of
them might be useful to write optimizations with. For example, as an
exercise, try to rewrite Simplify using NodeFinder
. (Hint: you
want to use the method it publishes instead of the call to toposort!)
Then, in apply
we do the actual job of simplification. We start by
iterating through the graph in topological order. For each node
encountered, we check if it’s a div
node. If not, we have nothing
to do here. If so, we put in x
, y
and z
the numerator,
denominator and quotient (output) of the division.
The simplification only occurs when the numerator is a multiplication,
so we check for that. If the numerator is a multiplication we put the
two operands in a
and b
, so
we can now say that z == (a*b)/y
. If y==a
then z==b
and if
y==b
then z==a
. When either case happens then we can replace
z
by either a
or b
using fgraph.replace_validate
- else we do
nothing. You might want to check the documentation about Variable
and Apply to get a better understanding of the
pointer-following game you need to get ahold of the nodes of interest
for the simplification (x
, y
, z
, a
, b
, etc.).
Test time:
>>> x = double('x')
>>> y = double('y')
>>> z = double('z')
>>> a = add(z, mul(div(mul(y, x), y), div(z, x)))
>>> e = gof.FunctionGraph([x, y, z], [a])
>>> e
[add(z, mul(div(mul(y, x), y), div(z, x)))]
>>> simplify.optimize(e)
>>> e
[add(z, mul(x, div(z, x)))]
Cool! It seems to work. You can check what happens if you put many
instances of in the graph. Note that it sometimes
won’t work for reasons that have nothing to do with the quality of the
optimization you wrote. For example, consider the following:
>>> x = double('x')
>>> y = double('y')
>>> z = double('z')
>>> a = div(mul(add(y, z), x), add(y, z))
>>> e = gof.FunctionGraph([x, y, z], [a])
>>> e
[div(mul(add(y, z), x), add(y, z))]
>>> simplify.optimize(e)
>>> e
[div(mul(add(y, z), x), add(y, z))]
Nothing happened here. The reason is: add(y, z) != add(y,
z)
. That is the case for efficiency reasons. To fix this problem we
first need to merge the parts of the graph that represent the same
computation, using the merge_optimizer
defined in
theano.gof.opt
.
>>> from theano.gof.opt import merge_optimizer
>>> merge_optimizer.optimize(e)
>>> e
[div(mul(*1 -> add(y, z), x), *1)]
>>> simplify.optimize(e)
>>> e
[x]
Once the merge is done, both occurrences of add(y, z)
are
collapsed into a single one and is used as an input in two
places. Note that add(x, y)
and add(y, x)
are still considered
to be different because Theano has no clue that add
is
commutative. You may write your own global optimizer to identify
computations that are identical with full knowledge of the rules of
arithmetics that your Ops implement. Theano might provide facilities
for this somewhere in the future.
Note
FunctionGraph
is a Theano structure intended for the optimization
phase. It is used internally by function and is rarely
exposed to the end user. You can use it to test out optimizations,
etc. if you are comfortable with it, but it is recommended to use
the function frontend and to interface optimizations with
optdb
(we’ll see how to do that soon).
Local optimization¶
The local version of the above code would be the following:
class LocalSimplify(gof.LocalOptimizer):
def transform(self, node):
if node.op == div:
x, y = node.inputs
if x.owner and x.owner.op == mul:
a, b = x.owner.inputs
if y == a:
return [b]
elif y == b:
return [a]
return False
def tracks(self):
# This should be needed for the EquilibriumOptimizer
# but it isn't now
# TODO: do this and explain it
return [] # that's not what you should do
local_simplify = LocalSimplify()
Todo
Fix up previous example... it’s bad and incomplete.
The definition of transform is the inner loop of the global optimizer,
where the node is given as argument. If no changes are to be made,
False
must be returned. Else, a list of what to replace the node’s
outputs with must be returned. This list must have the same length as
node.ouputs. If one of node.outputs don’t have clients(it is not used
in the graph), you can put None in the returned list to remove it.
In order to apply the local optimizer we must use it in conjunction with a Navigator. Basically, a Navigator is a global optimizer that loops through all nodes in the graph (or a well-defined subset of them) and applies one or several local optimizers on them.
>>> x = double('x')
>>> y = double('y')
>>> z = double('z')
>>> a = add(z, mul(div(mul(y, x), y), div(z, x)))
>>> e = gof.FunctionGraph([x, y, z], [a])
>>> e
[add(z, mul(div(mul(y, x), y), div(z, x)))]
>>> simplify = gof.TopoOptimizer(local_simplify)
>>> simplify.optimize(e)
>>> e
[add(z, mul(x, div(z, x)))]
OpSub, OpRemove, PatternSub¶
Theano defines some shortcuts to make LocalOptimizers:
-
OpSub
(op1, op2)¶ Replaces all uses of op1 by op2. In other words, the outputs of all Apply involving op1 by the outputs of Apply nodes involving op2, where their inputs are the same.
-
OpRemove
(op)¶ Removes all uses of op in the following way: if
y = op(x)
theny
is replaced byx
. op must have as many outputs as it has inputs. The first output becomes the first input, the second output becomes the second input, and so on.
-
PatternSub
(pattern1, pattern2)¶ Replaces all occurrences of the first pattern by the second pattern. See
PatternSub
.
from theano.gof.opt import OpSub, OpRemove, PatternSub
# Replacing add by mul (this is not recommended for primarily
# mathematical reasons):
add_to_mul = OpSub(add, mul)
# Removing identity
remove_identity = OpRemove(identity)
# The "simplify" operation we've been defining in the past few
# sections. Note that we need two patterns to account for the
# permutations of the arguments to mul.
local_simplify_1 = PatternSub((div, (mul, 'x', 'y'), 'y'),
'x')
local_simplify_2 = PatternSub((div, (mul, 'x', 'y'), 'x'),
'y')
Note
OpSub
, OpRemove
and PatternSub
produce local optimizers, which
means that everything we said previously about local optimizers
apply: they need to be wrapped in a Navigator, etc.
Todo
wtf is a navigator?
When an optimization can be naturally expressed using OpSub
, OpRemove
or PatternSub
, it is highly recommended to use them.
WRITEME: more about using PatternSub (syntax for the patterns, how to
use constraints, etc. - there’s some decent doc at
PatternSub
for those interested)
The optimization database (optdb)¶
Theano exports a symbol called optdb
which acts as a sort of
ordered database of optimizations. When you make a new optimization,
you must insert it at the proper place in the database. Furthermore,
you can give each optimization in the database a set of tags that can
serve as a basis for filtering.
The point of optdb is that you might want to apply many optimizations to a computation graph in many unique patterns. For example, you might want to do optimization X, then optimization Y, then optimization Z. And then maybe optimization Y is an EquilibriumOptimizer containing LocalOptimizers A, B and C which are applied on every node of the graph until they all fail to change it. If some optimizations act up, we want an easy way to turn them off. Ditto if some optimizations are very CPU-intensive and we don’t want to take the time to apply them.
The optdb system allows us to tag each optimization with a unique name as well as informative tags such as ‘stable’, ‘buggy’ or ‘cpu_intensive’, all this without compromising the structure of the optimizations.
Definition of optdb¶
optdb is an object which is an instance of
SequenceDB
,
itself a subclass of DB
.
There exist (for now) two types of DB, SequenceDB and EquilibriumDB.
When given an appropriate Query, DB objects build an Optimizer matching
the query.
A SequenceDB contains Optimizer or DB objects. Each of them has a name, an arbitrary number of tags and an integer representing their order in the sequence. When a Query is applied to a SequenceDB, all Optimizers whose tags match the query are inserted in proper order in a SequenceOptimizer, which is returned. If the SequenceDB contains DB instances, the Query will be passed to them as well and the optimizers they return will be put in their places.
An EquilibriumDB contains LocalOptimizer or DB objects. Each of them has a name and an arbitrary number of tags. When a Query is applied to an EquilibriumDB, all LocalOptimizers that match the query are inserted into an EquilibriumOptimizer, which is returned. If the SequenceDB contains DB instances, the Query will be passed to them as well and the LocalOptimizers they return will be put in their places (note that as of yet no DB can produce LocalOptimizer objects, so this is a moot point).
Theano contains one principal DB object, optdb
, which
contains all of Theano’s optimizers with proper tags. It is
recommended to insert new Optimizers in it. As mentioned previously,
optdb is a SequenceDB, so, at the top level, Theano applies a sequence
of global optimizations to the computation graphs.
Query¶
A Query is built by the following call:
theano.gof.Query(include, require = None, exclude = None, subquery = None)
-
class
Query
¶ -
include
¶ A set of tags (a tag being a string) such that every optimization obtained through this Query must have one of the tags listed. This field is required and basically acts as a starting point for the search.
-
require
¶ A set of tags such that every optimization obtained through this Query must have all of these tags.
-
exclude
¶ A set of tags such that every optimization obtained through this Query must have none of these tags.
-
subquery
¶ optdb can contain sub-databases; subquery is a dictionary mapping the name of a sub-database to a special Query. If no subquery is given for a sub-database, the original Query will be used again.
-
Furthermore, a Query object includes three methods, including
,
requiring
and excluding
which each produce a new Query object
with include, require and exclude sets refined to contain the new [WRITEME]
Examples¶
Here are a few examples of how to use a Query on optdb to produce an Optimizer:
from theano.compile import optdb
# This is how the optimizer for the fast_run mode is defined
fast_run = optdb.query(Query(include = ['fast_run']))
# This is how the optimizer for the fast_compile mode is defined
fast_compile = optdb.query(Query(include = ['fast_compile']))
# This is the same as fast_run but no optimizations will replace
# any operation by an inplace version. This assumes, of course,
# that all inplace operations are tagged as 'inplace' (as they
# should!)
fast_run_no_inplace = optdb.query(Query(include = ['fast_run'], exclude = ['inplace']))
fast_run_no_inplace = fast_run.excluding('inplace')
Registering an Optimizer¶
Let’s say we have a global optimizer called simplify
. We can add
it to optdb
as follows:
# optdb.register(name, optimizer, order, *tags)
optdb.register('simplify', simplify, 0.5, 'fast_run')
Once this is done, the FAST_RUN mode will automatically include your optimization (since you gave it the ‘fast_run’ tag). Of course, already-compiled functions will see no change. The ‘order’ parameter (what it means and how to choose it) will be explained in optdb structure below.
Registering a LocalOptimizer¶
LocalOptimizers may be registered in two ways:
- Wrap them in a Navigator and insert them like a global optimizer (see previous section).
- Put them in an EquilibriumDB.
Theano defines two EquilibriumDBs where you can put local optimizations:
-
canonicalize
()¶ This contains optimizations that aim to simplify the graph:
- Replace rare or esoterical operations with their equivalents using elementary operations.
- Order operations in a canonical way (any sequence of
multiplications and divisions can be rewritten to contain at most
one division, for example;
x*x
can be rewrittenx**2
; etc.) - Fold constants (
Constant(2)*Constant(2)
becomesConstant(4)
)
-
specialize
()¶ This contains optimizations that aim to specialize the graph:
- Replace a combination of operations with a special operation that does the same thing (but better).
For each group, all optimizations of the group that are selected by
the Query will be applied on the graph over and over again until none
of them is applicable, so keep that in mind when designing it: check
carefully that your optimization leads to a fixpoint (a point where it
cannot apply anymore) at which point it returns False
to indicate its
job is done. Also be careful not to undo the work of another local
optimizer in the group, because then the graph will oscillate between
two or more states and nothing will get done.
optdb structure¶
optdb contains the following Optimizers and sub-DBs, with the given priorities and tags:
Order | Name | Description |
---|---|---|
0 | merge1 | First merge operation |
1 | canonicalize | Simplify the graph |
2 | specialize | Add specialized operations |
49 | merge2 | Second merge operation |
49.5 | add_destroy_handler | Enable inplace optimizations |
100 | merge3 | Third merge operation |
The merge operations are meant to put together parts of the graph that represent the same computation. Since optimizations can modify the graph in such a way that two previously different-looking parts of the graph become similar, we merge at the beginning, in the middle and at the very end. Technically, we only really need to do it at the end, but doing it in previous steps reduces the size of the graph and therefore increases the efficiency of the process.
See previous section for more information about the canonicalize and specialize steps.
The add_destroy_handler
step is not really an optimization. It is
a marker. Basically:
Warning
Any optimization which inserts inplace operations in the
computation graph must appear after the add_destroy_handler
“optimizer”. In other words, the priority of any such optimization
must be >= 50. Failure to comply by this restriction can lead
to the creation of incorrect computation graphs.
The reason the destroy handler is not inserted at the beginning is that it is costly to run. It is cheaper to run most optimizations under the assumption there are no inplace operations.