Comments on active information

By E. Tellgren

Posted October 08, 2007

William Dembski has been one of the most influential contributors to the Intelligent Design (ID) movement. Among other things, his work has added the terms specified information, specified complexity, and complex specified information to the basic vocabulary of the ID movement. These terms are all directly related to the logarithms of special types of probabilities, e.g. the probability of a pattern of interest given that it was produced in some way that excludes the foresight and guidance of an intelligent agent.

In a recent draft manuscript, Dembski and his coauthor Marks extend the vocabulary with three new terms [1]: endogenous information, exogenous information, and active information. They consider as given a search space and a fixed subset, called a target, that makes up some fraction p_s of the search space. An issue of interest to them is how to measure how well a search algorithm [2] exploits the structure of the search problem. Two possible candidates are the probability p that a search algorithm is successful and the ratio p/p_s. Readers of Dembski's previous writings will not be surprised to discover that Marks and Dembski prefer to log-transform their probabilities and rename them 'information'. In equations, their definitions are
endogenous information = -log(p_s),
exogenous information = -log₂(p),
active information = -log₂(p_s/p).

(More generally, it is indicated that p_s can be replaced by the probability that some reference algorithm finds the target. Since the draft manuscript focuses on the case when p_s is the fraction of points in the target, I will do the same.) It sounds a little less strange to say that active information has to come from a programmer than it does to say the same of a probability value larger than p_s. We are more prone to reify something called 'information' and expect it to come from somewhere in particular than to doing the same for a probability or a probability ratio. For a given search problem, the three types of information typically have a rather mundane origin. They typically arise from the formulation of the search problem. If the formulation of the search problem is informative enough to enable us to estimate the size of the target, then it usually also implies something about which regions of the search space that are likely to be in the target and which search algorithms are most efficient. Of course, Marks and Dembski are not only concerned with the mere mathematical existence of successful search algorithms, they also want to imply something about how such search algorithms are implemented or realized. Naturally, for those search algorithms that are implemented in man-made computers, human programmers tend to play a central role. Even in this case, however, a large ratio p/p_s cannot be entirely attributed to the intrinsic creativity and cleverness of a human programmer. In practice, programmers learn from each other and from experience, and many algorithms are directly inspired by observations of nature. Examples include simulated annealing, genetic algorithms, swarm optimization, etc. To the extent that search problems, targets and large ratios p/p_s also arise in nature (e.g. minimization of free energy in thermodynamic systems or maximization of reproductive success in evolving biological populations), a different answer seems to be required. For one thing it is implausible to think that a human programmer had the means and opportunity to be involved. Additionally, any process that is capable of reaching a long-lived equilibrium or stationary state can be viewed as optimizing something (e.g. minimizing a function that describes the rate of change squared or the distance to the equilbrium state).

Because of the use of information terminology and the appeal to the Maximum Entropy (MaxEnt) principle, it is interesting to compare Marks and Dembski's work to older work within the MaxEnt tradition. It has been shown that the MaxEnt principle can be used as a framework to derive the optimal way to distribute search efforts over different cells of a search space [3]. Although the motivation and assumptions behind that work are different, it is similar in that it led to definitions of quantities measuring the amount of prior knowledge about the target and the amount of knowledge utilized during the search. These measures are, in different ways, both more general and more restricted than Marks and Dembski's measures. They are much more general because there is no restriction to uniform probability and complete lack of knowledge about the target location. They are more restricted in that they presuppose that the target consists of a single point. In the special case of uniform probability and a single-point target, they differ only by trivial additive constants from the endogenous and exogenous information. However, the older measures are directly related to optimal distribution of search efforts and therefore have a clearer meaning and significance. In comparison, it is less clear which questions Marks and Dembski's three quantities can answer.

On another note, Marks and Dembski are mistaken when they refer to Wolpert and Macready's original No-Free-Lunch (NFL) theorem as if it were directly applicable also to the situation they are analyzing. In the current version of their draft, Marks and Dembski stipulate that (a) the target makes up some fixed fraction p_s of the search space and (b) no other assumptions about the search problem are to be made. The NFL scenario analyzed by Wolpert and Macready [4] differs in that stipulation (a) is not made. When one makes use of the Principle of Insufficient Reason or the MaxEnt principle to determine a Bayesian prior, it is important to be clear about what, precisely, it is that is totally unknown. In the NFL scenario it is a cost/objective/fitness function f defined on the search space that is taken to be totally unknown. Performance is evaluated as an average over all mathematically possible functions. If one wishes to define a target it must be defined in terms of the totally unknown function f, e.g. as the region {x | f(x) > c} of the search space with function values larger than some threshold c for what is satisficingly good. For some functions such a target is very small, for other functions it is very large (an example is the function which assigns the maximum value to every search space point). Because all functions contribute, the NFL-averaged performance will include contributions targets of varying sizes, in contrast to the scenario analyzed by Marks and Dembski. Arriving at the target in the NFL scenario is not prohibitively hard, not even for random i.i.d. sampling [5].

In some ways the mistake is not very serious. They don't really need the original NFL results for their own conclusions. Moreover, a generalization of the NFL theorem to the situation when the function is known to belong to a special class of functions (which is closed under permutation of the search space) is applicable to their own scenario [6]. If the function to be sampled is known to assign satisficing values to a fraction p_s of the search space, but is otherwise totally unknown, then both stipulations (a) and (b) are respected and the generalized NFL theorem is applicable.

What makes the difference between the NFL scenario and Marks and Dembski's stipulations significant is the far-reaching conclusions inspired by the latter. They write [1]:

"To have integrity, all search algorithms, especially computer simulations of evolutionary search, should make explicit (1) a numerical measure of the difficulty of the problem to be solved, i.e., the endogenous information, and (2) a numerical measure of the amount of problem-specific information resident in the search algorithm, i.e., the active information."

While there is nothing unreasonable about the general idea of quantifying problem difficulty and problem-specific information, the particular assumptions Marks and Dembski rely on are neither realistic enough nor general enough to become some sort of integrity requirement. The assumptions (a) and (b) correspond to the NFL scenario with an extra restriction of the function to be sampled. Why is the NFL scenario with this extra knowledge a privileged set of assumptions? Why give knowledge of the target size special treatment compared to, say, knowledge about the smoothness of the function? In reply to criticism of Dembski's older work on the basis that real-world fitness functions exhibit clustering of similar values, Marks and Dembski claim that taking such knowledge into account is to commit the crime of smuggling in unwarranted assumptions [7]. Given that, it seems that they are equally guilty in smuggling in an unwarranted assumption about the target size. Without the target size assumption (a) smuggled in, the NFL scenario is so benign that any search algorithm, random i.i.d. sampling included, is reasonably effective [5]. A characteristic of hard search problems is that the difficulty grows fast with the size of the search space. In contrast, the NFL-averaged performance is (almost completely) independent of the size of the search space. If this is counter-intuitive, it is only because the original NFL theorem is a formal version of the problem of induction, whereas the real-world search problems we build our experience and intution on seem to come from domains that enable inductive learning and generalizations.

Acknowledgement: Thanks to Pete Dunkelberg for commenting on earlier versions.

References and notes:

[1] R. Marks and W. Dembski. Conservation of Information in Search: Measuring the Cost of Success, http://web.ecs.baylor.edu/faculty/marks/T/ActiveInfo.pdf, accessed September 19, 2007.

[2] Note that, following Wolpert and Macready and others, a search algorithm in this context is just a sampling distribution of the type P(next sampling point | data about previously sampled points). This differs a bit from the more common notion of an algorithm as a list of instructions, input to a Turing machine, or an implementation in a programming language.

[3] E. T. Jaynes. Entropy and Search-Theory, 1985. Available online at http://bayes.wustl.edu/etj/articles/search.pdf (Note that Jaynes frequently referred to MaxEnt Bayesianism as 'information theory' and to work within this tradition as 'information-theoretic'. This is different from both Shannon's information theory and algorithmic information theory.)

[4] D. H. Wolpert and W. G. Macready. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1):67-82, 1997.

[5] T. M. English. Optimization is easy and learning is hard in the typical function. In Proceedings of the 2000 Congress on Evolutionary Computation CEC00, p. 924-931, July 2000. IEEE Press. Available online at http://citeseer.ist.psu.edu/english00optimization.html

[6] C. Igel and M. Toussaint. On classes of functions for which No Free Lunch results hold. Information Processing Letters, 86(6):317-321, 2003. (e-print: http://arxiv.org/abs/cs/0108011)

[7] R. Marks and W. Dembski. Active Information in Evolutionary Search, http://web.ecs.baylor.edu/faculty/marks/T/Hag2.pdf, accessed September 19, 2007.

* * *

Location of this article: http://talkreason.org/articles/active.cfm