# Planning in Robotics Current Approaches and Future Directions, K.

526

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 32, NO. 4, JULY 2002

one of the other parts in the assembly, in which only one sequence is possible. VI. CONCLUSION Virtual assembly is a pilot project of a much bigger vision on “manufacturing in the computer.” The interactive approach presented in this paper creates a way of introducing human expertise into assembly planning and a mechanism for integrating robot programming with sequence planning. Virtual assembly helps to identify and resolve issues related to the construction of an integrated virtual manufacturing environment that could enhance all levels of manufacturing decision and control. REFERENCES

[1] J. Bander, “A heuristic-search algorithm for path determination with learning,” IEEE Trans. Syst., Man, Cybern. A, vol. 28, pp. 131–134, Jan. 1998. [2] K. Gupta, “The sequential framework for practical motion planning for manipulator arms: Algorithm and experiments,” in Practical Motion Planning in Robotics: Current Approaches and Future Directions, K. Gupta and A. del Pobil, Eds. New York: Wiley, 1998, pp. 9–31. [3] K. Gupta and A. del Pobil, “Successes, failures, challenges, and future directions of robot motion planning,” in Practical Motion Planning in Robotics: Current Approaches and Future Directions, K. Gupta and A. del Pobil, Eds. New York: Wiley, 1998, pp. 351–354. [4] S. Jayaram, H. Connacher, and K. Lyons, “Virtual assembly using virtual-reality techniques,” Computer Aided Design, vol. 29, pp. 575–584, Aug. 1997. [5] J. Latombe, Robot Motion Planning. Norwell, MA: Kluwer, 1991. [6] S. Mok, K. Ong, and C. Wu, “Automatic generation of assembly instructions using STEP,” in Proc. IEEE Int. Conf. Robotics and Automation, May 2001, pp. 313–318. [7] S. Mok, C. Wu, and D. Lee, “Modeling automatic assembly and disassembly operations for virtual manufacturing,” IEEE Trans. Syst., Man, Cybern. A, vol. 31, pp. 223–232, May 2001. [8] H. Sun, X. Yuan, G. Baciu, and Y. Gu, “Direct virtual–hand interface in robot assembly programming,” J. Vis. Lang. Comput., vol. 10, no. 1, pp. 55–68, 1999. [9] R. Wilson and J. Latombe, “Geometric reasoning about mechanical assembly,” Artif. Intell., vol. 71, pp. 371–396, Dec. 1994. [10] C. Wu and N. Kim, “Modeling of part-mating strategies for automating assembly operations for robots,” IEEE Trans. Syst., Man, Cybern., vol. 24, pp. 1065–1074, July 1994. [11] X. Yuan, “Interactive assembly planning in virtual environments,” in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Nov. 2000, pp. 1462–1467.

Encoding Probability Propagation in Belief Networks

Shichao Zhang and Chengqi Zhang

Abstract—Complexity reduction is an important task in Bayesian networks. Recently, an approach known as the linear potential function (LPF) model has been proposed for approximating Bayesian computations. The LPF model can effectively compress a conditional probability table into a linear function. This correspondence extends the LPF model to approximate propagation in Bayesian networks. The extension focuses on encoding probability propagation as a polynomial function for a class of tractable problems. Index Terms—Approximating reasoning, Bayesian network, belief network, encoding technology, probabilistic reasoning.

I. INTRODUCTION Probabilistic reasoning with Bayesian networks (or belief networks) [2] is based on conditional probability matrices. A conditional probability matrix Y j X is used in belief networks to describe causality of the form . Let the domain of be ( ) = 1 2 . . . m and the domain of be ( ) = 1 2 . . . n . Y j X is defined as Y j X = ( ) = ( = = ) = [ ( j i )]m2n , where ( j i ) = ( = j = i ) are conditional probabilities of = j , given . Given = i = 1 2 ... = 1 2 ... an observation = = ( ( 1 ) ( 2 ) . . . ( m )), we can obtain = of the form ( ( 1 ) ( 2 ) . . . ( n )) as

M X!Y X R X fx ;x ; ;x g Y R Y fy ;y ; ;y g M M P yjx P Y yjX x p y jx p y jx p Y y jX x Y y X x ;i ; ; ;m;j ; ; ;n X a p a ;p a ; ;p a Y b p b ;p b ; ;p b b = aMY X : (1)

j

Equation (1) is a well-known probability propagation in Bayesian networks. Computation with Bayesian networks has been proven to be NP-hard [1], [6]. By building an encoding technique, Santos [4], [5] advocated an efficient approach, known as the linear potential function (LPF), to circumvent this problem for Bayesian networks. This approach compresses a conditional probability table into a linear function. The encoding technique provides a new way to perform probability computations in Bayesian networks. By extending Santos’ approach, this correspondence presents a polynomial approximating function (PAF) model for approximating propagation in Bayesian networks. To demonstrate the extensibility of the LPF model, this correspondence focuses on propagation in Bayesian networks for only a class of tractable problems. For the rule , we can divide the causality between and into two cases. Case I A polynomial relationship between and . For example, for a free-falling object, the total distance traveled is directly proportional to the square of the time of travel. In particular, Example 1 (shown in Section II) presents a typical linear relationship between and , which constitutes the simplest form of causality. Case II A nonpolynomial relationship between and .

X!Y Y

X

X Y

Y

X

Y

X Y

X

Manuscript received February 8, 2001; revised August 22, 2001, December 10, 2001, and January 26, 2002. This paper was recommended by Associate Editor E. Santos. S. Zhang is with the Faculty of Information Technology, University of Technology, Sydney NSW 2007, Australia, and also with the School of Mathematics and Computing, Guangxi Normal University, Guilin, China (e-mail: zhangsc@its.uts.edu.au). C. Zhang is with the Faculty of Information Technology, University of Technology, Sydney NSW 2007, Australia (e-mail: chengqi@it.uts.edu.au). Digital Object Identifier 10.1109/TSMCA.2002.804784 1083-4427/02$17.00 ? 2002 IEEE

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 32, NO. 4, JULY 2002

527

For example, let be years, and be the population of China. There is a nonpolynomial relationship between the population of China and years. Case I certainly refers to a class of tractable problems in Bayesian networks. Also, there are potentially polynomial functions that can be used to approximate some nonpolynomial causal relations (Case II). These nonpolynomial causal relations refer to tractable problems in Bayesian networks, as well. In this correspondence, we construct polynomial functions to approximate the propagation for these tractable problems. Descriptions of the concepts used in this correspondence are the same as those in [4]. Our goal is to develop techniques for 1) encoding the vector to an integer X ( ); 2) encoding the vector to an integer Y ( ), where = Y jX; 0 2 3) finding a polynomial function Y ( ) = 2 X ( )+ 1 X ( )+ 0 and ( 0) is small 0 close to (1), where j Y ( ) 0 Y ( )j enough; 0 4) finding a unique decoding function to extract from Y ( ). 0 2 To find the polynomial function Y ( ) = 2 X ( ) + 1 X ( ) + 0 , the key issues are how to construct the encoding and decoding functions for a pair of vectors, and , and to find 2 1 and 0 . This correspondence tackles the aforementioned problems. Section II demonstrates the effectiveness of the encoding system. Section III proposes the encoding and decoding techniques. Section IV constructs the PAF model. Sections V and VI contain a performance evaluation and conclusion, respectively.

X

Y

a b

k

a

a b b aM kE a kE a <" " > b E b E b kE a kE a k b k ;k ; k

E E E b E b E b

The second solution is the LPF model [4], which uses an approximation function to capture all the values in a conditional probability table of a Bayesian network. Consider a simple table consisting of only one random variable, say . Suppose ( ) = fred, green, yellow, blue, purpleg and the table is ( = red) = 0 02 ( = green) = 0 50 ( = yellow) = 0 00 ( = blue) = 0 37 and ( = purple) = 0 11. The LPF model encodes the colors to A (yellow) = 0 A (red) = 02 A (purple) = 1 1 A (blue) = 3 7 and A (green) = 5. Then, the probabilities are described by points (0, 0.00), (0.2, 0.02), (1.1, 0.11), (3.7, 0.37), and (5, 0.50) which are able to approximately fit in a line. Hence, we can replace the information in the table with a simple continuous real function according to Santos’ LPF model. The third solution is our PAF proposal, which extends Santos’ approach to encode propagation in belief networks for a class of tractable problems. Consider some probabilities of and the corresponding values of in Example 1 as follows.

A RA PA : ;P A : ;P A : : ; PA : ;P A ;E E : ;E : ;E :; E

Y

X

a

II. EFFECTIVENESS OF ENCODING In the quest for reducing the complexity of the networks, let us now examine existing models. The first solution is the matrix model presented in (1), which has been proven to be NP-hard. We now illustrate the use of this solution with an example borrowed from [2, pp. 151–153]. Example 1: In a certain trial there are three suspects, one of which has definitely committed a murder. The murder weapon, showing some fingerprints, has been found by police. Let identify the last user of the weapon, namely, the killer. Let identify the last holder of the weapon, i.e., the person whose fingerprints were left on the weapon, represent the possible readings that may be obtained in a and let fingerprint laboratory examination. The relationship between these three variables would normally be expressed by the chain . ! ! Let suspect1 suspect2 and suspect3 be the three suspects. To represent common-sense knowledge ( ! ) that the killer is normally the last person to hold the weapon, we use a 3 2 3 conditional probability matrix

Intuitively, there is no direct linear relationship between = ( ( 1 ) ( 2 ) ( 3 )) and = ( ( 1 ) ( 2 ) ( 3 )) in the above graph. Now we encode them using the following functions:

px ;px ;px

b

py ;py ;py

EX (a) = p(x1 )102 + p(x2 )104 + p(x3 )106 EY (b) = p(y1)102 + p(y2)104 + p(y3)106 :

Then, the earlier graph can be transformed into the following form:

Y

X

Z

;

X

;

Y

Z

(

X

Y

315 910), (400060, 381 052), and (801010, 661 717) are fitted by the line

Now we take EX (a) and EY (b) as an ordered pair of the form EX (a);EY (b)). The concrete forms of (1090, 101 773), (307000,

: 0:1 0:1 MY X = 0:1 0:8 0:1 : 0:1 0:1 0:8 Now, let evidence be x = (0:7; 0:1; 0:2). Then we have 0:8 0:1 0:1 [0:7 0:1 0:2] 0:1 0:8 0:1 = [0:59 0:17 0:24]: 0:1 0:1 0:8 Therefore, y = [0:59; 0:17; 0:24] is the result that we want, or 0.59

08

j

EY (b) = 0:7EX (a) + 101 010: From Example 1, for a = (0:7; 0:1; 0:2);EX (a) = 0:7 102 +0:1 4 6 10 + 0:2 10 = 201 070. From the function EY (b) = 0:7EX (a) + 101010;EY (b) = 0:7 201 070+101 010 = 241 759. After decoding, b = (0:59; 0:17; 0:24), which is exactly the same as the result from the

3 3 3

3

first solution. (The encoding and decoding functions are described in Section III.) From the above observations, probability propagation with matrices is NP-hard. LPF can effectively compress probability tables. PAF can effectively reduce the complexity of propagation. III. ENCODING AND DECODING TECHNIQUES

!

j

is the probability that suspect1 is the last holder of the weapon, 0.17 is the probability that suspect2 is the last holder of the weapon, and 0.24 is the probability that suspect3 is the last holder of the weapon.

Given a rule X Y with matrix MY X and an observation pair X = a and Y = b, this correspondence endeavors to construct a polyn nomial function of the form: F (a) = kn EX (a)+ + k1 EX (a)+ k0 ,

111

528

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 32, NO. 4, JULY 2002

close to (1). Without loss of generality, we suppose n = 2 for the purpose of this correspondence. Ideally, this function should be expected to satisfy F (a) = EY (b) = EY (aMY j X ). That is

0 pj

have

k < 1. Also, note the power of ten of each operand when we

k2 EX (a) + k1 EX (a) + k0 = EY (aMY j X ):

2

(2)

In (2), we need to determine the constants k2 ; k1 ; k0 , and the mappings (encoders) EX and EY . First, we construct the encoders EX and EY . Then we construct the decoder. However, the techniques of determining k2 ; k1 ; and k0 will be demonstrated in Section IV. Let R(X ) = fx1 ; x2 ; . . . ; xk g; R(Y ) = fy1 ; y2 ; . . . ; ym g, and the state space of X be

10j d pj 0 pj + 1 1 1 + 10j 10j d pj 0 pj + 111 + 10j d pj 0 pj < 10j d :

0 0 0

d

pj

0 0 pj

Because jm01 < jm , so jm01 jm 0 1, then the above inequation can be reduced to

S (X ) =

(p1 ; p2 ; . . . ; pk ) j 1 i k; 0 pi 1;

k i=1

pi = 1 :

10j d

Hence

0 pj 0 pj

+ 1 1 1 + 10j

d

pj

< 10j

0 0 pj

d

10(j

01)

d:

The encoder EX is defined as

EX (p1 ; p2 ; . . . ; pk ) = 10d p1 + 102d p2 + 1 1 1 + 10kd pk

where d > 0 is a positive integer.1 d is determined as d = r + 1, if the decimal places demanded is r in an application; otherwise d = n + 1 when 10n01 < MaxfjR(X )j; jR(Y )jg 10n . Theorem 1: The above encoder EX is a one-to-one mapping. Proof: Reduction to absurdity. Let the encoder of 0 (p1 ; p2 ; . . . ; pk ) be equal to the encoder of (p01 ; p2 ;0 . . . ; pk ). That is, 0 0 EX (p1 ; p2 ; . . . ; pk ) = EX (p1 ; p2 ;0 . . . ; pk ), or

10j d

0 pj 0 pj

+ 1 1 1 + 10j

d

pj

< 10j

0 0 pj

d

0 pj 0 pj

:

This contradicts the previous assumptions. Accordingly, the above encoder is a one-to-one mapping. We now present the decoder of EY (b) for (2). 2 If we can obtain F (a) = k2 EX (a)+ k1 EX (a)+ k0 as an approximation function of (1) for the propagation, we can solve b1 ; b2 ; . . . ; bm from F (a) by decoding. That is

10d p1 + 102d p2 + 1 1 1 + 10kd pk = 10d p1 + 1 1 1 + 10kdpk

0 0

bi =

INT F (a) 10(i

01)

d

0 INT(F (a)=10id ) 3 10d

10d

or

10d (p1 0 p1 ) + 102d(p2 0 p2 ) + 1 1 1 + 10kd (pk 0 pk ) = 0:

0 0 0

0 We know from the above suppositions, pi 0pi (i = 1; 2; . . . ; k) must 0 not all be equal to 0. Assume that when i = j1 ; j2 ; . . . ; jm ; pi 0pi 6= 0, where j1 < j2 < 1 1 1 < jm . Then, the above formula can be rewritten 0 0 as 10j d (pj 0 pj ) + 1 1 1 + 10j d (pj 0 pj ) = 0, or

i = 1; 2; . . . ; m, where INT( ) is an integer function. For example, for d = 2 in Example 2 (shown in Section IV), if F (a) = 241759, then b1 = (241759 0 241700)=100 = 0:59; b2 = (2417 0 2400)=100 = 0:17, and b1 = (24 0 0)=100 = 0:24. To assure the probability significance level of the results, the final results are

10j d

0 pj 0 pj

+ 1 1 1 + 10j

d

pj

= 010j

d

0 0 pj

b1 := Maxf0; 1 0 (b2 + b3 + 1 1 1 + bm )g; bi := bi =(b1 + b2 + 1 1 1 + bm ); if b2 + 1 1 1 + bm > 1

where i = 2; . . . ; m. To consider computing errors, we assume the errors only impact on the last digits. For our model, suppose that only the last d digits are impacted upon. That is, only the probability of p(y1 ) Maxf0; 10(b2 + is changed by the computing errors. Therefore, b1 b3 + 1 1 1 + bm )g when we decode. The above decoder of EY (b) is actually the reversion of the encoder. IV. PAF MODEL FOR THE PROPAGATION

d

0 pj 0 pj

:

In other words

10j d pj = 10j

or

0 0 pj

+ 1 1 1 + 10j

j

pj

0 0 pj

d

0 pj 0 pj

10

j d

pj 0 pj

0

2 pj

+ 1 1 1 + 10 0 pj = 10j

0

d d

pj 0 pj

0

:

In this section, we will first determine parameters k2 ; k1 ; and k0 of (2) and then illustrate the use of the PAF model by examples. A. Determining Parameters of the Function Our goal is to compress (1) into an approximation function; therefore, for all states in S (X ), or a 2 S (X ), they must satisfy: 2 kk2 EX (a) + k1 EX (a) + k0 0 EY (aMY j X )k < ", where " > 0 is small enough, or

0 Because pi , thus j d kpj 0 pj k j d 0d (j 01)d , according to the encoder. On the other hand, according to suppositions i k; pi ; k=1 pi and i i 0 0 0 we have, kpj 0 pj k 1 1 1 kpj 0 k; pi ; k=1 pi i 0 0 0 pj k and kpj 0 pj k 6 . So kpj 0 pj k 1 1 1 pj 0

10

0

1

10

10 10 =

0

1

1 1

0 =1

1

=0

=1 1 + + + +

that this encoder only extends the LPF model to the propagation for a class of tractable problems. When the precision required is too large, the problem is intractable.

1Note

f 0 (k2 ; k1 ; k0 )

=

a2S (X )

2 k2 EX (a) + k1 EX (a) + k0 0 EY (aMY j X )

2

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 32, NO. 4, JULY 2002

529

where the value of f 0 (k2 ; k1 ; k0 ) must be the minimum. Or, for

(A) S (A)

f (k2 ; k1 ; k0 ) with respect to k2 ; k1 ; and k0 . We must determine this, and then set these derivatives to 0. That is

@f @k = 2 a2

(X )

2 k2 EX (a) + k1 EX (a) + k0 2 EX (a) = 0

f (k2 ; k1 ; k0 )

=

a2

(X )

k2 EX (a) + k1 EX (a) + k0 0 EY (aMY j X )

2

2

@f @k = 2 @f @k = 2

0 EY (aMY j X ) 0 EY (aMY j X )

a2

(X ) a2

(X )

k2 EX (a) + k1 EX (a) + k0 EX (a) = 0

2

:

where the value of f (k2 ; k1 ; k0 ) must be the minimum and f (k2 ; k1 ; k0 ) is a restriction of f 0 (k2 ; k1 ; k0 ) on

(A). Hence, we can obtain the following theorem. Theorem 2: The minimal solutions to the above formula for constants k2 ; k1 ; k0 are

2 k2 EX (a) + k1 EX (a) + k0

0 EY (aMY j X )

=0

1 (EY )4 (EX ) 0 2 (EY )2 (EX ) ; 1 (EX )4 (EX ) 0 2 (EX )3 (EX ) 1 (EY )3 (EX ) 0 2 (EY )1 (EX ) ; k1 = 2 (EX )3 (EX ) 0 1 (EX )4 (EX ) k2 = k0 = 1==(

(X )) EY (aMY j X )

a2

(X )

0 k2

where

a2

(X )

2 EX (a) 0 k1

a2

(X )

EX (a)

:

By solving the above equation group, we can obtain the solutions in Theorem 2. 2 Hence, we can take the above formula, F (a) = k2 EX (a) + k1 EX (a) + k0 , as an approximation function of (1) for the propagation in belief networks. Generally, for rules of the form X1 ^ X2 ^ 1 1 1 ^ Xn ! Y , PAFs can be easily constructed in the same way as above. For (2), when k2 = 0, the causal relation is linear, which captures the simplest causality in Case I; when k2 6= 0, it can be used in the polynomial causality in Case I, and it can also be used to approximate the causality in Case II. The use of the approximation function is illustrated by the following examples. B. Examples To demonstrate the use of the above approximation function model, we use two examples for causality Case I and one example for causality Case II. Example 2: Example 1 demonstrates a typical linear causality between X and Y . This means k2 = 0. Suppose the encoders of X and Y are both the same as those in Section III, and d = 2. According to Theorem 2, k0 and k1 are determined by samples in S (X ) and S (Y ) as follows: k0 = 101010; k1 = 0:7. Then, the approximating function is as follows:

1 (EX ) = =(

(X )) 2 (EX ) = =(

(X ))

a2

(X ) a2

(X )

EX (a) 0

4

2

a2

(X )

EX (a)

2

3 EX (a) 0 =(

(X ))

2 2

a2

(X )

2 EX (a) 3 EX (a) 0

F (a) = 0:7EX (a) + 101 010:

a2

(X )

3 (EX ) = =(

(X ))

a2

(X )

a2

(X )

EX (a)

2 EX (a) 2 EX (a) 0 =(

(X ))

In the above example, we directly use the encoder method from Section III. We now present a different example in which the order of the point values of variables need to be rearranged. Example 3: For the rule in Example 2, let d = 2 and

0:1 0:8 0:1 0:1 0:1 0:8 0:1

4 (EX ) = =(

(X ))

a2

(X )

MX j Y =

0:1 0:8

:

2

a2

(X )

EX (a)

2 EY (aMY j X )EX (a)

1 (EY ) = =(

(X ))

For this matrix, before X is encoded, the order of its point values needs to be rearranged as follows:

a2

(X )

0

a2

(X )

2 EX (a)

a2

(X )

(EY (aMY j X ))

x3 ; x1 ; x2 :

We can rename the values as z1 = x3 ; z2 = x1 ; z3 = x2 . Then, the state space is S (X ) = f(p(z1 ) = a1 ; p(z2 ) = a2 ; p(z3 ) = a3 ) j a1 + a2 + a3 = 1g, and the encoder is as EX (a) = EX (a1 ; a2 ; a3 ) = d 2d 3d d 2d 10 a1 + 10 a2 + 10 a3 , or EX (a) = 10 p(x3 ) + 10 p(x1 ) + 3d d p(y ) + 102d p(y ) + 103d p(y ). 10 p(x2 ), and EY (b) = 10 3 1 2 Now we can solve the approximation function with the above encoder as follows:

2 (EY ) = =(

(X ))

(EY (aMY j X )EX (a)) a2

(X )

0

a2

(X )

EX (a)

a2

(X )

(EY (aMY j X )):

Proof: By using the principle of extreme value in mathematical analysis, we can find the minimum by taking the partial derivatives over

F (a) = 0:7EX (a) + 101 010:

530

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 32, NO. 4, JULY 2002

Given an observation a = (p(x1 ) = 0:2; p(x2 ) = 0:1; p(x3 ) = Now, given an observed value p(x1 ) = 0:1; p(x2 ) = 0:7), the probabilities of the point values of YP can be gained by using 0; p(x3 ) = 0:9, the probabilities of the point values of YP the propagation for Bayesian networks

p(y1 ) = 0:59;

p(y2 ) = 0:24;

p(y3 ) = 0:17:

The corresponding state of this observation is (0.7, 0.2, 0.1) and the encoder of the state is EX (a) = 107020. If it is substituted into the above approximation function, we have F (a) = 175924. According to the decoding method in Section III, we can obtain the probabilities of YZ from F (a) as follows:

can be gained in Pearl’s plausible inference model as follows: p(y1 ) = 0:23; p(y2 ) = 0:38; p(y3 ) = 0:39. The encoder of this observed value is EX (a) = 900000100. If it is substituted into the above approximation function, we have F (a) = 390386000. According to the given formulae numbers, we can obtain the probabilities of the point values of YZ from F (a) as follows:

p(y1 ) = 0:00;

p(y2 ) = 0:386;

p(y3 ) = 0:39:

p(y1 ) = 0:59;

p(y2 ) = 0:24;

p(y3 ) = 0:17:

In order to assure the probability significance level of the results, the final results are

Examples 2 and 3 have shown that the causality in Case I can be perfectly fitted by the above function F (a). Now we can demonstrate the case when k2 6= 0. Example 4: Let the conditional probability matrix of a node be

p(y1 ) = 0:224;

p(y2 ) = 0:386;

p(y3 ) = 0:39:

MY j X

0:5 0:2 0:3 = 0:3 0:4 0:3 0:2 0:4 0:4

:

Thus, we have kP (YZ ) 0 P (YP )k = 0:012. These results show kP (YZ ) 0 P (YP )k is decreased from 0.02 to 0.012 for the observation (p(x1 ) = 0:1; p(x2 ) = 0; p(x3 ) = 0:9). However, the influencing range of the computational error is not reduced as d is enlarged. V. PERFORMANCE EVALUATION To study effectiveness and efficiency, we have performed several experiments for the proposed approach. Our algorithms are implemented on DELL-Pentium III machines, using Java++. Using our PAF model, the storage space of a rule X ! Y can be reduced from O(jR(X )j3jR(Y )j) to O(jR(X )j+jR(Y )j) and its running time can be decreased from O(jR(X )j3jR(Y )j) to O(jR(X )j+ jR(Y )j). Our experiments in this section focus on the effectiveness and efficiency of the PAF model, compared with the matrix model [2]. In our experiments, 20 random matrices for jR(X )j = 2 and jR(Y )j = 3, and 20 random matrices for jR(X )j = 3 and jR(Y )j = 3 are selected. For each matrix, given 50 random samples of S (X ), 50 states of S (Y ) are first generated in (1). Then, the approximating function of this matrix can be constructed by using the proposed method from the 50 groups of data. Let YP = (p(y1 ); p(y2 ); . . . ; p(yn )) be 0 0 0 the result by using (1), and YZ = (p(y1 ); p(y2 ); . . . ; p(yn )) be the result of using the approximating function constructed. For the sake of simplicity, let d = 2 in our experiments. In practical applications, d = 4 is more appropriate in relation to the computing complexity and the accuracy of results. For each approximating function constructed, we use 100 random samples in

(X ) (

(X ) S (X )) to check the effectiveness of the PAF model by kYZ 0 YP k, where kYZ 0 YP k = 0 0 0 kp(y1) 0 p(y1 )k + kp(y2) 0 p(y2 )k + 1 1 1 + kp(yn ) 0 p(yn )k. Our methods for checking the effectiveness of the PAF model by kYZ 0 YP k in the experiments are as follows: 1) Maximum error (ME): A. Error

Suppose the encoders of X and Y are both the same as those in Section III, and d = 2. According to the PAF model, k0 ; k1 : and k2 are determined by samples in S (X ) and S (Y ) as follows: k0 = 302693:63; k1 = 0:10128629; k2 = 01:26717 2 10010 . Then, the approximating function is as follows:

2 F (a) = 01:26717 2 10010 EX (a) + 0:10128629EX (a) + 302693:63:

Given an observation a = (p(x1 ) = 0:1; p(x2 ) = 0; p(x3 ) = 0:9), the probabilities of the point values of YP can be gained by the propagation for Bayesian networks as follows: p(y1 ) = 0:23; p(y2 ) = 0:38; p(y3) = 0:39: The encoder of this observation is EX (a) = 900010. If it is substituted into the above approximation function, we have F (a) = 393749:66. According to the decoding method in Section III, we can obtain the probabilities of the point values of YZ from F (a) as follows:

p(y1 ) = 0:49;

p(y2 ) = 0:37;

p(y3 ) = 0:39:

In order to assure the probability significance level of the results, the final results are

p(y1 ) = 0:24;

p(y2 ) = 0:37;

p(y3 ) = 0:39:

Thus, we have kP (YZ ) 0 P (YP )k = kpZ (y1 ) 0 pP (y1 )k + 1 1 1 + kpZ (yn ) 0 pP (yn)k = 0:02. Again, we can construct an approximating function for d = 3 with the same samples as follows:

a2

(X )

0

Max fkYZ 0 YP kg

k0 k2

= 300246064; k1 = 0:1002169; = 06:69927 2 10 14 :

2) Average error (AE):

Then, the approximating function is as follows:

2 F (a) = 06:69927 2 10014 EX (a) + 0:1002169EX (a) + 300246064:

1 100M a

2

(

X)

fkYZ 0 YP kg

where M is the number of matrices checked, and 100 is the number of each matrix checked.

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 32, NO. 4, JULY 2002

531

TABLE I EFFECTIVENESS OF APPROXIMATING FUNCTIONS

C. Analysis It can be seen that the model based on encoding functions provides good approximations of probability propagation in Bayesian networks. The functions constructed for these matrices are without error. However, the matrices in Case II also become simple by optimizing, and the errors are certainly afforded in applications. This correspondence has focused on a class of tractable problems in Bayesian networks. Most of the matrices in the above four groups for checking time and space were selected to match Case I of causality. Only two to four matrices in each group are in Case II, and the functions for these matrices contain low errors, not over 0.01. Therefore, the propagation errors in the poly-trees are apparently acceptable for applications. Because TBNs depend on operations on matrices to propagate probabilities, the computation is nonlinear. The proposed approach propagates probabilities by approximating polynomial functions. Therefore, running time and space for the proposed approach are less than those in TBN models. VI. CONCLUSION The LPF model has been proposed for compressing a conditional probability table in Bayesian networks into a linear function [4]. In this correspondence, we have extended the model to encode propagation in Bayesian networks for a class of tractable problems. We have developed techniques for 1) encoding the state a of a variable X to an integer EX (a) (shown in Section III); 2) finding a polynomial function 2 F (a) = k2 EX (a)+k1 EX (a)+k0 close to propagation (shown in Section IV); and 3) finding a unique decoding function to extract b from 0 EY (b) (shown in Section III). The experiments have shown that our PAF model can approximately fit propagation in Bayesian networks. ACKNOWLEDGMENT The authors would like to thank the Associate Editor, M. Santos, and the anonymous reviewers for their detailed constructive comments on the first version of this correspondence. They would also like to thank G. Webb for his comments and help. REFERENCES

Fig. 1. Comparison on time.

Fig. 2. Comparison on space.

The effectiveness of the PAF model by kYZ 0YP k in the experiments is listed in Table I. In Table I, “Proc” is a procedure for either k2 = 0 or k2 6== 0, “Ftype” is the type of function fitted, “LF” stands for linear function, “PF” stands for polynomial function, “NPF” indicates nonpolynomial function, “No1” is the number of matrices, “No2” is the checking numbers, “ME” is the maximum error, and “AE” is the average error. B. Efficiency of the PAF Model For comparison, we select four groups of data from real-world investment applications. For propagating probabilities, data in each group forms a belief network (a poly-tree). The main properties of the data sets are as follows. The first group consists of ten objects (nodes), in which the average range of objects is 5 and the biggest range among the objects is 9. The second group consists of 15 objects, in which the average range of objects is 8 and the biggest range among the objects is 16. The third group consists of 20 objects, in which the average range of objects is 10 and the biggest range among the objects is 24. The fourth group consists of 25 objects, in which the average range of objects is 7 and the biggest range among the objects is 14. Comparison of our PAF model with traditional Bayesian networks (TBNs) [2] on running time and space are illustrated in Figs. 1 and 2.

[1] G. Cooper, “Probabilistic inference using belief networks is NP-hard,” Stanford Univ., Stanford, CA, Rep. TR KSL-87-27, 1987. [2] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Mateo, CA: Morgan Kaufmann, 1988. [3] E. Santos, Jr. and S. Shimony, “Belief updating by enumerating highprobability independence-based assignments,” in Proc. UAI, 1994, pp. 506–513. [4] E. Santos, Jr., “On linear potential functions for approximating Bayesian computations,” J. ACM, vol. 43, pp. 399–430, 1996. [5] , “On multiple spline approximations for Bayesian computations,” Ann. Math. Artif. Intell., vol. 20, no. 1–4, pp. 267–300, 1997. [6] S. Shimony, “The role of relevance in explanation—Part I: Irrelevance as statistical independence,” Int. J. Approx. Reason., vol. 6, pp. 281–324, 1993.