Resolution recipes Propositional 1. Convert Axioms to Clause form 2. Convert Negated Goal to Clause Form 3. Using Resolution Rule, over and over derive [] = empty clause = false. Predicate Logic Same as above - but use recipes on 294--298 of text skolemization (get rid of existential quantifiers) and unification Resolution Rule: M or S1 or ... or Sn and ~M or T1 or ... or Tp yields S1 or ... or Sn or ... or Tp To prove things with SAT matrices, proceed as above write the SAT matrix and then prove it is unsolvable. ---------------------------------------- Lecture 12 1. Exploit symmetry: statistically study the other agents what percentage of the time is C played? by winner? by loser? by opponent? what is the distribution of payoff plays played? does it change over time? 2. Exploit diversity: 1. make your agent hard to predict (include randomness). 2. have a whole stable of agents! 3. Don't be afraid to "borrow" useful routines from the monitor. Just rename them. 3. Exploit "intelligence" of other agents. 1. mimick the leaders 2. avoid playing like the losers 4. Pick appropriate LBTF (look back time frame) for calculating your statistics. If too small, we forget past. If too large, we adapt slowly. ====================================== In payoff try to determine what ranger of values are in the inner 50% , shoot to end in the top segment of this. Look for scalar valued agents that do well, like: 6, 40-80, 60. Choose among them stochastically. =========================================== Skolemization in Resolution: everybody loves somebody: AxE(y)L(x,y) in clause form: L(x,y'(x)) where y' is a skolem function. somebody loves everybody ExA(y)L(x,y) in clause form: L(x',y) here x' is a skolem constant. -------------------------------------- Lecture 13 We are studying : Resolution... Unification Four instances of P[x,f(y),B] are Substitutions. P[z,f(w),B] alphabetic variant {w/y} P[x,f(A),B] more specific {A/y} P[g(z),f(A),B] more specific yet {g(z)/x,A/y} P[C,f(A),B] ground instance since no variables. {C/x,A/y} Two or more expressions unify if there is a substitution that make them identical. The most general-unifier (mgu) of a set of expressions is the "simplest" substitution that makes them all equal. The principle being employed is known as LEAST-COMMITMENT or MINIMAL ENTROPY. Do not create complexity that does not exist. EXAMPLE: Consider unifying {P[x,f(y),B] with P[x,f(B),B]} a possible unifier is {A/x, B/y} but this is not most general since B/y will suffice. Unification can also be taken on a larger scale of arbitrary pattern matching, such as matching two representations of chess positions. In 1965, J.A. Robinson provided a tremendous contribution to Traditional AI by inventing resolution, clausification, unification and showing that resolution is complete for first order logic. Ironically, it is exactly this contibution which may have set AI back a few years, since it avoided understanding the structure of knowledge in terms of graphs and hypergraphs. In fact, clausification, as powerful as it is, removes exactly the high-level structure that is used by humans for efficient theorem-proving. Since Existential Graphs do not require a clausification stage this structure is retained. Clausification goes through nine stages: 1. Eliminate implication symbols 2. Reduce scope of negation. 3. Standardize Variables (each quantifier gets a unique one). 4. Eliminate Existential Quantifiers through Skolemization which creates a function that produces "the object that exists". 5. Move universal quantifiers to front. 6. Convert to CNF. 7. Eliminate universals. 8. Eliminate and, by breaking into separate clauses. 9. Rename variables. -- every variable must occur in at most one clause. Unification can be done with a straight forward algorithm which works on expressions in a left-to-right fashion and never requires backtracking... We gave the basis of the Unification Algorithm: Let x and y be variables. Let A and B be constants. Let f(x1,x2,...,xn) and g(x1,...,xm) be functions. Then we consider the following situations: x unifies with y: x/y or y/x. x with B: x/B x with f(x1,...,xn): f(x1,....xn)/x provided x does not occur in any xi. A with B: No, unless A=B. A with f(x1,...,xn): No, in general. (but with some math a unification might be possible.) f(x1,.....,xn) with g(y1,....,ym). No unless: f=g, n=m and there is a unifier of (x1,....,xn) with (y1,...yn). That is, unification is called recursively. Moving left to right and propogating substitutions to the right (and to previous substitutions) this algorithm will find a most general unifier if one exists. Examples: Find mgu of P(x,z,y),P(w,u,w),P(A,u,u) an answer: P(A,z,A} These do not unify: P(f(x,x),A) with P(f(y,f(y,A)),A) P(f(A),x) with P(x,A)}. 8 puzzle solvability (the cycle method): ======================================================= 8-puzzle. Distance to goal is odd or even depending on the location of the blank: eoe oeo eoe. Thus every state changes parity (odd or eveneness) of distance to the goal (and hence distance does not remain the same. DETERMINING WHETHER TWO TILE PUZZLE STATES ARE SOLVABLE: Lemma 1: The 0 requires an even number of moves to return home from the corners and center and an odd number of moves from the sides. proof: This can be seen by considering the number of right, left, up, and down moves required by the blank. Consider this state: 245 671 389 where 9 is the blank. Listing it as a permutation we get 245198367. Our Method counts cycles in the permutation. These are formed by considering for each tile, which tile is currently in its home. In the above state we have these cycles (124) (3597) (68). The count is three and odd. It can be shown that every move of the blank either joins two cycles or breaks an existing cycle, thus changing the parity of the number of cycles. Since the goal state is (1) (2) (3) (4) (5) (6) (7) (8) (9) which is odd, again we see that the state is solvable. Note that this method will work on all nxn puzzles! -------------------------------------------- Lecture 14 I. PEIRCE's INFERENCE RULES: 1. Double circles with nothing between them may be removed or added at will. 2. Any graph may be deleted from a negative context. 3. A copy of a graph may be removed from any context that is "inner" or equal to its copy. 4. Bob's rule: A blank circle can blow up its entire context including itself. NOTE: We will prove things by negating them and then deriving "circle" - the empty context on the sheet. This differs from logic handout which goes from empty sheet to theorem. We prefer our method, as it seems easier to measure progress. Also, it makes it easier to learn how to prove things from experience: If I want to show that A is false, and I have already shown B is false, than I only need to derive B from A. Suppose existing database is: 3. ~S or P 4. ~U or S 5. U 6. ~W or R 7. W We wish to prove: P and (Q or R). IN EGS: Must derive contradiction from: (S (P)) (U (S)) U (W (R)) W (P ((Q ) (R))) kill w: (S (P)) (U (S)) U R W (P ((Q ) (R))) kill u: (S (P)) S U R W (P ((Q ) (R)). kill s: P S U R W (P ((Q) (R))) kill p: P S U R W (Q) (R) kill r: P S U R W (Q) () Bob's rule: KABOOM!!!! ------------------------------------------ Lecture 15 Amount of surprise in event E = I(E) = log2(1/P(E). Expected Amount of surprise in event E with n possible outcomes Ei = summation(Pilog2(1/Pi). I(A|B) = uncertainty about A's state given that I know B's state Mutual Information between events A and B = I(A:B) = I(A) - I(A|B) = I(B) - I(B|A) = I(B:A). Total Information of Parts = Diversity of System + Mutual Information Between the Parts. (often one wants to compute the diversity of the system). Amount of surprise in an event with n possible outcomes is: p1*log2(1/p1)+ p2*log2(1/p2)+...+ pn*log2(1/pn) where pi is the probability of event i. ****This number will be maximum when all outcomes are equally likely (as in a fair die versus an unfair one). We say an event is "random" if all events are equally likely - this may reflect more our own ignorance than the actual state of affairs. [If we know nothing about a system we must assume its states occur randomly, else we are lieing to ourselves.] Mutual Information = I(U:V) = I(U) - I(U|V) = I(V) -I(V|U) = I(V:U). Notice that Mutual Information is symmetric. Consider a fair die {1,2,3,4,5,6). Consider a fair coin (H,T). But what if we know H when {1,2,3} and T when {4,5,6}. Then mutual information between die and coin is I(die) - I(die|coin) = log2(6)-log2(3) = 1 = 1 - 0 = I(coin) - I(coin|die) These relationships can be summarized by the following equation: SUM OF INFORMATION (surprise) PARTS = INFORMATION OF A SYSTEM + SYMMETRY (diversity) (mut. info) Surprise For a closed system the left-hand-side and right-hand-side remain the same, and the symmetry term never decreases (for a deterministic rule (this is the second law) - which is implied by a closed system). The second law follows from the fact that "causuality " is determininstic and a function so that: evolution is 1-1 or many-to-1 losing information (diversity) over time. --------------------------------------------- Lecture 16 quiz4 is on thrusday... covers: propositional logic sat matrices, existential graphs, resolution predicate logic resolution clausification skolemization unification 8-puzzle solvability Optimal NIM play symmetry, diversity , 1st and second laws of systems Lisp functions (2) previous exams.... We also covered chapter 19 on nearest-neighbor (note: they are implemented using k-dtrie) and we introduced version spaces (chapter 20) --------------------------------- Lecture 17 Today we covered "version spaces" which are adequately covered in chapter 20 of your text. Note that the author makes the rather restrictive assumption that there is only one rule! This dramatically reduces the search space at the cost of lowering the chances of success. We also mentioned the website random.org. --------------------------------- Lecture 18 today we covered decision trees (chapter 21) note that finding a minimal decision tree is np-complete so we use the greedy information theory algorithm. we also covered EMAs. please see: http://www.investopedia.com/terms/e/ema.asp enjoy... --------------------------------- Lecture 19 Today we covered perceptrons and neural nets which are introduced in your text. Perceptrons are also covered here: http://www.computing.dcu.ie/~humphrys/Notes/Neural/single.neural.html which includes an online demo! Things to remember: Linear functions can best be learned by the statistical method linear regression. By adding non-linear terms (such as products) virtually any function can be learned. Threshold learning can be done by using just one more input weight. Perceptrons learn a weighted function of their input examples! ---------------------------------- Lecture 20 Today we talked about genetic algorithms and chess. You can read about the chess rating system at http://chess.about.com/library/weekly/aa03a25.htm