OXFORD UNIVERSITY COMPUTING LABORATORY

Minutes of 2003 Meetings of the Algebra of Programming Research Group

Michaelmas Term 2003 (all Fridays)
Trinity Term 2003 (all Fridays)
Hilary Term 2003 (all Fridays)

Friday, 28th November 2003
Present: RSB, SC(minute-taker), JG, GJ, CM, BO, BS

Another miscellany this week.

RSB started by talking about Critical Mass, Mike Spivey's programming competition for undergraduates.

Apparently it is played on a large grid of squares, where players (blue and red) take turns to put blobs on the board. You can only put your blob on a square that is unoccupied, or occupied only by your own pieces.

A square (without walls) reaches critical mass when it has 4 blobs. A square with one neighbouring wall reaches CM with 3 blobs. A corner square reaches CM with 2 blobs.

   _________________________
   |   |   |   |   |   |   |
   |___|___|___|___|___|___|
   |   |   |   |   |   |   |
   |___|___|___|___|___|___|
   |   |   |   |   |   |   |
   |___|___|___|___|___|___|
   |   |   |   |   |   |   |
   |___|___|___|___|___|___|
Once a square reaches critical mass, it explodes, and the blobs burst into the neighbouring cells (one each, any extras remain behind), and all the cells "burst upon" by a new blob turn the colour of that incoming blob. But this can set off a chain reaction, if that means that other cells also reach their critical mass. To break any deadlock, cells from top-left have priority over exploding first.

Maybe the chain reaction settles down, maybe it goes on for ever! You win by either filling the whole board with your colour or by setting up a chain reaction that never stops.

JG tested a small 2 x 2 grid with five blobs in it, to demonstrate a never-ending chain reaction.

Mike will have a test harness to implement this, and will set the various programming strategies against each other and see who wins.

There was a discussion about how people in general go about programming this sort of thing.

---------------

The topic of discussion moved onto interview questions, and RSB revealed what his favourite interview question is:

   o  o  o  o   ....
   A  B  C  D
Which of these is good for dancing?

Disc-o.

---------------

RSB then turned to invariants, and how you might try and get an invariant from the desired postcondition of your program. (This is to help students find invariants.)

Looking at a sorting post-condition:

    (forall i,j : 0<=i<=j<=N : a[i] <= a[j] )
Doing the "replace constant by variable" manoeuvre like this
    (forall i,j : 0<=i<=j<=n : a[i] <= a[j] )
yields insertion sort.

Alternatively, RSB wanted to have an invariant "true", with the variant being the number of inversions.

Rewriting the invariant as

   (forall i: 0<=i<N : (forall j: i<=j<N : a[i]<=a[j]))
and then replacing a constant by a variable, you get
   (forall i: 0<=i<N : (forall j: i<=j<N : a[i]<=a[j]))
which leads you to either selection sort or bubble sort, depending on how you maintain the invariant.

Rewriting the original postcondition as

   (forall i: 0<=i<N-1 : a[i]<=a[i+1] )
you could get an insertion sort or a bubblesort from that.

Going in the direction of recursion (and not really using an invariant) you might get quicksort:

    up (0,N) <=>  (exists i: up(0,i) & up(i,N)
                             & a[0..i) <= a[i..N))
Heapsort? Needed to rewrite as
   (forall i,j : 0<=i,j<=N : i<=j => a[i]<=a[j] )
and then replace constant by variable
   (forall i,j : 0<=i,j<=n : i<=j => a[i]<=a[j] )
There then followed a discussion about the tree ordering of the elements for heapsort, and whether students would be capable of seeing how the heapsort algorithm worked without actually referring to the tree.
   hp(l,r) = (forall i,j : l<=i,j<=r : i <| j => a[i]<=a[j] )

    a:[true, hp(0,N)]

  ; a:[hp(0,N),up(0,N)]

---------------

RSB then changed the problem to the celebrities problem (see PSC minutes from 14th November) and offered a new definition of a celebrity from Julian Tibble, a 2nd year undergraduate at Worcester.

   Ci = Pi & (forall j : 0<=j<N : iKj => Cj)
P means popular.

JG wondered if it should be instead

   Ci = Pi & (forall j : 0<=j<N : iKj => Pj)
which was Julian's first suggestion, but popularity isn't enough! For example, consider when 0K2, 0K1, 1K2, 2K0, 2K1 (and 0K0, 1K1, 2K2).

There's no celebrity ingroup there, yet C1 holds, as 1 is popular (known by everybody), and 2 is the only other person that 1 knows, and 2 is popular too, but 2 is not a celebrity as 2 knows a non-popular person (0). Therefore that definition won't work, and we use

   Pi = (forall j : 0<=j<N : jKi )

   Ci = Pi & (forall j : 0<=j<N : iKj => Cj)
But this is recursive. Do we use the least fixpoint or the greatest fixpoint? (Or even some other fixpoint?)

Starting by looking at the greatest fixpoint,

    0
   C i  = true

    1
   C i  = Pi

    2
   C i  = Pi & (forall j : iKj => Pj)

    3
   C i  = Pi & (forall j :...: iKj => Pj)
             & (forall j,k :...: iKj & jKk => Pk)

                          *
   Ci = (forall j :...: iK j => Pj )
So the greatest solution refers to the reflexive transitive closure of K.
    0
   C i  = false

    1
   C i  = Pi & (forall j : 0<=j<N & j=/=i : ~iKj )

    2
   C i  = Pi & (forall j : 0<=j<N & j=/=i : iKj => Pj)
          & (forall j,k: 0<=j,k<N & k=/=j & j=/=i : iKj => ~jKk )
This refers to people who are popular, know only popular people, and who know only people who don't know anyone. This collapses to strong celebrities: who are known by everyone but who know noone.

---------------

There was also talk of the future.

Mathematics of Program Construction is to be held in July 2004, at Stirling (submission date end of January)

The next two weeks see various people unable to attend on Fridays, so we will next meet in the new year.

Bruno booked 9th January to talk about iterators. Barney booked 16th January to talk about a simplification he's found in his PPM work for arithmetic coding.

---------------

SC wondered if people would like a Christmas pressie of a brand new problem? Yes.

This is a problem about displaying things compactly, which could be used, for example, with something like the Unix command "ls", or displaying links on a web page.

You are given a list of words, which are to be displayed in fixed-width font and arranged into (straight vertical) columns, kept in their original order.

Overall, it is desired to minimise the number of lines used (maximum column height). The constraint is that the sum of the column widths have to be less than or equal to the page width.

         word1  |xxx   |xxxxxxxxxxxx|xxx
         wordtwo|xxxxxx|xxx         |xxxxxxx
         word3  |xxx   |xxx         |
         ...xxx |xxx   |xxxxxx      |
(Although in practice you'd worry about keeping a gap between words and adding 1 to the word lengths, you can just imagine that you've gone through and pre-processed the word lengths and added 1 already and adjusted the page width appropriately, so you can ignore gaps between columns.)

So, it's a partition problem where you want the minimum maximum-component-length.

RSB was wondering why it needed to have a ragged bottom. JG produced a nice example of wordlengths with page width 31:

   3  10 2 12
   4  11 3 12
   3  12   12
      10   12
      11   12
If you then try to smooth out the ragged bottom, you ruin the fitting within the page width.

In general,

   1 n 1 n
     n   n
takes up 2n+2, but
   1 n n
   n 1 n
takes up 3n.

JG wondered if something similar to the solution for longest upsequences couldn't be used for this...

---------------

...which then reminded GJ of the two-dimensional free space problem.

This concerns a resource which is essentially two-dimensional, and has cells (integer coordinates). Some of this space is busy, and there is a pattern of used and not used up space.

        ____________________________________
        |                                   |
        |              yyyyyyy              |
        |              yyyyyyy              |
        |              yyyyyyy              |
        |         xxxx yyyyyyy              |
        |         xxxx                      |
        |         xxxx                      |
        |                                   |
        |                                   |
        ____________________________________
Ultimately you want a representation for which space is used and which isn't, and you want to be able to answer the question "Can I manage to fit in a rectangle of this particular size in the free space?".

So, how to represent the rectangles? You could just store it by the used rectangles, but then it's quite awkward to calculate what's free in order to answer the question.

You could represent it by maximally-sized free rectangles, which makes it easier to answer the question, but already that's a quadratic-sized representation - think of rectangles like this:

   _____________
  |      xx     |
  |      xx     |
  |    xx       |
  |    xx       |
  |  xx         |
  |  xx         |
  |xx           |
  |xx___________|
It's an example of a real life problem which is difficult. It's easy to solve in 1D, but difficult in 2D.

Likewise the max seg sum is easy in 1D but difficult in 2D.

RSB suggested quadtree partitioning to represent the rectangles. SC suggested that if you were going to use quadtree partitioning, then HV partitioning (adaptive quadtree partitioning) would be much better. As GJ illustrated, that gives you a reasonable compact representation of the free/used space, but it's still a pain to compute whether there's space available for a new rectangle.

The minute taker was ordered to take down what GJ said:

GJ:
"Reconfigurable gate arrays are the answer to a maiden's prayer"
RSB:
"... for the right kind of maiden."

Happy Christmas, everyone!

Friday, 21st November 2003
Present: SC(minute-taker), JG, GJ, CM, BO, BS

JG began by telling us what he didn't want to talk about. He didn't want to tell us about how he can now do the normalisation process within his spigot algorithms (see recent PSC).

JG then talked a little bit about doing binary operations on streams and Peter Potts' work with homographies, but he didn't want to say much on that either.

He also didn't want to say much about how he was thinking of incorporating bell-ringing patterns into a Haskell practical, preferring to leave the explanations to BS.

BS explained that when bells ring, they start ringing in sequence like

    1 2 3 4 5 6 7 8   (1 is the treble, 8 the tenor)
and then they can move, but only adjacent bells can swap. So for example, if the conductor calls "3 to 4" then the order is now
    1 2 4 3 5 6 7 8
and if the conductor further calls "2 to 4" then it changes to
    1 4 2 3 5 6 7 8
BS tried to convince us how much exercise bell-ringing wasn't, but SC wasn't having any of it. GJ pointed out that conductors (who also wave their arms around a lot) live long.

BS showed us a simple bell-ringing change pattern, called "plain hunt"

   1 2 3 4 5 6
    \
   2 1 4 3 6 5
      \
   2 4 1 6 3 5
        \
   4 2 6 1 5 3
          \
   4 6 2 5 1 3
            \
   6 4 5 2 3 1
             |
   6 5 4 3 2 1
            /
   5 6 3 4 1 2
          /
   5 3 6 1 4 2
        /
   3 5 1 6 2 4
      /
   3 1 5 2 6 4
    /
   1 3 2 5 4 6
   |
   1 2 3 4 5 6
He then showed us the website of the Oxford University Society of Change-Ringers [1], complete with animation of a bell ringing, and then showed us various examples of methods [2] (patterns of bell ringing permutations)

e.g. Plain Bob Minor can be written as X16X16X16-12

GJ wanted to know what the normal form was. BS said that they had two normal forms, and then realised what he'd said and explained that they weren't quite normal. Anyway, see [3] for more information.

Back to JG.

He mentioned a visit from Graham Hutton, who was talking about guarded coinduction.

To illustrate, JG defined

   from n = n : from (n+1)

   inc (x:xs) = (x+1) : inc xs
Suppose you want to prove that
   inc (from m) = from (n+1)
JG then reminded us what bisimulations are.
   R is a bisimulation iff
   xs R ys  =>   head xs = head ys  & (tail xs)R(tail ys)
We say that xs~ys if there exists a bisimulation R with xs R ys So xs~ys iff xs=ys

So, let

   R = {(inc (from n),from (n+1)) | n in |N }
Co-induction is like assuming it for all larger values, in order to prove it for n. JG and GJ wondered why it was co-induction (plainly not induction)

|N is the least solution of the equation

   {{}} |_| {x+1 | x in X}  subseteq X
So
   |N subseteq Y  <=  {{}} |_| {x+1 | x in Y}  subseteq Y
NatStream is the greatest X such that
  {x:xs | x in |N, xs in X} superseteq X
So
  NatStream superseteq Y
    <=     {x:xs | x in |N, xs in Y} superseteq Y
GJ wondered what that was useful for. JG said that it was useful in proving safety properties about streams, e.g. all reachable points from a starting state are safe
~ is the greatest R such that
   {(x:xs,x:ys) | (xs,ys) in R, x in |N}  superseteq R
To show ~ superseteq Y, it is sufficient to show that
   {(x:xs,x:ys) | (xs,ys) in Y, x in |N}  superseteq Y
GJ pointed out that separately you have to prove that it contains enough things.

So this is how to do a co-inductive proof. You need to invent this set, and then show that it's a bisimulation.

Doing the calculation

     inc (from n)
   =
     inc (n : from (n+1))
   =
     (n+1) : inc (from (n+1))   (*)
   =         <----- the magic step - is it safe to do this?
     (n+1) : from ((n+1)+1)      (*)
   =
     from (n+1)
Guarded co-induction claims that that step is valid. e.g. try it with trying to prove
    inc (from n) = repeat (n+1)
and you soon find that you can't do it.

But replace the (*) lines with

     (n+1) : tail (inc (from n))

     (n+1) : tail (repeat (n+1))
and it looks like the hypothesis assumption allows you to prove it! Wrong, so it seems like the hypothesis must be used immediately inside the :

The induction principle we were trying to use was

    (cons . id x f = cons . id x g) =>  f=g
    ---------------------------------------
                 f = g
But that can't be right - if (cons . id x f = cons . id x g) then the top line is true but we can't infer f=g.

We sat and thought about this for a while.

This harks back to a paper [4] of JG and GH's a while ago, which talked about ways you could do proofs for unfolds (approximation lemma, folds, fusion, bisimulation). JG and GH would like to add this principle to the list of proof methods, but it has to be correct!!

We all trooped off for lunch.

[1] Oxford University Society of Change-Ringers

[2] Adam Beer's website has the method archive

[3] and also a description of the place notation

[4] J. Gibbons and G. Hutton: Proof methods for structured corecursive programs

Friday, 14th November 2003
Present: RSB, SC(minute-taker), GJ, CM, BO, BS

RSB was being enthusiastic to talk about some problems he has been having with celebrities, related to an MSc course on Formal Program Design currently being given. This momentarily wrong-footed him, as BO is in the audience for that too!

1st Celebrity Problem

Suppose you have a party of N (N>0) people, representing the people by the integers [0..N) (Notation is Kaldewaij/Morgan style.) A celebrity is someone who knows no-one else, but is known by everyone else.

Trying to produce a program which finds an x, where IF there is a celebrity in this party, then x is it. Writing k for the 'knows' relation,

C(x,N) = (Forall y: 0 <=y<=N /\ y=/=x : ykx /\ ~xky)
There was some quibbling from the floor about whether N ought to be an n, and RSB said that this was exactly what he wanted to think about. What steps in a derivation are obvious, or reasonable, and what steps are rabbits?

After some discussion it was deemed reasonable to rewrite the above to

C(x,n) = (Forall y: 0 <=y<=n /\ y=/=x : ykx /\ ~xky)

   x:[true, (Exists z : 0<=z<N : C(z,N) => C(x,N) ]

    {with invariant I yet to be written up..}
[=
   x,n:[true,I];
   do x =/= n -->
      x:[I /\ 0<=n<N, I(n:=n+1)];
      n := n + 1;
   od
Then, for the invariant:
   E(n) = (Exists z : 0<=z<n : C(z))
   C(x) = C(x,N)
   I = E(n) => C(x)
The invariant can be initialised by
   x,n := 0,1
GJ suggested an alternative initialization would be n := 0 but that would result in a silly x, which you don't want, so GJ insisted that RSB strengthen his invariant to include n >= 1.

For the invariant maintenance in the loop,

   I(n := n+1)
<=>
   E(n+1) => C(x)
<=>
   (E(n) \/ C(n)) => C(x)
At this point you can rearrange that around a bit according to the laws of propostional logic, but still - where to go from here? How do you help students do the derivation?

RSB wasn't wanting to consider the situation where you already know the algorithm roughly and do a derivation as a post-hoc verification, he wanted to concentrate on helping the students to get insight from the calculation itself.

Assume

   nkx
  <=> {since nkx => ~C(n) }
   E(n) => C(x)
  <=>
   I
Assume ~nkx
   E(n) \/ C(n) => C(n)
  <=>
   E(n) => C(n)
  <=
   ~E(n)
  <=> {~nkx => ~C(x) }
   E(n) => C(x)
  <=>
   I
Then
   I(n := n+1)
  <=>
   E(n) \/ C(n) => C(n)
  <=>
   E(n) => C(n)
  <=
   ~E(n)
At this point RSB was very surprised to be accused of using rabbits, and couldn't see why. GJ proceeded to produce a terrier to destroy the rabbit...
   ~nkx, so therefore ~C(x)

   I /\ ~C(x)
  =>
   ~E(n)
  <=>
   E(n) \/ C(n) = C(n)
  =>
   (E(n) \/ C(n) => C(x)) <=> C(n) => C(x)
Just to give closure, RSB wrote for us the program:
   x,n:= 0,1;
   do x =/= n -->
      if  nkx --> skip
      [] ~nkx --> x := n
      fi;
      n := n + 1;
   od

2nd Celebrity Problem

As RSB informed us (trying to show he was keeping up with popular culture), nowadays celebrities come in couples! Posh and Becks, Ant and Dec, Morecambe and Wise, Laurel and Hardy, ...

So we have the notion of a celebrity couple, who know each other, but no-one else, and everyone knows the celebrity couple.

A related definition

   CC(x,y) = (Forall z: 0<=z<N /\ z=/=x /\ z=/=y :
                          zkx /\ zky /\ ~xkz /\ ~ykz)
[The editor of these minutes thought this was meant to be the definition of a celebrity couple, but can't see reference in the above to xky and ykx. It is possible that this was overlooked during note-taking.]

Going to head for the same sort of program as before.

   I = 0<=n<=N /\ 0<=x,y<=N /\ .... what?
Trying to define what it meant for an individual to be part of a celebrity couple
   A(x) = (Forall z: 0<=z<N /\ z=/=x : zkx)
           /\ (#z: 0<=z<N /\ z=/=x : xkz) <= 1
<=1 ? GJ accused RSB of thinking ahead. SC and GJ ganged up on RSB and made him change it to = 1, like so:
   A(x) = (Forall z: 0<=z<N /\ z=/=x : zkx)
           /\ (#z: 0<=z<N /\ z=/=x : xkz) = 1
There then ensued discussion about an invariant, as RSB said that he'd had difficulty thinking of one. He offered to write the program up, but we didn't want to see the program. SC suggested
   I =  (Forall z : 0<=z<=n : A(z) => (z=x \/ z=y))
which RSB didn't like, but GJ liked it, so RSB got outvoted, and then started the derivation with this invariant new to him...

Initialising was fine:

   x,y,n := 0,1,2
At this point, RSB said he was getting scared. SC: "Just like the students!" RSB: "Actually I'm getting excited!" GJ: "You need to get out more..."
   I(n := n+1)
  <=>
   (Forall z : 0<=z<n : A(z) => z=y \/ z=x)
     /\ A(n) => n=x \/ n=y
Assume nkx /\ nky, so ~A(n)

Then

     (Forall z : 0<=z<n : A(z) => z=y \/ z=x)
     /\ A(n) => n=x \/ n=y
   <=>
     (Forall z : 0<=z<n : A(z) => z=y \/ z=x)
     /\ True
   <=>
     (Forall z : 0<=z<n : A(z) => z=y \/ z=x)
Assume ~nkx, so ~A(x) Then replacing x by n maintains the invariant The ~nky case is symmetrical.

The resulting program is:

   x,y,n:= 0,1,2;
   do x =/= n -->
      if  nkx /\ nky --> skip
      [] ~nkx        --> x := n
      [] ~nky        --> y := n
      fi;
      n := n + 1;
   od

3rd Celebrity Problem

"Don't say trios!"

Groups. "In-groups" of celebrities. An in-group of celebrities is a non-empty group such that everyone knows everyone else within the group, but they don't know anyone outside of it, and everyone outside the in-group knows everyone in the in-group.

BS suggested the term "weak celebrity" for referring to someone who everyone else knows. Then an in-group is a group of weak celebrities that don't know anyone else outside the group.

    G(S) =  S=/={} /\
            (Forall x,y: x in S /\ 0<=y<N /\ y=/=x: ykx)
            /\ (Forall x,y: x in S /\ y ~in S : ~xky)
Alternatively,
    G(S) =  S=/={} /\
            (Forall x,y: x in S /\ 0<=y<N /\ y=/=x:
                       ykx /\ (xky => y in S))
RSB has got much less further with this one. He hasn't got a program. He hasn't got an invariant.

One point noted by the assembled throng was a relationship between in-groups and supersets. If you have an in-group and you remove people from the party, then the original in-group is a superset of the new in-group (if it exists).

Looking at an example, using ---* for "knows" (* = arrowhead)

     2
     |*
     | \
     *  *
     0*-*1
In this case there's no in-group and no point in keeping any of them, as none of them can be part of an in-group overall.

Anyone not known by somebody is not worth knowing, cynically speaking, "just like real life".

Now looking at

     2
     |*
     | \
     *  *
     0*-*1     3
Not having looked at who 3 knows/is known by, if there *is* an in-group in that group, then {3} is it.

Now try this

     2
     |\
     | \
     *  *
     0*-*1     3
(again without looking at who 3 knows/is known by yet)

If ~3k0 then {3} is the only possible in-group.

Richard likened this to a cascade - if someone leaves the in-group, everyone does!

If 3 doesn't know any of the current in-group, then they all go, and {3} is the only candidate in-group.

If only one of the in-group knows 3, then they "all go away", ie they can't be a celebrity in-group

If 3k0,3k1,0k3,0k1, then we know that they all know each other, but we don't know whether 2 knows 3, for example, so we're not sure whether {0,1,3} is a suitable in-group...

SC suggested that it was useful to know what you know about the candidate for the in-group. For example, in the 2nd problem, you know where you are with a pair, and what it means to say that that's a candidate for a celebrity couple. With a group, it's more unclear to know what you can deduce from "if there is an in-group, it is *this* candidate in-group"

Having actually solved a problem for once (the second one), everyone disappeared very fast in the direction of lunch.

Friday, 7th November 2003
Present: RSB, SC(minute-taker), JG, GJ, CM, BO, BS

It was a short meeting this week, as Erik Meijer was giving a talk at 12 noon.

Following on from mentions of programs to calculate the digits of π in previous PSCs (see Friday, 17th October 2003 (middle section) and Friday, 26th September 2003), JG told us more.

JG had gone to the trouble of preparing slides for this week, because he didn't want to have to write out all the programs and data tables. These minutes are shortish, to complement JG's slides. (There is also a draft paper.)

JG's talk was enitled Spigot Algorithms.

He started by showing us a slide of the short baffling obfuscated C program to calculate the digits of π, then showed us another slide which beautified it, laying it out properly, but didn't explain the program at all. He did, however, draw our attention to two key points:

NDIGITS is a constant set to 15000
LEN is another constant set to (NDIGITS/4 + 1)*14
NDIGITS is because you have to decide in advance the number of digits you want, and then you work out how many terms of the infinite series you're going to need in order to generate that quantity of digits.

Another slide gave a series to generate π (Wallis' formula). The key point of this slide is that, using funny mixed-radix bases

    π = 2.22222222222...

    π = 2 + 1(2 + 2( + 3(2 + 4(2 + ... )))))
            -     -    -     -
            3     5    7     9

      = 3 + 1(1 + 1(4 + 1(1 + ...
            -     -     -
            10    10    10
So the idea of this spigot algorithm is that it converts π from the 2.22222 mixed-radix representation into decimal 3.141...

JG then explained what a spigot was, it being a rod you put into a tap, like on a barrel of beer. It's called a spigot algorithm because the digits drip out one by one. BUT, it's not streaming, as later became obvious.

The next slide explained the algorithm, in particular the bit where the fraction has to be repeatedly normalized.

On slide 4.1 we could see the inner workings of the spigot algorithm:

First line, it starts with 2 2 2 2 2 ... Next it scales by 10. Easy, just a map. Then it normalizes, and going from right to left, the carry pops out the left hand side as a 3 - the first digit!

Then it keeps repeating those steps, popping out more and more digits. It's also the case that as the algorithm progresses, it gets faster and faster as some terms may be discarded from the table (fewer and fewer digits contribute to the next digit of π to drip out)

JG explained why we actually need to buffer the bottom digits - because otherwise you might do the equivalent of printing out 0.9999 when you meant to print out 1.000

Slide 4.3 showed the program translated into Haskell. "Eurrrgh!" was the reaction. JG said that he hadn't had time to beautify it yet.

JG also showed us another obfuscated C program, displayed in the shape of a π, which uses π-like names for its variables. And this program prints out the digits of .... e. JG got all these π things from [1].

JG then compared this program to his streaming program, and pointed out the link between the homographies and the formula for π, as doing

     1
 2 + - x        is the same as doing  /1 6\x
     3                                \0 3/
JG wants to try the spigot approach, but with an infinite list, not limiting the number of digits. The map (10*) bit is ok, it's the carries that are the problem.

JG produced a conjecture of his:

    0; 2,4,6,8,10,...    = 2
    0;20,40, ....        = 20

    0; 0,0,... ,0,2k,2(k+2),2(k+4),....   = 2k
JG is looking for bounds, wanting to find "What's the biggest carry that can come in?". He pointed out that he has a program where mostly it gets the right digits, but occasionally there's an error, and then a few digits later, a carry of 6000 comes in!!

The questions were posed: So, how far to the right can a carry happen? And how much can a carry affect the digits?

Then we all trooped off to hear Erik's talk.

[1] Jörg Arndt, Christoph Haenel, π Unleashed, Springer-Verlag, 2001. ISBN: 3540665722

[2] Jeremy Gibbons, An Unbounded Spigot Algorithm for the Digits of π.

Friday, 31st October 2003
Michael Abbott was speaking about dependent types today. A dependent type is a family of types. For example, the type of natural numbers less than n is dependent on n. This is written n:N |- Fin (n). Another example is that of arrays of size n: n:N |- Array (T, n) x Fin (n) -> T.

He went on to speak about how substitution works, and showed how Martin-Löf type theory can be seen as an example of dependent types. There followed some discussion of languages that use dependent types, including Lego and Cayenne.

Friday, 24th October 2003
Present: RSB, SC (minute-taker), JG, GJ, CM, BO, BS

RSB was talking about loopless algorithms (a follow-on from PSC June 13th 2003)

Suppose you want to associate some objects associated with a data structure( e.g. lists or trees), like permutations or subsequences, where typically you might have exponentially many to enumerate.

A loopless algorithm consists of first, a prologue (taking O(n) steps) and then a generation of the elements, each one being generated in constant time. So a loopless algorithm is of the form

          unfoldr step  . prologue
where prologue is O(n), and step is O(1).

SC wanted to know if the O(1) could be amortized? RSB said not. RSB also mentioned that he wasn't making using of laziness - in fact, laziness tended to get in the way.

RSB used an example

     concat :: [[a]] -> [a]
where the prologue has to be
     prologue = filter (not . null)
and
     step [] = []
     step ((x:xs):xss) = Just (x, Cons xs xss)
RSB said that the last time he said this, GJ complained that you could just have prologue = concat. GJ denied all knowledge of this. RSB pointed out that that trick won't work for generating exponential quantities of output.

Another example - preordered traversal of a tree:

     preorder Null = []
     preorder (Node x l r) = x : preorder l ++ preorder r

     prologue = wrap

     step [] = Nothing
     step (Node x l r : ts) = Just (x,Constree l (Constree r ts))
Now an inorder traversal of a tree
     inorder Null = []
     inorder (Node x l r) = inorder l ++ x : inorder r
Its loopless algorithm is
     unfoldr step . mkSpine
For a tree looking like
           t1
            (_)
        t2 /   \
         (_)
     t3 /   \
      (_)
     /   \
RSB is going to put the spine in the order [t3,t2,t1]

It was pointed out that you can obviously do it in this case because the answer can be generated in the prologue. RSB: "Yes, yes, alright, sea lawyer!" [WordNet dictionary says SEA LAWYER Pronunciation: see'loyur Definition: [n] an argumentative and contentious seaman ]

    mkSpine t = addSpine t []

    addSpine Null ts = ts
    addSpine (Node x l r) ts =  addSpine l (Node x l r : ts)

    step [] = Nothing
    step (Node x l r : ts) = Just (x, addSpine r ts)
RSB then said that he wanted inorder traversals of *perfect* trees, perfect trees being complete trees, like this sort of shape:
           /\
          /  \
         /\  /\
        /\/\/\/\
RSB then introduced a datatype of spiny trees. JG: "Spinary trees?"
     data SpinyTree a = Fork a [SpinyTree a] [SpinyTree a]
JG suggested that maybe we don't need spiny trees, and maybe we should just use rose trees? The loopless algorithm is
     unfoldr step . mkSpiny

     mkSpiny :: Tree a -> [SpinyTree a]
     mkSpiny t = addSpiny t []

     addSpiny Null sts = sts
     addSpiny (Node x l r) sts = addSpiny l (Fork x sl sr : sts)
                               where sl = mkSpiny l
                                     sr = mkSpiny r
JG wanted the invariant put in, and RSB eventually did so:
     addSpiny t sts = mkSpiny t ++ sts
GJ was wondering why use spines if you're not going to need them, amid general wonderings why RSB was using spines.

Looking at the complexity for perfect trees,

     sl ++ Fork x sl sr : sts
RSB noted that it would be
     T(n) = 2T(n/2) + O(log n)
which is Theta(n).

GJ pointed out that it would also be linear on leftist trees.

There was then discussion about the second sl in

     sl ++ Fork x sl sr : sts
                  ^^
and whether we could just use Null instead? GJ wanted to replace the "sl = mkSpiny l" in the definition of addSpiny by "sl = take __ (addSpiny l (Fork x sl sr : sts))" RSB accused GJ of liking cyclic structures.

RSB said that step was the same step function as from the concat program.

Apparently that was just the warm-up. Then RSB got onto what he really wanted to say, about mix order and mixing.

    mix :: [a] -> [a] -> [a]

    mix xs [] = xs
    mix xs (y:ys) = xs ++ y : mix (reverse xs) ys

    e.g.  1 2 3    a b c
    would mix in the fashion
    1 2 3 a 3 2 1 b 1 2 3 c 3 2 1
RSB says that mix is associative, and the proof is left to the diligent reader.
    mixall = foldr mix [] . filter (not . null)
RSB said that as mix was associative, you could use it with foldl instead, but it was pointed out that you could only do that as long as you wanted to apply it to a finite list. RSB: "There's a sea lawyer in here..."

RSB wanted a loopless algorithm for mixall, and took us on a diversion through Gray codes:

    gray  n= mixall [[i] | i <- [1..n]]
To illustrate:
    [[1],[2],[3],[4]]
    [1,2,1,3,1,2,1,4,1,2,1,3,1,2,1]
Why Gray? Well, this gives you the flipping instructions. The instructions for which bit to flip.
        1 2 3 4
        0 0 0 0
        1 0 0 0
        1 1 0 0
        0 1 0 0
           .
           .
           .
The Johnson-Trotter algorithm is an algorithm to produce permutations, where each permutation differs from the previous by only a single transposition. SC wanted to know whether Johnson and Trotter were bellringers.

The Johnson-Trotter code can be defined in terms of mixall, but RSB hadn't sufficient time to go into it right then.

      mixAll = unfoldr step . prologue

      prologue = map (\xs -> (xs,[]))

      step [] = Nothing
      step ((x:xs),sx):xss) = Just (x,(xs,x:sx):xss)
      step (([],sx):xss) = case  step xss  of
                             Nothing -> Nothing
                             Just (x,xss') -> Just (x,(sx,[]):xss')
This algorithm isn't loopless - it hasn't filtered out the [].

One way to make progress is to relate mixall to a mixorder traversal of binary trees.

      mixorder Null = []
      mixorder (Node [] l r) = mixorder l
      mixorder (Node (x:xs) l r) = mixorder l ++ x : mixorder (Node xs r l)
(This is one way of doing BS' booleans idea - swapping round the order of l and r instead)

Claim:

     mixAll = mixorder . mkTree

     mkTree = foldl op Null
              where op t xs = Node xs t (reflect t)

     reflect Null = Null
     reflect (node xs l r) = if even (length xs) then Node (reverse xs) r l
                                                 else Node (reverse xs) l r
RSB then drew a tree to illustrate. There then followed a discussion about bananas, cats, mice and barbed wire, in an effort to find a suitable phrase. We ended up with
             Yes
             / \
            /   \
          we     ew
          | \    | \
          |  \   |  |
          |  _\__|_/
          | /  \ |
         are     era
          | \    | \
          |  \   |  |
          |  _\__|_/
          | /  \ |
       bananas  sananab
[bananas,are,we,yes]
    rev (mix xs ys)
        = mix (rev xs) (rev ys)   <= #xs even
        = mix xs (rev ys)         <= #xs odd
It was commented that using mixOrder in order to do Gray codes is a serious rabbit/sledgehammer.

RSB issued the challenge:

Is there any way to do a loopless version of mixAll?
Then the minute-taker trooped off for lunch.

Friday, 17th October 2003
Present: MA, RSB, SC (minute-taker), JG, GJ, CM, BO, BS (Welcome to Bruno, JG's new research student!)

BS was talking again today about PPM (Prediction by Partial Matching), and begun by reminding us of what it is about.

In arithmetic coding to achieve data compression, PPM predicts the next letter in the input stream based on what input has already been seen. So if your prediction is wildly inaccurate, then you're not going to get very good compression. The better the model you have for predicting, the better the compression is going to be.

In what's known as the order-zero model for PPM, the prediction consists of simple probabilities based on counts of the number of occurrences of each letter (in the input stream seen so far).

The order-zero model can be improved on by looking at the letter immediately beforehand (just before the letter being predicted). So by counting the likes of

"How many times has an 'a' been followed by a 'b'?"
"How many times has an 'a' been followed by a 'c'?"
etc. you can get improved probabilities for predicting the next letter. This is called the order-one model, because it's based on one character beforehand.

The order-two model looks at the previous two characters, the order-three model looks at the previous three characters, and so on... until you have a model PPM* which is using the whole of the input stream so far for the prediction model.

BS then started to tell us about a particular feature of the encoder which involved escape characters, but at this point we wanted an example. Writing _ for a space char, BS wrote up the string

    THE_CAT_SAT_ON_THE_MAT
After a discussion about which was a suitable character to use as an example (which involved suggestions of dogs sitting on mats instead), we got to supposing that the encoder had just got up to the O character:
    THE_CAT_SAT_ ON_THE_MAT
                 ^
Now at this point, an 'O' has never been seen before (there's no context for it), so the encoder emits a special escape char, to tell the decoder to switch to a shorter context.

e.g. (in more detail) Suppose you were using context of length 3 (the order-three model). After "AT_" (the 3 chars just before the 'O'), an 'O' has never been seen before so far in the input, so the encoder emits an escape character, which says "never been seen before in this context", and then switches to a shorter context (the order-two model), and tries that. But 'O' has never been seen in the "T_" context either, so the encoder tries shorter contexts of just one letter (the order-one model), but that doesn't work either, and neither does the order-zero model (probabilities from letter counts), as 'O' is a new character that hasn't been seen before, so the encoder has to completely give up and revert to the "order-minus-one model", which is not having any clue whatsoever about the probability of 'O' occurring.

During the encoding, information has to be kept about the contexts seen so far in the input string, and this is kept in a suffix-tree. (A suffix-tree of a string is a tree that contains all the suffixes of that string.) PPM uses suffixes of the input stream so far for its context, and can use the tree to move easily to a shorter context (for situations such as the above), from the information stored in the nodes of the tree.

RSB wanted to see an example suffix tree, and after an abortive attempt to use the string "abracadabra", which doesn't work very well as an example, RSB declared that his favourite tree for these sort of things was "barbara".

                      /|\
                     / | \                  Suffixes:
                   a/  |b \r
                   /   |   @                barbara
                  /    |   |\               arbara
                r/     |a a| \b             rbara
                $      |   .  \             bara
               / |     |      |             ara
              /a |b    |r     |a            ra
             .   |     *      |             a
                 |    / \     |
                 |a  /a  \b   |r
                 |  .     \   |
                 |        |   |
                 |r       |a  |a
                 |        |   .
                 |        |
                 |a       |r
                 .        |
                          |a
                          .
JG thought it was usually a trie, not a tree, but BS said that that took quadratic space.

A discussion of suffix links then resulted in the above diagram being covered by little blue arrows.

Explanation & verbal description of blue arrows (can't do blue in ASCII so you have *$@*ing tree nodes instead):

Suffix links are indeed links to suffixes. In the above suffix-tree, the node * represents the string "bar" (as "bar" is the path followed from the root) and "ar" is a suffix of "bar" so there's a suffix link from "bar" to "ar", ie from the node * to the node $. Similarly, there's a suffix link from $ to @, because "r" is a suffix of "ar".

It was also noted that in the above tree structure, there can be some sharing done, because the subtree

            *
           / \
          /a  \b
         .     \
                \
                |a
                |
                |r
                |
                |a
                .
is repeated three times.

RSB was trying to think about this tree in the context of the PPM encoding algorithm, and wanted to know if you've just seen Barbara, where would you be? What if the next letter is an 's'?

BS explained that various nodes in the tree would be split and an 's' put in.

(Mutters of "free barabbas" "with every product" were heard from the back.)

Ukkonen's algorithm [1] is a linear-time algorithm for maintaining suffix-trees, character by character, and it uses various programming tricks for maintaining linearity (one of which is suffix links), so that when you see an 's', the suffix-tree updates without having to do any extra effort.

BS: "Shall I tell you about about Ukkonen's algorithm?"

(unanimous) "No."

Instead, BS went through an example of what happens to the model during arithmetic encoding (see e.g. [3] for details of arithmetic encoding), using "barbara" to illustrate.

Initially, an order-minus-one model is used, with probabilities (don't ask why the alphabet begins with 'b', ok?)

       b   a   r
      33% 33% 33% ,  pictured as    [---b----][---a----][---r----]
First, we see a 'b'. The encoder outputs an interval representing a 'b'
   'b'  [0, 1/3 )
Next letter 'a'

Now we are in the order-zero model. We haven't seen an 'a' before, so we are going to emit an Escape character. In this model, Escape characters are assigned probability 1/2, like this:

                      [------------][------------]
                        alreadyseen     Escape
So the encoder outputs the interval representing the Escape char
   ESC  [1/2, 1)
(back to order-minus-one model [---b----][---a----][---r----] ) then outputs the interval representing 'a'
   'a'  [1/3,2/3)
Now we are back in the order-zero model [--b--][--a--][----ESC-----] and see an 'r'. Not seen before! So the encoder outputs the interval representing the Escape char
   ESC  [1/2, 1)
(back to order-minus-one model [---b----][---a----][---r----] ) then outputs the interval representing 'r'
   'r'  [2/3,1)
Now in the order-zero model [-b-][-a-][-r-][-----Esc-----]. We see a 'b', the encoder outputs the interval representing a 'b'
   'b'  [0,1/6)
Now in the order-one model [------a------][-----ESC-----] (as we have seen "ba" before but no other "b*") We see a 'a', the encoder outputs the interval representing an 'a'
   'a'  [0,1/2)
Now in the order-two model [------r------][-----ESC-----] (as we have seen "bar" before but no other "ba*") We see an 'r', the encoder outputs the interval representing an 'r'
   'r'  [0,1/2)
Now in the order-three model [------b------][-----ESC-----] (as we have seen "barb" but no other "bar*") We see an 'a', but we've never seen an 'a' in this context, so the encoder emits
   ESC  [1/2,1)
And now just looking at order-two, ie context "ar", for which the model is [------b------][-----ESC-----], no again, the encoder emits
   ESC  [1/2,1)
Now just looking at order-one, context "r", for which the model is [------b------][-----ESC-----], no again, the encoder emits
   ESC  [1/2,1)
Now in the order-zero model, in which you have seen an 'a', so from the letter counts, the model is [-b-][-a-][-r-][-----Esc-----] and the encoder outputs
   'a'  [1/6,2/6)
So that's basically what the PPM model is doing. The more complicated the model, the slower the decoder, as the decoder has to do the same work in order to make the model as the encoder does.

BS pointed out that in some situations this was fine, like for making backups, where decompression is needed very rarely. So it depends on the application.

BS was asked during the course of his example whether ESC was always given 1/2? He explained that there are various ways of deciding how to allocate probabilities to the ESC char, e.g. another way to do it was to assign it a small fixed probability.

BS then started talking about using Bayesian statistics for prediction (his way of doing the predicting, compared to the standard PPM way), and reminded us that

     P(A|B) = P(B|A)P(A)
              ----------
                 P(B)
You start off doing the encoding with no knowledge, and then use the data to update your assumptions.

BS then considered the probability of seeing an 'A', given that you've just seen 6 'A's and 7 'B's:

     P(A | seen 6As, 7Bs)   =     (6+1)
                               -------------
                               (6+1)+(7+1)+1


     P(B | seen 6As, 7Bs)   =     (7+1)
                               -------------
                               (6+1)+(7+1)+1

     P(? | seen 6As, 7Bs)   =      1     (where ? unseen char)
                                   --
                                   16
For some reason you start all the counts at 1, which is why you go around adding 1 to everything, but BS didn't really explain this.

RSB was wondering why this was different from PPM?

BS explained that the difference was that you don't need to emit an ESC char because it's got a non-zero probability.

RSB commented that this problem was interesting because it's not the correctness of data compression - the correctness is trivial. Efficiency is the thing, but we don't have a formalism for expressing that.

Now (getting closer to BS' current work), suppose so far in encoding the string, the current context is (tail xs), and the prediction probabilities for chars 'A' and 'B' being next are

      A   B
      n1  n2
Now suppose we take a slightly longer context of xs, and the prediction probabilities are
      A   B
      m1  m2
BS was worried that he was counting characters multiple times, and messing up the probabilities. To explain further, suppose that
     P(next char is 'A'| just seen tail xs) = p
Assume that somehow, we've calculated a distribution on p, so that
     p ~  Beta(alpha,beta)    (explanations of Beta,alpha,beta follow)
And suppose that
    P(next char is 'A'| just seen xs) = q

    q ~ Beta(alpha+m1,alpha+m2)
"Where's the 'n's gone?" " 'm's ?" "Nose not mother"

BS explained where the 'n's had gone: if you take alpha',beta' from the previous context, then alpha = alpha' + n1 and beta = beta' + n2 And then sighed that at this point he was counting things more than once.

At this point we wanted to know what a Beta distribution[6] was, anyway.

It's this probability distribution

      alpha-1     beta-1
     x       (1-x)        Gamma(alpha+beta)
                     ----------------------
                     Gamma(alpha)Gamma(beta)
The gamma distribution is related to the binomial distribution, and for integers,
        Gamma(n) = (n-1)!
BS then drew some pictures of distributions.

The big picture is that PPM* is the benchmark to beat. It has reasonable speed, and reasonable effectiveness.

The point about the Bayesian version is to try to beat PPM*. It is hoped that this will happen, but hasn't yet, as there are still bugs in BS's Bayesian version.

RSB wanted to know if BS' version (BPPM) can be regarded as PPM* with a particularly complicated version of dealing with ESC? Yes, says BS.

---------------

There followed some discussion of future sessions - next week RSB will be loopless, and the week after that, MA will be dependent upon types.

Mention was made of the two-line pi program that JG sent round this week. He doesn't understand it yet. It's written in C and was presumably a candidate for the obfuscated C competition.

---------------

RSB couldn't bear not to use the remaining half hour, and wanted to know why the following would be useful.

It's an extension to Haskell called Template Haskell, and it's a way of revealing the guts of the compiler. You do some transformation from a Haskell expression

    [[ haskell expression ]]  --->  Exp
and get back some Exp, in the same language, with lambdas, Vars, Apps, etc all over it. Basically it puts the "quote" procedure from Scheme and Lisp back into Haskell.

RSB only met this yesterday afternoon, apparently "...in a dark alley", muttered someone.

So what would you want this for?

Take pretty printers for various types; you don't need to use generic Haskell any more, because you can build pretty printers on the fly instead.

We finished with a discussion on soft typing and dynamic typing. (Soft typing = You try and type your program, and if it works, fine, if not, go back to dynamic typing).

Eventually we all trooped off for lunch.

[1] E. Ukkonen: "On-line Construction of Suffix Trees" Algorithmica, 14(3) pp249-260, 1995.

Editor's suggestions for further reading:

[2] Probability Estimation for PPM

[3] Explanation of Arithmetic Coding

[4] One explanation of suffix trees and another explanation of suffix trees

[5] Suffix link generator

[6] Beta/Gamma distributions

[7] Template Haskell

Friday, 10th October 2003
Present: MA, RSB, SC, GJ, CM (minute-taker), BS

SC talked about her work on the classification of greedy algorithms, which has so far culminated in a paper to appear in SCP later this year, and which she eventually intends to turn into a book. She said that she still has some fuzzy edges, and she would welcome other people's views on these.

She began by drawing a diagram on the board:

			Best-Global
			/       \
		       /	 \
		Best-Local 	Better-Global
		       \ 	 /
			\	/
			Better-Local
(At this point GJ walked in and asked if this was a talk about pubs.)

In fact, this diagram names the four principles she uses to classify greedy algorithms, and the lines (which were drawn as up-arrows), are implications. So, every greedy algorithm satisfies the Best-Global principle, and some satisfy one or more of the others too. She has around thirty examples of algorithms that can be classified by these four principles (together with the intersection of Best-Local and Better-Global.)

She then explained how optimisation problems that can be solved by greedy algorithms can be specified as:

		opt C . /\ (rep S)
where C is a preorder representing the global optimality criterion, S is a construction step that gets repeated until it can no longer be taken, and opt C produces the optimum with respect to C. A greedy algorithm is a repetition of a greedy step defined by
		G = opt L . /\ S
where L is a local optimality criterion. Apparently, many people assume that L=C, but this is not always true and excludes a large class of greedy algorithms. RSB said that he thought the following characterisation is more intuitive (it looked clearer on the white board):
		G \subseteq S
		G . S^o \subseteq L
		dom G = dom S
SC then explained each of the four classification principles in detail. This is all very clearly described in a draft version of her paper which is available at her website.

One of the fuzzy edges that SC was concerned about was that every greedy algorithm can be classified as Best-Local, by taking L to be adherence to the greedy algorithm. So the problem is that the same greedy algorithm can be characterised by different values of L and S. GJ suggested that she shouldn't classify greedy algorithms, but the values of L and S instead. (I think).

She then described the Tree Trunk problem. She had discovered this while chopping down trees in her garden and converting them into borders for raised beds. The problem was to cut as many logs of a fixed length l from the trunks by doing a minimum amount of sawing. She claimed that it was sufficient to consider one trunk. (This provoked some discussion about combining greedy algorithms, since it's not obvious that the result is itself a greedy algorithm.) The interesting thing about this problem is that the greedy algorithm is counter-intuitive: instead of beginning to saw from the thin end of the trunk, you must start from the thick end.

Finally, RSB pointed out that SC's book must explain how her system relates to the established classification in terms of matroids and greedoids, and she said that it would, and reminded us all of what they are. The problem with them is that they assume that all partial solutions can be completed in a fixed number of construction steps, and this is not necessarily the case.

Friday, 3rd October 2003
Present: MA, RSB, SC (minute-taker), GJ, BS

RSB said that he'd booked room 347 for the term, all weeks apart from next week, when we'd have to be peripatetic. Hedgerow, anyone?

BS was persuaded into telling us about an idea he'd recently had, and he wanted to know whether the idea was bonkers or not.

Suppose you've got a (Greibach) grammar, which, say, looks something like

    S -> aT
    T -> aS
If you take all the suffixes of a word of that grammar, it looks self-similar.

SC looked worried:
"You're not going to talk about fractal grammars now, are you?"
BS
grinned.
SC:
"Arrrrrrrrrrrrrgggggggggggggghhhhhhhhhhhh!!!!!!!!!"
RSB:
"Do they exist?"
SC:
"I don't know, he just said the magic keyword 'self-similar'!"
     aT
      |
     abS
       |
     abbS
       --
      ---          <- bS similar to bbS
BS defined a metric
                1
   d(xs,ys) =  ---
               n+1
where n = length (lcp xs ys) (lcp = longest common prefix) and said that this gave a complete metric space ('complete' means that every Cauchy sequence converges to a point in the space).

He went on to say that for any space on which you have a complete metric, you can have a space of the compact subsets, and then you get the Hausdorff metric, and that's where the fractals live!

There followed a flurry of discussion in which we remembered various topological terms and tried to figure out what Barney was doing with them:

A compact subset is one where every cover has a finite sub-cover. The real line is not compact, for example, because {(n,n+2)| n in Z} has no finite subcover.

A cover of some set is a set of open sets where every element in the set you're covering belongs to one of the open sets forming the cover.

Barney claimed that all finite sets of finite strings were both open and closed, under his metric.

To recap... Barney wants to:

  • take a file
  • take all the suffixes of that file
  • he then has a set of strings which are in a complete metric space
  • fractals can be found in spaces of compact sets of complete metric spaces

SC wanted to know why lcp? Why prefixes? Why not longest common subsequences? And why take the suffixes of the file? BS replied that he wanted to do it incrementally. Strings always grow to the right in this kind of grammar.

SC was worried about text versus images: with fractal compression of images, although an image is discrete (made up of pixels) it's approximating continuous, and you treat it like that, and when you compress the image to get a function that approximates the image, the small differences in the decompressed image to the original don't matter. But what about if you decompress text and you have small spelling mistakes - that could change the whole meaning?

GJ pointed out that there's some recent research that says that so lnog as you keep the fsrit and last leertts of ecah wrod the same, you can jbmlue up the bag of lterets in the mdlide and slitl mkae snese of it. MA said taht terhe was a cteeurpxaomlne.

To put this in context, BS's main work is on arithmetic coding [1], and he was wondering whether he could use fractals for context guidance, where you want a good model to be able to predict the next letter.

It was wondered whether long distance has any effect, and RSB said that he had an example from the very first paper he wrote, to do with "if ...... , then" constructs.

SC suggested that BS should find a nice small little piece of text, and find a fractal for it, and that hopefully should give an idea as to whether this was worth pursuing or not (given the time limit of BS' thesis!)

RSB went through some of the main ideas behind arithmetic coding, as MA wasn't sure how a probabilistic grammar could give a compression. A model used in artihmetic coding is efficient if it can predict the next letter. The next step up from simple probability is PPM - prediction by partial matching. So for example after certain character strings in Polish you might expect a 'z'. As you eat the input, you're updating the probabilities. The expected size of the output file is minimized.

So, for example, with an {a,b} alphabet, suppose the probability of getting 'a' is 0.9, and 'b' 0.1. Start with an interval 0-1, and if the first character is 'a', you're down to 0-0.9, and if the second character is 'a' you're down to 0.81. So 0.1 encodes "aa" because it's in that interval. "bb" on the other hand, you need a number between 0.99-1, which will take up more bits, but that's ok, because "bb" is rare compare to "aa". So rare files take up more bits to encode. Once you've got your coder, you can feed it in, to get a compression of your file. You get long codings for narrow intervals. The trick is to get the algorithm communicate probabilities as cheaply as possible.

Thus arithmetic coding is a method which uses statistics to predict patterns in the text, compared to something like the LZ* family of algorithms, which don't use any statistics, just a dictionary of already-seen strings.

BS has been doing some modelling which involved solving a whole series of differential equations, but then went back to PPM, a Bayesian version of PPM.

SC wondered if that's where the self-similarity was coming from - a larger piece of text having a certain distribution of characters could have roughly the same distribution of characters as a smaller piece of text e.g aaaaaabbbbb similar to aabb This lead to an example of "cat", "mat", "the cat", "the mat", and ",but the cat", ",but the mat" being discussed, where the distances between the pairs of strings were smaller.

Apparently BS has a program that does PPM, but it's buggy and still in progress.

---------------

GJ then got up to explain what he'd been up to (a program for e similar to JG's program for pi last week [2]), and was worried whether MA was up to speed on the notation. MA explained he'd seen the notes from last week via e-mail, and GJ muttered something about it not being e-mail, but pi-mail.

Anyway! Continued fractions... [3] here's one for e:

   2 +      1
        ------------
         1  +   1
              ------
              2  + 1
                  ---
                  1+1
                   ---
                   1+1
                    ---
                    4+1
                     ---
                      ...       
  (pattern is 1,2,1,1,4,1,1,6,1,1,8,1,1,10,1,...)
Remembering that
 /p q\       px + q
 \r s/ means ------
             rx + s
The relevant infinite matrix product was
     /1 1\/1 2\/1 1\/1 1\/1 4\/1 1\...
     \1 0/\1 0/\1 0/\1 0/\1 0/\1 0/
     |_           _||_           _|
         groups of 3
And working out what the general matrix is...
  /1 1\/2n 1\/1 1\   is   /2n+2 2n+1\
  \1 0/\1  0/\1 0/        \2n+1  2n /
So you plug that into JG's program to get...... e !

However GJ's favourite continued fraction for e isn't that one, it's one not in canonical form:

  2 +      1
        ------------
         1  +   1
              ------
              2  + 2
                  ---
                  3+3
                   ---
                   4+4
                    ---
                    ...
for which the matrices are
    /n n\
    \1 0/
GJ: "Plug a sequence of those into JG's program, and blow me! Same digits jump out the end!"

Recall: JG's program for spitting out the digits works out whether the leading digit is already known, and spits it out when it is. So...if you can represent your fraction like this:

  x = a +     1
         ------------
          x  +   1
           0   -------
               x  +  1
                1  -----
                   x + 1
                    2 ---
                      ...
... then what's the test? How do we know when to spit out a digit?

In other words, if we're using the representation of an infinite sequence of coefficients, what's the next coefficient?

    px + q             1
    ------ =  c + -------------
    rx + s           p'x + q
                     -------
                     r'x + s
   x  + [1,infty)  ---->   x  +  1
    0                       0   -----
                                x +  1
                                 1  ---
                                    ...
 coeff /p q\  = if (p div r = (p+q) div (r+s))     (JG's test)
       \r s/    then
                   Just (p div r,  /0     1     \/p q\ )
                                   \1 -(p div r)/\r s/
                else
                   Nothing
And then you get your stream of coefficients.

GJ said that it was so pretty, he put it into Hugs and used it to generate encryption codes for a wireless router.

RSB remarked that it was the shortest pi algorithm in the west, and discussion veered off onto a side track about there being two films called pi, and SC was heard to mutter something about one of them being an American pi film.

Later, after a diversion from RSB, GJ returned to the floor, and posed the problem of doing this test as efficiently as possible?:

                p div q = r div s
(Apparently divs are 5 times as expensive as multiplications)

After some calculation, it was reduced by one div, to

                r <= (p div q)s < r+s
But it was thought it ought to be possible cheaper than that.

---------------

RSB's diversion was something that he was asking Ross Paterson about. One of these occasions when he actually wanted a nested datatype. Binomial trees.

   BT(1,0) is a leaf            <-- (width,depth) are the parameters
   BT(w,1) is of the shape  ____.____
                           ///// \\\\\
and
   BT(w,d) is Bin [BT(w,d-1), BT(w-1,d-1),.... BT(1,d-1)]
which the minute taker flatly refused to draw in ASCII. RSB said he'd talk about that some other time.

After GJ's interlude with the div problem, MA took the floor and produced the datatype that RSB had been looking for:

    BT(0,d) = Nil
    BT(w,0) = Leaf
                  w
    BT(w+1,d+1) = Cons (BT(w+1,d), BT(w,d+1))
After drawing a tree for BT(2,2), RSB was happy. Then he wanted MA to turn it into Haskell. MA offered instead to turn it into categorical type theory.

Then the minute-taker trooped off for lunch.

[1] Problem Solving Club Minutes, Friday 23rd May 2003

[2] Problem Solving Club Minutes, Friday 26th September 2003

[3] Problem Solving Club Minutes, Friday 25th April 2003

Friday, 26th September 2003
Present: RSB, SC (minute-taker), JG, GJ, BS

JG wanted to show us a small program for pi, and managed to throw the rest of us into a remarkably speedy state of confusion by putting up an infinite product of matrices (use fixed-width font):

       -|                                        2
   /0 4\|/1 1\ /3 4\ /5 9\  . . .   / 2n+1  (n+1)  \ . . .
   \1 0/|\1 0/ \1 0/ \1 0/          \  1      0    /
       _|

The matrices represent these homography things, with

                  \      qx + r
   /q r\ being    /\x -> ------
   \s t/                 sx + t

Barney said that he thought they looked like mobius transformations, which is exactly what they are.

JG then revealed that "everything from ] onwards is 4/pi "

SC wanted to check what this meant: as you start computing the matrix product from the left, you are producing a matrix with bigger and bigger numbers, which tends towards the function which returns the constant 4/pi

RSB wanted to justify that that really did produce 4/pi. JG didn't know how to. But apparently this is in Vuillemin's paper [1] about continued fractions.

GJ declared that it must be in the Chemical Rubber Book. [stunned silence]

RSB: "Oh that makes everything clear now!!"

By way of explanation, GJ went and got the Chemical Rubber Book. It turned out to be a thick book of mathematical & statistical tables, which RSB then quietly perused in the hope of finding something about pi and series.

JG posed the question: suppose we wanted the digits of this? In particular, a streaming version. GJ: "You would."

With

  /q r\ 
  \s t/

converging to 4/pi, we noted that q/s and r/t must both converge to 4/pi.

As

                  \      qx + r
   /q r\ being    /\x -> ------
   \s t/                 sx + t

suppose that

    |  q.0 + r  |     | q.infty + r |
    | --------- |  =  | ----------- |
    |_ s.0 + t _|     |_s.infty + t_|

i.e.

      |_r/t_| = |_q/s_|

Then the 1st digit of the series must be independent of the x

BS asked whether this whole sequence of functions are uniformly convergent?

The range of

  /q r\ 
  \s t/

was discussed, from [q/s,r/t) - but which is bigger? q/s or r/t? So we don't know which way round that is (nor which end is open).

JG's hypothesis: the range of xy is included in range x so he believed that the range gets narrower and narrower.

So if the first digit is the same, you can stream!

Not being satisfied by pi, RSB wanted to see one to produce e GJ: "ee by gum"

GJ did some calculations on a scrap of paper and pronounced that the range shrinks by more than a constant factor, so that it does shrink to a point.

JG wanted to talk about functions for the next bit. If d is the first digit produced, then

   /q r\        1(q'x + r')
   \s t/x = d + - --------
               10(s'x + t')

So...

   q'x + r'       10(q-ds)x + 10(r-dt)
   --------   =   --------------------
   s'x + t'             sx+t

GJ noted that that kept the values positive.

    q |--> 10(q-ds)
    r |--> 10(r-dt)
    s,t unchanged

Step  /q r\d  = / 10(q-ds)  10(r-dt) \
      \s t/     \    s          t    /

JG said that in Vuillemin's paper, two streams are fused, so that the digits of pi are what you get by streaming:

  pidigits = stream  step digit mmult /0 4\  [ /2n+1 (n+1)^2 \ | n <- [0..] ]
                                      \1 0/    \  1     0    /

As far as JG knows, that's new. He has in mind to write a paper (maybe a pearl) about the digits of pi.

GJ was concerned about the time the program would take to produce a digit. After all, there are long sequences of 9s, where you'd have to wait to produce a digit (and have an unbounded amount of state)

JG thought that this program wasn't going to compete with those of Ramanujan, but if it wasn't the fastest, it might compete to be the shortest.

RSB commented that streaming was only useful if you can't predict how long you're going to have to wait for the next digit. JG wasn't sure he bought that argument. GJ pointed out that you might want to use a streaming program just because you couldn't be bothered to write a non-streaming program.

So the above was the program that JG got, apart from he gave inline definitions of digit and mmult.

GJ wanted to hang on a minute, because you need to put in a parameter to the stream to tell it whether it's safe to emit a digit. JG said that digit has a Maybe in it.

In a short diversion, RSB suggested a different starter matrix,

  /-3 4\
  \1  0/

then wondered if it was sensible. JG commented that it broke the non-negative quality of the matrix values, but GJ pointed out that it's just the bottom ones that you need to be non-negative anwyay.

JG then presented the test he'd been using for whether a digit is sensible or not:

  | q+r |   | q |
  | --- | = | - |          (which covers the range x from 1 to infinity)
  |_s+t_|   |_s_|

This test was the one he'd been using rather than the more obvious

  | q |   | r |
  | - | = | - |     (which covers the full range of x from 0 to infinity)
  |_s_|   |_t_|

RSB complained that he didn't want 'div's in the test - they are computationally very expensive. And then of course there's division by zero. But JG wasn't worried about division by zero, he was worried about whether it was ok to use the above test with q+r/s+t or not - it's faster and it gives the right digits, but there isn't a justification!

  | q |   | r |
  | - | = | - |
  |_s_|   |_t_|

implies

  | r |   | q |   | q+r |
  | - | = | - | = | --- |
  |_t_|   |_s_|   |_s+t_|

but not vice versa. Probably it gives the wrong answer only rarely, but still!

GJ wanted to consider the infinite product of the tail (of the product of matrices)...but then wondered whether that was a meaningful question to ask.

There then followed discussion on the range that q/s,r/t can be in, and the range of values that x can take, for the homography

                  \      qx + r
   /q r\ being    /\x -> ------
   \s t/                 sx + t

in an attempt to see how the above test could be justified. We looked at the product of one matrix and the one following it

    / 2n+1  (n+1)^2\ / 2n+3  (n+2)^2 \
    \  1       0   / \  1       0    /

and deduced that at least for large n, you know you're in the range 1 to infinity, so using x in the range 1.. infinity will do.

RSB requested permission to troop off to lunch.

[1] J. Vuillemin: "Exact real computer arithmetic with continued fractions" IEEE Transactions on Computers, 39(8):1087-1105, August 1990.

Friday, 20th June 2003
Sharon was speaking about silicon wafers. She caused much amusement by walking in and proceeding to stick black sellotape all over Richard's desk. All would become clear, we were told.

In a recent post to rec.puzzles, Steve Terrel (is that spelt right?) posed the following puzzle. In a chip factory, there are some cassettes containing silicon wafers. A cassettes has s slots, and w of these are filled with wafers. The wafers are numbered from 1 to w, and have been arranged at random in the cassette. A robot can pick up a wafer and move it to a vacant slot in the cassette. There is always at least 1 empty space in the cassette (otherwise a solution would only be possible if the wafers were already ordered).

The problem is to minimise the number of moves needed. It doesn't matter if the cassette has some empty spaces between the wafers, as long as the wafers are in order. For example, ..1.23...4. would be an acceptable solution.

She showed us an example of this process by putting paper wafers into the cassette that she'd marked on the table with sellotape.

If you know what the best final arrangement of the wafers is, then it is easy to find the best way to get them into that arrangement. In general, though, it is difficult to find the optimal solution, so Sharon has come up with a greedy algorithm that will solve the problem approximately.

Friday, 13th June 2003
Present: RSB, SC (minute-taker), JG, GJ, CEM, BS

RSB wanted us to consider the properties of the
following interesting function

   mix xs [] = xs
   mix xs (y:ys) = xs ++ [y] ++ mix (reverse xs) ys

...which arises in the construction of Gray codes.

Is it associative? Yes, apparently, with identity [].

   fun1 = foldr mix []

So that, for example,

   fun1 [[1],[2],[3],[4]]

is

   [1,2,1,3,1,2,1,4,1,2,1,3,1,2,1]

RSB is interested in computing fun1, so you are given a
list of lists, and you want to mix them. (This sort of
corresponds to Gray codes in a radix system not binary.)

JG was astonished, as only yesterday he'd been doing exactly
the same sort of up-and-down-the-lists mixing, in converting
a lot of colours from HSB (Hue in range 0-360, Saturation in
range 0-1, Brightness in range 0-1) to RGB values.

RSB pointed out that the length of such a list, if you
started with m lists each of length n, would be

        m
   (n+1)  - 1


RSB suggested that the time required to produce the result
was going to be proportional to the length of the result,
and so amortized, this should be constant time.

But...
RSB was aiming for a "loopless" algoirithm. By loopless, he
meant a loop with no inner loops, so there's a constant
time delay in between each output. He explained this a
bit further, and mentioned that you were allowed to do
a bit of pre-processing on the list, so long as that didn't
take too long.

He suggested that

    unfoldr step . launch

was a reasonable definition of loopless, but GJ wanted some
strictification and finiteness in there, and insisted that
step must also produce a fully evaluated next step.
JG said that this reminded him of Chris Okasaki's point about
"amortized" and lazy languages.

RSB produced another version of fun1:

   fun2 = unfoldr step . launch

   launch = map ( \xs -> (xs,[]))

   step []               = Nothing
   step (([],sx):xss)    =
                 case  step xss  of
                     Nothing --> Nothing
                     Just (x,xss') --> Just (x,(sx,[]):xss')
   step ((x:xs),sx):xss) = Just (x, (xs,x:sx):xss)

RSB then tried to explain the role of what started out as
(xs,[]), which wasn't actually xs and reverse xs, but the
pair of lists together represented some of xs and some of
the reverse of xs.
SC suggested it was like a little shuttle.
RSB agreed that it was exactly like a little shuttle, and
JG warned that SC would be talking about knitting next!

In considering whether fun2 was loopless, whilst this
seemed reasonable at first sight, actually it has the same
problem as carries in a counter, and is *not* loopless.

JG was worried about a productivity problem here, what if
you had an infinite list of []s?
GJ said that you wouldn't have that because you'd need an
infinite time to make it.

RSB: "I knew Geraint would like this, he a mix thin knitter....
.... butterfly man"
[?!?!]

RSB also commented on Knuth's recent work on volume 4,
remarking on Knuth's tendency to "just give the algorithm".
GJ pointed out that RSB had just given the algorithm, too.

So how do we make it loopless?

RSB paused here, in case we thought of anything, and
GJ did: he was wondering about separating the state into two
lists, one being an accumulating parameter for reversing the
state (like a Turing machine with something to the right of the
head), but eventually decided that that didn't work, as that
was just postponing the problem.

A further pause, but no more thoughts were forthcoming,
and RSB said that he was now going to go to a different
part of the forest entirely.

        __4__
       /     \
      3       3
     / \     / \
    2   2   2   2
   / \ / \ / \ / \
   1 1 1 1 1 1 1 1

And when we do an inorder traversal of that tree, what
have we got?

"Ah. Yes. *That* part of the forest"
(We get the mix order of [[1],[2],[3],[4]] )

RSB defined

   mixorder Null              = []
   mixorder (Node [] l r)     = mixorder l
   mixorder (Node (x:xs) l r) = mixorder l ++
                                [x] ++
                                mixorder (Node xs r l)

RSB explained that he was thinking of a tree like

         _xs_
       /| | |\
      / | | | \
     l  r l r  l ....  up to infinity

SC wanted to know where the lrlrlrlrlrlr thing came from,
and RSB explained that he was heading towards Gray code.

To illustrate further, for the sequence

   [[1,2,3],[4,5],[6,7],[8,9]]

he drew its "grayTree"
[Thanks to JG for improving the clarity of the diagram,
and gee whiz thanks to JG for making it really hard to draw.]

             89
            / \
           /   \
         67     76
         | \    | \
         |  \   |  |
         |  _\__|_/
         | /  \ |
         45     54
         | \    | \
         |  \   |  |
         |  _\__|_/
         | /  \ |
         123   321


There was a collective shriek/howl/strange noise, as we
recognised that this was a nexus (see PSC 2nd May 2003).
BS wondered if this had something to do with that nexus
stuff that RSB had been talking about, but RSB reassured us
that it was just coincidence, another example of Nexus
programming.

RSB made a comment about the abstraction function he
absorbed into the mix order, and then claimed we should be
able to write grayTree from the diagram - apparently it
turns out to be a foldl (surprise)

In further explaining what had been going on in his
thought processes, RSB said that he'd gone to the fun2 alley,
had tried to optimise step, and couldn't.
JG was worried that it was premature to get rid of the
reverse early, and suggested an alteration to the definition
of fun2 (changes underlined)

   fun2 = unfoldr step . launch

   launch = map ( \xs -> (xs,rev xs))
                             ^^^^^^

   step []               = Nothing
   step (([],sx):xss)    =
              case  step xss  of
                  Nothing --> Nothing
                  Just (x,xss') --> Just (x,(sx,rev sx):xss')
                                                ^^^^^^
   step ((x:xs),sx):xss) = Just (x, (xs,sx):xss)
                                        ^^

RSB went on to fun3, making a spine:

   mixorder = unfold step . mkspine

   mkspine t = addspine t []

   addspine Null sp       = sp
   addspine (Node xs l r) = addspine l ((xs,r,l):sp)

and drew a picture to illustrate:


            (x1)
            /   \
        l1 /     \
         (x2)     r1
         /  \
     l2 /    \
      (x3)   r2
      /  \
     /    \
   l3     r3


So the left spine is just a list of the xs and rs, but as well,
he suggested using the ls as well.

   step [] = Nothing
   step (([],l,r):sp) = step sp
   step (([x],l,r):sp) = Just (x,addspine l sp)
   step ((x:xs,l,r):sp) = Just (x,addspine l ((xs,r,l):sp))

BS pointed out that there was still a recursive call to
step in it, and JG wondered why the stepping had got
more addspining in it?

RSB was worried about walking all over an exponentially
large tree, but decided that you have time to walk, but
you have *not* got the time to do it before you start,
nor in all the gaps you might do it in.

RSB asked why fun3 isn't loopless.
JG said that addspine wasn't helping, because it's recursive.
You might have to go the depth of the tree to find the next
element.

RSB had had an "Ah ha!" moment in his thought processes, when
he decided he wanted all the addspines to be there, and
use concatenation.
JG suggested consing elements onto the front of the list
and peeling off one by one, so concatentation ended up
getting represented by cons.

RSB had thought how to make fun3 loopless, and decided
to make the datatype of spine explicit, using a type
"forest of binary rose tree":

   data Spine a = S [([a],Spine a, Spine a)]

But he doesn't want to make a spine on a grayTree, he
wants to conflate them, to use sharing.

   step (Spine []) = Nothing
   step (Spine (([x],sl,sr):sp)) = Just (x,spcat sl sp)
   step (Spine ((x:xs,sl,sr):sp)) =
                   Just (x,spcat sl ((xs,sr,sl):sp))

   spcat (Spine sp) xs = Spine (sp++xs)

RSB claimed it was loopless, and JG agreed that it
certainly looked loopless...

GJ objected, saying that before, RSB had been carefully
using eagerness to convince us, and now he couldn't suddenly
start using laziness.

... actually it's not loopless, need to write right hand side
with an unfold?




Turning to another problem, RSB remarked on a paper by
Koda and Ruskey [1], which is about enumerating all ideals of
a finite poset.

         ()                  ()
        /  \                /
       ()  ()             ()
          /  \
         ()  ()


In evaluating the ideals of this, colour the nodes white
or black, following the rule that a descendant of a white
node must be white.

If there's a restriction to do it in Gray code order, then
the "koda" for the whole lot, is the mix of the kodas of
the trees.

RSB wants to make that algorithm loopless, too. Apparently
Knuth has an algorithm for it, but he wants to make a
better one.



Speaking of Knuth, there's another related problem that
Knuth talked about recently, concerning spider squishing,
similar to above except with directed edges, in an acyclic
graph (one which is still acyclic if you take the arrows
off the edges)

[ For *-----  read <----- , as angled arrows are illegible in ASCII.]


         ()     ()*---()
         /\    /
        /  \  /
       *    **
      ()    ()
      **
     /  \
    /    \
   ()    ()

Finding the kodas of these things are difficult, apparently
you need to launch the algorithm at the right place in the
structure.

RSB has a nice algorithm for this, based on mix, and
he wants to make it loopless.

Pottier/Filliātre wrote a paper [2] about producing all the ideals
of a forest functionally, and their work was based on continuations,
and their algorithm was not loopless, but it beat that of Knuth by a
factor of 2.

Then we all trooped off to lunch.

[1] Y. Koda and F. Ruskey:
"A Gray Code for the Ideals of a Forest Poset"
Journal of Algorithms, 15 (1993) 324-340.
Also available at
http://www.csr.uvic.ca/~fruskey/Publications/


[2] Jean-Christophe Filliātre and Franēois Pottier:
"Producing All Ideals of a Forest, Functionally"
To appear in the Journal of Functional Programming, October 2002.
Also available at
http://pauillac.inria.fr/~fpottier/biblio/pottier.html

Friday, 6th June 2003
Present: MA, SC, JG, GJ, CM, BS
[Minutes taken by SC - use a fixed-width font to view.]

GJ gave us what he called a "hands in pocket" presentation.

Recalling a discussion from several weeks ago, GJ reminded
us of the notation "box", so that this

       a [] b

is an expression whose value is the common value of a and b.
So you only ever write that if a and b are equal.

GJ explained that you might come across it in a context like
writing the maximum, with

      a |_| b  | a >= b  = a
               | a <= b  = b

which is a little messy, so you rewrite it as

      a |_| b  | a > b  = a
               | a = b  = a[]b
               | a < b  = b

GJ has a circuit which constructs the maximum. SC wondered if
it was GJ's only example of [], and GJ explained that it was
his only *nice* example.

GJ's motivation is something wicked that engineers have done
for centuries, concerning truth tables:
whilst some of the input lines are marked "don't care", the
wicked thing is that they also mark some of the outputs
"don't care", which means it is very awkward to reason with
the outputs.

GJ wanted to know whether you could take a (wicked) truth table
and rewrite it using []s.

Suppose f is a partial function, and we make some arbitrary
construction like

   a(x) = (x, x in dom f)
   b(x) = (x,true)

(a(x) [] b(x)) is safe to write when you're in the domain of f.

GJ then tried to write an expression to illustrate, and
came up with

    f(fst(a(x)[]b(x)))

which wasn't quite what he meant at first (see later on).

GJ produced an example truth table, corresponding to

   case s-t of
      -1 -> y
       0 -> x [] y
       1 -> x


      _s__t__x__y__|____
       0  0  0  0  |  0
       0  0  0  1  | ><
       0  0  1  0  | ><
       0  0  1  1  |  1
       0  1  0  0  |  0
       0  1  0  1  |  1
       0  1  1  0  |  0
       0  1  1  1  |  1
       1  0  0  0  etc.

[ >< stands for "don't care" ]

In the context of GJ's circuit, the situation where x =/= y
doesn't happen.

There was a short diversion about representing truth
tables by Karnaugh maps, using Gray code.

Much discussion ensued about how to represent such truth
tables:

    BS wanted to model this with a set of functions:
    each agreeing with the truth table but disagreeing on the >< s.

    MA wanted to rename the >< lines from the truth tables.
    JG pointed out that that's a problem with Karnaugh maps -
    doing that means you knock holes in your n-dimensional structure.

MA leapt up and added a truth table for []

      []_|_0_1_
       0 | 0 ><
       1 | >< 1

We agreed that we were happy with the semantics for [], but
that we wanted the algebra, too.

JG wanted further clarification from GJ about what he was
trying to do. GJ explained that if he claims that [] is useful,
then he has to say what it is useful *for*. So he is interested
in taking an arbitrary truth table and re-expressing it
involving [].

MA suggested a notation [] , as follows:
                           f

   [] (x)   = f(x)     for x in dom f
     f
            = ><       for x not in dom f


It was suggested that what was needed was an extension of f,
              ^
building an  f  from f.

           ^
    f(x) = f (fst (a(x)[]b(x)))


       ^
With  f a total function, we want f partial.


GJ had another go at explaining his question.
He has a refinement of a maximum circuit, into the function
that he gave the truth table of. This goes through naturally,
without having to think about domains.

                                         ^
Trying to think further about where the f comes from,
the previous expression was corrected, to

           ^
    f(x) = f (fst (a(x) [] b(x), x, .... ))

            ^
Where does f come from? Well, somehow, outputs have to
be produced, in the cases where you don't care.

There followed more thinking aloud and discussion,
in particular concerning refinement.

SC suggested writing down a random truth table, trying to
reverse engineer an expression involving [], and try and
learn something from this process.

JG and SC started writing down a random truth table:


    _s_t_u_|_f_
     0 0 0 | ><
     0 0 1 | 0
     0 1 0 | 1
     0 1 1 | ><
     1 0 0 | 0
     1 0 1 | 1
     1 1 0 | 1
     1 1 1 | ><

                                  ^
We started writing down possible fs.
One is  (s /\ (t \/ u)) \/ (~s /\ t); we found a shorter
one via a Karnaugh map, t \/ su.

BS came up with how to introduce the [] :
just take the [] of all the functions that give you the
                       ^
right answer (all the fs).

There was some further discussion about [].
You need

     (a,b) [] (c,d)  =  (a[]c,b[]d)

as an axiom, but the consequence of that is that
you don't have that
     __
     || (a,b) = a

anymore.

The discussion diverged into talk of relations and
non-determinism, and GJ pointed out that this wasn't
making it any simpler!


We ran out of things to say on GJ's topic, and so....


                  -------------------

...JG talked for a while about streaming, following on
from his work from a few weeks ago.

He was a bit worried that all his examples are arithmetic
coding/decoding or continued fractions in disguise!

Another example:
Suppose you want to convert fractions from one base
into another, e.g.

       0.125   =   0.001
            10          2

(converting "basimal" fractions, as someone put it)

Having noted that this involves streams, JG then considered
the problem of converting from base b to base c

     convert (b,c) = tobase c  . frombase b

     frombase :: Int -> [Int] -> Rat
     frombase b = foldr (step b) 0
                where step b n x = (n+x)/b


     tobase :: Int -> Rat -> [Int]
     tobase c = unfoldr (==0) (floor . (*c)) (frac . (*c))

BS suddenly exclaimed that he could see the similarity to
arithmetic coding/decoding.
JG: "That's what I'm worried about!"

At this point MA and BS got fascinated by the hypercube
being drawn on RSB's screensaver, but JG carried on
regardless:

    tobase c = unfoldr (next c)
               where next cx =  if x==0 then Nothing
                                else Just (floor y, frac y)
                                where y = c*x

    stream f g z xs
          = unfoldr f (foldl g z xs),

    ***provided that***
         f z = Just (y,z')
        =>  f (g z x) = Just (y, g z' x)

So, with the stream, if you can produce a y you do so, otherwise
you consume:

   stream f g z xs =
        case f z of
          Just (y,z')  -->  y : stream f g z' xs
          Nothing      -->  case xs of
                               x:xs' --> stream f g (g z x) xs'

BS wanted a flushing stream, so JG also gave that version:

   fstream f g h z xs =
        case f z of
          Just (y,z')  -->  y : fstream f g h z' xs
          Nothing      -->  case xs of
                               x:xs' --> fstream f g h (g z x) xs'
                                []   --> h z

JG:

      foldr (step b) 0
    =
      app 0 . foldr (.) id . map (step b)
    =
      app 0 . foldl (.) id . map (step b)
    =
      app 0 . foldl ( (+) ) id
           where u (+) n  = u . step b n
    = {below}
      appzero . foldl ( (+) ) (1,0)
      where (c,m) ( (+) ) (b,n) = (b*c, m*b+n)
            appzero (c,m) = m/c


Justification for step above:

      ((/c) . (+m) . (/b) . (+n)) x
     =
      n + (m+x)/c
      -----------
          b

     =
       nc + m + x
       ----------
           bc

     = ((/ bc) . (+mc+n)) x


There was discussion of an interval closing in on
a fraction that might be 0.399999999.... or 0.4
and we were worried that the stream might not be able
to produce an output, because it didn't know whether
to do a 3 or a 4.

JG mentioned that if you take the appzero bit, fuse with the
foldr, then try to write it as a stream, it doesn't work,
because the commitment condition isn't satisfied.
(It will only commit if it's safe to do so.)

It was pointed out that the use of limits turns the
strict inequalities into weak ones.

GJ then pondered on the word "strict", wondering why
the opposite of "strict" was "weak".
Why not "liberal"?

SC: "Because liberal doesn't *do* opposition."


The session ended with JG commenting that he quite likes
this example, because he can talk about streaming without
introducing arithmetic coding.

Friday, 30th May 2003
Present: MA, RSB, SC (minute-taker), JG, GJ, BS

MA has been thinking about holes in datatypes, and
talked to us about how you represent them.

For example, with a datatype

   Tree a = leaf a | tree (Tree a) (Tree a)

if you try and represent a hole o in a tree, 
like this, say (use fixed-width font)

                       .
                     // \
                    //   \
                   //     \
                  ./       \.
                 / \\     /  \
                /   \\  ./    \.
              ./     \.       / \
                     / \\   ./   \.
                    /   \\
                  ./     o
                        / \ 
                       /   \  
                     ./     \.

             Fig 1


you could use a path [ // in Fig 1], where

   Path = List (Dir)
   Dir = Left | Right

but MA says that's a bit tedious.

Another example of a hole in a datatype would be a
list with a hole in it, like

       _____________ _ ________
      |_____________|_|________|


                Fig 2

SC wanted to know the motivation behind this.

MA explained that you might, for example, be editing
a data structure (like some text, or a tree), and your 
cursor is somewhere in the middle of the data structure,
and you want to be able to do various operations, like
move left, move right, modify, etc.

This reminded GJ of Jeuring and Hinze's recent JFP paper 
Weaving a Web concerning a generic zipper. 
In addition, Conor McBride has written a syntactic 
description of this paper. 
MA has a semantic solution involving containers.

To illustrate, he drew some diagrams ( + meaning data and
o meaning a slot for a sub-data structure):

      /\
     /  \
    /    \     
   /      \
  /-+--o-o-\


     /\
    /  \
   /    \
  /-+-o-o\
     /\
    /  \
   /-o-+\               

  Fig 3


Either you have: a shape with two slots,
or you have: no slots for trees, but a slot for data

Categorically, such a tree is isomorphic to 

        TA ~= A + (TA)*(TA)

As a fixpoint, this is a fixpoint of

       mu Y . A + Y*Y


MA then talked about shapes and sets of positions;
BS wanted clarification. MA then started to talk about
formal languages and signatures, but this didn't seem helpful,
so we went back to 

      TX  ~=  X + TX*TX  ~=  mu Y. X+Y*Y


and restated the problem: 

How are we going to define the path [through the tree] so 
that all the operations are performable in constant time?

At each stage, you have a direction, and the rest of the
tree
                       .
                     // \
                    //   \
                   //    -\-------->8 t1
                  ./       \.
                 / \\     /  \
       t2 8<----/-  \\  ./    \.
              ./     \.       / \
               t3 8<-/ \\   ./   \.
                    /|  \\
                  ./ |  -o---->8 t4
                        / \ 
                       /   \  
                     ./     \.


                     Fig 4

[ 8<--- meant to represent scissors cutting off 
a portion of the tree (these were originally drawn 
with red chevrons but there ain't no red in ASCII) 
so starting from the top, first you cut off t1 and go 
left, then cut off t2 and go right, and so on.]

So, a path is a list of directions and a final tree

   Path  =  [(Tree,Dir)],Tree                     (1)


For the example in Fig 4, we have

  [(t1,left),(t2,right),(t3,right)], t4           (2)

Given (2), you can put the pieces together to get the
original tree, *and* you have a position in that tree.

If, for example, you wanted to implement the operation
"up", you'd go from (2) to

  [(t1,left),(t2,right)], Tree t3 t4


There then ensued a discussion concerning the lack of
data in the nodes of the tree, and what would happen if
there were. It was agreed that if there was, it would have
to be put in with the Dir.

RSB commented that this was a spine representation. Not
a left spine or a right spine but a mixed spine.

MA commented that he quite liked that representation, and
wanted to know whether that was the zipper, and apparently
yes, from what GJ remembers as the zipper! 

JG mentioned that he'd been trying to do this for 10 years,
and was absolutely gobsmacked at Conor's paper 
The Derivative of a Regular Type is its Type of One-Hole Contexts.
Analogous to left/right scans on trees are accumulations
on trees. This sort of generic scan on trees requires thinking
about paths in trees, and Conan's way was much nicer than
JG's way, apparently: JG had a complicated construction for
the paths, but Conan had a derivation that is like calculus,
and his paper about it is now languishing on his web page, 
unpublished.

Back to the data refinement that MA is trying to do, he
pointed out that the key thing in (1) is the (Tree,Dir) bit,
which isn't an isomorphism.

BS wanted to know about rose trees.

    Rose a  = Node  a  List (Rose a)

    RA = A * LRA




    a [  |     ]
        /
       |
       V
       a [    |    ]
             /
            |
            V
            a [           ]

             Fig 5

Another question then arose:
What is a Y*Y with a hole in it?
Y*Y  with a hole in is  Y*2.
Y*2? 
Yes, because the hole is either on the
right     Y*o
or left   o*Y
...so it's Y with a left-or-right label, ie 2*Y

RSB: "Oh my god! The penny's just dropped! 
      y squared goes to 2y!"  

Y*Y*Y?

RSB: "Obvious! 3Y2"    :-)

So what's a list with a hole in it (Fig 2)?
Two lists.

Looking at holes in other datatypes, writing
this as differentation, we have
[writing d for "differentiating" (surprise)]:

       n+1               n
    d(X   )   = (n+1) * X

                        2
    d(List X) = (List X)

    dX = 1
    dA = 0

[differentiating wrt X]

Short diversion:
It was noted that 

        + * 1 0    

on datatypes have all the properties of a ring but no negation,
which makes it a semi-ring, apparently.

Back to differentiating, RSB summed up progress so far:
"If you want a mint with a hole, you do a d(mint)"

More questions. What is a d(mu)? What about products?
For products,

     d( FX*GX )  gives

          (dF)X*GX + FX*(dG)X

(because the hole is either on the left or the right).
So the product rule works. Also the sum rule works.

It was also noted that holes are atomic, in the sense 
that you can't have half-holes.

So what about mu?  
In order to do mu, we need the chain rule.

   mu Y . F(X,Y)
       ~=  F(X,mu Y.F(X,Y))

We then started using a new notation for bifunctors, to 
distinguish differentating wrt the first parameter, and
the second.

  d F(X,Y)                  <-  "d by dx"
   1

  d F(X,Y)
   2

Now, what's a hole in a mu-type?
When we use F(X,Y) to define a datatype, and the X is the
data bit, and the Y is the sub-structure bit, then the hole
can either be in the data (the X bit) or it can be in the
Y bit. 

MA pointed out that this bit involved a certain amount of
hand-waving, but that he had carefully drawn a categorical
diagram, so that he had some justification for this

  (dT)X  =  (d F)(X,TX)  +  (d F)(X,TX)*(dT)X
              1               2

  (dT)X  =  mu Y .  (d F)(X,TX)  +  (d F)(X,TX)*Y   (3)
                      1               2

RSB wanted to know if it worked with nu, and MA said that
he hadn't checked.

So it was wondered what happens when you differentiate the
type of an infinite list:

      oo     
     L  (A)  =  nu X . 1 + X*A

as opposed to these lists (streams):

     S(A) = nu X . X*A

MA: "So a stream with a hole in it is......."
SC: "...a well"

It was decided that a hole in an infinite list was a real hole,
and you couldn't put it at the end, you had to put it a
finite distance from the beginning.

     oo             oo
   dL  (A)  = LA x L  A


Back to lists (finite ones).

   LA = mu Y . 1 + A*Y

So, we have that

   F(X,Y) = 1 + X*Y

   d F = Y
    1

   d F = X
    2

and (3) decrees that

  dLX =  mu Y . LX + X*Y

in other words, two lists of Xs, which is exactly what
you get when you punch a hole in a list! (See Fig 2)

RSB wanted to do trees next.
Doing binary trees with labels at tips, we have

   F(X,Y) = X + Y*Y

   d F(X,Y) = 1
    1

   d F(X,Y) = 2*Y
    2

And (3) decrees that differentiating gives us

   mu Y . 1 + 2*TX*Y


RSB: "I want to do rose trees!
      3 examples are theory; 2 examples are coincidence."

Rose trees (see Fig 5)

   F(X,Y) = X*LY

   d F(X,Y) = LY
    1

   d F(X,Y) = X*LY*LY
    2


Short diversion:
  
  dL ~= mu Y . LY + Y*Y
     ~= LX + LX

Therefore (using (3))

  (dR)X = mu Y . L(RX) + X*L(RX)*L(RX)*Y           (5)

Hmm.

Recalling that

  (dT)X  =  mu Y .  (d F)(X,TX)  +  (d F)(X,TX)*Y   (3)
                      1               2

         =  L(d F(X,TX)) * d F(X,TX)
               2            1

This enables (5) to be rewritten as

   (dR)X = L(X*LRX*LRX) * LRX

This was compared with Fig 5.
There was then some confusion about d, wondering whether
d had put the hole in the wrong place, and whether the
only place a hole can be is where a label is.

We'd managed to get paths that lead to Xs, but we wanted
paths that led to Ys, so we wondered if we'd been differentiating
by the wrong variable.  

If you want a hole for X, you differentiate wrt X. 
If you want holes for Y, you differentiate wrt Y?

For the zipper, a path to a Y was wanted.

Thoughts about  d F(X,TX) flew around the room, but
                 2
no conclusion was reached.

MA believed that the fundamental law was right, that

       X*(dFX)  --------------------> FX
      \_______/     substitution
        path

A suggestion was made to unroll the mu.

    TX = mu Y . F(X,Y)

       = F(X, mu Y.F(X,Y))

so differentiating this wrt TX
turns out to be d F 
                 2

Applying the chain rule:

   d(F(GX))  = dF(GX)*(dG)X

   d(F(X,GX))
    = d F(X,GX) + d F(X,GX)*dGX
       1           2

So... 

   d  TX  = d F(X,TX) * dTX
  ---        2  
  dTX

? (Bluff)
Urgh! We got a stream, wanted a list.
General confusion.

To recap:
We were looking for a hole for an X, and
that's what the dT means.
New concept: path of T.
dTX is a datatype of X, less an X.
Useful concept, but diddly squat to do with paths.

  p X  = L(d F(X,TX))
   T        2

  TX  =  mu Y . F(X,Y)  = F(X,TX)


Somebody suggested opening up the mu, differentiating
wrt Y.

   LX  = mu Y . 1 + X*Y

                  __   __
   L  =  mu (1 +  || * || )
                    0    1

JG wrote

   (dL)X ~= LX * LX

   List Int = mu((1+) . (Int *))

   List A   = mu ((1+) . (A*))
              
              mu F (k  + (Id * F))
                     1

Bother. Still get streams.

GJ: "It's all suddenly got too abstract"

So, what have we done wrong?  
We tried to bluff differentiation and we keep getting streams.

We decided that the bluff didn't work. BUT, we liked
the differentiation, and found it fascinating.

Friday, 23rd May 2003
Present: MA, RSB, SC(minute-taker), GJ, BS

Following on from thinking about arithmetic coding [2],
BS has been looking at stochastic processes and grammars [1].
He reminded us of what grammar are, with their terminal and
non-terminal symbols, and production rules like

   N -> abcST

so you keep replacing non-terminal symbols with sequences of 
terminal/non-temrinal symbols, according to the production rules,
and that's how you generate strings of the grammar.

With stochastic grammars, probabilities are assigned to the
production rules, so that, say (used fixed-width font)

  N ->  abcST      with probability p1
  N ->  pqr                         p2
  N ->  xy                          p3

            where   p1 + p2 + p3 = 1

BS emphasized that these are context-free, and said that these 
context-free grammars can be put into various normal forms, 
e.g. Chomsky, Greibach and Greibach quadratic

With Chomsky normal form, the production rules are of the form

   N -> a
   N -> AB

With Greibach normal form, production rules look like

                        N -> aBCbcD...
                             ^
the crucial thing being that | is always a terminal. So there's no
left recursion.  BS mentioned that there is also a Greibach quadratic
normal form, where production rules look like

   N -> aAB
   N -> aA
   N -> a

MA was worried about needing something to prevent looping, but
RSB pointed out, these grammars are for generating strings and 
you can't talk of looping. The language generated by a grammar 
is all strings that can be derived using finitely many steps.

e.g. the sole production rule   S -> S   would generate the empty
language (no words in it, not even the empty string).

Returning to the notion of stochastic grammars, BS said that you
can't arbitrarily assign probabilities to the production rules, because
things end up going pear-shaped.

The assembled throng wanted to know where BS was going at this
point, whereupon it was explained that he wants to look at how 
these grammars produce strings, and then see if we can use that 
to result in efficient data compression.

RSB found it helpful to think of a Pascal example - it's a great
help to know the grammar, if you're trying to compress a Pascal program.

BS is trying to aim for the situation where you don't know the
grammar in advance. 

He told us of the idea that you might, in this way, compress works 
by various composers, then take an unknown piece, and see which
compressing compresses the unknown piece the most, and then figure 
out the composer that way.

BS wanted to investigate probabilities like

            P(zs = XS ++ YS)

Suppose you have a production rule like

            S -> XY

then X can generate strings with various probabilities
     Y  "    "        "       "     "          "
So what's the probabilities of strings generated by S?

BS says that he proved a lemma (dead easy to prove, he said):

  P(zs = XS ++ YS)  =  Sum_{xs ys xs++ys=zs} [P(xs = XS) P(ys = YS)]


BS then looked at a simple grammar: 

  S -> ""    q
  S -> aS    1-q

P(replicate n 'a' = xs) = q (1-q)^n

BS then relabelled the probabilities to be random variables,
rather than fixed values, Q1, Q2, etc.

  S -> ""    Q
  S -> aS    1-Q

Also needed are (independent, or so 'twas said) random variables
XS, YS

After you compress a string, you get an EOF. You have never seen this
character before in the string, so what probability do you assign it?
Apparently some arithmetic coders just assign it the smallest
probability they can, but you can get an estimate of the probability to
assign it, given the length of the string seen already...

Taking Q from the uniform distribution on (0,1),

  P(""=XS | Q=q) = q
  P('a':YS = XS | Q=q) = 1-q

GJ objected to the second line, Barney explained that
he was trying to say XS='a':_   really, and labelling the _ as YS,
but Geraint pointed out that this meant that XS,YS weren't
independent.
So, it could be rewritten in terms of head(XS)='a'.

  P(replicate n 'a' = XS | Q=q) = (1-q)^n q

                              /1
  P(replicate n 'a' = XS) =  |   P(replicate n 'a' = XS | Q=q) dq
                             /0
                

                          =   1/(n+1)  -   1/(n+2)

In other words, "If you've seen a long file already, it's likely to be.... 
longer" said BS.

More analogies were offered. If you land in a country and it's been
raining for 40 days, you'd expect rain the next day, whereas if
you land in a desert, you'd expect dry weather.

GJ pointed out that that's how the stock market works.

RSB was at this point trying to think what this could mean in
terms of data compression, but BS continued:

  P(end of file | seen n letters) = 1/(n+2)

  P(not end of file | seen n letters) = (n+1)/(n+2)

BS then wanted to experiment with a different grammar, such as

  S -> ""      Q
  S -> aSS     1-Q

Q again from Uniform(0,1)

  P(""=XS) = q
  P('a':YS++ZS = XS) = 1-q      (same comments as before)

  p_n = P(replicate n 'a')

  p_0 = q
  p_{n+1} = P(replicate (n+1) 'a' = XS)
          = P(replicate (n+1) 'a' = XS | 'a':YS++ZS = XS)
          = P('a':YS ++ ZS = XS)
          = P(replicate n 'a' = YS ++ ZS) (1-q)
          = sum_{rep n 'a' = YS++ZS) P(y=YS)P(z=ZS)(1-q)
          = sum_{n=y+z} P(rep y 'a' = YS) P(rep z 'a' = ZS)(1-q)
          = sum{y=0..n} p_y p_{n-q} (1-q)

  p_n = C_n (1-q)^n q^{n+1}

  C_n is the nth Catalan number

Using a generating function for the Catalan numbers, call it g,

      g(x) = sum{n=0 .. oo} C_n x^n

           =   (1- sqrt(1-4x))/2x    if x =/= 0
               1                     if x=0

      sum_{n=0... oo}  p_n

      = sum{n=0... oo} C_n (1-q)^n q^{n+1}

      = q g((1-q)q)

      = q/(1-q)       if  0 <= q < 1/2
        1             if 1/2 <= q <= 1

If q is too small, you end up with a non-zero probability of
generating an infinite string.

So this illustrates that you have to be careful about choosing your
probabilities for the production rules.

At this point, RSB was still trying to think what this could mean
in terms of data compression. He pretended he was hiding a text 
written in unicode, or binary, and wanted to know how BS was going 
to compress it. BS said that he was going to start with a grammar 
that could generate the text. 

Which grammar?
After all, it was pointed out that for binary, this grammar would
be sufficient to generate RSB's hidden text

  S -> ""
  S -> 0S
  S -> 1S

Sufficient, yes, but more rules could be added. BS wants to use a
big grammar with lots of possible production rules. Which rules? 
Lots of rules.  Then probabilities would be assigned to the rules 
(and so those with probability 0 would disappear).

GJ: "e.g. if there are no mismatched {}s, the rule for generating
mismatched {}s would disappear".

RSB still wasn't convinced.  

Barney then started talking about PPM[2], Prediction by Partial
Matching. This maintains a table of strings up to a fixed length.
For example, in going along a string

                abaaab
               ^

    after seeing     probabilities of next character being
    the string 
      ""              P(a) = ?    P(b) = ? 
      a
      b
      aa
      ab
      . 
      .
      .

a table is built up and this is fed to the arithmetic coder. 

Can use a grammar that can generate all strings of a/bs. By
looking at the text, can assign probabilities to different 
production rules in the grammar.

   *

The meeting finished with some general discussion. It was clarified 
(to those of us having not been there at previous meetings discussing
artihmetic coding) that for the compression, an arithmetic coder
would be doing the compression, but managing to find a stochastic grammar to 
generate the text would provide a parameter to the arithmetic coding.
BS hoped that this could be used to prove that various existing
methods of data compression do indeed compress well; he felt that even
proving it for PPM would be useful.

The situation of a communications network was mentioned: imagine the
line set up with a encoder at one end, a decoder at the other,
and you don't know what sort of data you're going to get through in
advance, so being able to compress the data without knowing what 
grammar generated it would be useful.

BS wanted to know whether his idea was sensible or not. There was a
general feeling that at the moment, a lot of it was vague, so we
couldn't see whether we felt it would work or not.

Friday, 16th May 2003
Mike Spivey spoke on Deriving Henderson's SECD machine by calculation:
Abstract: In this talk, I will show how a Henderson-style SECD machine (where expressions are compiled into an abstract machine code, unlike Landin's original machine) can be derived by calculation from a continuation-passing interpreter for a simple ISWIM-like language.

The transformation is in 3 stages:

  1. Introducing stack and environment registers, and reducing the semantics of each construct to a composition of named actions.
  2. Introducing an explicit representation of actions as instructions, obtaining a compiler and an interpreter for object code.
  3. Introducing an explicit representation of continuations as dumps.
This work is not new, but it draws together a number of known ideas, some of which should be better known among functional programmers.

Friday, 9th May 2003
Sharon has been working on fractal image compression. To illustrate the principle, she drew a Sierpinski triangle on the board, and showed how the same image could be obtained by applying an iterative process to any image you can think of. She demonstrated this with a portrait of Jeremy.

Although the Sierpinski triangle is an extreme example, most images posess some amount of self-similarity, and this can be exploited for compression. The image to be compressed is split up into 8x8 pixel squares, and the compressor then looks for a 16x16-pixel square elsewhere in the image that looks sufficiently like the small square being compressed after some transformations are applied. This was again illustrated by looking at Jeremy's portrait - bits of beard look much like bits of hair, for example.

As with the Sierpinski triangle, decompression occurs iteratively. You start out with any image at all, and repeatedly apply the transformations used on the original image until sufficient quality is achieved. To see this at work, we looked at a handout in which Lena was decompressed.

One observation made was that after decompression, Lena's eyes had become decidedly square. This is because the shape of her eyes doesn't recur in the image, so the nearest matching area wasn't quite the right shape. This problem can be overcome by using quad-tree partitioning.

The discussion moved onto implementation details in Haskell, and the use of Conal Elliot's image libraries.

Friday, 2nd May 2003
Present: RSB, SC (minute-taker), JG, GJ, CM, BS.

RSB wanted to talk about nexus programming (Ralf Hinze's problem)
and drew a picture [use fixed-width font]
        .
       / \
      .   .
     / \ / \
    .   .   .
   / \ / \ / \

and asked us to imagine a data structure, with sharing
( nexus = tree with shared data structure ).

For example, think of dynamic programming on subsequences.
If you can construct this with shared data, then you have
the information you need from the immediate descendants (of the 
head)

RSB pointed that you can't observe the intermediate nodes,
and if you do a map, you destroy the sharing.
JG suggested that you'd have to fuse the map with sharing.

RSB identified two classes of problem:

Class A - the function you want to evaluate 
          is a function of its immediate children
Class B - the function is from the values along
          the spines of the substructure

It was noted that you can transform any class B problem into a
class A problem, so you then have a problem to construct the
two spines. A discussion then ensued as to which sorts of problems
were folds or not.

RSB offered the example of optimal bracketing, labelling the
tree
       abcd
       / \
     abc bcd
     / \ / \
    .   .   .
   / \ / \ / \

      Fig 1

and pointing out that the value at the top depended (amongst 
other things) on the abc node and the bottom d node, so it
was a class B problem.

GJ tried to argue that class A and B problems were the same 
thing.

RSB wanted to construct something different, more like
          ____._____      
         /    |     \
   (a,bcd) (ab,cd) (abc,d)
      / \   /\ /\   / \

             Fig 2

so that everything at the next level is all you need to compute
the value above.        


In RSB's original diagram, the immediate descendants were the
init and tail. Alternatively, you can draw a power lattice,

            abc
          /  |  \
         /   |   \
        ab   ac  bc
        | \ / \ / |
        |  X   X  |
        | / \ / \ |
         a   b   c
          \  |  /
           \ | /
             .

            Fig 3

where the immediate descendants are "drop one".

It seemed to be too complicated to construct the structure in
Fig 2, but maybe the structure in Fig 3 could be constructed,
..but how?

Upwards? Could equally construct it downwards.
Apparently Ralf constructed a circular structure.

RSB said that he had a way of constructing it. So did Ralf,
but Ralf's was simpler.

JG suggested using "dimensions" 
(everything from nothing,
then everything from c,
then everything from b,c
then everything from a,b,c)
so you went

             .

 ->
                c
               /
              /
            .

->

                 bc
                / |
               /  |
              /   |
             b   c
             |  /
             | /
             .

->

            abc
          /  |  \
         /   |   \
        ab   ac  bc
        | \ / \ / |
        |  X   X  |
        | / \ / \ |
         a   b   c
          \  |  /
           \ | /
             .


           Fig 4

RSB then drew a tree which drew comments

"What's that?"
"Is it a molecule?"
"Is it caffeine?"

but the tree looked much better once GJ had redrawn it 
rotated through an angle of 45 degrees:

            0
           /
          a
        /  \
      ab    b
     / \   / \
   abc ac bc  c

       Fig 5

Much disagreement ensued about where the 0 should go or even
whether it should be in there at all.

RSB suggested calling the empty nexus e, and then said that
in constructing a value associated with a nexus, you need three
ingredients:

  e :: b
  f :: a -> b
  g :: [b] -> b

RSB then wrote up some nexus construction programs (in which
binary trees are built with Fork and Empty, and nexuses 
are built with Node):

 mkNexus e f g xs 
   = extract (Fork s (mktree xs x [] ) Empty)
   where s = Node e []

 extract :: Tree c -> c
 extract (Fork x Empty _ ) = x
 extract (Fork _   l   _ ) = extract l

GJ: "Ah! Leftmost non-terminal!"

At this point we were confused and wanted a spec. RSB said
that we are building Fig 5, and we want "the cube thing" (Fig 3),
and that ultimately he wants a better program.
We agree he had to show us what was going on with the 
existing program, which is apparently

  mktree     [] u ts = Empty
  mktree (x:xs) u ts = Fork s l r
                  where s = node x (u: map val ts)
                        l = mktree xs s (r: map left ts)
                        r = mktree xs u (map right ts)

  node x [u] = Node (f x) [n]
  node x  us = Node (g map info us) us

Much discussion ensued over the relationship of Fig 5 to Fig 3.

GJ suggested a different version of mkNexus:

 mkNexus' e f g xs 
  = last (s : leftspline (mktree xs s [] ))


We tried thinking about a bigger example...
             _____a____
            /          \
          ab            b
         /  \        /    \
      abc   ac      bc     c
      / \   / \    /  \   / \
  abcd abd acd ad bcd bd cd  d

              Fig 6

.. but that didn't help (much), although it was noted that
abcd isn't just dependent on  abc, abd, acd, bcd but on

    abc,d   abd,c  acd,b   bcd,a
       ab,cd  ac,bd  ac, bd

Like spines, but using level-order traversal.

RSB compared this to what he'd used for the Countdown problem.
Discussing level-order traversals, it was suggested that

   levOrd ts
    = map info ts ++ concat (map levOrd (subtrees ts))

But JG thought this was too slow, because you're going go to
doing levOrd of each child, and then revisit them to glue
them together.  levOrd was redefined as

   levOrd ts
    = map info ts ++ levOrd (concat (map subtrees ts))

So the problem is to do this at every level, with the reversing, 
the zipping, gluing together... all through this nexus.

RSB has no idea how to prove it, derive it, or where that binary
tree (Fig 5) comes in.
JG: "But do you have a spec?"
RSB: "Yep. Build that [the nexus]"

GJ asked what the significance of [u] was, and rewrote part of 
the mktree definition as

     s | null ts   = Node (f x) u
       | otherwise = Node (g (map info (u: map val ts)))
                          (U: map val ts)

There followed a discussion of e. 
RSB didn't like e, why e in the nexus?
"It has to be there? Why?"
"You can't have a cube with 7 corners."

JG then attempted to briefly revisit his continued fractions and 
flushing streams (see last week), but this was over a glass of 
red wine each (very nice red wine, kindly provided by CM), and 
he didn't get very far.

Friday, 25th April 2003
Present: MA, RSB, SC (minute-taker), JG, GJ.
Apologies: CM.

After a short prelude in which GJ demonstrated a little
problem he had involving an identity operator [],
RSB listed topics from all of us for the forthcoming term:

GJ: Algebra of Identity
JG: Continued Fractions
SC/CM: Fractal Image Compression
RSB: Nexus Programming
BS: Arithmetic Coding
MA: - misc -
 
JG went first, reminding us that continued fractions are of
the form (use fixed-width font)

   x +      1
    0   ------------
         x  +   1
          1   ------
              x  + 1
               2  ---
                  ...


e.g.  45/34 = 1 + 1/(3 + 1/11)

phi = (1+ sqrt(5))/2 = 1 + 1/(1 + 1/(1 + 1/( .... )))

JG thought that all rationals have representations that terminate,
that irrationals have non-terminating representations, and 
that reals have unique representations. 
Apparently "e is wonderful", and so we were treated to the
representation 1 + 1/(2 + 1/(2 + 1/(2 + ...)))
which is apparently sqrt(2), and e is represented by the sequence
[2,1,2,1,1,4,1,1,8,1,...]. It was also claimed that sqrts of rationals
have periodic representations.

Having been convinced that "continued fractions are fun", JG went
on to define two conversion functions, toCF and fromCF.

Defining toCF uses an unfold (a rational being a pair of ints):

  toCF :: Rational -> CF
  toCF = unfoldr next
       where next (_,0) = Nothing
             next (a,b) = Just (a div b, (b,a mod b))

MA: Unfold?
JG: "The unique homomorphism to the final co-algebra for lists"
RSB: "... and that's how we explain it to our first years!"  :-)

  unfoldr :: (b -> Maybe (a,b)) -> b -> [a]


And for the other conversion function:

  fromCF :: CF -> Rational   (a rational is a pair of ints)
  fromCF :: foldr step (1,0)
            where step n (a,b) = (n*a + b,a)

GJ pointed out that step wasn't associative, otherwise you could
write it as a foldl.

Also

  fromCFi :: CF -> [Rational]
  fromCFi = map fromCF . inits


JG: "Do you know what homographies are? Cos I don't."
JG went on to define a homography as a function of the form

                 Lambda x ->  qx + r
                              ------
                              sx + t


To verify it's a foldr, JG considered

          q [[ x ++ [n] ]] + r
          --------------------
          s [[ x ++ [n] ]] + t

(by meaning, [[ ]], he meant "fromCF of")

  fromCF' :: CF -> Homography

Why CF -> Homography? Well, writing fromCF as foldr is fine for
finite CFs (lists) but not for infinite ones. What we'd like is
something from left to right, with a foldl.


   lambda y -> qy + r
               ------    = fromCF (x ++ y)
               sy + t


   q*(n + 1/y) + r        (qn + r)y + q
   ---------------   =    -------------
   s*(n + 1/y) + t        (sn + t)y + s

MA: "Are you saying you can work out the homography semi-automatically?"
JG: "Yes. What do we start with? ... 
[temporary hush as we all try to figure it out]
JG: ... we need the identify function, so (1,0,0,1)"  (as (q,r,s,t))

  fromCF' = foldl stepl (1,0,0,1)
          where stepl (q,r,s,t) n = (qn+r,q,sn+t,s)

JG said that apparently the paper said that q,s are in their lowest
possible terms. A general discussion ensued about taking about common
factors and whether JG's claim that the numbers involved tended to
be "small" was accurate, as examples in papers are selected by
the authors of said papers.

SC offered a metaphor concerning throwing things at a wastebasket
and MA agreed that it was rare that you managed to land throws
minisculely close to a point target, and it was declared reasonable that
the numbers involved were small.

So we have another definition of fromCFi

  fromCFi = map fromCF . inits
          = map hom2rat . scanl stepl (1,0,0,1)

JG pointed out that if you do this, you get an infinite list of
rationals, each a better approximation to the number you were first
thinking of.


JG now asked us to "suppose that we have our conversions 
to/from rational, now....homographies are continued fractions:

        h(x) =  2
               ---         (0,2,-1,3)
               3-x

...can just do the same thing." JG wanted to have something,
call it a process, with type

  process :: Homography -> CF -> CF

One way is to do the foldl stepl (...)  with the homography you
first thought of, then convert -> rational -> CF again.
This is fine *provided* that x is a finite CF. Doesn't work if
x is infinite. So you *don't* want to consume all input, then
start producing output, you want to do streaming.

[echoes from advanced FP summer school were heard]

   stream :: (rho -> Maybe (b,rho)) -> (rho -> a -> rho)
                       -> rho -> [a] -> [b]

   stream f g z xs = 
      case f x of
        Just (n,z') -> n : stream f g z'
        Nothing     -> case xs of
                       [ ]    -> [ ]
                       (y:ys) -> stream f g (g z y) ys


You can write stream as a spec, if this condition (1) holds...

  f z = Just (y,z')  =>  f (g z x) = Just (y,g z' x) for all x,y,z,z' 
                                                                     (1)
then

  stream f g z xs = unfoldr f (foldl g z xs)

z is the initial state
g is the consumer
f is the producer

Translation of condition (1) on f,g:  
if f would produce some thing, then adding more 
information won't change that decision.


The aim is here that we want to apply h to an infinite CF

Trying to satisfy conditions for the streaming...

  (2)    qx + r      =  n +        1
         ------            -------------------
         sx + t            (q'x + r')/(s'x + t')

is implied by

   (q',r',s',t') = (s,t,q-ns,r-nt)

But how to pick the  n?

JG asked to suppose that s,t both positive.
He wanted to say that if (2) has the same integer part 
whatever x, then output. Otherwise stuck, need to consume something.

JG pointed out that if x in range 0 ... infinity  then
(2) is between q/s and r/t, and made an argument for a flushing
stream.

So...new version:

  stream :: (rho -> Maybe (b,rho)) -> (rho -> a -> rho)
            -> (rho -> [b]) -> rho -> [a] -> [b]

  stream f g h z xs =
   case f z of
      Just (n,z')  -> n : stream f g h z'
      Nothing      -> case xs of
                        [ ]    -> h z
                        (y:ys) -> stream f g h (g z y) ys


  evolve f z = (unfoldr f z t, loop f z)
  loop :: (a -> Maybe a) -> a -> a


(1) =>
  stream f g z xs = ys ++ h z'    
                  where (ys,z') = evolve f (foldl g z xs)

GJ: "It ought to take a continuation... ought to take a cat 
elimination on unfoldr"
RSB: "You need this?"
JG: "You need this.
I'm guessing the reason is that you can't simply write it as an
unfold after a fold."

GJ: was worried about when writing, rather than calculating, the program,
it was going to want to output a pair of 0s, which you don't want,
so it's effectively saying no, I'm not going to output anything at this
point.  (This apparently came from some work with an MSc student one
summer.)

"It's a mapping from what you've not yet consumed, to what you've
not yet produced, represented by 4 numbers."

RSB asked about the context for this work. JG said that it didn't
contribute to anything new about CF, and commented that another
researchers used a different representation for CFs, not using floors.

GJ suggested that JG was being too conservative in looking at the
range 0 ... infinity, and suggested JG might like to look at 
0 ... 1 and 1 ... infinity, which might allow to produce more often.

  produce (q,r,s,t)
   |  s,t>0 & q div s = r div t   = Just (n, (s,t,q-ns,r-nt))
   |  otherwise                   = Nothing

Actually the n should be  q div s [] r div t  according to GJ's
algebra of identities :-)

GJ wondered/worried about getting into a state where 
produce (q,r,s,t) = Nothing, and asked to be convinced, but not today.

JG further explained that the reason we need flushing is that if you
run it on an infinite list, then take finite prefixes, that does
not correspond to running it separately on all the prefixes.

After homographies.... quadratics!!!
RSB escaped for lunch.

Friday, 14th March 2003
Michael Abbott came to speak to the group last week. He is working on problems to do with "shapes" and "positions" in datatypes.

Michael expanded:

A "container" is a kind of generalised polynomial datatype of the form
	Sigma s:S. (P(s) => X) 
(where X is a type parameter). Here S is a set (or object) of "shapes", and for each s:S P(s) is the object of "positions" for that shape: the intuition is that an element of this data type consists of 1. a choice of "shape"; 2. for each position in the chosen shape, an element of the parameter type X.

Container types are closed under sum, product, covariant exponentiation (ie, K=>- for K a constant type) and both least and greatest fixed points (mu and nu types).

Container types are closely related to Barry Jay's shapely types. See http://www.cs.nott.ac.uk/~txa/publ/fossacs03.pdf for details.

An interesting application of containers is the computation of the "derivative" of a datatype: http://www.cs.nott.ac.uk/~txa/publ/tlca03.pdf

Friday, 7th March 2003
Shin has now returned from Cambridge, and he gave a much shorter proof of his Converse of a Function Theorem. The new proof is only 6 steps long, compared to several pages for the original one.

Barney has been thinking about some statistical problems connected with data compression, and has proved that the intuitive way of avoiding the zero-frequency problem (by starting the counts at 1 instead of 0) is correct.

He has also produced a fast approximation to the interval-shrinking operation used in arithmetic coding.

Friday, 14th February 2003
Richard has been studying the problem of sharing items of data in Haskell. Specifically, he's been looking at ways of generationg the lattice of subsets of a finite set. Each subset is contained in many sets, and so space can be saved by storing each subset just once and referring to it many times.

Friday, 7th February 2003
Jeremy has come up with conditions under which, for a simple relation R, there exists a simple S such that R = fold S.


Sharon Curtis (sharoncurtis@brookes.ac.uk), from Trinity Term.

Barney Stratford (Barney.Stratford@comlab.ox.ac.uk), for Hilary Term

[Oxford Spires]



Oxford University Computing Laboratory Courses Research People About us News