Raku Solutions Weekly Review

Task #1: Longest Common Substrings

This is derived from my blog post made in answer to the Week 18 of the Perl Weekly Challenge organized by Mohammad S. Anwar as well as answers made by others to the same challenge.

The challenge reads as follows:

Write a script that takes 2 or more strings as command line parameters and print the longest common substring. For example, the longest common substring of the strings “ABABC”, “BABCA” and “ABCBA” is string “ABC” of length 3. Other common substrings are “A”, “AB”, “B”, “BA”, “BC” and “C”. Please check this wiki page for details.

I can see at least two ways to tackle the problem (to simplify, let’s say, for the time being, between two strings). One is to have two nested loops, one on the letters of the first string and the second one on the letters of the second string, and to store the substrings (or, possibly, the longest so far). The other is to generate all the substrings of each word and then to compare them. I used the first approach for solving the challenge in Perl 5 and the second one in Perl 6 (because P6 has some functionalities making the second approach easy and interesting, and probably quite efficient). Since this blog is about Perl 6, I’ll detail only the second approach.

Note that the program below will consider only extended ASCII strings for simplicity. A couple of very minor changes might be needed for dealing properly with full Unicode strings.

My Solution

To generate all the substrings of a given string, we could use the regex engine with the :exhaustive adverb, to get all the overlapping matches. For example, consider this Perl 6 one-liner:

$ perl6 -e 'say ~$_ for sort "ABC" ~~ m:exhaustive/.+/
A
AB
ABC
B
BC
C

So this seems to be dead simple.

But I’ll rather use the rotor built-in subroutine, which isn’t mentioned very often although it is very powerful and expressive, because I wanted to use the opportunity experiment a bit with it.

These are two examples using rotor under the REPL:

> 'abcd'.comb.rotor(1);
((a) (b) (c) (d))
> 'abcd'.comb.rotor(2);
((a b) (c d))

In these examples, rotor groups the elements of the invocant into groups of 1 and 2 elements. We’re a long way from generating all the substrings of a given string. But we can do better:

> say 'abcd'.comb.rotor($_) for 1..4;;
((a) (b) (c) (d))
((a b) (c d))
((a b c))
((a b c d))

This is already much better, but we’re still missing some of the desired substrings such as bc and bcd.

The rotor method can take as parameter a key-value pair, whose value (the second item) specifies a gap between the various matches:

> (1..10).rotor(2 => 1)
((1 2) (4 5) (7 8))

As you can see, we obtain pairs of values, with a gap of 1 between the pairs (item 3, 6 and 9 are omitted from the list). Now, the gap can also be negative and, in that case, we get all successive pairs from the range:

> (1..10).rotor(2 => -1)
((1 2) (2 3) (3 4) (4 5) (5 6) (6 7) (7 8) (8 9) (9 10))

The rotor subroutine can in fact do much more than that (check the rotor documentation), but I’ve basically shown the features that we’ll use here.

The other Perl 6 functionality that we will use here is the the Set type and the associated intersection (∩ or (&)) operator. This operator does exactly what set intersection does in the mathematical set theory: it returns the elements that are common to the two sets.

We can now code the largest common substring in Perl 6:

use v6;
use Test;

sub substrings (Str $in) {
    my @result = $in.comb;
    append @result,  map { .join('') }, $in.comb.rotor: $_ => 1-$_ for 2..$in.chars;
    return set @result;
}
sub largest-substring (@words) {
    my Set $intersection = substrings shift @words;
    while (my $word = shift @words) {
        $intersection ∩= substrings $word;
    }
    return $intersection.keys.max({.chars});
}
multi MAIN (*@words where *.elems > 1) {
    say largest-substring @words;
}
multi MAIN () {
    plan 2;
    my @words = <ABABC BABCA ABCBA>;
    cmp-ok largest-substring(@words), 'eq', 'ABC', "Testing 3 strings";
    @words = 'abcde' xx 5;
    cmp-ok largest-substring(@words), 'eq', 'abcde', "Testing identical strings";
}

Launching the program with no argument to run the tests produces this:

$ perl6  substrings.p6
1..2
ok 1 - Testing 3 strings
ok 2 - Testing identical strings

And with three strings, we get the longest substring:

perl6  substrings.p6 ABABCTO BABCTO ABCTBA
ABCT

My solution returns only one longest substring, even when there are two (or more) distinct longest substrings. After all, the challenge specification said “print the longest common substring,” not “print the longest common substrings.” Anyway, the program would require just one additional code line to return several longest substrings if needed.

Alternative Solutions

Arne Sommer decided to generate all possible substrings from each input string and then to incrementally build the set of common substrings using the ∩ set intersection operator (and find the longest common substring at the end). So, something quite similar to what I did above, except that, to generate all the substrings, he manually implemented two nested loops on the letter sequences of a given string.

Mark Senn also used nested loops to find all the substrings of each of the input strings and stored them in an array @set of sets (well actually of SetHash objects). That’s a nice idea, but the true beauty of Mark’s solution lies in the way he uses the set intersection operator within the reduction metaoperator and sorts the result by substring length to find the LCS and print it, all in one single code line:

([(&)] @set).keys.sort({.chars}).tail.say;

I am really impressed. Congratulations, Mark, very good job! One possible minor improvement, though: you didn’t really need to sort the substrings and could have used the max function, for example .max({.chars}), which should presumably be more efficient.

Simon Proctor also wrote a subroutine to generate all substrings of a given string. And he also cleverly thought about using the set intersection operator within the reduction metaoperator to find all the common substrings. Simon’s final step is a bit more complex than Mark’s, because his program finds all the longest common substrings when there is more than one:

    my @word-subs = @words.map( &all-substrings );
    .say for ([(&)] @word-subs).keys.sort( { $^b.codes <=> $^a.codes } ).grep( { state $len = $_.codes; $_.codes == $len });

Joelle Maslak also used a subroutine to find all substrings of a given string, and she also used the magical [∩] combination of the set intersection operator and reduction metaoperator to find all common substrings. And she also made sure to display several longest substrings when there is more than 1. Clearly one of the best contributions on this challenge.

Athanasius also wrote a subroutine to generate all substrings of a given string, using two nested loops. He also populated an array of sets, but he performed the intersection in a for loop.

Veesh Goldman also reduced the intersection operator, but with a different syntax. He is the only one who used a regex with the :ex or :exhaustive adverb. Veesh is clearly the winner in terms of the most concise syntax:

sub MAIN ( *@strings where .elems > 1 ) {
  @strings.map( { m:ex/.+/>>.Str } ).reduce( { $^a ∩ $^b } ).keys.max(*.chars).say
}

And also certainly one of the best and most perl-sixish solutions, IMHO. I wish Veesh took part more often to the Perl 6 part of the challenge.

Francis Whittle also wrote a subroutine to generate all substrings of a given string, but he used the hyper method to accelerate his nested loops:

sub all-substrings(Str $in) {
  gather for (^($in.chars - 1)).hyper -> $i {
    for (1..^$in.chars).hyper -> $j {
      take $in.substr($i..$j) if $i <= $j;
    }
  }
}

This is quite clever, and I must admit that don’t think about this easy possible performance enhancement most of the time. His MAIN code is also original, interesting and worth looking at.

Noud used two nested loops to find the longest substring or substrings between two strings. Noud’s solution also covers the case where there are several longest substrings. The only problem is that the challenge said “2 or more strings” and Noud’s solution can process only two input strings. And, as we’ll see below, an LCS subroutine between two strings cannot properly find a LCS between 3 input strings.

Ruben Westerberg‘s solution suffers of the same syndrome: it can process only two input strings.

Ozzy orighinally implemented two nested loops to find the LCS between two strings. But this solution also can process only two input strings. However, he implemented another solution which uses three nested forloops to find all the substrings of the input words (two or more) and then uses the intersection operator to find trhe common substrings. This second solution displays all the longest common substrings when there is more than one.

Fench Chang interestingly created an infix LCS operator. The good thing about creating such an operator is that you can then use it within the reduction operator [] to process more than two input strings. I’m afraid, though, that this approach will fail on some input strings. Suppose for example that you want to compare 3 strings, ABCDEFUVWXY, ABCDEFGHUVWX and ABUVWXY. If I understand correctly, Feng’s program first looks for the LCS between the first two strings, and finds ABCDEF; then, the script looks for the LCS between ABCDEF and ABUVWXY, and finds AB. But, in reality, UVWX was a longer substring common to the three input strings. Well, after having written the preceding sentences, I decided that I should better test to check. So, I copied Feng’s LCS infix operator definition and tested it with the input strings of my example just above:

say [LCS] <ABCDEFUVWXY ABCDEFGHUVWX ABUVWXY>.flat;

and the program displayed “AB”. So it seems that my analysis is correct.

Enter Damian Conway

Damian Conway did not take part to the challenge, but wrote Chopping substrings, a blog post, that says everything you’ve ever wanted to know about the subject, and probably even more than that.

I will not try to summarize Damian’s master piece, you should really follow the link and read it, but I will only highlight a few points.

First, Damian says that the “best practice” solution for LCS is a relatively complex technique known as suffix tree, but we can get very reasonable performance for strings up to hundreds of thousands of characters long using a much simpler approach.

The idea is to get sets of all substrings of each input word and then find the intersection of those sets. Damian further notes that there can be several longest substrings and insists on finding them all.

His first solution uses a regex with the :ex (exhaustive) adverb to find all the substrings and the ∩ set intersection operator, together with the reduction metaoperator, to find all the common substrings:

keys [∩] @strings».match(/.+/, :ex)».Str

We could then use the builtin max function (as I did in my solution), but that returns only one longest substring, whereas Damian wants to find them all (when there is more than one LCS). So he decided to augment the max function so that it takes a new adverb, :all to indicate that we want all the maxima, not just one:

# "Exhaustive" maximal...
multi max (:&by = {$_}, :$all!, *@list) {
    # Find the maximal value...
    my $max = max my @values = @list.map: &by;

    # Extract and return all values matching the maximal...
    @list[ @values.kv.map: {$^index unless $^value cmp $max} ];
}

So, with this revised version of max, finding the longest common substrings now looks like this:

max :all, :by{.chars}, keys [∩] @strings».match(/.+/, :ex)».Str

So, problem nicely solved! Except that this won’t work when some bioinformaticist will try to compare DNA strands with 10,000 bases (or more). Finding all the substrings of a 10,000-letter string becomes highly unpractical. Damian goes on saying that it would be much easier if we knew what the length of the longest substring(s) was, because the number of possible substring would be much smaller. Of course, we don’t know this length, but it can be found with a binary search algorithm. And this new algorithm scales incredibly better. I’ll not describe further Damian’s findings (and his additional optimization), as it would be much better for you to read directly what Damian wrote. So, please, follow the link.

Task #2: Priority Queues and Binary Heaps

This is derived from my blog post made in answer to the Week 18 of the Perl Weekly Challenge organized by Mohammad S. Anwar as well as answers made by others to the same challenge.

The challenge reads as follows:

Write a script to implement Priority Queue. It is like regular queue except each element has a priority associated with it. In a priority queue, an element with high priority is served before an element with low priority. Please check this wiki page for more information. It should serve the following operations:

is_empty: check whether the queue has no elements.
insert_with_priority: add an element to the queue with an associated priority.
pull_highest_priority_element: remove the element from the queue that has the highest priority, and return it. If two elements have the same priority, then return element added first.

My solutions

There are numerous ways to design simple priority queues (at least when performance is not an issue, for instance if the data set isn’t very large). For example, it might be sufficient to maintain an array of arrays (AoA), where each of the arrays is a pair containing the item and associated priority. Or an array of hashes (AoH) based on the same idea. This means that each time we want to pull the highest priority element, we need to traverse the whole data structure to find the item with the highest priority. This may be quite slow when there are many items, but this may not matter if our data structure only has a few dozen items.

Another way is to build a hash of arrays (HoA), where the hash keys are the priorities and the hash values are references to arrays. When the number of priorities is relatively small (compared to the number of items in the queues), this tends to be more efficient, but note that we still need to traverse the keys of the hash until we find the highest priority. An AoA with the index being the priority and the sub-hashes the item might be more efficient (because the priorities remain sorted), but this requires the priorities to be relatively small positive integers. We still have to traverse the top data structure until we find the first non-empty sub-array. This could be done as follows:

use v6;

sub new-queue {
    my @queue;  # an AoA
    sub is_empty {
        for @queue -> $item {
            next unless defined $item;
            return False if $item.elems > 0;
        }
        True;
    }
    sub insert_with_prio ($item, Int $prio) {
        push @queue[$prio], $item;
    }
    sub pull_highest_prio {
        for reverse @queue -> $item {
            next unless defined $item;
            return shift $item if $item.elems > 0;
        }
    }
    return &is_empty, &insert_with_prio, &pull_highest_prio;
}

my (&is-empty, &insert, &pull-prio) = new-queue;
# Testing the above code
for 1..20 -> $num {
    insert($num,
        $num %% 10 ?? 10 !!
        $num %% 5  ?? 5  !!
        $num %% 3  ?? 3  !!
        $num %% 2  ?? 2  !!
        1);
}
for 1..20 -> $num {
    say pull-prio;
}
say "Empty queue" if is-empty();

We’re using functional programming to implement a pseudo-object system, in which the new-queue subroutine is acting as an object constructor, although it is implemented as a function factory. The @queue object is limited to the scope of the new-queue constructor, but the three methods returned to the caller are closures that can access to the content of their shared object, @queue. The tests insert 20 numerical items (numbers between 1 and 20) with a priority of 10 (highest) for numbers evenly divided by 10, of 5 for numbers evenly divided by 5 (but not by 10), of 3 for numbers evenly divided by 3 (but not by 5), etc.

This script displays the following output:

$ perl6  queues.p6
10
20
5
15
3
6
9
12
18
2
4
8
14
16
1
7
11
13
17
19
Empty queue

Another possibility is to use a heap, a data structure that usually has better performance (when it matters). This what we will look into now.

Background on Binary Heaps

A binary heap is a binary tree that keeps a partial order: each node has a value less than its parent and larger than either of its two children; there is no specific order imposed between siblings. (You may also do it the other way around: you can design heaps in which any node has a value larger than its parent, you basically only need to reverse the comparison.)

Because there is no order between siblings, it is not possible to find a particular element without potentially searching the whole heap. Therefore, a heap is not very good if you need random access to specific nodes. But if you’re interested in always finding the largest (or smallest) item, then a heap is a very efficient data structure.

Heaps are used for solving a number of computer science problems, and also serve as the basis for an efficient and very popular sorting technique called heap sort.

For a human, it is useful to represent a heap in a tree-like form. But a computer can store a heap as a simple array (not even a nested array). For this, the index of an element is used to compute the index of its parent or its two children. Roughly speaking, the children of an element are the two locations where the indexes are about double its index; conversely, the parent of a node is located at about half its index. If the heap starts at index 0, the exact formulas for a node with index $n are commonly as follows:

Parent: int( ($n-1)/2 )
Left child: 2*$n + 1
Right child: 2*$n + 2

The root node is at index 0. Its children are at positions 1 and 2. The children of 1 are 3 and 4 and the children of 2 are 5 and 6. The children of 3 are 7 and 8, and so on.

Suppose we build a heap (in ascending order) from an array of all letters between a and v provided in any pseudo-random order, for example:

my @input =  <m t f l s j p o b h v k n q g r i a d u e c>;

The resulting @heap might be something like this:

[a b g d c k j l f h e m n q p t r o i u s v]

We will see below on how to build such a heap from an unordered array, but let’s concentrate for now on the heap properties.

The order in the @heap above may not be immediately obvious, but a is the smallest letter, and its two children, b and g, are larger than a. The children of b are d and c and are larger than their parent b. Similarly, the children of g are k and j and are larger than their parent. And so on. But it is rather inconvenient to manually check that we have a valid heap. So, we may want to write a helper subroutine to display the heap in a slightly more graphical way:

    sub print-heap (@heap) {
        my $start = 0;
        my $end = 0;
        my $last = @heap.end;
        my $step = 1;
        loop {
            say @heap[$start..$end];
            last if $end == $last;
            $start += $step;
            $step *= 2;
            $end += $step;
            $end = $last if $end > $last;
        }
    }

This subroutine will not be used in the final code, but it proved to be very useful for debugging purposes.

If we pass the letter heap as an argument to this subroutine, it will be displayed in the following format:

(a)
(b g)
(d c k j)
(l f h e m n q p)
(t r o i u s v)

With a little bit of reformatting we can now see its structure in a tree-like format:

                (a)
        (b              g)
    (d       c      k        j)
  (l  f    h  e   m  n     q   p)
(t r o i  u s v ...)

And from that, we can now easily draw the tree:

alt text

The important thing to notice is that there is no particular order between siblings, but children are always larger than their parent.

How to build a Binary Heap

Since we’ll be dealing later with integers (priorities) in descending order, we will abandon our ascending order letter heap. Let’s suppose we have this heap example taken from the implementation section of the Wikipedia page on heaps: https://en.wikipedia.org/wiki/Heap_(data_structure)

my @heap = 100, 19, 36, 17, 12, 25, 5, 9, 15, 6, 11;

For the time being, we will consider it is a global variable accessible anywhere in the file.

Our print-heap helper subroutine would display it as:

(100)
(19 36)
(17 12 25 5)
(9 15 6 11)

We can see it’s a valid heap (the children are always smaller than their parent).

Let’s now add a new item, say 45, at the end of this array (for example with the push function). Of course, this item is not at its right place and the array is no longer a valid heap, but we can now use the following subroutine to move items around in order to obtain again a valid heap:

sub add-in-heap ($index is rw) {
    my $index-val = @heap[$index];
    while ($index) {
        my $parent-idx = Int( ($index - 1) /2);
        my $parent-val = @heap[$parent-idx];
        last if $parent-val >    $index-val;
        @heap[$index] = $parent-val;
        $index = $parent-idx;
    }
    @heap[$index] = $index-val;
}

The parameter passed is the index of the item that has just been added at the end of the array (11 in this example). This subroutine looks at the value of the parent of this new item. If the parent is larger than the new item, then we’re done: the new array happens to be a valid heap (which is not the case in our example). If not, then we move the parent value to the position where we’ve just added the new element. Then we change the index of interest to the parent and iterate this way until either the elements are in the right place (the parent value is larger than the current index value) or the index become 0 (we’ve reached the root node). At this point, the loop ends and we can put the value we’ve added in the right place. If you think about it in terms of the binary tree shown above, we’re really exploring the single path from the added element to the root (although we may not have to go all the way up to the root), the rest of the heap remains untouched.

Note that this subroutine is not designed to do anything special when fed with duplicate values. Here, duplicates will he handled gracefully and returned in the correct order. So, that’s OK, it works fine, but we’ll have to do something special about it when we will implement priority queues (if we had two priorities with the same value in the heap, we would be unable to predict the order in which items having the same priority will be pulled).

This subroutine will move around items from parent to child from the end to the beginning of the array (or at least until the new added value finds its right place), so that we get a new valid heap:

[100 19 45 17 12 36 5 9 15 6 11 25]

Using the print-heap helper subroutine to display the new heap outputs this:

(100)
(19 45)
(17 12 36 5)
(9 15 6 11 25)

I’ll leave it to you to draw the tree to check that it is a proper heap.

We now know how to add an item to a existing heap, we can of course use that subroutine to add an item to an empty heap, and we can use that subroutine repeatedly to place each item in its proper place in order to create a heap from an input list in any order:

for @array.keys -> $i {
    my $idx = $i;
    add-in-heap $idx;
}

At the end of this loop, the @array will have been turned into a heap.

Removing One Element from the Heap

If we’re looking for the largest element, it will be the root of the tree, i.e. the first item of the array.

Now, if we want to use this data structure to manage a priority queue, we will need at some point to delete the value in the root node and to reorganize the array so that it becomes again a legitimate heap. When we remove (100) from the above array, we have to choose the largest item between the two children, i.e. 45 in our example, and promote it as a new root node. And we can then propagate similarly the needed changes until the end of the array.

But the thermometer outside my house now shows 43.6 °C (in the shade), and it is more than 37°C inside. So, I’ll be a bit lazy for a moment and, rather than writing such a new subroutine (which should be done if you want to be efficient and will be done below), I’ll consider the array with the root node removed as an array in no particular order and use the code already written (the add-in-heap subroutine) to build a new heap from it:

sub take-from-heap {
    my $result = shift @heap;
    for @heap.keys -> $i {
        my $idx = $i;
        add-in-heap $idx;
    }
    return $result;
}

If we run that subroutine on our existing heap, it will return the largest item (the root node, i.e. 100) to the caller and reorganize the rest of the array into a new heap:

[45 36 17 15 25 5 9 12 6 11 19]

(45)
(36 17)
(15 25 5 9)
(12 6 11 19)

OK, this works, but reconstructing the full heap each time we remove an item is somewhat inefficient, which goes against the very purpose of heaps. What should a proper take-from-heap subroutine do? Take a look again at the binary tree displayed above. If we take off the root node value (a), we should replace it by b which is larger than g. It should be clear that we won’t need to change anything in the g sub-tree. And we can recursively replace b by c, and then c by e and finally e by v. Nothing else needs to be changed. So basically we have to move up one step each of the nodes on the path of the smallest nodes in the b sub-tree. And, by the way, it is thanks to the fact that, whether we add a new item or remove an item from the heap, we only need to traverse one single path through the heap that insertion and deletion operations have a 0(log n) complexity and are therefore fairly fast. Implementing the ideas just described is not too difficult, but, for each visited node, we need to take into account three possible cases: this node may have 0, 1 or 2 children.

sub take-from-heap {
    my $result = @heap[0];
    my $index = 0;
    loop {
        my $left-index = 2 * $index + 1;
            # right-index is $left-index + 1
        unless (defined @heap[$left-index] or
            defined @heap[$left-index + 1]) {
                @heap.splice($index, 1);
                last;
        }
        unless defined @heap[$left-index + 1] {
            @heap[$index] = @heap[$left-index]:delete;
            last;
        }
        unless defined @heap[$left-index] { # probably not happening
            @heap[$index] = @heap[$left-index + 1]:delete;
            last;
        }
        # both children are defined if we get here
        my $next-index = ($left-index,
            $left-index + 1).max({@heap[$_]});
        @heap[$index] = @heap[$next-index];
        $index = $next-index;
    }
    return $result;
}

If we run this new subroutine on our previous heap, we obtain this new heap:

[45 19 36 17 12 25 5 9 15 6 11]
(45)
(19 36)
(17 12 25 5)
(9 15 6 11)

Note that this is not the same heap as the one obtained before (same data but not in the same order), but this is another valid heap for such data. Using this subroutine repeatedly, we’ll get the nodes in the same order: 45, 36, 25, 19, 17 etc. For example, let’s run the new take-from-heap 10 times on our original heap and print out each time the removed first item and the resulting heap:

say "First item = ", take-from-heap, "; Heap: ", @heap for 1..10;

We can see that we have a valid heap at each iteration and pull the values in the right order:

First item = 100; Heap: [45 19 36 17 12 25 5 9 15 6 11]
First item = 45; Heap: [36 19 25 17 12 5 9 15 6 11]
First item = 36; Heap: [25 19 9 17 12 5 15 6 11]
First item = 25; Heap: [19 17 9 11 12 5 15 6]
First item = 19; Heap: [17 12 9 11 5 15 6]
First item = 17; Heap: [12 11 9 5 15 6]
First item = 12; Heap: [11 15 9 5 6]
First item = 11; Heap: [15 6 9 5]
First item = 15; Heap: [9 6 5]
First item = 9; Heap: [6 5]

So, it seems that we have a working algorithm to manage heaps. Let’s turn now to priority queues.

A Priority Queue as a Heap

Basically, we want to manage our priorities with a heap, and each priority will be associated with an array containing the individual items in the order in which they were inserted. To give you immediately an idea of the data structure, the queue will look like this at a certain point during the execution of the tests in the script below:

[[10 [10 20]] [5 [5 15]] [2 [2 4 8 14 16]] [1 [1 7 11 13 17 19]] [3 [3 6 9 12 18]]]

The first item in the queue displayed above, [10 [10 20]], is the data structure for priority 10, which contains two elements, 10 and 20. The next one ([5 [5 15]]) is for priority 5. And so on.

When we are inserting elements (item and priority), we first call insert_with_prio to check whether there is already an array for the given priority. If it already exists, we just add the item to the array of elements associated with this priority. If there no array with such priority, then we call add-to-queue to add a priority data structure into the heap (and reorganize the heap as we’ve done before). Similarly, when we call pull_highest_prio, we just pick up and return the first element from the data array of the first priority item. In the event that the data array of a given priority becomes empty, then we call take-from-heap to remove the priority data structure from the heap (and reorganize the heap as we’ve done before).

use v6;
sub new-queue {
    my @queue;  # an AoA
    sub is_empty {
        @queue.elems == 0;
    }
    sub insert_with_prio ($item, Int $prio) {
        my $index = first {@queue[$_][0] == $prio}, @queue.keys;
        if (defined $index) {
            push @queue[$index][1], $item;
        } else {
            push @queue, [$prio, [$item]];
            my $idx = @queue.end;
            add-to-queue($idx);
        }
    }
    sub pull_highest_prio {
        return Nil if is-empty;
        my $result = shift @queue[0][1];
        take-from-heap if @queue[0][1].elems == 0;
        return $result;
    }
    sub add-to-queue ($index is rw) {
        my $index-val = @queue[$index];
        while ($index) {
            my $parent-idx = Int( ($index - 1) /2);
            my $parent-val = @queue[$parent-idx];
            last if $parent-val[0] > $index-val[0];
            @queue[$index] = $parent-val;
            $index = $parent-idx;
        }
        @queue[$index] = $index-val;
    }
    sub take-from-heap {
        my $index = 0;
        loop {
            my $left-index = 2 * $index + 1;
                # right-index is $left-index + 1
            unless (defined @queue[$left-index] or
                defined @queue[$left-index + 1]) {
                    @queue.splice($index, 1);
                    last;
            }
            unless defined @queue[$left-index + 1] {
                @queue[$index] = @queue[$left-index]:delete;
                last;
            }
            unless defined @queue[$left-index] {
                @queue[$index] = @queue[$left-index + 1]:delete;
                last;
            }
            # both children are defined if we get here
            my $next-index = ($left-index,
                $left-index + 1).max({@queue[$_][0]});
            @queue[$index] = @queue[$next-index];
            $index = $next-index;
        }
    }

    return &is_empty, &insert_with_prio, &pull_highest_prio;
}
my (&is-empty, &insert, &pull-prio) = new-queue;
# Testing the above code: 20 insertions and then trying 30 deletions
for 1..20 -> $num {
    insert($num,
        $num %% 10 ?? 10 !!
        $num %% 5  ?? 5  !!
        $num %% 3  ?? 3  !!
        $num %% 2  ?? 2  !!
        1);
}
for 1..20 -> $num {
    last if is-empty;
    say pull-prio;
}
say "Empty queue" if is-empty();

This program displays more or less the same as our previous implementation:

$ perl6  heap_queue.p6
10
20
5
15
3
6
9
12
18
2
4
8
14
16
1
7
11
13
17
19
Empty queue

Adding some additional print statements shows how the priority queue is evolving when we pull elements from it:

[[10 [10 20]] [5 [5 15]] [2 [2 4 8 14 16]] [1 [1 7 11 13 17 19]] [3 [3 6 9 12 18]]]

[ ... lines omitted for brevity ...]

Pulled 18; New queue: [[2 [2 4 8 14 16]] [1 [1 7 11 13 17 19]]]
Pulled  2; New queue: [[2 [4 8 14 16]] [1 [1 7 11 13 17 19]]]
Pulled  4; New queue: [[2 [8 14 16]] [1 [1 7 11 13 17 19]]]
Pulled  8; New queue: [[2 [14 16]] [1 [1 7 11 13 17 19]]]
Pulled 14; New queue: [[2 [16]] [1 [1 7 11 13 17 19]]]
Pulled 16; New queue: [[1 [1 7 11 13 17 19]]]
Pulled  1; New queue: [[1 [7 11 13 17 19]]]
Pulled  7; New queue: [[1 [11 13 17 19]]]
Pulled 11; New queue: [[1 [13 17 19]]]
Pulled 13; New queue: [[1 [17 19]]]
Pulled 17; New queue: [[1 [19]]]
Pulled 19; New queue: []
Empty queue

The code is quite long and is certainly not worth the effort if we’re going to manage only 20 data elements and 5 priorities, as in our test cases above. But with much larger datasets and wider ranges of priority, it should be more efficient than our other implementation. If we’re going to use many priority queues, the code of the add-to-queue and take-from-heap subroutines could be stored separately in a module, making the new-queue code much smaller and more manageable.

Note that the insert_with_prio subroutine is traversing sequentially the heap to figure out whether the priority data structure already exists in the heap. Depending on the number of priorities, this might become time consuming. It would be easy to add and maintain a hash keeping track of the existing priorities and their position in the heap, to avoid sequential search. I did not do it because I considered this to be an implementation detail that may be or may not be useful depending on the exact circumstances. I would probably do it if I were to write a heap priority library for a CPAN module.

Alternative Solutions

Several of the challengers used the sort built-in function to find the highest priority element in an array or hash. Let me remind them that Perl 6 has built-in min and max routines to do the job more efficiently (they have an O(n) complexity).

Arne Sommer used a PriorityQueue Perl 6 module, which he wrote for the purpose of the challenge. This module is a nice and concise OOP package that maintains a %!queue attribute, which is implemented as a hash of arrays: the hash as one item per priority and each such item if an array of tasks to be run. When pulling the highest priority element, the hash entry for that priority is deleted if the corresponding array becomes empty, so that each existing hash priority entry always have at least one task lined up (there is never an empty array). One possible improvement might be to maintain an additional scalar attribute recording the maximal priority within the queue, as Joelle Maslak’s solution does (see below), so that the pull_highest_priority_element method does not have to scan the keys of %!queue to find the highest priority (which would be useful only if the number of priorities can become large). For some reason, Arne’s module did not make it into the Github repository, please check his blog to see the details.

Feng Chang took an OOP approach and created two classes, task and PriorityQueue. The former is very simple and implements objects with two attributes, a priority and a task ID (and no method). The latter is using two data structures, an array of priorities and a hash of arrays for the tasks, and does all the work. It can be noted that the add-new-task method implements a binary search algorithm on the priority array, to speed up the search.

Martin Barth also wrote a PriorityQueue class, with a nested Item class. Martin uses an array of items which he keeps sorted by priority (by inserting new items in the right place). His !find-pos private method also implements a binary search algorithm (with a recursive approach).

Noud‘s PriorityQueue class uses an ordered array of arrays.

Francis J. Whittle also took an OOP approach and created a Priority-Queue class, with a hash of arrays attribute. Francis’s code is really concise: only 13 code lines for the Priority-Queue class definition.

Kevin Colyer also took an OOP approach and created a PriorityQueue class, with an $!lol array of arrays data structure. Kevin’s class also has a $.items counter of tasks.

Randy Lauen wrote a ÜberNaïvePriorityQueue class implementing an array of hashes. Any new item is just simply pushed at the end of the @!elements. His pull-highest-priority-element uses the maxpair and first built-in routines to find the highest priority item. Perhaps a “naive” solution, but a very concise one: his class definition has 15 code lines (excluding empty lines and comments).

Simon Proctor‘s solution defines two roles, a SingleQueue role implementing an array or items, and a OrderedQueue role implementing a hash of SingleQueue objects. Quite an interesting solution, although I wonder why Simon chose to use roles rather than classes. It seems to me that his roles are really classes (in spirit) and that you could just replace the two occurrences of the word role by the word class and get exactly the same functionality.

Athanasius defines a MyPriorityQueue class in a separate module. This module in turn uses the Perl 6 Heap module, which provides push and pop operations.

Jaldhar H. Vyas created a Data::PriorityQueue with a nested Element class. Jaldhar’s queue is an array of Element objects ordered by priority and creation order. The hard work is done in the insert_with_priority method, which has to find the right place where to insert any new element.

Joelle Maslak defined a Priority-Queue class with a hash of arrays attribute. Her class also has a $!max scalar attribute, which keeps track of the highest priority in the data structure and makes it possible to pull the highest priority element without having to traverse the keys of the hash. Quite clever!

Mark Senn is one of the only two persons (well, three including myself) who did not use OOP. He implemented two parallel arrays, @priority and @value. Mark`s code is quite simple and relatively short.

Ruben Westerberg also did not use OOP. He created a hash of arrays.

Wrapping up

Please let me know if I forgot any of the challengers or if you think my explanation of your code misses something important.

If you want to participate to the Perl Weekly Challenge, please connect to this site.

Laurent Rosenfeld Weekly Review: Challenge - 018

Raku Solutions Weekly Review

Task #1: Longest Common Substrings

My Solution

Alternative Solutions

Enter Damian Conway

Task #2: Priority Queues and Binary Heaps

My solutions

Background on Binary Heaps

How to build a Binary Heap

Removing One Element from the Heap

A Priority Queue as a Heap

Alternative Solutions

SEE ALSO

Wrapping up

SO WHAT DO YOU THINK ?