Raku Solutions Weekly Review
Task #1: Anagrams of a Word
This is derived in part from my blog post made in answer to the Week 5 of the Perl Weekly Challenge organized by Mohammad S. Anwar as well as answers made by others to the same challenge.
The challenge reads as follows:
Write a program which prints out all anagrams for a given word. For more information about anagrams, please check this Wikipedia page.
I’m not sure I fully grasp the question, but my understanding is that we are looking for existing words which are anagrams of each other. For example, “pots”, “spot”, “tops”, and “post” are anagrams of each other because they have exactly the same letters rearranged in a different order.
For the purpose of this challenge, I will use a words.txt
file containing 113,809 lower-case English words usually accepted for crossword puzzles and other word games. The words.txt
file can be found on my Github repository. The original list was contributed to the public domain by Internet activist Grady Ward in the context of the Moby Project. This word list is also mirrored at Project Gutenberg.
For the purpose of testing the programs below, the words.txt
file is located in my current directory.
My Solutions
Normalized Strings: the is-anagram
Subroutine
We can start by creating an is-anagram
subroutine that takes two words as parameters and return a true value if the words are anagrams of each other and false otherwise.
sub is-anagram (Str $word1, Str $word2) {
return False if $word1.chars != $word2.chars;
return $word1.comb.sort eq $word2.comb.sort;
}
for <ban bane post stop pots stop post pots pots taps> -> $w1, $w2 {
say "$w1 $w2:\t", is-anagram $w1, $w2;
}
The is-anagram
subroutine first returns a false value if the input words don’t have the same length. Words of different lengths cannot be anagrams and it is my belief that checking the length of the input words is significantly faster than the subsequent processing of their individual letters (a couple of tests seemed to confirm that idea, but I must admit that haven’t seriously benchmarked that). Therefore, it should speed up processing when we will use this subroutine repeatedly for a large number of word combinations. When words have the same length, then we just split the words into individual letters, sort the letters, and produce new “normalized” or “canonical form” strings with the input letters; if the new normalized strings are equal, then the words were anagrams of each other.
This displays the following output:
$ perl6 is-anagram.p6
ban bane: False
post stop: True
pots stop: True
post pots: True
pots taps: False
Note that this works correctly because the eq
operator in the is-anagram
subroutine coerces its arguments into strings, so we don’t need to explicitly stringify the letter lists generated by the comb
function after they have been sorted.
Using a Bag
Another approach is to use one of a variety of Perl 6 data structure types called Set
, Bag
and Mix
. They are immutable collections of unique and weighed items.
You might construct a Set
as follows:
> my $s = set <banana apple orange orange banana pear apple>;
set(apple banana orange pear)
As you can see, duplicates have been removed. Sets only tell us whether at least one item of a given name has been encountered.
A bag, by contrast, also keeps track of how many of each item have been seen, as shown in these tests under the REPL:
> my $b = bag <banana apple orange orange banana pear apple orange>;
bag(banana(2), orange(3), pear, apple(2))
> say $b{'banana'}
2
Mixes are similar to bags, except that the elements’ weights don’t have to be integers.
One interesting thing about these collections is that they can use many set operators commonly used in mathematics, such as the (elem)
(or ∈
) set membership operator or the (<)
(or ⊂
) subset operator, illustrated here under the REPL:
> say "Found it!" if 'apple' `∈` $s;
Found it!
> say "It is a subset" if qw/orange banana/ ⊂ $s
It is a subset
> say "Found several oranges" if qw/orange orange/ ⊂ $b
Found several oranges
We can now try the following alternate is-anagram
subroutine using bags:
sub is-anagram (Str $word1, Str $word2) {
return $word1.comb.Bag === $word2.comb.Bag;
}
for <ban bane post stop pots stop post pots pots taps> -> $w1, $w2 {
say "$w1 $w2:\t", is-anagram $w1, $w2;
}
The ===
value identity operator used between two bags returns True
if the bags are identical. This program displays the same output as before.
Creating an Anagram Operator
Just a bit of fun: rather than creating an is-anagram
subroutine, we could create the infix ana
operator:
sub infix:<ana> (Str $word1, Str $word2) {
return $word1.comb.Bag === $word2.comb.Bag;
}
for <ban bane post stop pots stop post pots pots taps> -> $w1, $w2 {
say "$w1 $w2:\t", $w1 ana $w2;
}
This prints the same result. Note, however that this feels significantly slower. My guess is that it is mainly compilation time. Adding the following line at the end of the previous script:
say now - INIT now;
shows that the run time is less than 0.02 seconds:
perl6 is-anagram.p6
ban bane: False
post stop: True
pots stop: True
post pots: True
pots taps: False
0.0156261
I might undertake a serious benchmark one day, but it really seems that the perceived response time really has to do with compile time.
Finding All Anagrams of a Word From a Word List
The other possible understanding of the challenge is that we are given a word and a word list, and should output the words of the list that are anagrams of the input word. We can use the same is-anagram
subroutine as follows and use directly the words.txt
file mentioned earlier:
my $post-bag = "post".comb.Bag;
sub is-anagram (Str $word) {
return $word.comb.Bag === $post-bag;
}
for "words.txt".IO.lines -> $line {
say $line if is-anagram $line;
}
This program displays the words of the word list that are anagrams of the word “post”:
perl6 is-anagram.p6
opts
post
pots
spot
stop
tops
Alternative Solutions
Arne Sommer understood the challenge to mean that we were supposed to look up anagrams of a given word in a list or dictionary (what I did in the last section of my solutions). His approach is to read all the words of the dictionary and store them in a Set
(done in the get-dictionary
subroutine). Then, the program uses the permutations built-in function to find all the letter permutations of the input word:
unit sub MAIN (Str $word is copy where $word !~~ /\W/,
:$dictionary where $dictionary.IO.r = "/usr/share/dict/british-english");
$word .= lc;
my $dict = get-dictionary($dictionary);
print "Anagrams:";
for $word.comb.permutations>>.join.unique -> $candidate
{
# next if $candidate eq $word;
print " $candidate" if $dict{$candidate};
}
print "\n";
sub get-dictionary ($file where $file.IO.r) is export
{
return $file.IO.lines.grep(* !~~ /\W/)>>.lc.Set;
}
Doug Schrag understood the challenge the same way, also used a set to store the words of the dictionary, and also used the permutations
built-in function:
subset Filename of Str;
sub MAIN(Str $word, Filename :$word-file) {
my $file = .IO with $word-file;
my Set $words = Set.new(.lines.sort) with $file;
if $words.defined {
.say for anagrams($word, -> $w { $w (elem) $words });
}
else {
note 'Please supply word dictionary using --word-file option';
say 'All permutations:';
.say for anagrams($word);
}
}
sub anagrams ($word, &is-word = -> $w { True }) {
gather
for $word.comb.permutations.unique(:with(&[eqv])) {
with .join {
.take if .&is-word
}
}
}
Francis J. Whittle also understood that the task was to find anagrams of a word in given file. He used Bag
s of letters to check whether words are anagrams of each other.
unit sub MAIN(
Str $file #= file containing list of words
where { given .IO { .r && ( .l || .f) or die "Cannot read from $_" } },
$word #= word to find anagrams of
);
my $word-bag := $word.lc.comb(/ \w /).Bag;
my @words = $file.IO.lines.unique.hyper.grep(*.chars > 2)
.map: { .lc.comb(/ \w /).Bag => $_ };
@words.race.grep(*.key === $word-bag)».value.unique(with => *.lc eq *.lc).join(', ').put;
Jaldhar H. Vyas normalized the target word and the words of the dictionary by rearranging their letters in alphabetical order much in the same way as in my original is-anagram
subroutine. His normalized word dictionary is stored in a hash.
my $search = $word.comb.sort.join;
my %dictionary = $list.IO.lines.map({ $_ => $_.comb.sort.join });
my %anagrams;
for %dictionary.kv -> $key, $value {
%anagrams{$value}.push($key);
}
if (%anagrams{$search}:exists) {
say %anagrams{$search}.grep({ $_ !~~ $word }).sort.join(' ');
}
Jo-Christian Oterhals stored the words of the /usr/share/dict/words
standard Unix dictionary into a Set
and used the permutations
of the input words to check for anagrams:
my $dict = "/usr/share/dict/words".IO.lines()>>.lc.Set;
for @*ARGS ?? @*ARGS !! ! $*IN.t ?? lines() !! '' -> $w {
$w.lc.comb.permutations>>.join.grep({ $dict{$_} and $_ ne $w }).map({ "$w\t$_\n" }).unique.join.say;
}
Joelle Maslak used Bag
s to store the letters of the input word and analyze the dictionary words.
sub MAIN(Str:D $letters, Str:D $filename = '/usr/share/dict/words') {
my $matchbag = Bag.new($letters.comb);
my SetHash $dedupe = SetHash.new; # To store matches we gave back
for $filename.IO.lines -> $word {
my $fcword = $word.fc;
my $bag = Bag.new($fcword.comb);
if $bag ~~ $matchbag {
next if $fcword ∈ $dedupe;
$dedupe{$fcword}++;
say $fcword;
}
}
}
Mark Senn normalized words of the word list much in the same way as my original is-anagram solution (i.e. sorting the letters) and stored the result in a hash. He then used a hash
my Array %hash;
# Construct the array.
for (@word)
{
# The key for "family" is "amfily".
my $key = .comb(/./).sort.join;
# Add the current word to the hash.
%hash{$key}.push($_);
}
my $word = @word.pick;
# Convert $word to $key.
my $key = $word.comb(/./).sort.join;
for (%hash{$key}.Array.sort)
{
($word eq $_) or print " $_";
}
print "\n";
Rob4t used Bag
s to check for anagrams:
sub MAIN(Str $word, Str $file where *.IO.r = '/usr/share/dict/words') {
my $word_bag = $word.lc.comb.Bag;
my @found_words = $file.IO.lines.grep: {
# not the same word
.lc ne $word.lc
and
# look for words AND phrases
.lc.words.map({.comb}).flat.Bag eqv $word_bag
};
.say for @found_words;
}
Ruben Westerberg wrote an anagram module implementing a findAnangrams
(sic) subroutine:
unit module anagram;
sub findAnangrams(@letters, @words) is export {
my @a=sort @letters;
my @anagrams;
my $test;
while * {
$test=@a.join("");
@anagrams.append(@words.hyper(degree=>4,batch=>50000).grep: {$_ eq $test});
my int $k=0;
my int $l=0;
my $tmp;
my $flag=False;
my int $maxK=-1;
my int $maxL=-1;
while ($k < @a.elems-1) {
$maxK= $k if ((@a[$k] lt @a[$k+1]) && ($k > $maxK));
$k++;
}
last unless $maxK >= 0;
$maxL=$maxK+1;
while ($l < @a.elems) {
$maxL= $l if ((@a[$maxK] lt @a[$l]) && $l>$maxL);
$l++;
}
$tmp=@a[$maxK];
@a[$maxK]=@a[$maxL];
@a[$maxL]=$tmp;
my @b=@a.splice($maxK+1).reverse;
@a.=append(@b);
}
@anagrams;
}
His program using this module then looks like this:
use lib $?FILE.IO.dirname;
use anagram;
my $wordsFile= $?FILE.IO.dirname ~ "/../words_alpha.txt";
my @word-list=$wordsFile.IO.lines;
my $input= @*ARGS[0];
my @letters=$input.trim.lc.split("")[1..*-2];
say "Anagrams: ",findAnangrams(@letters,@word-list).join: ' ';
This probably works well, but, although the idea of creating a module for anagrams is probably good, I have the feeling that this implementation is a bit over-engineered. My solution held in 7 code lines.
Simon Proctor also wrote an Anagrams module:
unit package Anagrams;
sub normal ( Str \word ) is pure {
samemark( lc( word ), "a" )
}
sub order-string ( Str \word ) is export is pure {
normal( word ).comb.sort( { $^a cmp $^b } ).join;
}
multi sub is-anagram-of( Str \target, Str \word where * eq target ) is export is pure { False; }
multi sub is-anagram-of( Str \target, Str \word where *.codes != target.codes ) is export is pure { False; }
multi sub is-anagram-of( Str \target, Str \word ) is export is pure {
normal( target ) ne normal( word ) && order-string( target ) ~~ order-string( word );
}
His program using this module then looks like this:
use v6;
use lib 'lib';
use Anagrams;
my %*SUB-MAIN-OPTS = :named-anywhere;
sub USAGE { say $*USAGE }
subset FilePath of Str where *.IO.f;
#| Display Help file
multi sub MAIN ( Bool :h($help) where *.so ) { USAGE(); }
#| Find the anagrams for a given word
multi sub MAIN (
Str $word, #= Word to check for. Case insensitive
FilePath :$dict = "/etc/dictionaries-common/words" #= Dictionary file to use. Defaults to "/etc/dictionaries-common/words"
) {
$dict.IO.words.grep( { is-anagram-of( $word, $_ ) } )>>.say;
}
Again, this seems to be a bit over-engineering to me.
SEE ALSO
Not less than six blog posts this time:
-
Arne Sommer: https://perl6.eu/anagrams.html;
-
Francis J. Whittle: https://rage.powered.ninja/2019/04/22/anagramming-max.html;
-
Joelle Maslak: https://digitalbarbedwire.com/2019/04/27/anagrams-in-on/;
-
Jo Christian Oterhals: https://medium.com/@jcoterhals/perl-6-small-stuff-18-applying-permutations-to-an-anagram-challenge-65eb2ff64367
-
Simon Proctor: https://www.khanate.co.uk/blog/2019/04/26/perl-weekly-week-5/
Wrapping up
Please let me know if I forgot any of the challengers or if you think my explanation of your code misses something important (send me an e-mail or just raise an issue against this GitHub page).