Kian-Meng Ang Weekly Review: Challenge - 024

Sunday, Sep 15, 2019| Tags: Perl

Continues from previous week.

Feel free to submit a merge request or open a ticket if you found any issues with this post. We highly appreciate and welcome your feedback.

For a quick overview, go through the original tasks and recap of the weekly challenge.


Task #1


CPAN modules used: B, strict, warnings

Read the excellent blog post by Arne Sommer on his investigation to find the shortest solution in both Perl 5 and Perl 6.

Quite a few participants were taken aback (including the reviewer) by this task. Upon re-reading and reviewing the task and submitted solutions, perhaps this was one of those task where the solution depends solely on the interpretation of the participant and along the way, let us all learn something about the Perl interpreter itself (see Arne Sommer‘s post). Nevertheless, we will look at different ways used by participants to solve this task.

First off is the empty file solution. First submitted by Joelle Maslak and followed by E.Choroba, Simon Proctor, and Ruben Westerberg. Joelle Maslak did a comparison of parsing empty file between a few programming languages and Perl 5 have the fastest startup time without throwing any errors at “doing nothing”.

Equivalent one-liner also submitted by Colin Crain, E. Choroba, Laurent Rosenfeld, and Yet Ebreo.

Some participants (Dave Cross and Roger Bell West) have an alternative opinion that an empty file is not really a Perl script. Well, they are partially correct as according to file command, an empty file, is just an empty file.

$ file challenge-024/joelle-maslak/perl5/ch-1.pl
challenge-024/joelle-maslak/perl5/ch-1.pl: empty

Hence, by adding a shebang as a interpreter directive, the file command will recognize a file as Perl executable script.

$ file challenge-024/roger-bell-west/perl5/ch-1.pl
challenge-024/roger-bell-west/perl5/ch-1.pl: Perl script text executable

Does that means that a text file without a Perl shebang interpreter directive is not a Perl script?

Not really.

As shown in the solution submitted by Lubos Kolouch, Steven Wilson, and Andrezgz, we can run the script just fine although file command identified the file as plain ASCII text file.

$ perl challenge-024/andrezgz/perl5/ch-1.pl
This script is the smallest in terms of size that on execution doesn't throw
any error, doesn't do anything special and explains what it does

$ file challenge-024/andrezgz/perl5/ch-1.pl
challenge-024/andrezgz/perl5/ch-1.pl: ASCII text

For those who use code linter in their development environment, perlcritic, even at the most gentle setting will raise some concerns. Well, this is not part of the requirement of the task, it’s good to know how perlcritic evaluates a basic Perl script.

$ perlcritic --gentle challenge-024/lubos-kolouch/perl5/ch-1.pl
challenge-024/lubos-kolouch/perl5/ch-1.pl: [TestingAndDebugging::RequireUseStrict]
Code before strictures are enabled at line 1, column 1 (Severity 5).
    Using strictures is probably the single most effective way to improve
    the quality of your code. This policy requires that the `'use strict''
    statement must come before any other statements except `package',
    `require', and other `use' statements. Thus, all the code in the entire
    package will be affected.

    There are special exemptions for Moose, Moose::Role, and
    Moose::Util::TypeConstraints because they enforces strictness; e.g.
    `'use Moose'' is treated as equivalent to `'use strict''.  The maximum
    number of violations per document for this policy defaults
    to 1.

In short, an empty file is probably the shortest and closest answer to this task that fulfil most of the requirements.


Task #2


CPAN modules used: Carp, DBI, Data::Dumper, English, File::ByLine, File::Find::Rule, Lingua::Stem, List::MoreUtils, List::Util, Storable, Syntax::Construct, Test::More, Text::Table::Tiny, autodie, boolean, constant, feature, strict, warnings

If you haven’t done this task and want to learn more about implementing full text search using Inverted Index, start with submission by Laurent Rosenfeld. The solution was concise but comprehensive enough to demonstrate a working implementation of Inverted Index. Next, move to something similar with test case by Lubos Kolouch. If the regex on extracting words to build the indexes confuse you, you can read the submission by Andrezgz, which have good written comments on the regex. Now if you still cannot grok how Inverted Index works, then look at output of the solution by Yet Ebreo where you can visualize the how the index works as shown below.

perl .\ch-2.pl "i sing eat and love" .\file1.txt .\file2.txt .\file3.txt .\file4.txt .\file5.txt
+-------+--------------------------------+
| Words | File(s)                        |
+-------+--------------------------------+
| and   | file1.txt file2.txt            |
| eat   | file4.txt                      |
| i     | file1.txt file2.txt file4.txt  |
| love  | file2.txt file5.txt            |
| sing  | (N/A)                          |
+-------+--------------------------------+

By going through all four submissions, you’re now equipped with good fundamental overview of the implementation of Inverted Index.

The above mentioned four solutions served as a good working prototype to get things started. You must be wondering, can we improve or extend on these solutions?

Yes, there were quite a few.

Interestingly, two of the participants (E. Choroba and Joelle Maslak) used their own CPAN module in their own solution. The first we’ve seen so far and it caught us by surprise! First, the solution by E. Choroba used Syntax::Construct CPAN module as an alternative way to manage feature pragma. Second, the solution by Joelle Maslak which used File::ByLine CPAN module to process single a file in a parallel manner.

How about storage? Both Duane Powell and E. Choroba used Storable CPAN module to capture the index in a persistent manner. While Guillermo Ramos was the only participant who used DB to store the index.

What if we want to refine the word we’ve extracted to build the index? Roger Bell West solved this by stemming through the Lingua::Stem CPAN module.

How about counting the word frequency for each document? Randy Lauen was the only participant which implemented this approach.

Now, for some other miscellaneous we’ve noticed from reviewing these solutions.

If you don’t like to use glob to filter and get the list of files, you can look into File::Find::Rule as seen in submission by Duane Powell. In formatting the output of the result, Adam Russell demonstrated that you can use the Perl formats (quite old school way) to achieve that.


See Also


(1) Small Inversions with Perl 6 by Arne Sommer. Recommended read of the week.

(2) Inverted Index Formatting by Adam Russell

(3) Perl Weekly Challenge # 24: Smallest Script and Inverted Index by Laurent Rosenfeld

(4) Perl Weekly Challenge 24 by Jaldhar H. Vyas

(5) RogerBW’s Blog: Perl Weekly Challenge 24 by Roger Bell West

(6) Perl Weekly Challenge W024 - Smallest Script, Inverted Index by Yet Ebreo

(7) Perl Weekly Challenge 024: Inverted Index and Shortest Oneliner by E. Choroba

(8) Perl is Good for Nothing by Joelle Maslak

SO WHAT DO YOU THINK ?

If you have any suggestions or ideas then please do share with us.

Contact with me