Decode Hexdump

Saturday, Jun 18, 2022| Tags: Perl

Those who know me personally, are aware that my background is Mathematical as I did Degree in Mathematics Honours in the year 1993-95. During my early days with COBOL and Pascal, I never dealt with Unicode or played with bytes. Even when I moved to C, I was still no where near it. Those with Degree in Computer Science always stay ahead in the understandings of these low level key aspects of programming.

Fortunately or unfortunately, I never got the opportunity to work with Unicode. Having said, It always was there in the back of my mind, one day I will conquer the battle. Few weeks ago, I had conversation with a senior member of Perl Community with regard to an issue in one of the CPAN module, that I currently maintain. I was pleasantly surprised to see how comfortable he was playing with Unicode and debugging using hexdump. I decided to get hold of it rather than delaying it any further.

But the matter of fact is, my plate is always full, at any given point in time. So adding anything to the overloaded plate is going to tip over something already in the pipeline. As always the case, I went back to my Twitter handler and request for help. I did get some nice suggestions.

In this post, I am going to share my experience with you all.


So what exactly is the problem?


My initial blocker was that I am unable to decode the output of hexdump.


So for the purpose of this blog post, I created sample plain text file sample.txt.


Sample File


Now time, to get the hexdump dump some garbage (to me at least).


Hexdump output


Here comes the trouble, how the output relates to the actual text in the file?


My twitter friends again helped me with decoding.



    6548  eH
    6c6c  ll
    206f  <s>o
    6f57  oW
    6c72  lr
    2064  <s>d
    2121  !!
    0a21  <l>!


<s> means space and <l> is linefeed. That was all, I needed.


But then why it is the other way around?


I am told again by my twitter friends, it is the endianness that is behind the order.

For me, this is another blocker that I had to deal with.

With little search on Google, I found this post that explains the subject in details.

In summary, Big Endian (BE) stores data MSbyte first where as Little Endian (LE) stores data MSbyte last.


Now what is MSbyte?


The term Most Significant Byte (MSbyte) is the most common method of defining endianness.

The byte holding the greatest position is called MSbyte. Similarly the bit holding the greatest position is called MSbit.

The byte holding the smallest position is called LSbyte. Similarly the bit holding the smallest position is called LSbit.


Going back to the original, why the hexdump output is not in the correct order?


The answer is my machine is configured/built as LE.


So how do I know what endian is my system build upon?


I started looking for Perl way of figuring out.

In no time, I found this solution. It had a typo in the original solution, I fixed it here detect-endian-ness.pl.


#!/usr/bin/perl

use v5.36;

my @b = unpack('C*', pack('I', 0));
my $sizeof_long = scalar(@b);

my @c = (1..$sizeof_long);
my $i = pack('I', hex('0x0'.join('0',reverse @c)));
my $big = pack('C'.$sizeof_long, reverse @c);
my $lit = pack('C'.$sizeof_long, @c);

if ( substr($i, 0, $sizeof_long) eq $big ) {
   say 'big';
}
elsif ( substr($i, 0, $sizeof_long) eq $lit ) {
   say 'little';
}
else {
   say "strange";
}

Time to find out the endian of my current system.


Endian


Before I end this discussion, I would like share another post that explains Byte Order Mark (BOM), if you are interested.

Last but not the least, I would to thank everyone who helped in this.

That’s it for now.

SO WHAT DO YOU THINK ?

If you have any suggestions or ideas then please do share with us.

Contact with me