Sunday 31 January 2016

Compling a C Program

In this lab, we investigated the source code of a C program against the output of the C compiler. We first started off with compiling hello.c as is.
gcc -o hello hello.c -g -O0 -fno-builtin

The options do the following:
  • -g:                  Enables debug information
  • -O0:               Disables GCC optimizations
  • -fno-builtin:   Disables C optimizations

The resulting binary files was just a few kilobytes in size.  In the lab we were to recompile the code with the following changes:
  1. Add the compiler option -static
  2. Remove the compiler option -fno-builtin
  3. Remove the compiler option -g
  4. Add additional arguments to the printf()
  5. Move the printf() call to a separate function named output(), and call that function from main()
  6. Remove the -O0 and add -03 to the gcc options

 

Add the Compiler Option -static

The static option causes all the dependencies to be added for the program in the binary. Everything is included and compiled along with the source code. This option makes the program access links much faster but increases the size of the binary. Without the option the program will dynamically look for the links.

 

Remove the Compiler Option -fno-builtin

This option tells the compiler to avoid using any built in optimization techniques. When used it changes the function printf() to become puts(). The difference between the two is that printf() validates each character to make sure its a formatted. While puts() simply inserts the function into the buffer and onto the screen without any validation. Other functions that are being used in the background are also changed but I am unable to tell why and for what purpose. I assume most of the operations are much faster and more direct just like printf() and puts().

 

Remove the Compiler Option -g

The -g option allows debugging to occur. Without it errors are not as human readable and warning are not told. The -g option can make the job of figuring out easier for the developer, but the public binary should not be released with it.

 

Add Additional Arguments to the printf()

When additional arguments are added to printf(), based on the platform, numbers are loaded into the registry first while others are pushed onto the stack.

int main() {
  400500:       55                      push   %rbp
  400501:       48 89 e5                mov    %rsp,%rbp
  400504:       48 83 ec 30             sub    $0x30,%rsp
    printf("Hello World!\n%d%d%d%d%d%d%d%d%d%d%d", 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100);
  400508:       c7 44 24 28 64 00 00    movl   $0x64,0x28(%rsp)
  40050f:       00
  400510:       c7 44 24 20 5a 00 00    movl   $0x5a,0x20(%rsp)
  400517:       00
  400518:       c7 44 24 18 50 00 00    movl   $0x50,0x18(%rsp)
  40051f:       00
  400520:       c7 44 24 10 46 00 00    movl   $0x46,0x10(%rsp)
  400527:       00
  400528:       c7 44 24 08 3c 00 00    movl   $0x3c,0x8(%rsp)
  40052f:       00
  400530:       c7 04 24 32 00 00 00    movl   $0x32,(%rsp)
  400537:       41 b9 28 00 00 00       mov    $0x28,%r9d
  40053d:       41 b8 1e 00 00 00       mov    $0x1e,%r8d
  400543:       b9 14 00 00 00          mov    $0x14,%ecx
  400548:       ba 0a 00 00 00          mov    $0xa,%edx
  40054d:       be 00 00 00 00          mov    $0x0,%esi
  400552:       bf 00 06 40 00          mov    $0x400600,%edi
  400557:       b8 00 00 00 00          mov    $0x0,%eax
  40055c:       e8 7f fe ff ff          callq  4003e0 <printf@plt>
}
  400561:       c9                      leaveq
  400562:       c3                      retq
  400563:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  40056a:       00 00 00
  40056d:       0f 1f 00                nopl   (%rax)

 

Move the printf() Call to a Seperate Function Named output(), and Call that Function From main()

Moving the function for this is straight forward. All that seems to happens is that the output() function has its own section that main calls upon.

 

Remove the -O0 and Add -O3 to the GCC options

Any -OX command allows for optimization to be applied to the program. X being the level of optimization, where 0 is the minimal level of optimization and 3 is the most optimized. -O3 has the potential to break your program as it goes above and beyond to make sure it is optimized. From -O0 to -O3 different options are turned on automatically for the developer but they can also enable these options themselves for better control and choice.

Here are the important differences in main(-O3 vs -O0):
int main() {
    printf("Hello World!\n");
  400410:       bf a0 05 40 00          mov    $0x4005a0,%edi
  400415:       31 c0                   xor    %eax,%eax
  400417:       e9 c4 ff ff ff          jmpq   4003e0 <printf@plt>
and
int main() {
  400500:       55                      push   %rbp
  400501:       48 89 e5                mov    %rsp,%rbp
    printf("Hello World!\n");
  400504:       bf b0 05 40 00          mov    $0x4005b0,%edi
  400509:       b8 00 00 00 00          mov    $0x0,%eax
  40050e:       e8 cd fe ff ff          callq  4003e0 <printf@plt>
}
  400513:       5d                      pop    %rbp
  400514:       c3                      retq
  400515:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  40051c:       00 00 00
  40051f:       90                      nop

As you can see, -O3 accomplishes the same task in less code than -O0.
 

Assembly

During our Lab 3 in class we were assigned the task of learning more about how Assemblers work and what they are used for. First though we had to learn the differences between X86_64 Register and aarch64 Register. It's interesting to see and take away the different design philosophy that each platform decides on.

An important thing to note is the differences and the similarities. For example, ARM instructions are RISC (Reduced instruction set computer) like instructions. The benefit is that instructions are all very simple and fixed length. X86 has more complicated instructions as it is CISC (Complex instruction set computer).

For the lab itself, the task required building a simple program ran in x86_64 and aarch64. The program first had it looping from 0 to 9 but by then it had to be 0 to 30. I'm just going to focus on the x86_64 portion of the lab, as the logic for one, was same for the other, just with minor tweaks I'll elaborate later on.

The main differences between assembly and higher level languages for me are:
  1. Symbols
  2. Keeping Track of Registers
  3. A Different Way of Thinking

 

Symbols

Although the program is simple in nature, it still taught me the basic learning blocks for what more complicated programs will ask for. It is important to allocate memory without the need of moving it for later. Symbols make sure all of this corresponds to the correct memory addresses, registers or other values.


Keeping Track of Registers

I found it very unique how registers are handled. They help speed up the processor operations with internal memory storage locations. The only problem and challenge is that there are so few registers to use. I am assuming later on you will need to dynamically point to memory storage instead of just using registers but that is still yet to be seen. It will be interesting to compare these thoughts later down the road as I learn more in SPO600.


A Different Way of Thinking

If the task was asked for in a higher level language, it could be finished in a few lines like this psuedocode suggests:
for count in range(0,30):
        print "Loop:" + count

But this was a bit more complicated. For example, although counting out looud in your head pass 10 is simple, for assembly doing this in ASCII isn't simple. Instead we had to take the number, divide it by 10 using an instruction, taking that reminder and displaying it.


Sunday 17 January 2016

Code Building

GNU Units

The first software I will be compiling is the GNU Units and it uses the GNU License. After I installed the file through FTP and unzipped and untar I found there was no makefile at all in the directory. I wasn't sure what the issue was:










I thought maybe I had to do ./install-sh but was perplexed why it wouldn't work(I assume that should have at least done something as it was an executable.) I had then noticed the INSTALL readme file and opened it up to see what to do.

After reading through the documentation everything had work and I was able to run units correctly. The configuration and make both took very quickly, less than a few seconds even though the documentation said otherwise.

LUA

The next software I installed and complied is Lua and it uses the MIT License. The installation method was easier as the make file was already inside the directory. The compiling had some issues.


After googling the fatal error issue I discovered the problem was that I had to install the readline development library. After doing so, the project was able to install correctly.


Open Source & Contributing To Software

This post will briefly speak of two open source software packages that anybody can contribute to and how it can be done.

parsedatetime Library

The parsedatetime library is an open source python library able to parse human-readable date/time strings  and can be obtained at https://github.com/bear/parsedatetime. It supports many different types of date formats. I've used the library before in the past for personal projects and I've always found no matter the string, it's able to come out ahead with the correct parse. The license for the project is the Apache License.
  1. Go to there issues tab on github to see any problems with ongoing development. https://github.com/bear/parsedatetime/issues
  2. To contribute a patch, make a "pull request" through git for the issue. 
  3. Wait for the author to comment and review your proposal.
  4. If thoroughly review and approved, the project author will merge the pull request.
The author will respond very quickly and the pull could be added quickly if he approves. Even in the same day sometimes. Other times, bugs or problems will be cataloged for others to go through to fix themselves. Although the project is very thorough throughout the years, there are still edge cases here and there that the author needs help fixing.

jQuery

jQuery is a JavaScript library designed to simplify the client-side scripting of HTML. jQuery is the most popular JavaScript library in use today. I have personally used it many times while working for clients who required intricate details for their web projects. It can be obtained at https://jquery.com/. It uses the MIT License. To contribute:
  1. Go to https://contribute.jquery.org/ 
  2. Through the side menu decide whether you want to contribute to:
    • Bug Triage
    • Code
    • Community
    • Documentation
    • Support
  3. You'll need to establish the need for a fix or feature.
  4. Discuss the ticket with the team
  5. Create a pull request
  6. Respond to code reviews and wait for it to be accepted.
When contributing, it's important to follow the style guide found here https://contribute.jquery.org/style-guide/ and to make sure unit tests are added. The community itself has many ways to support through forums or even Meetups. This is a quick and easy way to join an open source community.