Project 2 EECS 370

Starting from:

$29.99

Project 2 EECS 370

Project 2 EECS 370

Worth: 100 points

**NOTE** : No late days for 2z

1. Purpose

The purpose of this project is to help you understand the assembling and linking process, which we can utilize to create multi-file LC-2K projects, and to help you understand how procedure calls work in assembly language.

2. Problem

[[assembly files]] -assembler--> [[object files]] -linker--> [executable]

This project has three parts. In the first part, you will write a program
that assembles an assembly file into an object file. In the second, you will write a program to link object files into one executable consisting of machine code, which your project 1 simulator will be able to run. In the third, you will write an assembly-language function to recursively compute combination(n,r)(i.e., n choose r).

In Project 1a, you wrote an assembler which took an LC-2K assembly file as input and produced an executable file as output. This approach is fine if all the code needed is contained in one file, but what happens if we want to use other pieces of code? Libraries contain functions that make coding easier. Splitting code into multiple files encourages modularity and organization. Multiple files are also important for large projects; if you modify one file, you only need to re-assemble that one and then link everything together. Now that we have a better understanding of translation software, we can create a separate assembler and linker.

3. Assembler (30%)

Your new assembler will take in a single assembly file (see 3.1) as input and output a single object file (see 3.2).

3.1.1 Assembly File Format

Assembly language programs will be of the same format as those from Project 1, with a few extra restrictions.

The first part of the assembly file must contain only assembly instructions. The second part should contain only .fill assembler directives. For example, suppose an assembly file is composed of M instructions and N .fill’s. Lines 0 to (M-1) contain actual instructions, and lines M to (M+N-1) contain .fill's, with no mixing between them.

LC-2K files may now use global symbolic addresses, which means we must now distinguish between local and global labels. The scope of a local label is the file the label is defined in. The scope of a global label is all object files linked together (more on this in part 2l). Because of this, different object files can use local labels with the same name and still be linked together.

Local symbolic addresses must be defined at assembly time. However, a global symbolic address can be undefined at assembly time. It is assumed that undefined global labels are defined in another file to be linked at compile time, so they should be temporarily resolved as address 0 in the text and data segments. Defined symbolic addresses should be resolved exactly as they were in Project 1.

You can assume assembly files max out at 65536 total instructions and data, although we’ll test you on much, much less than that.

3.1.2 LC-2K Peculiarities Part 1

Firstly, local labels will start with a lowercase letter {a…z} while global labels start with a capital letter {A…Z}. This is unique to LC-2K as a way to distinguish between local and global labels.

Secondly, if a branch instruction contains a symbolic address, the label it refers to must be a locally defined label. This label can be either a local or global label. Branching to another file (undefined global label) is bad style and makes linking needlessly difficult. A programmer should use jalr in this case. Keep this in mind when considering what to add to the Symbol and Relocation Tables.

Thirdly, in LC-2K, loading or storing to an absolute address no longer makes sense. The locations of data and text within the final executable file will likely be different than in the original object file, leading to unintended execution. While this isn’t something we will enforce with error checking, it is recommended that labels are used when dealing with loads and stores. In reality, there are reasons to use absolute addressing: memory mapped IO for example (if you’re curious about this, take EECS 373 **shameless plug**). If you come across a label with a constant offset, just assemble as you would in Project 1.

Fourthly, local labels do not require symbol table entries. However, a local symbolic address does need a relocation table entry as the address of the local label might change. These addresses can be fixed by calculating the new local label location during linking.

3.1.3 Assembly File Format Summary

In summary, assembly file formatting rules are:
1.) Do not mix instructions with directives (.fills)
2.) Instructions come first
3.) Directives come second
4.) Defined symbolic addresses are resolved exactly as they were in
the Project 1 assembler
5.) Undefined global symbolic addresses are temporarily resolved as
address 0
6.) Local labels start with {a…z} and must be defined at assembly
7.) Global labels start with {A…Z} and can be undefined at assembly
8.) Branches cannot use undefined global symbolic addresses
9.) Loads and stores should use symbolic addresses (but are not
required to)

3.2 Object File Format

Object files will contain the following sections in the following order: Header, Text, Data, Symbol table, and Relocation table.

*** Refer to lecture for a detailed explanation of each section. ***

--------------------------------------------------------------------------
Table 1: Object file sections
--------------------------------------------------------------------------
Section Name Number of lines Description
--------------------------------------------------------------------------
Header Fixed: 1 The Header contains the size, in
lines, of the sections to follow.
Sizes are listed in the following order, each separated by a space: Text, Data, Symbol table, Relocation table.

Text Variable: t Each line in the Text segment
t = # of instr. consists of a single machine
code instruction, assembled in
the same way as instructions in
Project 1.

Data Variable: d The Data segment contains data
d = # of .fills stored by assembler directives,
one word of data per line.

Symbol table Variable: s Each line in the Symbol table
s = # of global consists of a global label, one
labels + # of letter (T/D/U) corresponding to
Unresolved global Text, Data, and Undefined
symbolic addresses respectively, and a line offset
from the start of the T/D section
(0 if the letter was ‘U’). Each
value separated by a space, in
that order. Each symbol should
only appear once in the symbol
table, even if it is used multiple
Times. Entries can appear in any order.

Relocation Variable: r Each line in the Relocation table
Table r = # of symbolic consists of a line offset from the
addresses used start of the T/D section
(whichever section the symbol was
used in), an opcode, and a label.
Each separated by a space, in
that order. Entries can appear in
any order.
--------------------------------------------------------------------------

IMPORTANT FORMATTING NOTES:

1.) Assembly code in text should be assembled EXACTLY as it was for
project 1. This means symbolic addresses are resolved the same, with the exception of undefined global symbolic addresses which are temporarily assembled as 0.
2.) Offsets in the Symbol and Relocation Tables indicate the
line offset of the label from the start of the either the Text or Data section (whichever the section the label appears in).

For example, the symbol table entry “negOne D 0” indicates the label negOne appears on the zeroth line in the Data section. The relocation table entry “4 lw negOne” indicates the symbolic address negOne is used on the fourth line (zero indexed) of the Text section by a lw instruction.

3.3 Error Checking

Your assembler should catch the following errors in assembly files:
1.) duplicate defined labels (same local or global label within one
assembly file)
2.) undefined local symbolic address
3.) beq using an undefined global symbolic address
4.) offsetFields that don't fit in 16 bits
5.) unrecognized opcodes

Your assembler should exit(1) if it detects an error and exit(0) if it finishes without detecting any errors. Your assembler should NOT catch simulation-time errors, i.e. errors that would occur at the time the assembly-language program executes (e.g. branching to address -1, infinite loops, etc.).

3.4 Assembly example

Please see section 9 example 2a.

3.5 Running Your Assembler

Write your program to take two command-line arguments. The first argument is the filename where the assembly-language program is stored, and the second argument is the filename where the output (the object file) is written. For example, with a program name of "assemble", an assembly-language program in "program.as", the following would generate an object file "program.obj":

./assemble program.as program.obj

Note that the format for running the command must use command-line arguments for the file names (rather than standard input and standard output). Your program should store only object files in the format specified above. Any deviation from this format (e.g. extra spaces or empty lines) will render your machine-code file ungradable. Any other output that you want the program to generate (e.g. debugging output) can be printed to standard output.

3.6 Test Cases

The test cases for the assembler part of this project will be short assembly-language programs that serve as input to an assembler. You will
submit your suite of test cases together with your assembler, and we will grade your test suite according to how thoroughly it exercises an assembler. Each test case may be at most 50 lines long, and your test suite may contain up to 20 test cases. These limits are much larger than needed for full credit (the solution test suite is composed of 5 test cases, each < 10 lines long). See section 7 for how your test suite will be graded.

Hints: the example assembly-language program above is a good case to include in your test suite, though you'll need to write more test cases to get full credit. Remember to create some test cases that test the ability of an assembler to check for the errors in section 3.3.

4. Linker (40%)

Now that you’ve written an assembler to create object files, you need a way to link these files together. In this part of the project, you will write a linker to combine multiple object files into a single executable.
This final executable will be backwards compatible with the simulator from project 1.

4.1 LC-2K linker description

Your linker should be able to take an arbitrary number of object files as input. It will concatenate all text and data segments within each object file, creating one unified executable. Segments should be combined in the order they appear as arguments. The combined text section should be placed before the combined data section. Then, for each object file, the linker iterates through their relocation table. For each relocation entry, the linker iterates through all symbol table entries to locate the label and fix the reference. The final executable will be a machine code file.

4.2 What about main()?

You might be asking yourself, what will be executed first? Shouldn’t there be a main() function or label?

To simplify the process of linking and simulating, LC-2K code is executed starting at the first line of in a machine code file. In order to specify what object file should execute first, ordering of the linker’s arguments is needed.

./linker file_0.obj file_1.obj machine_code.mc

In the above example, file_0.obj contains the first instructions to be executed. The resulting machine code file should be laid out as follows:

-------- machine_code.mc --------

<file_0.obj TEXT>
<file_1.obj TEXT>
<file_0.obj DATA>
<file_1.obj DATA>

For more information on the linker’s command line arguments, please see section 4.7. For more information on how linker’s actually handle this, see section 4.4.

4.3 Stack Label

As discussed in lecture, programs build up stack frames as they execute. The stack is important for storing data that can’t fit within a machine’s registers, such as stack frames and local data. You’ll implement a recursive LC-2K program in part 2c that will make use of the stack, but we first must provide support for this functionality. As shown in example 2c, this is done by using a global label, “Stack”.

In LC-2K, the “Stack” label is a special label that should not be defined by any object file, but it can be used as a symbolic address. This label is inserted by the linker and should refer to the line after the last piece of data in the data segment. For example, if there are M instructions and N pieces of data in the final executable, the linker should resolve the symbolic address, “Stack”, as (M + N). This allows the stack to grow without affecting the instructions or data.

4.4 LC-2K Peculiarities Part 2

Programming languages often specify where to begin executing. In reality, a linker typically inserts an object file into the linking process. This inserted code appears first and jumps to a specified function to begin executing the program, among doing other things. The LC-2K method of ordering files during the linking process to indicate what to execute first is a simplification.

LC-2K also lacks a proper function call instruction that jumps to labels. It instead jumps to registers that hold function addresses (so the register is a function pointer). This means that a function can have a local label, yet still be accessible from other files, so long as the function pointer is global. Linking should still succeed in this case.

Additionally, LC-2K’s use of the “Stack” label doesn’t reflect how all assembly languages use the stack. ARM, for example, has special instructions such as push and pop that directly interface with the stack, providing a layer of abstraction to assembly programmers. The stack is typically allocated by an operating system that passes the stack pointer to an executing program.

4.5 Error Checking

Your linker should catch the following errors:
1.) duplicate defined global labels
2.) undefined global labels
3.) “Stack” label defined by an object file

Your linker can assume that any object file used as input is properly formatted.

4.6 Tip - Local Labels

Fixing local symbolic addresses during linking can be tricky, since we don’t have symbol table entries associated with them. It might help to store certain data for each file read in: text size, data size, text starting location (in final mc), and data starting location (in final mc). By also storing which file each relocation table entry is in, you should have all the data needed to adjust each local symbolic address.

Actually fixing a local symbolic address in the relocation involves several steps. First, identify which section of the file the label is in, either text or data. Second, parse the original symbolic address value from the instruction referenced by the relocation entry. Fix this value by adding an offset to the address, to account for the new location of the local label.

4.7 Linker Example (Please see section 9 example 2l.)

4.8 Running Your Linker

Write your program to take N command-line arguments, where N >= 2. The first argument is the object file to execute first, arguments 2 through N-1 are additional object files (these are not required), and the Nth argument is the filename where the machine code output is written. For example, with a program name of "linker" and an assembly-language program in "prog_1.obj" and “prog_2.obj”, the machine code file "prog.mc" will be generated:

./assemble prog_1.obj prog_2.obj prog.mc

The number of object files your linker must be able to link together is between 1 and 6. If a program is self-contained within one object file, your linker should still be able to translate it into a machine code file. We will not test you on linking more than 6 object files.

Note that the format for running the linker must use command-line arguments for file names (rather than standard input and standard output). Your program should store only machine-code in the format specified above. Any deviation from this format (e.g. extra spaces or empty lines) will render your machine-code file ungradable. Any other output that you want the program to generate (e.g. debugging output) can be printed to standard output.

4.9 Test Cases

Test cases for the linker part of this project will be short, valid assembly-language programs that, after being assembled into object files, serve as input to a linker. You will submit a suite of test cases together with your linker, and we will grade your test suite according to how thoroughly it exercises an LC-2K linker. Each test assembly file may be at most 50 lines long, and your test suite may contain up to 20 test cases. A test can contain no more than 6 assembly files to be linked together. These limits are much larger than needed for full credit. See Section 7 for how your test suite will be graded.

A naming scheme is needed to specify what test assembly files should be linked together. A single “test” refers to a group of 1 or more assembly files to be linked together. The naming scheme is as follows.

<test name>_<{0, …, N}>.as

All tests with the same test name will be assembled and linked together (do not include angled brackets in the test name). An underscore character, ‘_’, separates the test name and the assembly file’s number (do not include angled brackets or curly brackets in the number). Assembly files within the same test should be numbered starting at zero, with the zeroth assembly file being the first code to be executed.

The following testcases:

test_0.as test_1.as test_2.as anotherTest_0.as anotherTest_1.as

Will be assembled and then linked by the autograder as follows:

./linker test_0.obj test_1.obj test_2.obj test.mc
./linker anotherTest_0.obj anotherTest_1.obj anotherTest.mc

DO NOT use more than one underscore in your test case names. We will not grade your test case if you do.

Remember to create some test cases that test the ability of a linker to check for the errors in Section 4.5.

5. Assembly-Language Function to Compute Combination(n,r) (30%)

Write an assembly-language function to recursively compute combination(n,r), defined as:

combination(n,0) = 1
combination(n,n) = 1
combination(n,r) = combination(n-1,r) + combination(n-1,r-1);

0 <= r <= n

Here's a C function to do this; your assembly-language program may follow this logic closely.

int combination(int n, int r)
{
if (r==0 || n==r) {
return 1;
} else {
return combination(n-1,r) + combination(n-1,r-1);
}
}

Note that the definition of combination(n,r) is recursive. Your assembly-language function must also use a recursive function call. That is, your function should call itself twice (to compute combination(n-1,r) and combination(n-1,r-1)) and add the results together. This isn't the most efficient solution (e.g., n > 9 will take a LONG time), but using recursion will help you understand assembly-language procedure calls and stacks. Your program will be passed inputs n and r in registers 1 and 2 respectively. Your program should store the result in register 3, which acts as the function’s return value. Straightforward solutions are possible in about 45 lines. If your program is significantly longer than that, you are probably doing something wrong. Your program must be reasonably efficient, i.e., it must execute at most 5000 instructions for n=7, r=3 (this is several times slower than a good solution).

Passing parameters to subroutines can be confusing in assembly language. It's easiest if you make most registers callee-save so the caller can assume the subroutine leaves those registers unchanged. These callee-saved registers are typically pushed on the stack at the beginning of the subroutine and popped off the stack just before the subroutine returns.

Not all registers can or should be callee-save, however. For example, one
register will contain the subroutine's return value. Also, the jalr instruction used to return from the subroutine will leave a register with a different value than it held when the subroutine was called.

Here is a small LC-2K program that uses a subroutine call. It takes
an argument in register 1 and calls a subroutine to compute the quantity 4*input. Register 1 is used to pass input to the subroutine; register 3 is used by the subroutine to pass the result back. The current top-of-stack (first empty location) is given by “Stack” + register 5.

-------- main.as ----------

lw 0 1 input r1 = memory[input]
lw 0 4 SubAdr prepare to call sub4n. r4=addr(sub4n)
jalr 4 7 call sub4n; r7=return address; r3=answer
halt
input .fill 10

-------- sub4n.as ---------

sub4n lw 0 6 pos1 r6 = 1
sw 5 7 Stack save return address on stack
add 5 6 5 increment stack pointer
sw 5 1 Stack save input on stack
add 5 6 5 increment stack pointer
add 1 1 1 compute 2*input
add 1 1 3 compute 4*input into return value
lw 0 6 neg1 r6 = -1
add 5 6 5 decrement stack pointer
lw 5 1 Stack recover original input
add 5 6 5 decrement stack pointer
lw 5 7 Stack recover original return address
jalr 7 4 return. r4 is not restored.
pos1 .fill 1
neg1 .fill -1
SubAdr .fill sub4n contains the address of sub4n

The stack array starts at the implicit label "Stack" and extends to larger addresses, which is why the linker inserts the “Stack” label as the last line in the final executable (see section 4.3 for more).

The combination function you submit must be an assembly file. Your function will be called by the following main function:

-------- combination_main.as ---------

lw 0 1 n
lw 0 2 r
lw 0 4 Caddr load combination function address
jalr 4 7 call function
halt
n .fill 7
r .fill 3

Your combination function must follow these guidelines:
1.) Accept n as an argument passed in register 1
2.) Accept r as an argument passed in register 2
3.) Store the starting line of your combination function by
including the following line in data:
Caddr .fill <start_of_function>
<start_of_function> is a label at the start of your function
4.) Use recursion by building a stack frame for each function called
5.) Return the result in register 3
6.) Use register 7 as the return address
7.) DO NOT use a global register
8.) DO NOT use a global register
seriously, we’ll detect it

A global register is one that all recursive calls to a function use. For example, one could simply increment reg3 at every leaf function. While this would compute a correct result, it does not accurately simulate how a recursive function works.

Make sure your assembly-language function works when assembled and linked with combination_main.as. Use the Project 2 assembler and linker and the Project 1 simulator, since that's how we'll test it. Programs that don't work with the solution assembler/linker/simulator will receive a 0.

5.1 Tips

One difficult aspect of this program will be allocating variables
in the 8 LC-2K registers. To make your job easier, here are the
register assignments we used--some are enforced, others are not:

r0 value 0
r1 n input to function - ENFORCED
r2 r input to function - ENFORCED
r3 return value of function - ENFORCED
r4 local variable for function
r5 stack pointer
r6 temporary value (can hold different values at different times,
e.g., +1, -1, function address)
r7 return address - ENFORCED

5.2. Competition

There is an optional competition this semester for designing an efficient solution to the combination function. The fastest average across numerous inputs will win a prize. The only rule is as follows:

* You cannot return pre-computed values *

Anyone who can beat a cycle count of *less than 110,000* for C(14,7) will receive 5 bonus points to the project. Please note the competition will test more than just C(14,7), this bar is just to encourage you to attempt this efficient solution.

Submissions for the competition will be accepted one week after the submission deadline for the other portions of the project. Late days may NOT be used to submit competition functions -- we will only accept submissions that come before the deadline listed at the beginning of this document.

Please submit to 2z instead of 2c or 2s.

6. Compiling the Project

Your code will be compiled with the GCC compiler using the C99 standard. The following bash command compiles program.c and writes the executable into program.

gcc -std=c99 program.c -o program

7. Grading, Auto-Grading, and Formatting

We will grade primarily on functionality, including error handling, correct assembly, and comprehensiveness of the test suites.

To help you validate your project, your submission will be graded automatically, and the result will be mailed back to you. You may then continue to work on the project and re-submit. To deter you from using the autograder as a debugger, you will receive feedback from the autograder only for the first THREE SUBMISSIONS for each project part on any given day. That is, you will receive an e-mail with your score for any part of the project only three times on any given day. All subsequent submissions will be silently graded. Your final score will be derived from your overall best submission to the autograder.

The feedback from the autograder will not be very illuminating; it won't tell you where your problem is or give you the test programs. The purpose of the autograder is to let you know that you should keep working on your project (rather than thinking it's perfect and ending up with a 0). The best way to debug your program is to generate your own test cases, figure out the correct answers, and compare your program's output to the correct answer. This is also one of the best ways to learn the concepts in the project.

The student suites of test cases will be graded according to how thoroughly they test both the assembler (for part 2a) and linker (for part 2l). We will judge thoroughness of the test suites by how well it exposes potential bugs.

For the assembler test suite, the auto-grader will use each test case as input to a set of buggy assemblers. A test case exposes a buggy assembler by causing it to generate a different answer from a correct assembler.

For the linker test suite, the auto-grader will first assemble the test files and use them as input to a set of buggy linkers. A test case exposes a buggy linker by causing it to generate a different answer from a correct linker. Test cases must use the naming scheme specified in section 4.9.

The test suites are graded based on how many of the buggy assemblers / linkers were exposed by at least one test case. This is known as "mutation testing" in the research literature on automated testing.

8. Turning in the Project

Use the submit370 program to submit your files. The full name is /afs/umich.edu/class/eecs370/bin/submit370, but you could add /afs/umich.edu/class/eecs370/bin/ to your path so you don't have to keep typing in the full name. submit370 submits the set of files associated with a project part (assembler, ...), and is called as follows:

submit370 <project-part> <file1> <file2> <file3> ... <filen>

Here are the files you should submit for each project part:
1) assembler (part 2a)
a. C program for your assembler (name should end in ".c")
b. suite of test cases (each test case is an assembly-language
program in a separate file)

example:
submit370 2a assemble.c test1.as test2.as test3.as

2) linker (part 2l)
a. C program for your linker (name should end in “.c”)
b. Suite of test cases (each test case is a set of assembly-language
programs using the naming scheme specified in section 4.9.

example:
submit370 2l linker.c test0_0.as test1_0.as test1_1.as

3) combination (part 2c)
a. assembly-language function for computing combination(n,r)

example:
submit370 2c comb.as

9. Sample Test Cases

** Example 2a **
Here is a multi-file assembly-language program that counts down from 5, stopping when it hits 0, and then halts:

-------- main.as ----------

lw 0 1 five
lw 0 4 SubAdr
start jalr 4 7
beq 0 1 done
beq 0 0 start
done halt
five .fill 5

-------- subone.as --------

subOne lw 0 2 neg1
add 1 2 1
jalr 7 6
neg1 .fill -1
SubAdr .fill subOne

And here are the corresponding object files after running the following lines of code:

./assembler main.as main.obj
./assembler subone.as subone.obj

NOTE: Text within parenthesis should not be included in your assembler’s or linker’s output.

-------- main.obj ---------

6 1 1 2 (Header)
8454150 (Text)
8650752
23527424
16842753
16842749
25165824
5 (Data)
SubAdr U 0 (Symbol Table)
0 lw five (Relocation Table)
1 lw SubAdr

-------- subone.obj -------

3 2 1 2 (Header)
8519683 (Text)
655361
25034752
-1 (Data)
0
SubAdr D 1 (Symbol Table)
0 lw neg1 (Relocation Table)
1 .fill subOne

** Example 2l **
This example uses the object files from example 2a. Here is the machine code produced after the linking process:

./linker main.obj subone.obj count5.mc

-------- count5.mc -------

8454153 (main.as TEXT)
8650763
23527424
16842753
16842749
25165824
8519690 (subone.as TEXT)
655361
25034752
5 (main.as DATA)
-1 (subone.as DATA)
6

This code can be simulated using your project 1 simulator.

10. Linker Starter code

The following starter code is meant to help you read in and parse object files. It is probably a good idea to break it up into different functions, but this should get you started.

NOTE: Please see starter file.