Friday, February 15, 2008

Finally Here

When I first got to Southern I had my heart set on taking Compiler Construction. So I set out on a mission to go full boar taking as many CS classes as possible to do all I can to make sure I've got the classes I need, and the knowledge I need to be prepared to take it when it gets offered again. So this semester, my opportunity showed up.

We had to do lots of work last semester to get it offered. We don't have a large department, so we had an option of having AI or Computer Construction offered this semester. At first, it seemed as though AI was going to be offered, but I was going to do everything I could do to swing the vote the other way. So a fellow classmate and I started campaigning for it. We spoke to as many students as possible to get the vote up. All we needed were 6 students for the class, and it would be enough to swing it. Well, we got our six. After a month of class, only 6 remain...well, there are still 6, but one guy only comes to class like every two weeks....I doubt he'll last much longer.

So far we've had got two pieces built. You can monitor my progress here. The goal of the class is to compile Decaf, a subset of Java to MIPS. But MIPS isn't very useful, so I thought of compiling to some type of intermediate language. The two options that come to mind are the .NET IL and Java bytecode. I decided I was interested in doing it in Java bytecode. So my prof told me to go get Inside the Java VM at the library. After reading it for a few hours, I realized that a Java class file isn't that complicated to understand. So last weekend I wrote a data structure to store the contents of a Java class file (class.[ch]). Next, I have to implement a few more methods to work with it. Then I need to figure out how the instruction set works. My prof tells me it's a stack based language, so I'll have to check that out some more. I've never worked with one of those before. Should be fun.

In any case, we've got a symbol table and a lexer build already for class. My symbol table, I cheated and just wrote a glib GHashTable wrapper. So that ran fast and reliably. Some are still working on theirs. My lexer on the other hand...that was a pain. Lex is annoying, and quite unpredictable from time to time. Take for instance the following lines from the Rules section of my lex file.



. {BEGIN ERROR1;unput(*yylext);}

If you want to completely understand what is going on, I suggest you find a lex tutorial, but the basic gist is simple. The first line is in the state. If it matchs a character that has been unmatched by all the other regexes from above, then one character is matched, we go into the state and put the character matched back in the buffer. THen the second line will match any single character followed by as many non-operator/whitespace characters to follow. the following is a sample matching

ie: bob?julie;

bob identifier
?julie error

THe use of states would seems to imply that the order of thsoe rules doesn't matter, hence why we use states, but if I switched those two rules, it wouldn't work properly. I think it's because the first line doesn't explicitly state the state, it assumes it...who knows. My lex and yacc book told me that if no state is specified, the INITIAL state is assumed. Who knows. I hate programming by coincidence, but for now, I think I have to deal with it. Cause I don't have time to figure this out.

Anyhow, I hope to be able to post here from time to time to give updates on my progress.