What I learned in Boating School: Fall 18 School Recap

I find what I learn in school very exciting and I want to share it with the world. I will be maintaining this series to recap on my favorite topics at the end of each semester.

Two of my favorite and most challenging classes this fall were Computer Architecture and Programming Language concepts and I'm excited to talk about some of the techniques I learned this semester.

Programming Language Concepts

In this class we started with the differences and similarities behind programming language grammars. We also went over the history of early programming langauges such as FORTRAN and COBOL as well as the creation of C.

We learned how Just-In Time compiling works in comparison to machine compiling and the differences in how popular programming languages function.

Our assignments guided us through the creation of a simple compiler for a scripting language that supported integers, strings, if statements, logic operations and arithmetic operations. This included not only the parsing of code into lexemes (lexical analysis) but also involved syntax analysis and constructing a parsetree of the code within the confines of an EBNF Grammar.

In order to complete the compiler I exercised a few new features of C++.

Derivation of a Programming Language

Derivation refers to the repeated application of a defined set of rules called a grammar. A Context-Free Grammar is used to define the proper syntactic structure of a program. A grammar breaks down sentences into tokens or lexemes and also defines the proper order of operations within a statement. Using the grammar, we can define how to correctly parse a programs source code.

When following a grammar you iterate one token at a time and compare it with the previous token to determine if it is a valid sentence.

At some points, you may need to compare one token ahead in order to correctly interpret a statement, one example is when assigning a variable.

For example, when you see a = you will have to peek at the next token or right-hand side in order to determine the value assigned to that operator.

Example Derivation

For the following grammar:

<E> -> <T> { PLUS <T> }
  <T> -> <P> { * <P> }
  <P> -> VAR | NUM

Creates the following derivation for: 8 + a * 4

<E>
  <T> { PLUS <T>}
  <P> { STAR <P> } { PLUS <T> }
  NUM { STAR <P> } { PLUS <T> }
  8 { STAR <P> } { PLUS <T> }
  8 { PLUS <T> }
  8 PLUS <T> { PLUS <T> }
  8 + <T> { PLUS <T> }
  8 + <P> { STAR <P> } { PLUS <T> }
  8 + VAR { STAR <P> } { PLUS <T> }
  8 + a { STAR <P> } { PLUS <T> }
  8 + a STAR <P> { STAR <P> } { PLUS <T> }
  8 + a * <P> { STAR <P> } { PLUS <T> }
  8 + a * NUM { STAR <P> } { PLUS <T> }
  8 + a * 4 { STAR <P> } { PLUS <T> } //{} are optional and can be tossed
  8 + a * 4

Extending Classes by Overloading Operators

When writing classes you might want to use one of the standard operators when working with them. The most relevant example is with c++'s std::string class enabling you to concatenate two strings by adding them.

You can implement the operator override in the class for the object you wish to use with it.

class Score {
    ...
public:
    Score operator+(const Score& other) {
        return this.getScore() + other->getScore();
    }
    Score operator++(const int other) {
        return this.getScore() + other;
    }
};

In this example I have a Score object which I would like to increment with the ++ operator. The ++ operator by default only works with integers but you may be familiar with how it is overloaded in std::map and std::vector so that you can add an element with myMap[value]++.

You instantiate an operator overload by calling the operator function. It takes one parameter which will refer to the right-hand side of the operator.

Operator Overloads are a nice way of providing syntactic sugar however for some solutions it may be necessary. Such as when building the parse tree and having it handle arithmetic:

class Value {
    bool    bval;
    int     ival;
    string  sval;
    enum VT { isBool, isInt, isString, isTypeError } type;
    //enum somewhere else, just return type

public:
    Value() : bval(false), ival(0), type(isTypeError) {}
    Value(bool bval) : bval(bval), ival(0), type(isBool) {}
    Value(int ival) : bval(false), ival(ival), type(isInt) {}
    Value(string sval) : bval(false), ival(0),
    	sval(sval), type(isString) {}

    // in case of an error, I use the value to hold error message
    Value(string sval, bool isError) : bval(false),
    	ival(0), sval(sval), type(isTypeError) {}

...
getters and setters
...

And the code for overloading + towards the bottom of the Value class.

Value operator+(const Value& v) { //compare the l/rhs
    //add two ints with the standard +
    if(this->isIntType() && v.isIntType()) {
      int result = this->getInteger() + v.getInteger();
      return result;
    }
    //adding two strings to concatenate them
    if(this->isStringType() && v.isStringType()) {
        string result = this->getString() + v.getString();
        return result;
    } else {
        RunTimeError("Cannot add these two values");
        return Value();
    }   
}

This is a snippet of the value class which was used as a container for each element in the parsetree. This value has to implement it's own arithmetic because + doesn't know how to add two Value objects, nor does it know how to add an integer and a Value object.

In the operator+ overload we can take the Value object as a parameter, use it's member functions to get the integer value from both the left and right hand sides of the +, add them using a normal + and then return them back as the result of the overloaded +.

Restrictions

You can overload almost any c++ operator. You can even overload bitshift operators such as <<. However, we cannot overload the following:

:: (scope resolution)
. (member access)
.* (member access through pointer)
?: (ternary conditional)

Additionally, if you wish to overload the pointer reference overload (->) you will have to return to it a raw pointer address or an entire object.

Sadly we cannot create new operators. For example you cannot make =^= an operator as cool as it looks.

Conclusion

The EBNF notation defines not only the syntactical rules of a programming language, it also dictates the order or operations within a languages statement. Programming language grammars make it easier for us to identify and eliminate ambiguity in an language's rules.

Operator overloads expands how we can define syntax for member functions in a class. This helps us make Object Oriented code much more readable because we can use small statements instead of entire functions.

Understanding how programming languages are interpreted by the compiler helps to understand their syntax and nuances much faster, and with more depth, than just memorizing the apis through practice.