Assertions

Should I use assert to validate parameters?

assert is meant for debugging, as explained on the official site.

Typically, programs can run in two modes:

  • Debug mode: extra checks are performed, making the application run slower but making it easier to find bugs.
  • Release mode: the code is optimized, making it run faster, but if something goes wrong, it’s harder to know why.

An assertion only runs in debug mode. Therefore, your code should never depend on assert for important things, such as parameter validation.

When should I use assert?

assert is often used in tests: these run in debug mode (it wouldn’t make any sense to run them in release mode) so we are certain that assertions will be checked.

You can also use assert for sanity checks. Say you are writing a sort function. Sorting is a nontrivial algorithm and things can easily go wrong. It can therefore be helpful to have the sort function check its own results:

def sort(lst):
    # Algorithm that sorts lst and stores it in result

    # Check result
    assert is_sorted(result)
    assert contain_same_elements(lst, result)

    # Return result
    return result

This way, every time you call sort, the code performs a “self check”. If your code fails to sort lst correctly, you’ll immediately get a big AssertionError thrown at you. This is A Good Thing: Fail-Fast truly is your friend.

While these self-checks can slow down your program considerably, remember that you can turn off assertions by running your program in release mode.

Assertions work well because it turns out that checking a solution is often much simpler than finding a solution. For example, we can implement the two checks as

def is_sorted(xs):
    return all(x <= y for x, y in zip(xs, xs[1:]))

def contain_same_elements(xs, ys):
    return Counter(xs) == Counter(ys)

Classes

How should I implement __eq__?

__eq__ is the dunder method that corresponds to the lhs == rhs operator. It is a binary operator, i.e., it takes two operands.

Since __eq__ is always defined in class, say C, this means you always know something about the type of the left operand: it is an object of type C, or of a child class of C.

The right operand is trickier though: it can be anything. The first thing you’ll probably want to do is to determine its type. When you’re implementing a class C, you should have an idea which other object a C object could be equal to. Generally, you’ll only want to compare your C object with other C objects. For example, a Person object can only be equal to other Person objects.

The first thing you’ll do then is to check the type of the right operand:

class C:
    def __eq__(self, rhs):
        if isinstance(rhs, C):
            # We know rhs has type C, we can use this information to compare self with rhs
        else:
            ???

Your __eq__ could also be able to check equality with other types of objects, so feel free to go through a list of other types:

class C:
    def __eq__(self, rhs):
        if isinstance(rhs, C):
            # ...
        elif isinstance(rhs, str):
            # ...
        elif isinstance(rhs, list):
            # ...
        else:
            ???

Don’t overdo this, however. The meaning of your __eq__ should be intuitive and meaningful. You should definitely not try to make it overly flexible by allowing comparison with all kinds of type. Sometimes being strict and rigid is the best way to go.

What to do if rhs has a type you don’t support? For example, it makes little sense to compare a Person with a list. You could simply return False. There is a better solution though, and that is to return NotImplemented.

Let’s do a little experiment:

class Foo:
    def __eq__(self, rhs):
        if isinstance(rhs, Foo):
            return True
        else:
            return NotImplemented

>>> foo = Foo()
>>> foo == 5
False

This should surprise you: we compare a Foo object with 5, for which your __eq__ method returns NotImplemented, not False. Why does foo == 5 not evaluate to NotImplemented?

As explained on the official Python pages, when you evaluate x == y, Python will cann x.__eq__(y). If this returns NotImplemented, Python will instead try out y.__eq__(x). If this again returns NotImplemented, the x == y will evaluate to False.

The documentation is not completely accurate: it claims that if both x.__eq__(y) and y.__eq__(x) return NotImplemented, an exception will be raised. This is not the case, as is pointed out by this discussion. It is true however for other binary operators.

Why does Python operate like this? Why would y.__eq__(x) yield a different result? Wouldn’t that simply be inconsistent?

Consider the following code:

class Foo:
    def __eq__(self, rhs):
        if isinstance(rhs, Foo):
            return True
        else:
            return NotImplemented


class Bar:
    def __eq__(self, rhs):
        if isinstance(rhs, Bar):
            return True
        if isinstance(rhs, Foo):
            return True
        return False

In this case, Foo() == Bar() would return True. But if this is what we want, why doesn’t Foo.__eq__ simply return True instead of NotImplemented when comparing to a Bar?

The Foo and Bar classes are not necessarily defined at the same time. At the time someone wrote Foo, Bar may not have existed, so there was no reason for it to add code for it in Foo.__eq__. Maybe only much later, Bar was added and it was decided that Foos and Bars should be the same.

Maybe you’re wondering if it wouldn’t be better to simply update Foo.__eq__ when Bar was added. This would indeed be a cleaner solution and we wouldn’t need this NotImplemented trickery, but updating Foo might not be an option. Maybe it’s part of a library, maybe the company doesn’t like modifying well-tested code, etc.

Say you develop a Fraction class. The fraction 2/3 would be written Fraction(2, 3) in Python code. You would probably want to allow Fractions to be compared to ints and floats. For example, you would like 1 to be considered equal to Fraction(2, 2).

Having Fraction(2, 2) == 1 is easy to achieve, as this calls Fraction.__eq__, which is under your control. However, for 1 == Fraction(2, 2) to be True, you’d have to somehow be able to update int.__eq__, but that’s not possible. Thanks to NotImplemented however, this is not necessary: 1.__eq__(Fraction(1, 1)) will return NotImplemented, causing Fraction(1, 1).__eq__(1) to be evaluated next, which can return True.

Should I check parameter types using isinstance?

Short answer: no.

Now for the long answer… Say we want to write our own sum function. (Yes, we know it already exists, but it makes for a good example.) A possible definition would be

def sum(lst):
    result = lst[0]
    for elt in lst:
        result += elt
    return elt

Now, of course, it makes no sense to call sum on a string, so we can try to impose limitations on lst’s type. We insist lst must a a list:

def sum(lst):
    if not isinstance(lst, list):
        raise TypeError('lst must be a list')
    result = 0
    for elt in lst:
        result += elt
    return elt

Okay, this prevents us from passing sum strings, Persons or other weird things as argument. However, we can still pass a list of, say, Persons… Those are not particularly summable. Maybe we should check the elements’ type too.

def sum(lst):
    if not isinstance(lst, list):
        raise TypeError('lst must be a list of ints')
    if not all(isinstance(elt, int) for elt in lst):
        raise TypeError('lst must be a list of ints')
    result = 0
    for elt in lst:
        result += elt
    return elt

There. Now sum should be fool proof. Only lists of ints.

But what if we have a tuple of ints? Should we really rewrite a separate sum function to deal with tuples? It would be exactly the same code!

Okay, let’s add some flexibility to our existing sum:

def sum(lst):
    if not isinstance(lst, list) and not isinstance(lst, tuple):
        raise TypeError('lst must be a list or tuple of ints')
    if not all(isinstance(elt, int) for elt in lst):
        raise TypeError('lst must be a list or tuple of ints')
    result = 0
    for elt in lst:
        result += elt
    return elt

So, it can be a list or a tuple of ints. But what about sets? Okay, here we go again…

def sum(lst):
    if not isinstance(lst, list) and not isinstance(lst, tuple) and not isinstance(lst, set):
        raise TypeError('lst must be a list or tuple or set of ints')
    if not all(isinstance(elt, int) for elt in lst):
        raise TypeError('lst must be a list or tuple or set of ints')
    result = 0
    for elt in lst:
        result += elt
    return elt

Surely we’re done now. Well, not really. What about a list of floats?

def sum(lst):
    if not isinstance(lst, list) and not isinstance(lst, tuple) and not isinstance(lst, set):
        raise TypeError('lst must be a list or tuple or set of ints or floats')
    if not all(isinstance(elt, int) or isinstance(elt, float) for elt in lst):
        raise TypeError('lst must be a list or tuple or set of ints or floats')
    result = 0
    for elt in lst:
        result += elt
    return elt

But then there’s also Fractions that are addable. And complexs. And vectors. And matrices. And quaternions!

This is becoming ridiculous. We need something… saner.

Right now we are checking that our arguments have specific types, but this is actually the wrong approach. We want to focus on the operations available on the arguments instead of their types.

For example, we need to be able to iterate over lst using a for loop. We don’t care if it’s a list, or a set, or a tuple, or anything else: we just want it to be iterable. Something can be looped over if it has a __iter__ method, so that’s what we need to look for.

def sum(lst):
    if not hasattr(lst, '__iter__'):
        raise TypeError()
    if not all(isinstance(elt, int) or isinstance(elt, float) for elt in lst):
        raise TypeError()
    result = 0
    for elt in lst:
        result += elt
    return elt

The elements of lst should be addable, so they need a __add__ method.

def sum(lst):
    if not hasattr(lst, '__iter__'):
        raise TypeError()
    if not all(hasattr(elt, '__add__') for elt in lst):
        raise TypeError()
    result = 0
    for elt in lst:
        result += elt
    return elt

But what if lst contains a mix of matrices and ints? Both have an __add__ method, but it will not be happy with its argument types: matrices only want to be added to matrices, not with ints.

As you can see, it becomes quite complex. We need many checks, which makes the code both unwieldy and inefficient. Added to this, if we didn’t check, we’d still be receiving an error message down the road. For example, using a for loop on lst will call __iter__, so the check does happen, just a little bit later. It’s not perfect (e.g., it’s not fail-fast), but it’s something.

There are actually programming language that support static type checking. These languages do check all types, and they do so at zero efficiency cost. However, Python is a dynamically typed language. It’s less robust, but more flexible. It’s a choice.

There is actually a way to add static type checking to Python. It’s not perfect and sometimes clumsy, but it does help.

How should I build strings?

There are multiple ways to construct a string.

  • Using addition: "Greetings, " + name. Avoid it: it is the most unreadable and inefficient approach.
  • Using the % operator: "Greetings, %s" % name. Quirky and limited. Best to avoid.
  • Using str.format: "Greetings, {}".format(name).
  • String interpolation: f”Greetings {name}”. This is the preferred solution. See also PEP 498.

What’s the difference between __str__ and __repr__?

Both are methods that are meant to convert an object into a string. You should never call these methods directly, but instead use str and repr:

print(str(some_object))    # Internally class some_object.__str__
print(repr(some_object))   # Internally class some_object.__repr__
  • __str__ should return a human-friendly, readable representation of the object.
  • __repr__ should return a string that is actually Python code which allows you to recreate the object.

For example,

class Book:
    def __init__(self, title, author):
        self.title = title
        self.author = author

    def __str__(self):
        return f'{self.title} written by {self.author}'

    def __repr__(self):
        # {self.title!r} is shorthand for {repr(self.title)}
        return f"Book(title={self.title!r}, author={self.author!r})"

Git

Why do you inflict Git on us? What did we do to deserve this?

You deserve this because you chose to study in IT…

Admittedly, Git can be difficult to work with at first, but we assure you, once you understand how it works, it holds few surprises and using it will become second nature.

Git is by far the most used VCS (Version Control System). Git was developed by Linus Torvalds. He wanted a decent VCS for the development of Linux but couldn’t find one, so he decided to write his own. Since then, Git has made its way in the industry: Google, Microsoft, Amazon, Twitter, Netflix, … all use it.

How do I get better at Git?

Chapter 2 of the freely available book Pro Git will help out a lot. Chapter 3 is also very useful.

On B, create a new directory and clone your repository there.

# Replace URL by your GitHub URL
$ git clone URL

In order to be able to receive updates on machine B, you might want to also add a link to the lecturer’s repo:

$ git remote add upstream https://github.com/UCLL-PR2/exercises.git

Say you work on machine A. You need to store your changes in the repository. This is done as explained on the workflow page:

# Store changes in local repository on A
$ git add FILES
$ git commit -m "MESSAGE"

# Upload changes to GitHub
$ git push

You can then download your changes on machine B:

# Downloads changes from GitHub
$ git pull

So, in short, you push your changes on one machine and pull them onto the other machine.