Primitive types are not your friends

Really. Stop (ab)using just strings and integers!

Yes, they're the building blocks of whatever you do. But if you abuse them, you're not taking full advantage of your Object Oriented programming language.

Let me explain with some concrete example. For the sake of conciseness, I'll use Python here, but the concepts fit most OO languages. Classes are your friends.

Really. Classes and type systems weren't invented to make you feel miserable and add bloating to your code. They were invented to help you to write correct code, letting the machine verify you're doing things properly and consistently.

But if you don't use them, the machine can't help you. And no, dynamically (yet strongly typed) languages aren't a good reason to skip correctness and proper OO design.

Let's start with some examples:

def retrieve_web_resource(http_url):  
    #[...]
    requests.get(...)

What is that http_url ? In many contexts, it will probably be a string. But this is not a best practice. Primitive types like strings carry very little semantics and probably do not enforce domain boundaries. What does that mean?

It means that somewhere in your code you have something like

    my\_http\_url = somewhere.create_url_from()

(with my_http_url still being of type str) and then you carry it around as a str. Maybe, at some point, you manipulate it. And, possibly because of some error or because some unexpected data was thrown in, at some point it ceases being a valid http url.

This might mean that, in many parts of your code, whenever you use such value, you may be tempted to check some preconditions:

def retrieve_web_resource(http_url):  
    assert http_url.starswith("http"), "not an http url"

But then, you need to check those preconditions everywhere!

And, if you happen to pass your variable around, and at some point you pick a less-than-meaningful variable name, it may be hard to understand, when debugging or if a serious error happened that totally messed up the content, what that variable was supposed to contain.

So, what is a much better solution? Create the right type, always, and use those types in your application. Example:

class HTTPURL(object):  
   def __init__(self, scheme, host, path, query=None):
       if scheme != "http":
            raise ValueError, "not an http url"
       # validate host
       # validate path
       # validate query - should be a param-value dict
       self._host = host
       self._path_parts = path_parts
       self._query = query if query is not None else {}

   @classmethod
   def parse_from_str(cls, http_url_as_str):
       # parse the url, then
       return cls(scheme, host, path, query)

   def add_query_param(self, param, value):
       # add param and value to dictionary, return a new
       # HTTPURL object with that additional value. Raise
       # an error if such value is already defined

   def append_path(self, additional_path):
       # properly append additional_path, checking for 
       # missing or duplicate initial slashes, etc, then
       # return a new HTTPURL object.

   def __str__(self):
       # create the url from current variables, and return it as
       # a str

(this is simplified - you might want to add auth data, port, etc)

See the point? You're making sure that no invalid http_url instance can exist, and you're providing proper methods to manipulate such instance. This way you're encapsulating your domain knowledge about URLs in a specific class, instead of scattering and duplicating it everywhere and you're making it really hard for users to commit accidental mistakes - they would be caught quickly.

The HTTPURL we've seen here is composite and this approach may make a lot of sense, but in Python you can even use Tiny Types in a very, VERY efficient fashion, when you just need to add some semantics and some validation without a full-fledged class because you just don't need that kind of manipulation:

class Angle(int):  
    def __new__(cls, v):
        if isinstance(v, Angle):
            return v
        if v < 0 or v > 360:
            raise ValueError("invalid angle, must be within 0 and 360")
        if not isinstance(v, (int, long)):
            raise TypeError("invalid angle type, must be int or long")
        return super(Angle, cls).__new__(cls, v)

See? The instances you create are of type Angle, but they're still an int! So you can pass them straight to other apis/libraries that just don't support or know nothing about your custom types, and yet you get semantics and validation!

Of course, if you need more operations you'll need to redefine more methods:

class Angle(int):  
    def __new__(cls, v):
        if isinstance(v, Angle):
            return v
        if v < 0 or v > 360:
            raise ValueError("invalid angle, must be within 0 and 360")
        if not isinstance(v, (int, long)):
            raise TypeError("invalid angle type, must be int or long")
        return super(Angle, cls).__new__(cls, v)

    def __add__(self, other):
        return Angle(int(self) + int(Angle(other)))

angle = Angle(10)  
angle2 = Angle(2)  
angle3 = Angle(angle2)

print angle+angle2  
print type(angle+angle2)  
12  
<class '__main__.Angle'>  

Do you think all that is too verbose, and you'd just prefer to stick to string and ints? Well... I suggest you try this approach, check how many bugs you prevent and/or how your code becomes more expressive and cohesive, and then think again about verbosity. If your code is more compact at the expense of correctness and automated checking, maybe that compactness isn't worth it.

Alan Franzoni

Read more posts by this author.

Trieste, Italy
comments powered by Disqus