wtorek, 26 marca 2013

SuperFib - an example of using lexical closure in Python

This story begins with simple implementation of a function that calculates Fibonacci numbers.

def fib(n):
    return 1 if n == 1 or n == 2 else fib(n-1) + fib(n-2)


Assuming that we are using 1-based indices, we can run a couple of tests and see that it works. Of course, fib(35) takes some time to calculate - about 15 seconds on my laptop.

Now, if you don't know what lexical closure is, I recommend reading about it on Wikipedia) before going further. Let's try to cache the results, but also be better than using conventional memoization - not only will we cache the final result of each call, but also intermediate results.

def fib():
    _dict = dict()
    def inner(n):

        nonlocal _dict
        if n == 1 or n == 2:
            return 1
        else:
            try:
                return _dict[n]
            except KeyError:
                _dict[n] = inner(n-1) + inner(n-2)
                return _dict[n]
    return inner


f = fib()
f(1000) returns instantly, with:
43466557686937456435688527675040625802564660517371780402481729089536555417949051890403879840079255169295922593080322634775209689623239873322471161642996440906533187938298969649928516003704476137795166849228875

However, if I call f(1500) without calling f(1000) first, the calculation will not finish and my interpreter will crash or at least restart because of stack evaluation overflow.

But if I ran f(1000) and then f(1500) and then f(2000) and so on and so forth, advancing by 500 each time, I can very quickly get to quite far elements of Fibonacci sequence. Calculating 10 000th element is quick and easy, compared to conventional approach. We can quickly get to the point where printing the numbers to the console takes more time than actually calculating them.


piątek, 22 marca 2013

Unit Testing code with IO file operations (in Python)

We may often come across a piece of code that was written without Unit Tests at all. In addition, the piece of code may be dealing with IO like file writing and reading, which makes it more difficult to Unit Test it when we are trying to refactor and modify. Let's suppose the code in question looks like this:

def writeInitialsToFile(filename, name, surname):
    initials = name[0] + '.' + surname[0] + '.'
    with open(filename, 'w') as file:
        file.write(initials)


def readInitials(filename):
    initials = None
    with open(filename, 'r') as file:
        initials = file.readline()
    return initials


A straightforward and bad idea would be to write a couple of Unit Tests that make use of a real file and simply test the reading and writing. Is therea a better way to test this code?

First of all, we need a way to replace the real file with something else. For both reading and writing we will now have a couple of functions, one that expects a stream for reading or writing and the other that creates the stream and calls the corresponding reading or writing function:

def readInitialsFromFileStream(fileStream):
    return fileStream.readline()


def readInitialsFromFile(filename):
    initials = None
    with open(filename, 'r') as fileStream:
        initials = readInitialsFromFileStream(fileStream)
    return initials


def writeInitialsToFileStream(fileStream, name, surname):
    initials = name[0] + '.' + surname[0] + '.'
    fileStream.write(initials)


def writeInitialsToFile(filename, name, surname):
    with open(filename, 'w') as fileStream:
        writeInitialsToFileStream(fileStream, name, surname)






Now, we can test at least the pair: readInitialsFromFileStream and writeInitialsToFileStream. In order to test these functions, we don't even need to create a file in the file system - we can just pass something that has similar characteristic, but is not a real file: io.StringIO.

A quasi - Unit Test piece (without the full unittest.TestCase class, to shorten things a bit) would look like this:

testReadingOfInitialsFromFileStream:
testStream = io.StringIO()
testStream.write('T.M.')
testStream.seek(0)
assert('T.M.', readInitialsFromFileStream(testStream))

testWritingOfInitialsToFileStream:
testStream = io.StringIO()
writeInitialsToFileStream(testStream, 'Thomas', 'Mann')
testStream.seek(0)
assert('T.M.', testStream.readline())



Why is it better than using a real file? Some of the reasons (probably not all) are:
  • unit tests should work flawlessly regardless of the environment; when using real file, we may have different permissions depending on the computer and operating system where the tests are run
  • some tests (although this was not necessarily clear from the example) may require that the file has certain name or even certain content - otherwise they'll fail; by using a real file with pre-set name and content we are creating an unnecessary dependency that may impact the maintainability and, in the long run, usefullness of the tests
  • in order to avoid using a real file we had to separate the creation of a stream from writing to / reading from the stream; this forced us to make a first step towards fulfilling the Single Responsibility Principle

niedziela, 17 marca 2013

Long if-else statement without if (in Python)

Let's suppose we find ourselves in a situation where we have no other option than to write a lengthy if-else statement (definitely not something that would make cleaner by polimorphism).

def inevitable_long_if (n):
    if n == 0:
        print('Zero')
    elif n == 1:
        print('One')
    elif n == 2:
        print('Two')
    else:
        print('Manu-manu')


Can we re-write it so that we do not use IFs?

import collections
import functools

d = collections.defaultdict(lambda : functools.partial(print, 'Manu-manu'))
d[0] = functools.partial(print, 'Zero')
d[1] = functools.partial(print, 'One')
d[2] = functools.partial(print, 'Two')



Now, let's call some to see how it works:


d[0]()  ==>  Zero
d[1]()  ==>  One
d[2]()  ==>  Two
d[7]()  ==> Manu-manu

The first version is 9 lines long, the second one (without imports) is 4 lines long. To me, the second version is also much more readable, but you decide.

czwartek, 14 marca 2013

Scrum? Yes, I'll have one, please!

There is one of those soft skills training like "Conflict management" or "Team building". The trainer is a woman with psychological background. After the usual part that is delivered during every soft skills training (and, surprisingly, this part is almost always the same on every training) there is a break. The trainer asks people what they do in their job and it may well be that she is really interested in what people say, or maybe she's just pretending. Regardless, she goes like this:

- Could you tell me more about this Scrum thing you mentioned? How do you do it exactly?

One of the trainees feels he's the one who must speak up:

- Well, we meet every day at 9:00 and we talk briefly what we did yesterday and what we have to do today and whether anyone has any obstacles. This is called "stand-up". Also, the whole project is divided into a number of short periods called iterations and there is the planning at the beginning of each and (...)

The trainer is impressed:

- Wow, that sounds like a great idea! I just wonder... could that Scrum thing be used in another sort of business, not necessarily software?

And then there is the key turn of action: the trainee ponders for a while... some more... OK, the neurons have worked out the answer:

- Hmmm.... in general yes, you can. I think you could well adapt it to another domain.

- Oh my, that is so great! I have to sell this idea to the marketing team I have a training with this week!

Let's stop the story here. Scrum is a framework for producing software. Agile Manifesto has been written with software in mind. There is NO evidence that these things would effectively work in, or could be applied to, any other kind of business. Sorry, no. As professionals working in the software business we should not be deceiving people from other businesses that this is something they can use as well.

wtorek, 12 marca 2013

Pure functions in Object Oriented World

Let's suppose we have a class called Time with method convertToGMT(...). The method has one parameter called time, it converts the parameter to the GMT and returns the result.

For some reason (and please let's don't try to question this reason for the sake of this short exercise) the actual time zone of the caller is an instance variable of the Time object. It may be set when the object is constructed or re-set afterwards.

convertToGMT simply checks the time zone by accessing the instance variable timezone, performs calculation and based on that returns the GMT result.

When writing a couple of unit tests to test how convertToGMT works for different time zones, we would have to repeatedly:
  • set the time zone by calling the presumed Time.setTimeZone(...) method
  • call convertToGMT and check the result
We would do that for each time zone - time-to-be-converted pair we want to test. Clearly, this function is not a pure function, since its result depends on the state of the object. Programmers with no functional background would probably not notice the impurity. However, we could legitemately ask the question: can we make our tests look cleaner in this case?

We could add a helper function _convertToGMT (with underscore) that would take two parameters instead of one: actual time and timezone. The function would be a pure function. We could write all our test cases without meddling at all with the state of the Time object, just checking the result of calling _convertToGMT(time, timezone) each time for different pair of parameters. Then, after covering all desired test cases, we would add just one or a couple to test that convertToGMT correctly makes use of _convertToGMT by passing the instance variable timezone as the second parameter.

Of course, this is a very simple example with just one parameter and one instance variable, but in your real code you can have more interesting cases where this technique could be useful.

How is it called? I have no idea. If you have seen this already elsewhere, please let me know.

środa, 6 marca 2013

Clojure is great!

I've been playing with Clojure for a couple of months now and I want to share with you some thoughts why I think it is a great language.

Clojure is a Lisp

Many wise people have written great things on Lisp, so I'm not even trying to present arguments for it; just try out these:
  • http://www.paulgraham.com
  • http://www.gigamonkeys.com/book
Clojure integrates easily with Java libs

Everybody knows it. Nothing interesting to add here.

Clojure implements modern concurrency techniques

Clojure implements STM(http://en.wikipedia.org/wiki/Software_transactional_memory). It features Atoms (abstractions for managing state synchronously), Agents (abstractions for managing state asynchronously, by applying function to agent's state), and lock-free handling of scenarios that would have to be handled with mutexes in other programming languages.

Don't be deceived that Clojure is "nice, because it revitalised Lisp or JVM". The key point is that it features modern approach to concurrency. Read this excellent explanation by Rich Hickey:
http://clojure.org/state


poniedziałek, 4 marca 2013

Code Kata: Eating habits and Markov chains

The Wikipedia entry:
http://en.wikipedia.org/wiki/Markov_chain#Introduction
contains nice and clear explanation of the idea of Markov chains. Based on the paragraph about eating habits, we can form an interesting coding exercise:
  • A person eats lunch every day
  • We form a set of rules similar to the one presented in Wikipedia - one part of the exercise is how to represent these rules in a program. What would be the convenient way to represent this internally and how would you translate a possible textual representation (such as one entered from standard input) into the internal format? Would the two format be quite similar or quite different from each other?
  • We run a simulation for a number of days - say, 365 days - and we get statistical distribution of the number of each type of meals that would be eaten
Does it seem like a long program to write? My initial solution in Python was 23 lines long and then refined solution took just 14 lines.