Thursday, May 12, 2016

Better algorithms



Computers are getting faster and faster everyday. Hard disk drives have more capacity now that I ever dreamt thirty years ago, when I started programming. Memories have more capacity and are much faster. Now I don't need to turn my computer on and go to have some coffee while it gets ready to work. This happens in seconds.

This is all very good, but nothing is perfect. You know what they say about that old Chinese symbol called Yin-Yang. There is always something bad inside good things, and always something good inside bad things.



The good part of older computers was the fact we, programmers, had to deal in too much optimization. We had to be aware of every single byte, of every single millisecond spend.

The bad part of new computers is the fact we don't need to do the same. Many people say "Oh, I don't need to optimize that much, because computers are faster now", or then "If the memory is not enough to run this system we may buy some more", or even  "We may upgrade the processor if this one is too slow to run this".

As a general result, many algorithms and data structures are not so thought about as they could be. I'll give you an example in this post.

A student of mine was trying to solve this simple problem: "Given a list of n integer numbers, how many divisors of these numbers are also divided by 2 themselves?"

In order to solve this problem he wrote this Ruby code and asked me to evaluate it.

divisors01.rb

def divide(d,n)
    ((n % d) == 0)
end

def even_divisors(n)
    cnt = 0
    (2..n).each { |d|
        cnt = cnt + 1 if (divide(d,n) && divide(2,d))    
    }
    cnt
end

_n = $stdin.gets.to_i

(1.._n).each { |r|
    _prov = $stdin.gets.to_i 
    puts even_divisors(_prov)
}

As you may easily see, this does the job. It reads the number of elements in the list then loops reading the elements and invoking the method even_divisors who prints the number of divisors of the parameter received who are also even numbers.

But this method, as you may see, may take a lot to run for each number. It loops from 2 to n one by one, which is not necessary, since we don't need to count the odd divisors. Then, we test the divisibility of the parameter n given by (n-1) numbers, but half of the tests are not necessary.

Under my suggestion he changed the code this way.

divisors02.rb

def divide(d,n)
    ((n % d) == 0)
end

def even_divisors(n)
    cnt = 0
    d = 2
    while (d <= n) do
        cnt = cnt + 1 if divide(d,n)
        d = d + 2
    end    
    cnt
end

_n = $stdin.gets.to_i

(1.._n).each { |r|
    _prov = $stdin.gets.to_i 
    puts even_divisors(_prov)
}

Now we are much better. Instead of that range block (2..n) we are looping from 2 to n in steps of two, testing only the even numbers. This, of course, will reduce running time. But I insisted with him it is not optimal yet.

The fact is, we have a method named divide with a single line and we may move this line to our looping block, avoiding an unnecessary method call, this way:

divisors03.rb

def even_divisors(n)
    cnt = 0
    d = 2
    while (d <= n) do
        cnt = cnt + 1 if ((n % d) == 0)
        d = d + 2
    end    
    cnt
end

_n = $stdin.gets.to_i

(1.._n).each { |r|
    _prov = $stdin.gets.to_i 
    puts even_divisors(_prov)
}

We run a simple benchmark to see the differences of performances of these three codes. This is the result:



As you may see, the first code takes 2.5937 times the time taken by the second. And when compared to the third it takes 3.8604 times the time the third method takes.

You may also see that when you compare the second method with the third, it takes 1,48837 times the time the third takes. This is due to the method call. We seldom think about what happens when we call a method. There are a lot of tasks to be performed and these tasks take time.

Now you may ask me: "How may I know when I should create a separate method or when I must keep the method code inside the loop?"

It is easy to know. If the code is simple enough not to make the logic inside the loop too complex and it will be executed just from one point, keep it inside the loop. Create a new method only when this is needed to keep the DRY principle.