I’m not trying to turn this into language A vs. B debate, its just that something interesting happened last night. I’m trying to learn both Scala and Ruby. I’m a bit more enthused by Scala at this point, mostly because I somewhat prefer static typing and I’m a long time Java guy. Now, I am also a long time Perl guy, so I’m not necessarily choosing Scala over Ruby because of lack of experience with dynamic languages.
So, with this said, the best way to learn language idioms is to start actually implementing something. A sample application is good, but due to frameworks and relatively low business logic/algorithm ration to these sample apps, at least the ones I can conceive, you mostly spend time learning the framework idioms vs. actual languages ones.
So I thought I’d start by implementing some algorithms in these languages. I decided on Merge Sort. It’s a O(n log n) divide and conquer algorithm. See the link for more info.
Let’s start off with the code. Again, this is not done in an OO and/or reusable manner, this is mostly procedural/functional code to test mostly functional idioms.
Scala code:
import java.math._
import scala.util._
import scala.collection.mutable._
object MergeSort extends Application {
println("Starting...");
val list = initListWith(1000)
val start = System.currentTimeMillis()
println("Sorted " + merge_sort(list.toList).length + " elements.")
val end = System.currentTimeMillis()
println("Total time: " + (end - start))
def merge_sort(list:List[Int]): List[Int] = {
if (list.length <= 1) {return list}
var (left: List[Int], right: List[Int]) = list splitAt Math.round(list.length/2.0).toInt
left = merge_sort(left); right = merge_sort(right)
if(left.last > right.head)
merge(left, right);
else
left ::: right
}
def merge(l:List[Int], r:List[Int]):List[Int] = {
var result:ListBuffer[Int] = new ListBuffer()
var left = l; var right = r;
while(left.length > 0 && right.length > 0) {
if(left.head <= right.head) {
result += left.head
left = left.tail
} else {
result += right.head
right = right.tail
}
}
if(left.length > 0)
result.toList ::: left
else
result.toList ::: right
}
def initListWith(limit:Int):List[Int] = {
val list:ListBuffer[Int] = new ListBuffer()
val randGen = new Random()
(0 until limit).foreach( i => list += randGen.nextInt(1000000) )
return list.toList
}
}
And Ruby code:
def merge_sort(*list)
return list if list.size <= 1
result = []
middle = (list.size/2.0).round
left = list[0, middle]
right = list[middle..-1]
left = merge_sort(*left)
right = merge_sort(*right)
if left[-1] > right[0]
result = merge(left, right);
else
result = left + right
end
return result
end
def merge(left, right)
result = Array.new()
while left.size > 0 && right.size > 0
if left.first <= right.first
result << left.slice!(0)
else
result << right.slice!(0)
end
end
if left.size > 0
result.concat(left)
else
result.concat(right)
end
return result
end
list = (1..1000).collect {|i| rand(1000000); }
puts "Starting..."
start = (1000 * Time.now.to_f).to_i
puts "Sorted " + merge_sort(*list).size.to_s + " elements."
end_time = (1000 * Time.now.to_f).to_i
total_time = end_time - start
puts "Total time: #{total_time}"
So all works great, they both perform the operation correctly. But here is the weird part. JVM bytecode (Scala is very similar to Java) is supposed to be super fast. That’s the assumption I’ve always made on various readings as well as empirical data. Now, check out these benchmarks…
$ scalac merge_sort.scala && scala MergeSort Starting... Sorted 10000 elements. Total time: 374 $ ruby merge_sort.rb Starting... Sorted 10000 elements. Total time: 138
I’m not timing the compilation/startup phase, as you can see from the code. In the case of 10K records, ruby performs almost 3 times as fast. It the case of 100K records:
$ scalac merge_sort.scala && scala MergeSort Starting... Sorted 100000 elements. Total time: 31213 $ ruby merge_sort.rb Starting... Sorted 100000 elements. Total time: 7681
Ruby is 4 times faster than Scala in this case, of course the growth is relative to the (n log n) performance.
I’m not focusing on Ruby here, since Scala is what I’m concerned about. I used a List for most operations that didn’t require in place modifications. I used a ListBuffer for mutable lists, which advertises constant time append/prepend operations. I’m not sure what I’m missing. I’m going to play with it a bit more to see where the culprit is, but again, I’m not a Scala expert, though not very familiar with its collections library and the ins/outs of each implementation.
Any ideas?
** Update **
So after Maxim’s comment, I updated Scala to use
while(left.lengthCompare(0) > 0 && right.lengthCompare(0) > 0)
vs.
while(left.length > 0 && right.length > 0)
All I can say, Scala #$@%^ing rocks. So it seems like the length calculation on a mutable collection is the culprit, which makes perfect sense.
Using lengthCompare(l:Int):Int gets rid of the brute force length calculation. Here is what I get from the docs…
Result of comparing length with operand l. This method is used by matching streams against right-ignoring (…,_*) patterns.
I’m going to dig into the source to figure out how it does this comparison in a bit, but I can only guess that if say, we’re comparing it to length 0, all it needs is to get to the first element in the collection to figure out that the result is false, no need to actually get the full length.
Did I say culprit, well, let me say, BIG culrpit. Here are the benchmark results now…
Scala:
$ scalac merge_sort.scala && scala MergeSort Starting... Sorted 100000 elements. Total time: 387
Ruby:
$ ruby merge_sort.rb Starting... Sorted 100000 elements. Total time: 7667
Wow, what a drastic improvement.
*** Disclaimer: This means nothing in terms of benchmarking! Please don’t waste your time disqualifying this benchmark, because I know it really means nothing in this small context as well as it means nothing without _power of a test measurement. Benchmarks are crap unless properly and fairly compared. This post is not about benchmarking languages, rather I wanted to see how the various idioms and libraries perform while learning both languages. It turned out to be very useful, as a small recommended tuning in Scala’s collections lib completely turned things inside out_ ***
Any ruby folks out there care to comment?
Tags: algorithms, ruby, scala