Sunday, February 25, 2018

A small adventure into Julia macro land

The Julia Manual teaches us that
Julia evaluates default values of function arguments every time the method is invoked, unlike in Python where the default values are evaluated only once when the function is defined.
in Noteworthy differences from Python section.

However, sometimes you want a value to be evaluated only once when the function is defined. Recently a probably obvious fact has downed on me that this can conveniently be achieved using macros. Here is a simple example:

macro intvec()
    println("Hey!")
    Int[]
end

function f(x)
    v = @intvec()
    push!(v, x)
    v
end

When you run this code you can observe that Hey! is printed once (when @intvec is evaluated).

Now let us check how the function works. Running:

for i in 1:5
    println(f(i))
end

produces:

[1]
[1, 2]
[1, 2, 3]
[1, 2, 3, 4]
[1, 2, 3, 4, 5]

and we can see that @intvec was not run (no Hey! is printed). This is natural - macros are evaluated only once before the program is actually executed.

Another small example using comprehensions:

a = [Int[] for i in 1:3]
b = [@intvec() for i in 1:3]
push!(a[1], 1)
push!(b[1], 1)

Now let us compare the contents of a and b:

julia> a
3-element Array{Array{Int64,1},1}:
 [1]
 Int64[]
 Int64[]

julia> b
3-element Array{Array{Int64,1},1}:
 [1]
 [1]
 [1]

And we see that in case of  b each index points to the same array.

One might ask if it is only a special case or it does actually mater in daily Julia usage. The situation where this distinction is important came up recently when writing documentation of @threads macro. If you check out a definition of f_fix function there you will find:

function f_fix()
    s = repeat(["123", "213", "231"], outer=1000)
    x = similar(s, Int)
    rx = [Regex("1") for i in 1:nthreads()]
    @threads for i in 1:3000
        x[i] = findfirst(rx[threadid()], s[i]).start
    end
    count(v -> v == 1, x)
end

where we use Regex("1") instead of a more natural r"1" exactly because the latter would create only one instance of regex object.

So the question is what is the benefit of r"1" then? The answer is performance - we have to compile the regex only once. This saves time if a function containing it would be called many times, e.g.:

julia> f() = match(r"1", "123")
f (generic function with 2 methods)

julia> g() = match(Regex("1"), "123")
g (generic function with 1 method)

julia> using BenchmarkTools

julia> @benchmark f()
BenchmarkTools.Trial:
  memory estimate:  240 bytes
  allocs estimate:  4
  --------------
  minimum time:     139.043 ns (0.00% GC)
  median time:      143.627 ns (0.00% GC)
  mean time:        170.929 ns (12.91% GC)
  maximum time:     2.854 μs (90.67% GC)
  --------------
  samples:          10000
  evals/sample:     916

julia> @benchmark g()
BenchmarkTools.Trial:
  memory estimate:  496 bytes
  allocs estimate:  9
  --------------
  minimum time:     5.754 μs (0.00% GC)
  median time:      6.687 μs (0.00% GC)
  mean time:        7.313 μs (0.00% GC)
  maximum time:     97.039 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     6

The lesson is typical for Julia - you can squeeze out a performance but there are consequences that you should be aware of.