Sunday, February 25, 2018

A small adventure into Julia macro land

The Julia Manual teaches us that
Julia evaluates default values of function arguments every time the method is invoked, unlike in Python where the default values are evaluated only once when the function is defined.
in Noteworthy differences from Python section.

However, sometimes you want a value to be evaluated only once when the function is defined. Recently a probably obvious fact has downed on me that this can conveniently be achieved using macros. Here is a simple example:

macro intvec()

function f(x)
    v = @intvec()
    push!(v, x)

When you run this code you can observe that Hey! is printed once (when @intvec is evaluated).

Now let us check how the function works. Running:

for i in 1:5


[1, 2]
[1, 2, 3]
[1, 2, 3, 4]
[1, 2, 3, 4, 5]

and we can see that @intvec was not run (no Hey! is printed). This is natural - macros are evaluated only once before the program is actually executed.

Another small example using comprehensions:

a = [Int[] for i in 1:3]
b = [@intvec() for i in 1:3]
push!(a[1], 1)
push!(b[1], 1)

Now let us compare the contents of a and b:

julia> a
3-element Array{Array{Int64,1},1}:

julia> b
3-element Array{Array{Int64,1},1}:

And we see that in case of  b each index points to the same array.

One might ask if it is only a special case or it does actually mater in daily Julia usage. The situation where this distinction is important came up recently when writing documentation of @threads macro. If you check out a definition of f_fix function there you will find:

function f_fix()
    s = repeat(["123", "213", "231"], outer=1000)
    x = similar(s, Int)
    rx = [Regex("1") for i in 1:nthreads()]
    @threads for i in 1:3000
        x[i] = findfirst(rx[threadid()], s[i]).start
    count(v -> v == 1, x)

where we use Regex("1") instead of a more natural r"1" exactly because the latter would create only one instance of regex object.

So the question is what is the benefit of r"1" then? The answer is performance - we have to compile the regex only once. This saves time if a function containing it would be called many times, e.g.:

julia> f() = match(r"1", "123")
f (generic function with 2 methods)

julia> g() = match(Regex("1"), "123")
g (generic function with 1 method)

julia> using BenchmarkTools

julia> @benchmark f()
  memory estimate:  240 bytes
  allocs estimate:  4
  minimum time:     139.043 ns (0.00% GC)
  median time:      143.627 ns (0.00% GC)
  mean time:        170.929 ns (12.91% GC)
  maximum time:     2.854 μs (90.67% GC)
  samples:          10000
  evals/sample:     916

julia> @benchmark g()
  memory estimate:  496 bytes
  allocs estimate:  9
  minimum time:     5.754 μs (0.00% GC)
  median time:      6.687 μs (0.00% GC)
  mean time:        7.313 μs (0.00% GC)
  maximum time:     97.039 μs (0.00% GC)
  samples:          10000
  evals/sample:     6

The lesson is typical for Julia - you can squeeze out a performance but there are consequences that you should be aware of.


  1. The manual says "A macro maps a tuple of arguments to a returned expression...". I found your results a bit confusing until I realized `@intvec` isn't returning an expression. If you use `macro intvec2() quote Int[] end end` you get a distinct array on each invocation. Why do repeated calls to `push!(@intvec,1)` always return `[1]` and not return Vectors of increasing size? I expect the latter from your `b = [@intvec() for i in 1:3]` example.

    1. This is exactly the point - macro does not have to return an expression. And, for instance, this is what regex macro does. The reason why repeated calls to `push!(@intvec,1)` return `[1]` is that the macro is invoked every time you make this call, but if you would write for example `for i in 1:5 global x = push!(@intvec, i) end` at the end `x` will contain five elements as `@intvec` is invoked only once when the loop is parsed.