Erlang for Python programmers: Part III

September 24, 2007

Previous parts: Intro, Part I and Part II

Today we are going to take a look at functions.
In Erlang there is no direct correspondence to what we have in Python:

1. default values, keyword arguments and formal parameter of form **name

def foo(key, name=‘ruslan’, type=‘unknown’, **kw):
    print ‘key = %s, name = %s, type = %s’ % (key, name, type)
    for key, val in kw.items():
        print ‘%s = %s’ % (key, val)
>>> foo(1)
key = 1, name = ruslan, type = unknown
>>> foo(1, type=’known’, x=1, y=10)
key = 1, name = ruslan, type = known
y = 10
x = 1

2. formal parameter of form *name

def double_sum(*args):
    return sum(x * 2 for x in args)
>>> double_sum(2, 4, 5)
22

Let’s take a look at Erlang’s function declaration syntax (a bit modified from Erlang Reference Manual):

  1. Function declaration consists of function clauses separated by semicolons (;) and terminated by period (.)
  2. Function clause consists of head and body separated by ->
  3. A clause head consists of function name (which is atom), argument list(each argument is pattern) and optional guard sequence beginning with keyword when.
  4. A clause body consists of sequence of expressions separated by comma (,)

Name(Pattern11,…,Pattern1N) [when GuardSeq1] ->
Body1;
…;
Name(PatternK1,…,PatternKN) [when GuardSeqK] ->
BodyK.

where Body is in form:
Expr1,
Expr2,

ExprN

Number of arguments in function is called arity. Function is uniquely identified by combination of module name, function name and function’s arity and this form is often written as m:f/N. Two functions in the module with the same name, but different arity
are completely different functions, remember it.

Enough dry theory, let’s define our Erlang factorial function in module mymath.erl :

-module(mymath).
-export([factorial/1]).

factorial(0) ->
    1;
factorial(N) ->
    N * factorial(N-1).

What we have here:

1) function declaration of one function called factorial
2) there are two clauses
- first clause separated by semicolon (;)

factorial(0) ->
    1;

- second and final clause terminated with period (.)

factorial(N) ->
    N * factorial(N-1).

3) Every clause (first and second) has correspondingly head and body. Head consists of function’s name, in our case it’s factorial and after follows arguments list which contains patterns. In first clause pattern is 0, in second - N.

How this works:

When function is entered, the function clauses are pattern matched against passed arguments. Pattern matching happens in order functions are defined in module. So when you enter

1> mymath:factorial(5).
120

first 5 is matched against clause factorial(0) and fails as 5 != 0, then 5 is matched against N in second clause and pattern matching succeeds. As a result unbound variable N in pattern becomes bound and gets value 5. This leads to execution of body of second clause, namely N * factorial(N-1). This happens recursively until N becomes 0 and first clause factorial(0) is executed terminating recursion.
The same factorial function will look in Python without pattern matching like:

def factorial(n):
    if n == 0:
        return 1
    return n * factorial(n-1)

In Python’s version we “hardcoded” branching in function’s body.

The benefits of pattern matching are that when conditions inside function become complex you may do your job better and write less code using pattern matching and also if you have big function this may mean that you do not have to modify body of your function but to add another clause with corresponding pattern.

Earlier I mentioned that function clauses are pattern matched in order they are defined in module. For our function this is important. If we change order of clauses this will lead to problems, as the condition to terminate recursion never appear:

-module(mymath).
-export([factorial/1]).

factorial(N) ->
    N * factorial(N-1);
factorial(0) ->
    1.

When compiling we will get warning (use C-c C-k in Emacs or c(mymath). in Eshell to compile)

2> c(”/home/alienoid/dev/erlang/mymath”, [{outdir, "/home/alienoid/dev/erlang/"}]).
/home/alienoid/dev/erlang/mymath.erl:6: Warning: this clause cannot match because a previous clause at line 4 always matches
{ok,mymath}

indicating that second clause will never match, but still you can call the function (it will most likely to crash after eating all your available memory).

You can easily avoid that problem introducing guard that starts with keyword when in function clause’s head:

-module(mymath).
-export([factorial/1]).

factorial(N) when N > 0 ->
    N * factorial(N-1);
factorial(0) ->
    1.

Now you can put those two clauses in any order you like. I should also add that if during pattern matching no matched function clause is found a function_clause run time error will occur:

3> mymath:factorial(-4).

=ERROR REPORT==== 24-Sep-2007::01:53:25 ===
Error in process <0.29.0> with exit value: {function_clause,[{mymath,factorial,[-4]},{erl_eval,do_apply,5},{shell,exprs,6},{shell,eval_loop,3}]}

** exited: {function_clause,[{mymath,factorial,[-4]},
                             {erl_eval,do_apply,5},
                             {shell,exprs,6},
                             {shell,eval_loop,3}]} **

What is guard anyway ? Guards make pattern matching more powerful, they can be used to perform different tests and comparison operations on variables in a pattern as you could see. You can have several guard expressions in clause separated by commans (,). The guard containing several guard expressions evaluates to true only if every guard expression evaluates to true. Example of guard containing several guard expressions:

-module(mymath).
-export([factorial/1]).

factorial(N) when is_integer(N), N > 0 ->
    N * factorial(N-1);
factorial(0) ->
    1.

For above definition calling mymath:factorial(5) reads as: choose clause factorial(N) when (N is integer AND N > 0)

There are also guard sequences which have form guard1; guard2; …; guardN and guard sequence evaluates to true if at least one guard is evaluated to true:

-module(mymath).
-export([factorial/1]).

factorial(N) when is_integer(N), N > 0; is_list(N) ->
    N * factorial(N-1);
factorial(0) ->
    1.

For above definition calling mymath:factorial(5) reads as: choose clause factorial(N) when (N is integer AND N > 0) OR N is list.

You have probably noted that the way we defined our factorial function means that it will be calling itself recursively and this will use stack and stack is not infinite resource. We need to do something about our function and this means to make it tail recursive. When last expression in function body is function call this means that call is tail recursive and this is how infinite loop may be done without consuming stack resources. So here is our new tail-recursive function factorial that uses accumulator to pass value of previous calculation:

-module(mymath).
-export([factorial/1]).

%% this one is exported, arity is 1
factorial(N) ->
    factorial(N, 1).

%% helper function, not exported, arity is 2
%% tail-recursive, uses accumulator to store
%% calculation between invocations
factorial(N, Acc) when N > 0 ->
    factorial(N-1, N*Acc);
factorial(0, Acc) ->
    Acc.

In foregoing example you can see guards, tail-recursion, usage of accumulators, functions with different arities. Experiment with it.


comment-dwim and comment-style

September 20, 2007

In my previous post I mentioned that if you use Emacs and you’re editing Erlang source code you can easily comment whole marked region by invoking M-; to which by default bound interactive Lisp function comment-dwim. This works in different modes and it’s very handy especially if your language of choice has no multiline comments.

So far so good, but I must admit sometimes I get the blues when I comment some nested code blocks with M-; keystrokes. Well, maybe I’m blowing things out of proportion about my sadness, but in any case recently I came across really nice small blog post called comment-style and since then I’m happy with comments produced in nested blocks. Let’s see why.

When you comment marked regions in Emacs buffer with M-; the comment-style variable defines the way regions are commented. Default value is plain.

Buffer with Python code before commenting:

def binsearch(seq, key, start, end):
    if end < start:
        return -1
    mid = (start + end) / 2
    if key == seq[mid]:
        return mid
    if key < seq[mid]:
        return binsearch(seq, key, start, mid-1)
    if key > seq[mid]:
        return binsearch(seq, key, start+1, end)

Buffer with Python code after commenting:

def binsearch(seq, key, start, end):
    if end < start:
        return -1
    mid = (start + end) / 2
    # if key == seq[mid]:
#         return mid
    if key < seq[mid]:
        return binsearch(seq, key, start, mid-1)
    if key > seq[mid]:
        return binsearch(seq, key, start+1, end)

As you see visually elements of commented block are not on the same line.

Example with Erlang code before commenting (somewhat far-fetched example though our main concern are comments, not code here):

all(Pred, [Hd|Tail]) ->
    case Pred(Hd) of
        true -> all(Pred, Tail);
        false ->
            io:format(“Hd = ~p~n”, [Hd]),
            true,
            false
    end;
all(Pred, []) when is_function(Pred, 1) -> true.

After commenting:

all(Pred, [Hd|Tail]) ->
    case Pred(Hd) of
        true -> all(Pred, Tail);
        false ->
            %% io:format(”Hd = ~p~n”, [Hd]),
%%             true,
            false
    end;
all(Pred, []) when is_function(Pred, 1) -> true.

So the visual issue with comments is the same.

After reading foregoing blog post about comment-style I’ve added this line to my .emacs:

(setq comment-style 'indent)

Easy as pie and which means (taken from Emacs help): ” …`comment-start’ markers should not be put at the left margin but at the current indentation of the region to comment.”

After that addition foregoing commented block in Python mode will look like:

def binsearch(seq, key, start, end):
    if end < start:
        return -1
    mid = (start + end) / 2
    # if key == seq[mid]:
    #     return mid
    if key < seq[mid]:
        return binsearch(seq, key, start, mid-1)
    if key > seq[mid]:
        return binsearch(seq, key, start+1, end)

in Erlang mode:

all(Pred, [Hd|Tail]) ->
    case Pred(Hd) of
        true -> all(Pred, Tail);
        false ->
            %% io:format(”Hd = ~p~n”, [Hd]),
            %% true,
            false
    end;
all(Pred, []) when is_function(Pred, 1) -> true.

Now I am happy. Thank you, Edward for the tip.


Erlang for Python programmers: Part II

September 16, 2007

In this short tutorial we will take a look at comparison operations, some arithmetic operations and modules. You may also want to skim over Inro and Part I.

Comparison operations

 Python   Erlang   Description             Erlang Example
 --------+--------+-----------------------+----------------
  <        <        strictly less than
 --------+--------+-----------------------+----------------
  <=       =<       less than or equal
 --------+--------+-----------------------+----------------
  >        >        strictly greater than
 --------+--------+-----------------------+----------------
  >=       >=       greater than or equal
 --------+--------+-----------------------+----------------
  !=       /=       not equal
 --------+--------+-----------------------+----------------
  ==       ==       equal                   1> 1 == 1.
                                            true
                                            2> 1 == 1.0.
                                            true
 --------+--------+-----------------------+----------------
           =:=      exactly equal to        1> 1 =:= 1.
                                            true
                                            2> 1 =:= 1.0.
                                            false
 --------+--------+-----------------------+----------------
           =/=      exactly not equal to
 --------+--------+-----------------------+----------------
  is                object identity
 --------+--------+-----------------------+----------------
  is not            negated obj identity

Python notes

Above comparison operations are supported by all objects. In Python you can chain comparisons:

>>> x, y, z = 1, 3, 7
>>> x < y <= z
True
>>> x < y and y <= z
True

Foregoing examples are identical, except that in second case y is evaluated twice.

Erlang notes

Following order is defined:

number < atom < reference < fun < port < pid < tuple < list < binary

1> 5 < erlang.
true
2> erlang < make_ref().
true

Both Erlang comparison operators (<, =<, >, >=, /=, ==) and Python comparison operators(<, <=, >, >=, !=, ==) make type coerce, “narrower” type is widened to that of another, ie when comparing integer and float first integer is converted to float, etc.
Erlang has special operators without type coerce though: =:= and =/=

Arithmetic operations

I’ll provide only several operations, for more take a look at corresponding language references.

 Python   Python Desc.         Erlang    Erlang Desc.   Example
 --------+--------------------+---------+--------------+--------------------
  x % y    remainder            X rem Y   integer        >>> 7 % 3
           of x / y                       remainder      1
                                          of X / Y       >>> 7.0 % 3
                                                         1.0

                                                         1> 7 rem 3.
                                                         1
                                                         1> 7.0 rem 3.
                                                         =ERROR REP..
 --------+--------------------+---------+--------------+--------------------
  x / y    quotient             X / Y     floating       >>> 7 / 3
           of x and y                     point          2
                                          division       >>> 7.0 / 3
                                                         2.3333333333333335

                                                         1> 7 / 3.
                                                         2.33333
 --------+--------------------+---------+--------------+--------------------
  x // y   (floored) quotient   X div Y   integer        >>> 7 // 3
           of x and y,                    division       2
           integer division                              >>> 7.5 // 3
           (result type is                               2.0
           not forced to be
           int)                                          1> 7 div 3.
                                                         2

Modules

Code in Erlang is organized into units called modules, which is familiar word for Python programmer.

Let’s define sample module with function declaration stored in file mymath.erl :

-module(mymath).
-export([fact/1]).

fact(0) -> 1;
fact(N) -> N * fact(N-1).

What we see here:

  1. module’s source code is stored in file with .erl extension
  2. module consists of attributes and function declarations which are terminated by period (.)
  3. we should provide module declaration defining name of the module
  • module name should be atom
  • module name is the same as file name minus .erl extension
  • module declaration is mandatory
  • module declaration attribute should be defined first.

To make functions defined in module accessible outside the module we need to export them, for this -export module attribute exists. We write exported functions inside square brackets in form of func/N where N is the number of arguments of function, called arity. I’ll repeat that functions not pointed in -export will not be accessible outside module.

Before code can be run, module must be compiled. Resulting compiled file will contain extension .beam

If you use Emacs you can compile module with C-c C-k and see results in erlang shell:

1> c(”/home/alienoid/dev/erlang/mymath”, [{outdir, "/home/alienoid/dev/erlang/"}]).
{ok,mymath}

Or you can compile it directly in shell. Make sure your shell’s current directory is where your mymath.erl lives, if it’s not the case use cd command in erlang shell, cd(”/path/to/dir/with/mymath.erl”) :

1> c(mymath).
{ok,mymath}

To invoke our function fact we use syntax mod:func :

2> mymath:fact(4).
24

Erlang has also -import attribute which allows to import functions into modules, so that you don’t need to use fully-qualified name mod:func to invoke function, again familiar behaviour and naming to Python programmer.

Erlang allows to insert code from file as-is with -include attribute at point where -import is defined, this is used to include records and macro definitions, for example.

Comments in Erlang module begin with character “%“, continue up to end-of-line and may be placed anywhere except inside string and quoted atoms.
Like in Python Erlang has no multiline comments.
If you use Emacs you can easily comment whole region with M-; command after you marked it.

-module(mymath).
-export([fact/1, print_double/1]).
-import(lists, [foreach/2]).
-include(“my_records.hrl”).

%% sum(L) ->
%%     sum(L, 0).

%% sum([H|T], Acc) ->
%%     sum(T, H+Acc);
%% sum([], Acc) -> Acc.

fact(0) -> 1;
fact(N) -> N * fact(N-1).
print_double(L) ->
    foreach(fun(X) -> io:format(“Double of ~p = ~p~n”, [X, X*2]) end, L).

Fin.

Next tutorial will be devoted to thorough exploration of functions in Erlang.


How to say big numbers in English: Common Lisp

September 12, 2007

When i was describing notations and representation of numbers in Erlang(and Python) i forgot to mention very cool feature of format function in Common Lisp, with which i got acquainted exploring highly-recommended Practical Common Lisp book.

This feature is ~R directive which knows how to say really big numbers in English words.

Here is the example (I use SBCL + SLIME):

CL-USER> (format nil “~R” 1604872898756737459538)
“one sextillion six hundred four quintillion eight hundred seventy-two
quadrillion eight hundred ninety-eight trillion seven hundred fifty-six
billion seven hundred thirty-seven million four hundred fifty-nine
thousand five hundred thirty-eight”

Erlang for Python programmers: Part I

September 9, 2007

Let’s skim over data types in Erlang today. Check previous tutorial for introduction.

Numbers

In Erlang there are two types of numeric literals: integers and floats.
In Python there are four of them: plain integers(usually called just integers), long integers, floating point numbers, and imaginary numbers.

In addition to conventional notation Erlang has its own specific notations:
1) $char
Gives ASCII value of char
2) base#value
Produces integer. Base is in range [2,36]

1> $A.
65
2> 2#111.
7
3> 16#1A.
26
4> 5.
5
5> 2.55.
2.55000

In Python we can define octinteger and hexinteger correspondingly with:

>>> 017, 0×1A
(15, 26)

To produce integer from different bases in Python we may use built-in int([x[, radix]]), where radix is in range [2,36]:

>>> int(”111″, 2), int(”1A”, 16)
(7, 26)

Atom

An atom is a literal or a constant with name. Atoms should start with small letter or it should be enclosed in single quote(’) if it contains other characters than alphanumeric characters, _, or @.
Atoms can not have values like variables, they are simply names (constant with name).

1> erlang.
erlang
2> ‘hello world’.
‘hello world’
3> ‘Erlang’.
‘Erlang’

In Python there are also atoms in form of identifiers or literals, but no correspondence with Erlang’s constant with name type of atom.

More examples with atoms to come when we get acquainted with other data types
and parts of Erlang.

Binary

From a reference manual:
“A binary is used to store an area of untyped memory.
Binaries are expressed using the bit syntax.”

This data type allows to store large raw chunks of memory in an efficient way, much more space-efficient than in lists or tuples.

Binaries are written in double less-than and greater-than brackets and printed in that form as well.
They can be constructed from set of constants or string literal:

1> <<1,2,3>>.
<<1,2,3>>
2> <<”erlang”>>.
<<”erlang”>>

Or from bound variables:

3> A = 1, B = 2, C = 3.
3
4> Bin = <<A, B, C>>.
<<1,2,3>>

Remember pattern matching ?
Binary (Bin for short) can also be used for matching:

5> <<X, Y, Z>> = Bin.
<<1,2,3>>
6> X.
1
7> Y.
2
8> Z.
3

There is more than that, namely bit syntax which allows to pack/unpack sequences of bits in Bin. Actually bit syntax is an extension to pattern matching.
Let’s pack three variables into 16 bit memory area in a variable Mem, then unpack it to another three variables using bit syntax:

1> X = 2.
2
2> Y = 40.
40
3> Z = 30.
30
4> Mem = <<X:3, Y:7, Z:6>>.
<<74,30>>
5> <<X1:3, Y1:7, Z1:6>> = Mem.
<<74,30>>
6> X1.
2
7> Y1.
40
8> Z1.
30

In foregoing output you see that after packing shell prints packed three variables as <<74,30>> which is 16 bits.

In Python 2.5 there is no built-in data type to support binary data, but in Python 3000 binary data is represented by a separate mutable “bytes” data type.

Fun

Fun is an anonymous function, ie without name.

In Python this is infamous lambda, which had many debates whether it should remain in language or not. Personally i like lambda and glad it survived and will continue its life in Python 3000.

But Ok, back to Erlang’s Fun. Here how we can define it in shell:

1> Double = fun(X) -> X * 2 end.
#Fun<erl_eval.6.56006484>
2> Double(4).
8

In contrast to Python Erlang’s Fun can contain statements and is “more advanced” in general:

1> Print_val = fun(X) -> io:format(”X = ~p~n”, [X]), X*2 end.
#Fun<erl_eval.6.56006484>
2> Print_val(3).
X = 3
6
>>> print_val = lambda x: print ‘x = %d’ % x; x*2
  File “<stdin>”, line 1
    print_val = lambda x: print ‘x = %d’ % x; x*2
                              ^
SyntaxError: invalid syntax
>>> print_val = lambda x: x*2
>>> print_val(3)
6

More advanced examples of Fun like Funs with several different clauses i’ll show when discussing in details functions in Erlang in upcoming tutorials.

Tuple

It’s a compound data type with fixed number of terms (Term is a piece of data of any data type). Number of elements in tuple is called size of tuple.
Syntax of tuples is:

1> Tuple = {ruslan, 28}.
{ruslan,28}
2> Info = {data, Tuple}.
{data,{ruslan,28}}

In Python tuples are defined with or without enclosing parentheses:

>>> tup = “ruslan”, 28
>>> tup
(’ruslan’, 28)
>>> info = (’data’, tup)
>>> info
(‘data’, (’ruslan’, 28))

To get number of elements in tuple.

Erlang:

3> size(Info).
2

Python:

>>> len(info)
2

List

List is a compound data type with variable number of terms. Number of elements in a list is called length of the list.

List is declared in square brackets like in Python.

1> List = [1, 2, 3, "erlang"].
[1,2,3,"erlang"]

Now Erlang specific part, which may be familiar to you if you know a bit Lisp and its CAR/CDR stuff:
List is either an empty list [] or may be represented as head (first element - CAR in Lisp) and tail (remainder of list - CDR in Lisp) which is also a list and often you can see it in form like [H|T].
This representation is recursive:
[]
[c | []]
[b | [c | []]]
[a | [b | [c | []]]]
are all lists actually.

Let’s define list in shell:

1> List = [a, 1, {b, 2}].
[a,1,{b,2}]

and unpack it with pattern matching to head and tail:

2> [H|T] = List.
[a,1,{b,2}]
3> H.
a
4> T.
[1,{b,2}]

As you see T(tail) above is also a list.

We can also construct list using head and tail syntax:

5> List2 = [c|T].
[c,1,{b,2}]

Working with Erlang you’ll see/use this syntax quite often.

To get number of elements in list.

Erlang:

6> length(List2).
3

Python:

>>> len([1, 2, 3, "python"])
4

String

In Python string type is immutable sequence of characters. In Python there is no separate character type, a character is represented by a string of one item. String literals are written in single or double quotes:

>>> ‘xyz’, “xyz”
(‘xyz’, ‘xyz’)

In Erlang strings are enclosed in double quotes, but unlike Python string is not separate data type actually. String “erlang” is just shorthand for list [$e, $r, $l, $a, $n, $g].

Adjacent string literals are concatenated at compile time:

1> “hello” “, world” ” in ” “erlang”.
“hello, world in erlang”

In Python behaviour with adjacent strings is the same:

>>> ‘hello’ ‘, world’ ‘ in ‘ ‘python’
‘hello, world in python’

Boolean

Again unlike Python Erlang has no built-in Boolean type, but it uses atoms true and false to denote Boolean values:

1> 1 < 2.
true
2> true or false.
true

In Python Boolean is separate data type:

>>> type(True), type(False)
(<type ‘bool’>, <type ‘bool’>)
>>> 1 < 2
True
>>> True or False
True

Record

Record is a data structure to hold fixed number of elements and it provides method to associate a name with particular element in tuple. It resembles structure in C. Record is not true data type, it is “tuple in disguise”(TM).
It’s syntax is:
-record(Name, {key1=Default1, key2, …}).

But in shell we can not use -record syntax and we do not yet know about modules where it can be defined, so we’ll use ‘rd’ shell command to define record:

1> rd(info, {name=”ruslan”, age=28, height, weight}).
info

Now let’s create instance of record:

2> X = #info{}.
#info{name = “ruslan”,age = 28,height = undefined,weight = undefined}
3> X.
#info{name = “ruslan”,age = 28,height = undefined,weight = undefined}

As you see height and weight got “undefined” value.

To extract fields of record we use pattern matching:

4> #info{name=Name, age=Age} = X.
#info{name = “ruslan”,age = 28,height = undefined,weight = undefined}
5> Name.
“ruslan”
6> Age.
28

We can also access field of record with “dot syntax”:

7> X#info.name.
“ruslan”

Erlang has more data types like Pid, Port identifier, Reference which will be described later in more advanced topics.There is no direct correspondence to Python’s built-in data types like dictionaries and sets, but Erlang has modules which provide the same semantics.

That’s it for today.