Rust updates

bearophile bearophileHUGS at lycos.com
Sun Jul 8 06:49:49 PDT 2012


On Reddit they are currently discussing again about the Rust 
language, and about the browser prototype written in Rust, named 
"Servo" (https://github.com/mozilla/servo ):
http://www.reddit.com/r/programming/comments/w6h7x/the_state_of_servo_a_mozilla_experiment_in/


So I've taken another look at the Rust tutorial:
http://dl.rust-lang.org/doc/tutorial.html

and I've seen Rust is quite more defined compared to the last two 
times I've read about it. So below I put more extracts from the 
tutorial, with few comments of mine (but most text you find below 
is from the tutorial).

On default in Rust types are immutable. If you want the mutable 
type you need to annotate it with "mut" in some way.

Rust designers seems to love really short keywords, this is in my 
opinion a bit silly. On the other hand in D you have keywords 
like "immutable" that are rather long to type. So I prefer a mid 
way between those two.

Rust has type classes from Haskell (with some simplifications for 
higher kinds), uniqueness typing, and typestates.

In Haskell typeclasses are very easy to use.

 From my limited study, the Rust implementation of uniqueness 
typing doesn't look hard to understand and use. It statically 
enforced, it doesn't require lot of annotations and I think its 
compiler implementation is not too much hard, because it's a pure 
type system test. Maybe D designers should take a look, maybe for 
D3.

Macros are planned, but I think they are not fully implemented.

I think in Go the function stack is segmented and growable as in 
Go. This saves RAM if you need a small stack, and avoids stack 
overflows where lot of stack is needed.

-------------------------

Instead of the 3 char types of D, Rust has 1 char type:

char  A character is a 32-bit Unicode code point.

-------------------------

And only one string type:

str  String type. A string contains a UTF-8 encoded sequence of 
characters.

For algorithms that do really need to index by character, there's 
the option to convert your string to a character vector (using 
str::chars).

-------------------------

Tuples are rightly built-in. Tuple singletons are not supported 
(empty tuples are kind of supported with ()):


(T1, T2)  Tuple type. Any arity above 1 is supported.

-------------------------

Despite Walter said that having more than a type of pointer is 
bad, both Ada and Rust have several pointer types. Rust has three 
of them (plus their mutable variants).


Rust supports several types of pointers. The simplest is the 
unsafe pointer, written *T, which is a completely unchecked 
pointer type only used in unsafe code (and thus, in typical Rust 
code, very rarely). The safe pointer types are @T for shared, 
reference-counted boxes, and ~T, for uniquely-owned pointers.

All pointer types can be dereferenced with the * unary operator.

Shared boxes never cross task boundaries.

-------------------------

This seems a bit overkill to me:

It's also possible to avoid any type ambiguity by writing integer 
literals with a suffix. The suffixes i and u are for the types 
int and uint, respectively: the literal -3i has type int, while 
127u has type uint. For the fixed-size integer types, just suffix 
the literal with the type name: 255u8, 50i64, etc.

-------------------------

This is very strict, maybe too much strict:

No implicit conversion between integer types happens. If you are 
adding one to a variable of type uint, saying += 1u8 will give 
you a type error.

-------------------------

Even more than Go:

++ and -- are missing


And fixes a C problem:

the logical bitwise operators have higher precedence. In C, x & 2 
 > 0 comes out as x & (2 > 0), in Rust, it means (x & 2) > 0, 
which is more likely to be what you expect (unless you are a C 
veteran).

-------------------------

Enums are datatypes that have several different representations. 
For example, the type shown earlier:

enum shape {
     circle(point, float),
     rectangle(point, point)
}

A value of this type is either a circle, in which case it 
contains a point record and a float, or a rectangle, in which 
case it contains two point records. The run-time representation 
of such a value includes an identifier of the actual form that it 
holds, much like the 'tagged union' pattern in C, but with better 
ergonomics.

The above declaration will define a type shape that can be used 
to refer to such shapes, and two functions, circle and rectangle, 
which can be used to construct values of the type (taking 
arguments of the specified types). So circle({x: 0f, y: 0f}, 10f) 
is the way to create a new circle.

Enum variants do not have to have parameters. This, for example, 
is equivalent to a C enum:

enum direction {
     north,
     east,
     south,
     west
}

-------------------------

This is probably quite handy:

A powerful application of pattern matching is destructuring, 
where you use the matching to get at the contents of data types. 
Remember that (float, float) is a tuple of two floats:

fn angle(vec: (float, float)) -> float {
     alt vec {
       (0f, y) if y < 0f { 1.5 * float::consts::pi }
       (0f, y) { 0.5 * float::consts::pi }
       (x, y) { float::atan(y / x) }
     }
}

- - - - - - - -

Records can be destructured in alt patterns. The basic syntax is 
{fieldname: pattern, ...}, but the pattern for a field can be 
omitted as a shorthand for simply binding the variable with the 
same name as the field.

alt mypoint {
     {x: 0f, y: y_name} { /* Provide sub-patterns for fields */ }
     {x, y}             { /* Simply bind the fields */ }
}

The field names of a record do not have to appear in a pattern in 
the same order they appear in the type. When you are not 
interested in all the fields of a record, a record pattern may 
end with , _ (as in {field1, _}) to indicate that you're ignoring 
all other fields.

- - - - - - - -

For enum types with multiple variants, destructuring is the only 
way to get at their contents. All variant constructors can be 
used as patterns, as in this definition of area:

fn area(sh: shape) -> float {
     alt sh {
         circle(_, size) { float::consts::pi * size * size }
         rectangle({x, y}, {x: x2, y: y2}) { (x2 - x) * (y2 - y) }
     }
}

-------------------------

This is quite desirable in D too:

To a limited extent, it is possible to use destructuring patterns 
when declaring a variable with let. For example, you can say this 
to extract the fields from a tuple:

let (a, b) = get_tuple_of_two_ints();

-------------------------

Stack-allocated closures:

There are several forms of closure, each with its own role. The 
most common, called a stack closure, has type fn& and can 
directly access local variables in the enclosing scope.

let mut max = 0;
[1, 2, 3].map(|x| if x > max { max = x });

Stack closures are very efficient because their environment is 
allocated on the call stack and refers by pointer to captured 
locals. To ensure that stack closures never outlive the local 
variables to which they refer, they can only be used in argument 
position and cannot be stored in structures nor returned from 
functions. Despite the limitations stack closures are used 
pervasively in Rust code.

-------------------------

Unique closures:

Unique closures, written fn~ in analogy to the ~ pointer type 
(see next section), hold on to things that can safely be sent 
between processes. They copy the values they close over, much 
like boxed closures, but they also 'own' them—meaning no other 
code can access them. Unique closures are used in concurrent 
code, particularly for spawning tasks.


There are also heap-allocated closures (so there are 3 kinds of 
closures).

- - - - - - - -

In contrast to shared boxes, unique boxes are not reference 
counted. Instead, it is statically guaranteed that only a single 
owner of the box exists at any time.

let x = ~10;
let y <- x;

This is where the 'move' (<-) operator comes in. It is similar to 
=, but it de-initializes its source. Thus, the unique box can 
move from x to y, without violating the constraint that it only 
has a single owner (if you used assignment instead of the move 
operator, the box would, in principle, be copied).

Unique boxes, when they do not contain any shared boxes, can be 
sent to other tasks. The sending task will give up ownership of 
the box, and won't be able to access it afterwards. The receiving 
task will become the sole owner of the box.

-------------------------

In D you control this adding "private" before names, but I think 
a centralized control point at the top of the module is safer and 
cleaner:

By default, a module exports everything that it defines. This can 
be restricted with export directives at the top of the module or 
file.

mod enc {
     export encrypt, decrypt;
     const super_secret_number: int = 10;
     fn encrypt(n: int) -> int { n + super_secret_number }
     fn decrypt(n: int) -> int { n - super_secret_number }
}

-------------------------

This is needed by the uniqueness typing:

Evaluating a swap expression neither changes reference counts nor 
deeply copies any unique structure pointed to by the moved rval. 
Instead, the swap expression represents an indivisible exchange 
of ownership between the right-hand-side and the left-hand-side 
of the expression. No allocation or destruction is entailed.

An example of three different swap expressions:

x <-> a;
x[i] <-> a[i];
y.z <-> b.c;

-------------------------

For some info on the typestate system, from the Rust manual:

http://dl.rust-lang.org/doc/rust.html#typestate-system

This description is simpler than I have thought. It seems 
possible to create an experimental D compiler with just a similar 
typestate system, it looks like a purely additive change (but 
maybe it's not a small change). It seems to not even require new 
syntax, beside an assert-like check() that can't be disable and 
that uses a pure expression/predicate.

Bye,
bearophile


More information about the Digitalmars-d mailing list