Rust updates
bearophile
bearophileHUGS at lycos.com
Sun Jul 8 06:49:49 PDT 2012
On Reddit they are currently discussing again about the Rust
language, and about the browser prototype written in Rust, named
"Servo" (https://github.com/mozilla/servo ):
http://www.reddit.com/r/programming/comments/w6h7x/the_state_of_servo_a_mozilla_experiment_in/
So I've taken another look at the Rust tutorial:
http://dl.rust-lang.org/doc/tutorial.html
and I've seen Rust is quite more defined compared to the last two
times I've read about it. So below I put more extracts from the
tutorial, with few comments of mine (but most text you find below
is from the tutorial).
On default in Rust types are immutable. If you want the mutable
type you need to annotate it with "mut" in some way.
Rust designers seems to love really short keywords, this is in my
opinion a bit silly. On the other hand in D you have keywords
like "immutable" that are rather long to type. So I prefer a mid
way between those two.
Rust has type classes from Haskell (with some simplifications for
higher kinds), uniqueness typing, and typestates.
In Haskell typeclasses are very easy to use.
From my limited study, the Rust implementation of uniqueness
typing doesn't look hard to understand and use. It statically
enforced, it doesn't require lot of annotations and I think its
compiler implementation is not too much hard, because it's a pure
type system test. Maybe D designers should take a look, maybe for
D3.
Macros are planned, but I think they are not fully implemented.
I think in Go the function stack is segmented and growable as in
Go. This saves RAM if you need a small stack, and avoids stack
overflows where lot of stack is needed.
-------------------------
Instead of the 3 char types of D, Rust has 1 char type:
char A character is a 32-bit Unicode code point.
-------------------------
And only one string type:
str String type. A string contains a UTF-8 encoded sequence of
characters.
For algorithms that do really need to index by character, there's
the option to convert your string to a character vector (using
str::chars).
-------------------------
Tuples are rightly built-in. Tuple singletons are not supported
(empty tuples are kind of supported with ()):
(T1, T2) Tuple type. Any arity above 1 is supported.
-------------------------
Despite Walter said that having more than a type of pointer is
bad, both Ada and Rust have several pointer types. Rust has three
of them (plus their mutable variants).
Rust supports several types of pointers. The simplest is the
unsafe pointer, written *T, which is a completely unchecked
pointer type only used in unsafe code (and thus, in typical Rust
code, very rarely). The safe pointer types are @T for shared,
reference-counted boxes, and ~T, for uniquely-owned pointers.
All pointer types can be dereferenced with the * unary operator.
Shared boxes never cross task boundaries.
-------------------------
This seems a bit overkill to me:
It's also possible to avoid any type ambiguity by writing integer
literals with a suffix. The suffixes i and u are for the types
int and uint, respectively: the literal -3i has type int, while
127u has type uint. For the fixed-size integer types, just suffix
the literal with the type name: 255u8, 50i64, etc.
-------------------------
This is very strict, maybe too much strict:
No implicit conversion between integer types happens. If you are
adding one to a variable of type uint, saying += 1u8 will give
you a type error.
-------------------------
Even more than Go:
++ and -- are missing
And fixes a C problem:
the logical bitwise operators have higher precedence. In C, x & 2
> 0 comes out as x & (2 > 0), in Rust, it means (x & 2) > 0,
which is more likely to be what you expect (unless you are a C
veteran).
-------------------------
Enums are datatypes that have several different representations.
For example, the type shown earlier:
enum shape {
circle(point, float),
rectangle(point, point)
}
A value of this type is either a circle, in which case it
contains a point record and a float, or a rectangle, in which
case it contains two point records. The run-time representation
of such a value includes an identifier of the actual form that it
holds, much like the 'tagged union' pattern in C, but with better
ergonomics.
The above declaration will define a type shape that can be used
to refer to such shapes, and two functions, circle and rectangle,
which can be used to construct values of the type (taking
arguments of the specified types). So circle({x: 0f, y: 0f}, 10f)
is the way to create a new circle.
Enum variants do not have to have parameters. This, for example,
is equivalent to a C enum:
enum direction {
north,
east,
south,
west
}
-------------------------
This is probably quite handy:
A powerful application of pattern matching is destructuring,
where you use the matching to get at the contents of data types.
Remember that (float, float) is a tuple of two floats:
fn angle(vec: (float, float)) -> float {
alt vec {
(0f, y) if y < 0f { 1.5 * float::consts::pi }
(0f, y) { 0.5 * float::consts::pi }
(x, y) { float::atan(y / x) }
}
}
- - - - - - - -
Records can be destructured in alt patterns. The basic syntax is
{fieldname: pattern, ...}, but the pattern for a field can be
omitted as a shorthand for simply binding the variable with the
same name as the field.
alt mypoint {
{x: 0f, y: y_name} { /* Provide sub-patterns for fields */ }
{x, y} { /* Simply bind the fields */ }
}
The field names of a record do not have to appear in a pattern in
the same order they appear in the type. When you are not
interested in all the fields of a record, a record pattern may
end with , _ (as in {field1, _}) to indicate that you're ignoring
all other fields.
- - - - - - - -
For enum types with multiple variants, destructuring is the only
way to get at their contents. All variant constructors can be
used as patterns, as in this definition of area:
fn area(sh: shape) -> float {
alt sh {
circle(_, size) { float::consts::pi * size * size }
rectangle({x, y}, {x: x2, y: y2}) { (x2 - x) * (y2 - y) }
}
}
-------------------------
This is quite desirable in D too:
To a limited extent, it is possible to use destructuring patterns
when declaring a variable with let. For example, you can say this
to extract the fields from a tuple:
let (a, b) = get_tuple_of_two_ints();
-------------------------
Stack-allocated closures:
There are several forms of closure, each with its own role. The
most common, called a stack closure, has type fn& and can
directly access local variables in the enclosing scope.
let mut max = 0;
[1, 2, 3].map(|x| if x > max { max = x });
Stack closures are very efficient because their environment is
allocated on the call stack and refers by pointer to captured
locals. To ensure that stack closures never outlive the local
variables to which they refer, they can only be used in argument
position and cannot be stored in structures nor returned from
functions. Despite the limitations stack closures are used
pervasively in Rust code.
-------------------------
Unique closures:
Unique closures, written fn~ in analogy to the ~ pointer type
(see next section), hold on to things that can safely be sent
between processes. They copy the values they close over, much
like boxed closures, but they also 'own' them—meaning no other
code can access them. Unique closures are used in concurrent
code, particularly for spawning tasks.
There are also heap-allocated closures (so there are 3 kinds of
closures).
- - - - - - - -
In contrast to shared boxes, unique boxes are not reference
counted. Instead, it is statically guaranteed that only a single
owner of the box exists at any time.
let x = ~10;
let y <- x;
This is where the 'move' (<-) operator comes in. It is similar to
=, but it de-initializes its source. Thus, the unique box can
move from x to y, without violating the constraint that it only
has a single owner (if you used assignment instead of the move
operator, the box would, in principle, be copied).
Unique boxes, when they do not contain any shared boxes, can be
sent to other tasks. The sending task will give up ownership of
the box, and won't be able to access it afterwards. The receiving
task will become the sole owner of the box.
-------------------------
In D you control this adding "private" before names, but I think
a centralized control point at the top of the module is safer and
cleaner:
By default, a module exports everything that it defines. This can
be restricted with export directives at the top of the module or
file.
mod enc {
export encrypt, decrypt;
const super_secret_number: int = 10;
fn encrypt(n: int) -> int { n + super_secret_number }
fn decrypt(n: int) -> int { n - super_secret_number }
}
-------------------------
This is needed by the uniqueness typing:
Evaluating a swap expression neither changes reference counts nor
deeply copies any unique structure pointed to by the moved rval.
Instead, the swap expression represents an indivisible exchange
of ownership between the right-hand-side and the left-hand-side
of the expression. No allocation or destruction is entailed.
An example of three different swap expressions:
x <-> a;
x[i] <-> a[i];
y.z <-> b.c;
-------------------------
For some info on the typestate system, from the Rust manual:
http://dl.rust-lang.org/doc/rust.html#typestate-system
This description is simpler than I have thought. It seems
possible to create an experimental D compiler with just a similar
typestate system, it looks like a purely additive change (but
maybe it's not a small change). It seems to not even require new
syntax, beside an assert-like check() that can't be disable and
that uses a pure expression/predicate.
Bye,
bearophile
More information about the Digitalmars-d
mailing list