More on Rust language

Thu Nov 3 20:14:29 PDT 2011

Through Reddit I've found two introductions to the system language Rust being developed by Mozilla. This is one of them:

http://marijnhaverbeke.nl/rust_tutorial/

This is an alpha-state tutorial, so some parts are unfinished and some parts will probably change, in the language too.

Unfortunately this first tutorial doesn't discuss typestates and syntax macros (yet), two of the most significant features of Rust. The second tutorial discussed a bit typestates too.

Currently the Rust compiler is written in Rust and it's based on the LLVM back-end. This allows it to eat its own dog food (there are few descriptions of typestate usage in the compiler itself) and the backend is efficient enough. Compared to DMD the Rust compiler is in a earlier stage of development, it works and it's able to compile itself but I think it's not usable yet for practical purposes.

On the GitHub page the Rust project has 547 "Watch" and 52 "Fork", while DMD has 159 and 49 of them, despite Rust is a quite younger compiler/software compared to D/DMD. So it seems enough people are interested in Rust.

Most of the text below is quotations from the tutorials.

---------------------------

http://marijnhaverbeke.nl/rust_tutorial/control.html

Pattern matching

Rust's alt construct is a generalized, cleaned-up version of C's switch construct. You provide it with a value and a number of arms, each labelled with a pattern, and it will execute the arm that matches the value.

alt my_number {
  0       { std::io::println("zero"); }
  1 | 2   { std::io::println("one or two"); }
  3 to 10 { std::io::println("three to ten"); }
  _       { std::io::println("something else"); }
}

There is no 'falling through' between arms, as in C—only one arm is executed, and it doesn't have to explicitly break out of the construct when it is finished.

The part to the left of each arm is called the pattern. Literals are valid patterns, and will match only their own value. The pipe operator (|) can be used to assign multiple patterns to a single arm. Ranges of numeric literal patterns can be expressed with to. The underscore (_) is a wildcard pattern that matches everything.

If the arm with the wildcard pattern was left off in the above example, running it on a number greater than ten (or negative) would cause a run-time failure. When no arm matches, alt constructs do not silently fall through—they blow up instead.

A powerful application of pattern matching is destructuring, where you use the matching to get at the contents of data types. Remember that (float, float) is a tuple of two floats:

fn angle(vec: (float, float)) -> float {
    alt vec {
      (0f, y) when y < 0f { 1.5 * std::math::pi }
      (0f, y) { 0.5 * std::math::pi }
      (x, y) { std::math::atan(y / x) }
    }
}

A variable name in a pattern matches everything, and binds that name to the value of the matched thing inside of the arm block. Thus, (0f, y) matches any tuple whose first element is zero, and binds y to the second element. (x, y) matches any tuple, and binds both elements to a variable.

Any alt arm can have a guard clause (written when EXPR), which is an expression of type bool that determines, after the pattern is found to match, whether the arm is taken or not. The variables bound by the pattern are available in this guard expression.

Record patterns

Records can be destructured on in alt patterns. The basic syntax is {fieldname: pattern, ...}, but the pattern for a field can be omitted as a shorthand for simply binding the variable with the same name as the field.

alt mypoint {
    {x: 0f, y: y_name} { /* Provide sub-patterns for fields */ }
    {x, y}             { /* Simply bind the fields */ }
}

The field names of a record do not have to appear in a pattern in the same order they appear in the type. When you are not interested in all the fields of a record, a record pattern may end with , _ (as in {field1, _}) to indicate that you're ignoring all other fields.

Tags

Tags [FIXME terminology] are datatypes that have several different representations. For example, the type shown earlier:

tag shape {
    circle(point, float);
    rectangle(point, point);
}

A value of this type is either a circle¸ in which case it contains a point record and a float, or a rectangle, in which case it contains two point records. The run-time representation of such a value includes an identifier of the actual form that it holds, much like the 'tagged union' pattern in C, but with better ergonomics.

Tag patterns

For tag types with multiple variants, destructuring is the only way to get at their contents. All variant constructors can be used as patterns, as in this definition of area:

fn area(sh: shape) -> float {
    alt sh {
        circle(_, size) { std::math::pi * size * size }
        rectangle({x, y}, {x: x2, y: y2}) { (x2 - x) * (y2 - y) }
    }
}

------------------------------

// The type of this vector will be inferred based on its use.
let x = [];

// Explicitly say this is a vector of integers.
let y: [int] = [];

---------------------------

Tuples

Tuples in Rust behave exactly like records, except that their fields do not have names (and can thus not be accessed with dot notation). Tuples can have any arity except for 0 or 1 (though you may see nil, (), as the empty tuple if you like).

let mytup: (int, int, float) = (10, 20, 30.0);
alt mytup {
  (a, b, c) { log a + b + (c as int); }
}

---------------------------

Pointers

Rust supports several types of pointers. The simplest is the unsafe pointer, written *TYPE, which is a completely unchecked pointer type only used in unsafe code (and thus, in typical Rust code, very rarely). The safe pointer types are @TYPE for shared, reference-counted boxes, and ~TYPE, for uniquely-owned pointers.

All pointer types can be dereferenced with the * unary operator.

---------------------------

When inserting an implicit copy for something big, the compiler will warn, so that you know that the code is not as efficient as it looks.

---------------------------

Argument passing styles

...

Another style is by-move, which will cause the argument to become de-initialized on the caller side, and give ownership of it to the called function. This is written -.

Finally, the default passing styles (by-value for non-structural types, by-reference for structural ones) are written + for by-value and && for by(-immutable)-reference. It is sometimes necessary to override the defaults. We'll talk more about this when discussing generics.

==============================================

The second introduction I have found:
https://github.com/graydon/rust/wiki/

---------------------------

https://github.com/graydon/rust/wiki/Unit-testing

Rust has built in support for simple unit testing. Functions can be marked as unit tests using the 'test' attribute.

#[test]
fn return_none_if_empty() {
   ... test code ...
}

A test function's signature must have no arguments and no return value. To run the tests in a crate, it must be compiled with the '--test' flag: rustc myprogram.rs --test -o myprogram-tests. Running the resulting executable will run all the tests in the crate. A test is considered successful if its function returns; if the task running the test fails, through a call to fail, a failed check or assert, or some other means, then the test fails.

When compiling a crate with the '--test' flag '--cfg test' is also implied, so that tests can be conditionally compiled.

#[cfg(test)]
mod tests {
  #[test]
  fn return_none_if_empty() {
    ... test code ...
  }
}

Note that attaching the 'test' attribute to a function does not imply the 'cfg(test)' attribute. Test items must still be explicitly marked for conditional compilation (though this could change in the future).

Tests that should not be run can be annotated with the 'ignore' attribute. The existence of these tests will be noted in the test runner output, but the test will not be run.

A test runner built with the '--test' flag supports a limited set of arguments to control which tests are run: the first free argument passed to a test runner specifies a filter used to narrow down the set of tests being run; the '--ignored' flag tells the test runner to run only tests with the 'ignore' attribute.
Parallelism

Parallelism

By default, tests are run in parallel, which can make interpreting failure output difficult. In these cases you can set the RUST_THREADS environment variable to 1 to make the tests run sequentially.

Examples
Typical test run

> mytests

running 30 tests
running driver::tests::mytest1 ... ok
running driver::tests::mytest2 ... ignored
... snip ...
running driver::tests::mytest30 ... ok

result: ok. 28 passed; 0 failed; 2 ignored

Test run with failures

> mytests

running 30 tests
running driver::tests::mytest1 ... ok
running driver::tests::mytest2 ... ignored
... snip ...
running driver::tests::mytest30 ... FAILED

result: FAILED. 27 passed; 1 failed; 2 ignored

Running ignored tests

> mytests --ignored

running 2 tests
running driver::tests::mytest2 ... failed
running driver::tests::mytest10 ... ok

result: FAILED. 1 passed; 1 failed; 0 ignored

Running a subset of tests

> mytests mytest1

running 11 tests
running driver::tests::mytest1 ... ok
running driver::tests::mytest10 ... ignored
... snip ...
running driver::tests::mytest19 ... ok

result: ok. 11 passed; 0 failed; 1 ignored

---------------------------

https://github.com/graydon/rust/wiki/Error-reporting

Incorrect use of numeric literals.

auto i = 0u;
i += 3; // suggest "3u"

Use of for where for each was meant.

for (v in foo.iter()) // suggest "for each"

This is something I'd like in D too:
http://d.puremagic.com/issues/show_bug.cgi?id=6638

---------------------------

https://github.com/graydon/rust/wiki/Attribute-notes

Crate Linkage Attributes

A crate's version is determined by the link attribute, which is a list meta item containing metadata about the crate. This metadata can, in turn, be used in providing partial matching parameters to syntax extension loading and crate importing directives, denoted by the syntax and use keywords respectively.

All meta items within a link attribute contribute to the versioning of a crate, and two meta items, name and vers, have special meaning and must be present in all crates compiled as shared libraries.

An example of a typical crate link attribute:

#[link(name = "std",
       vers = "0.1",
       uuid = "122bed0b-c19b-4b82-b0b7-7ae8aead7297",
       url = "http://rust-lang.org/src/std")];

==============================================

Regarding different kinds of pointers in D, I have recently found this:
http://herbsutter.com/2011/10/25/garbage-collection-synopsis-and-c/

>From what I understand in this comment by Herb Sutter, I was right when about three years ago I was asking for a second pointer type in D:

>Mark-compact (aka moving) collectors, where live objects are moved together to make allocated memory more compact. Note that doing this involves updating pointers’ values on the fly. This category includes semispace collectors as well as the more efficient modern ones like the .NET CLR’s that don’t use up half your memory or address space. C++ cannot support this without at least a new pointer type, because C/C++ pointer values are required to be stable (not change their values), so that you can cast them to an int and back, or write them to a file and back; this is why we created the ^ pointer type for C++/CLI which can safely point into #3-style compacting GC heaps. See section 3.3 of my paper (http://www.gotw.ca/publications/C++CLIRationale.pdf ) A Design Rationale for C++/CLI for more rationale about ^ and gcnew.<

Tell me if I am wrong still. How do you implement a moving GC in D if D has raw pointers? D semantics doesn't allow the GC to automatically modify those pointers when the GC moves the data.

--------------------------

As you see this post of mine doesn't discuss typestates nor syntax macros. I have not found enough info about them in the Rust docs.

Even if Rust will not become widespread, it will introduce typestates in the cauldron of features known by future language designers (and maybe future programmers too), or it will show why typestates are not a good idea. In all three cases Rust will be useful.

Some comments regarding D:
- I'd like the better error messages I have discussed in bug 6638.
- Tuple de-structuring syntax will be good to have in D too. There is a patch on this. If the ideas of the patch are not developed enough, then I suggest to present the design problems and to discuss and solve them.
- I'd like a bit more flexible switch in D, discussion: http://d.puremagic.com/issues/show_bug.cgi?id=596
  This is just an additive change, I think it causes no breaking changes.
- Tag patterns used inside the switch-like "alt": syntax-wise this looks less easy to implement in D.
- I think unit testing in D needs more improvements. Rust is in a less developed state compared to D, yet its unit testing features seems better designed already. I think this is not complex stuff to design and implement.

Bye,
bearophile