Basic Types
Integer Types
u8
is used to represent single-byte values.
Characters are distinct from the numeric types (unlike C++); a char
is neither a u8
, nor an i8
.
Values used as array access indices must be usize
. The same applies to values that represent the size of arrays or vectors.
Integer literals can take a suffix indicating their type. The suffix can optionally be seperated by an underscore. eg:
42u8
is au8
value1729isize
and1729_isize
are bothisize
Compiler behavior: When infering a numeric type, the compiler will tend to favor inferring
i32
.
The following prefixes can be used with numeric literals to specify their radix:
0x
hexadecimal0o
octal0b
binary
Long numeric literals may be segmented by underscores for readability, eg: 4_295_923_000_010
or 0xffff_0f0f
.
Rust provides byte literals, which are character-like literals for u8
values: b'X'
represents the ASCII code for the character X, but as a u8
value.
You can convert from one integer type to another using the as
(type-cast) operator: 65535_u16 as i32
Floating-Point Types
The fraction part of a floating-point type may consist of a lone decimal point: 5.
is a valid float constant.
Compiler behavior: Given a floating-point number, the compiler will infer a type of
f64
.
The bool
Type
bool
values can be converted to i##
types using the as
operator:
#![allow(unused)] fn main() { assert_eq!(false as i32, 0); assert_eq!(true as i32, 1); }
But, the inverse is not true. The as
operator can't convert numeric types to bool
. You have to be more explicit by using a comparison: x != 0
Rust uses an entire byte for a bool
value in memory, so you can create a pointer to it.
Characters
Rust's character type char
represents a single Unicode character, as a 32-bit value.
char
s represent a single character in isolation. Whereas strings and streams of text use UTF-8 encoded bytes. This means the String
type represents a sequence of UTF-8 bytes, not char
s.
A char
literal is just a single Unicode character wrapped in single quotes, e.g. '©'
.
The as
operator can be used to convert char
to an integer type (i32
, u16
, etc), but the opposite is only true for u8
types. For others, use std::char::from_(integer type)
.
Tuples
Tuple elements cannot be accessed using dynamic indices. That is to say, given tuple t
, I can't use variable i
to access the i
th element.
Term: The type definition
()
is called the unit type.
Rust uses the unit type where there's no meaningful value to carry, but the context still demands us to define a type. e.g. a function that returns no value has a return type of ()
.
Shorthand: A function declaration whose return type is ommited is shorthand for returning the unit type. e.g.
fn my_fn();
is shorthand forfn my_fn() -> ();
.
Trailing commas are acceptable in tuples. They're acceptable pretty much anywhere in Rust.
Pointer Types
Pointers in Rust are much more performant and memory-efficient than they are in GCed languages.
References
&
is the immutable reference operator. It creates the reference.
&mut
is the mutable reference operator.
*
is the dereference operator. It accesses the value being referred to.
The type
&T
is pronounced "ref T", meaning "reference to a value of typeT
".
The expression
&x
creates a reference to valuex
. In words, we'd say that it "borrows a references tox
".
The expression
*x
(given thatx
is of type&T
) refers to the value thatx
is a reference to.
References are immutable by default. For a reference to be mutable, it must have type &mut T
.
Pointers in Rust can never be null. There are no pointer exceptions.
Boxes
Box
s are references whose referent is allocated directly in the heap.
When a Box
is created, enough memory is allocated on the heap to contain its value:
#![allow(unused)] fn main() { let v = vec![1, 2, 3, 4]; let b = Box::new(v); // allocated space on the heap to hold v }
When a Box
reference goes out of scope, both itself and the value it refers to in the heap are freed.
Raw Pointers
Raw pointers are only used in unsafe
code.
Arrays, Vectors, and Slices
Rust has 3 types for representing a sequence of values.
Name | Type | Description | Size | Memory |
---|---|---|---|---|
Array | [T; N] | Array of N values, each of type T | Fixed | Stack |
Vector | Vec<T> | Vector of T s | Dynamic | Heap |
Slice | &[T] | Shared slice of T s | Fixed | Stack (as pointer to heap value) |
Given any of the above types as value v
, the expression v.len()
gives the number of elements in v
, and v[i]
refers to the i
'th element of v
. i
must be of type usize
; no other integer types will work as an index.
Arrays
An array's length is built into its type and is fixed at compile time.
Implicit behavior: When working with an array value and accessing its methods, Rust implicitly converts a reference to an array to a slice. So if you need to know the methods for an array, go look at the methods for slices.
Vectors
A vector is allocated on the heap.
There are 5 main ways to create a vector:
- Use the
vec!
macro (simplest) - Build a vector by repeating a given value a certain number of times using a syntax that imitates array literals:
#![allow(unused)] fn main() { let rows = 100; let cols = 100; let pixel_buffer = vec![0; rows * cols]; println!("Buffer is {} bytes long.", pixel_buffer.len()) }
- Using
Vec::new
to create a new, empty vector, and pushing elements onto it.
#![allow(unused)] fn main() { let mut v = Vec::new(); v.push("hello"); v.push("vector"); println!("{:?}", v); println!("capacity: {}", v.capacity()); }
- Iterators produce vectors when executed (using their
.collect()
method):
#![allow(unused)] fn main() { let v: Vec<i32> = (1..4).collect(); assert_eq!(v, [1, 2, 3]); }
- If you know the size of the vector in advance, you can use
Vec::with_capacity
to create the vector, instead ofnew
:
#![allow(unused)] fn main() { let mut v = Vec::with_capacity(); v.push("hello"); v.push("vector"); println!("{:?}", v); }
Using Vec::with_capacity
instead of Vec::new
is more performant because it can prevent costly heap reallocations when a vector grows beyond its current capacity.
A vector's capacity()
method returns the number of elements the vector could hold without reallocation.
#![allow(unused)] fn main() { // Track the length and capacity of a vector as values are added to it let mut v: Vec<i32> = Vec::with_capacity(2); println!("length/capacity: {}/{}", v.len(), v.capacity()); v.push(1); v.push(2); println!("length/capacity: {}/{}", v.len(), v.capacity()); v.push(3); println!("length/capacity: {}/{}", v.len(), v.capacity()); }
As with arrays, slice methods can be used on vectors.
In stack memory, a Vec<T>
consists of three values:
Stack cell | Stack cell | Stack cell |
---|---|---|
Pointer to heap-allocated buffer | The capacity of the buffer | The current occupied size of the buffer |
Inserting and removing vectors vectors from anywhere but the end of a vector is expensive.
Slices
A slice, written [T]
(without specifying the length), is a region of an array or vector.
Since a slice can be any length, they can't be stored directly in variables or passed as function arguments; they are always passed by reference.
A reference to a slice is a fat pointer.
Term: A fat pointer is a two-word value on the stack comprised of
- A pointer to the slice's first element
- The number of elements in the slice
Whereas an ordinary reference is a non-owning pointer to a single value, a reference to a slice is a non-owning pointer to several values.
A slice is (maybe?) a psuedo-generic for any sequential data type.
You can get a reference to a slice of an array, vector, or another slice by indexing it with a range:
#![allow(unused)] fn main() { let v: Vec<f64> = vec![1., 2., 3.]; // println!() }
The term slice is often used for reference types like &[T]
or &str, but that's just shorthand. Those types are called references to slices.
String Types
String Literals
String literals are enclosed in double quotes.
Term: Rust offers raw strings that don't require backslashes or explicit inclusion of whitespace. They're similar to template string in Javascript.
#![allow(unused)] fn main() { let paragraph = r#" I'm just a regular paragraph with the appropriate spacing. "#; println!("{}", paragraph); }
Byte Strings
A string literal with the b
prefix is a byte string
. A byte string is a slice of u8
values (rather than Unicode text).
Strings in Memory
Rust strings are stored in memory using UTF-8 (not as arrays of char
s).
A String
is stored on the heap as a resizable buffer of UTF-8 text. You can think of a String
as a Vec<u8>
that is guaranteed to hold well-formed UTF-8.
Pronounciation: A
&str
is called a "stir" or "string slice".
A &str
is a reference to a sequence of UTF-8 text owned by someone else.
A &str
is a slice, so it is therefore a fat pointer. You can think of a &str
as being nothing more than a &[u8]
that is guaranteed to hold well-formed UTF-8.
A string literal is a &str
that refers to preallocated text stored in a read-only memory.
Any string type's length (returned by .len()
) is measured in bytes, not characters.
It is impossible to modify a &str
:
#![allow(unused)] fn main() { let mut s = "hello"; s[0] = 'c'; // &strs cannot be mutably indexed }
String
Ways to create a String
:
- Given a
&str
, the.to_string()
method will copy it into aString
. - The
format!()
macro works just likeprintln!()
, except that it returns a newString
instead of writing text to stdout, nor does it automatically add a newline at the end. - Arrays, slices, and vectors of strings have two methods that form a new
String
from many strings:.concat()
.join(sep)
#![allow(unused)] fn main() { let elves = vec!["snap", "crackle", "pop"]; println!("{:?}", elves.concat()); println!("{:?}", elves.join(", ")); }
A &str
can refer to both a string literal or a String
, so it's the most appropriate for function arguments when the caller should be allowed to pass either kind of string.
Unlike other languages, Rust strings are strictly Unicode only. This means that they're not always the appropriate choice for string-like data. Here are some situations where they're not the correct choice:
When you have | Use |
---|---|
Unicode text | String or &str |
Filename | std::path::PathBuf and &Path |
Binary data | Vec<u8> and &[u8] |
Environment variables | OsString and &OsStr |
Strings from a FFI | std::ffi::CString and &CStr |