Input and Output

Concept: All I/O in Rust is organized around 4 traits, owned by std::io:

  1. Read: Defines methods for byte-oriented input. Implementers are called readers.
  2. BufRead: Includes Read methods, plus methods for reading lines of text and so forth. Implementers are called buffered readers.
  3. Write: Defines methods for both byte-oriented and UTF-8 text output. Implementers are called writers.

Shortcut: All 4 traits are so commonly used that they can there's a prelude module containing only them. Just add:


#![allow(unused)]
fn main() {
use std::io::prelude::*;
}

Readers and Writers

One of the simplest, most low-level implementation of both Read and Write is a function that copies data from any reader to any writer:


#![allow(unused)]
fn main() {
use std::io::{self, Read, Write, ErrorKind};

const DEFAULT_BUF_SIZE: usize = 8 * 1024;

fn copy<R: ?Sized, W: ?Sized>(reader: &mut R, writer: &mut W)
    -> io::Result<u64>
    where R: Read, W: Write
{
    let mut buf = [0; DEFAULT_BUF_SIZE];
    let mut written = 0;

    loop {
        let len = match reader.read(&mut buf) {
            Ok(0) => return Ok(written),
            Ok(len) => len,
            Err(ref e) if e.kind() == ErrorKind::Interrupted => continue,
            Err(e) => return Err(e),
        };
        writer.write_all(&buf[..len])?;
        written += len as u64;
    }
}
}

Shortcut: The import statement use std::io::{self}; declares io as an alias to the std::io module, which means we can write things like std::io::Result as just io::Result.

Readers

All main methods defined by Read take the reader itself by mut reference. There are also four adapter methods that take the reader by value and transform it into an iterator or a different reader.

Note that there is no method for closing a reader. Readers and writers implement Drop, so they are closed automatically.


Reader Method reader.read(buffer)


#![allow(unused)]
fn main() {
fn read(&mut self, buf: &mut [u8]) -> Result<usize>
}

Reads an undefined number of bytes from the data source and stores them in the given buffer. The usize success value is the number of bytes read, which might be less than or equal to buffer.len(), even if there's still more data to read.

If read returns Ok(0), theres no more input to read.

On error, read returns Err(err), where err is an io::Error value. io::Errors are printable for humans. For computers, you should use the .kind() method, which returns an error code of type io::ErrorKind.

io::ErrorKind is an enum with lots of different types of errors. Most variants shouldn't be ignored because they indicate actual issues, but not all. io::ErrorKind::Interrupted corresponds to the EINTR UNIX error code, which means the signal was interrupted and can in almost all scenarios be ignored.


Reader Method reader.read_to_end(&mut byte_vec)


#![allow(unused)]
fn main() {
fn read_to_end(&mut self, buf: &mut Vec<u8>) -> Result<usize>
}

Reads all remaining input from the reader into a vector.

There's no limit on the amount of data that read_to_end will return, so it's usually a good idea to impose a limit using .take().


Reader Method reader.read_to_string(&mut string)


#![allow(unused)]
fn main() {
fn read_to_string(&mut self, buf: &mut String) -> Result<usize>
}

Reads all remaining input from the reader into a string. If the source provides data that isn't valid UTF-8, read_to_string will return an ErrorKind::InvalidData error.


Reader Method reader.read_exact(&mut buf)


#![allow(unused)]
fn main() {
fn read_exact(&mut self, buf: &mut [u8]) -> Result<()>
}

Reads exactly enough data to fill the given buffer. If the reader runs out of data before reading buf.len() bytes, read_exact returns an ErrorKind::UnexpectedEof error.


Adapter reader.bytes()


#![allow(unused)]
fn main() {
fn bytes(self) -> Bytes<Self> where Self: Sized
}

Converts a reader into an iterator over the bytes of the input stream. The item types is io::Result<u8>, so an error check is required for every byte. It calls reader.read() one byte at a time, so this method is super inefficient if the reader isn't buffered.


Adapter reader.chars()


#![allow(unused)]
fn main() {
fn chars(self) -> Chars<Self> where Self: Sized
}

Converts a reader into an iterator over the input stream as UTF-8 characters.


Adapter reader.chain(reader2)


#![allow(unused)]
fn main() {
fn chain<R: Read>(self, next: R) -> Chain<Self, R> where Self: Sized
}

Creates a new reader that produces all of the input from reader, followed by all of the input from reader2.


Adapter reader.take(n)


#![allow(unused)]
fn main() {
fn take(self, limit: u64) -> Take<Self> where Self: Sized
}

Creates a new reader that reads from the same source as reader, but is limited to n bytes of input.

Buffered Readers

Buffered readers implement both Read and BufRead, which provides three main methods.

Come back and add the type signatures of the following methods.


Buffered Reader Method reader.read_line(&mut line)

Reads a line of text and appends it to line, which is of type String.

The method returns an io::Result<usize, io::Error>, where usize is the number of bytes read, including the line ending, if any.

If the reader is at the end of the input, line will be unchanged and the method will return Ok(0).


☆ Buffered Reader Method reader.lines()

Returns an iterator over the lines of the input.

The item type is io::Result<String, io::Error>. Newline characters are not included in the strings.


Buffered Reader Methods reader.read_until(stop_byte, &mut byte_vec) and reader.split(stop_byte)

Byte-oriented versions of .read_line() and .lines(). Produces Vec<u8> instead of Strings.

Reading Lines

We can use .lines() to create a function that implements the Unix grep utility. Our function receives a generic reader (ie anything that implements BufRead).


#![allow(unused)]
fn main() {
use std::io;
use std::io::prelude::*;

fn grep<R>(target: &str, reader: R)
    -> io::Result<()>
    where R: BufRead
{
    for line_result in reader.lines() {
        let line = line_result?;
        if line.contains(target) {
            println!("{}", line);
        }
    }
    Ok(())
}
}

In the case that we want to use stdin as our source of data, we have to convert it to a reader using its .lock() method like so:


#![allow(unused)]
fn main() {
let stdin = io::stdin();
grep(&target, stdin.lock())?; // ok
}

If we wanted to use our function with the contents of a file, we could do so like this:


#![allow(unused)]
fn main() {
let f = File::open(file)?;
grep(&target, BufReader::new(f))?; // also ok
}

Collecting Lines

Look at this.

Writers

To send output to a writer, use the write!() and writeln!() macros.


#![allow(unused)]
fn main() {
writeln!(io::stderr(), "error: world not helloable")?;
writeln!(&mut byte_vec, "The greated common divisor of {:?} is {}", numbers, d)?;
}

The write macros are the same as the print macros except for two differences:

  1. The write macros take an extra first argument, a writer.
  2. The write macros return a Result, so errors must be handled. When the print macros experience an issue, they simply panic.

The Write trait has these methods:


Writer Method writer.write(&buf)

Writes some of the bytes in the slice buf to the underlying stream.

Returns an io::Result<usize, io::Error>.

On success, gives the number of bytes written, which may be less than buf.len(), depending on the stream's mood.

This is the lowest-level method and is usually not used in practice.


Writer Method writer.write_all(&buf)

Writes all the bytes in the slice buf.

Returns Result<(), io::Error>.


Writer Method writer.flush()

Flushes any buffered data to the underlying stream.

Returns Result<(), io::Error>.


Warning: When a BufWriter is dropped, all remaining buffered data is written to the underlying writer. However, if an error occurs during this write, the error is ignored. To make sure errors don't get swallowed, always call .flush() on all buffered writers before dropping them.

Files

We've got two main ways to open a file:


File Method File::open(filename)

Opens an existing file for reading. It's an error if the file doesn't exist.

Returns an io::Result<File, io::Error>.


File Method File::create(filename)

Creates a new file for writing. If a file exists with the given filename, it gets truncated.

Returns an io::Result<File, io::Error>.


There is an altertive that uses OpenOptions to specify the exact open behavior we want.


#![allow(unused)]
fn main() {
use std::fs::OpenOptions;

// Create a file if none exists, or append to an existing one
let log = OpenOptions::new()
    .append(true)
    .open("server.log");

// Create a file, or fail if one with the specified name already exists
let new_file = OpenOptions::new()
    .write(true)
    .create_new(true)
    .open("new_file.txt")?;
}

Just like with readers and writers, you can add a buffer to a File if needed.

Term The method-chaining pattern seen with OpenOptions is called a builder in Rust.

Seeking

Files also implement the Seek trait, which means you can hop around within a File rather than reading or writing in a single pass from the beginning to the end.

Seek is defined like this:


#![allow(unused)]
fn main() {
pub trait Seek {
    fn seek(&mut self, pos: SeekFrom) -> io::Result<u64>;
}

pub enum SeekFrom {
    Start(u64),
    End(i64),
    Current(i64),
}
}

Seeking within a file is slow.

Other Reader and Writer Types

Add notes about common types of readers and writers.

Handy Readers and Writers

The std::io offers a few function that return trivial readers and writers.


io::sink()

No-op writer. All the write methods return Ok and the data is discarded.


io::empty()

No-op reader. Reading always succeeds and returns end-of-input.


io::repeat(byte)

Creates a reader that repeats the given byte endlessly.

Binary Data, Compression, and Serialization

Go here for some crate recommendations.

Files and Directories

OsStr and Path

Rust strings are always valid Unicode. Filenames are almost always Unicode.

To solve the Unicode issue, Rust provides std::ffi::OsStr and std::ffi::OsString.

std::ffi::OsStr

OsStr is a string type that's a subset of UTF-8. It's sole purpose is to represent all filenames, CLI arguments, and environment variables on all systems.

std::path::Path

Path is exactly like OsStr, but it provides a bunch of handy filename-related methods.

When to use which?

For absolute and relative paths, use Path. For an individual component of a path, use OsStr.


Owning types

For each string type, there's always a corresponding owning type that owns heap-allocated data.

String type | Owning type | Conversion method --|-- str | String | .to_string() OsStr | OsString | .to_os_string() Path | PathBuf | .to_path_buf()

All three of these string types implement a common trait, AsRef<Path>, which makes it easy to declare a generic function that accepts "any filename type" as an argument.


#![allow(unused)]
fn main() {
use std::path::Path;
use std::io;

fn open_file<P>(path_arg: P)
    -> io::Result<()>
    where P: AsRef<Path>
{
    let path = path_arg.as_ref();
    // ...
}
}

Path and PathBuf Methods

Path Method Path::new(str)


#![allow(unused)]
fn main() {
fn new<S: AsRef<OsStr> + ?Sized>(s: &S) -> &Path
}

Converts a &str or &OsStr to a &Path. The string doesn't get copied; the new &Path points to the same bytes as the original argument.


Path Method path.parent()


#![allow(unused)]
fn main() {
fn parent(&self) -> Option<&Path>
}

Returns the path's parent directory, if any. The path doesn't get copied; the parent directory of path is always a substring of path.


Path Method path.file_name()


#![allow(unused)]
fn main() {
fn file_name(&self) -> Option<&OsStr>
}

Returns the last component of path, if any.


Path Methods path.is_absolute() and path.is_relative()


#![allow(unused)]
fn main() {
fn is_absolute(&self) -> bool
fn is_relative(&self) -> bool
}

Tells you whether the path is absolute or relative.


Path Method path1.join(path2)


#![allow(unused)]
fn main() {
fn join<P: AsRef<Path>>(&self, path: P) -> PathBuf
}

Joins two paths. If path2 is an absolute path, it just returns a copy of path2.

Use Case: The path join method can be used to turn any path into an absolute path.


#![allow(unused)]
fn main() {
let abs_path = std::env::current_dir()?.join(any_path);
}

Path Method path.components()


#![allow(unused)]
fn main() {
fn components(&self) -> Components
}

Creates an iterator over the components of the given path, from left to right. The Item type of the iterator is std::path::Component, which is an enum:


#![allow(unused)]
fn main() {
pub enum Component<'a> {
    Prefix(PrefixComponent<'a>),
    RootDir,
    CurDir,
    ParentDir,
    Normal(&'a OsStr),
}
}

Converting Paths to Strings

Path Method path.to_str()


#![allow(unused)]
fn main() {
fn to_str(&self) -> Option<&str>
}

If path isn't valid UTF-8, this method returns None.


Path Method path.to_string_lossy()


#![allow(unused)]
fn main() {
fn to_string_lossy(&self) -> Cow<str>
}

Basically the same as to_str, but it'll always return a string regardless of whether or not the path is valid UTF-8. In the case the case that it's not valid, each invalid byte is replaced with the Unicode replacement character, �.

Path Method path.display()


#![allow(unused)]
fn main() {
fn display(&self) -> Display
}

Doesn't return a string, but it implements Display so that it can be used with print! macro and friends.

Filesystem Access Functions

Go here for some reference.

Reading Directories

To list the contents of a directory, use std::fs::read_dir, or the .read_dir() method of a Path:


#![allow(unused)]
fn main() {
use std::path;

for entry_result in path.read_dir()? {
    let entry = entry_result?;
    println!("{}", entry.file_name().to_string_lossy());
}
}

The read_dir method has the following type signature:


#![allow(unused)]
fn main() {
fn read_dir<P: AsRef<Path>>(path: P) -> Result<ReadDir>
}

A DirEntry is a struct with a few methods that have the following signatures:


#![allow(unused)]
fn main() {
struct DirEntry(_);

fn path(&self) -> PathBuf
fn metadata(&self) -> Result<Metadata>
fn file_type(&self) -> Result<FileType>
fn file_name(&self) -> OsString
}

Platform-Specific Features

The std::os module contains a bunch of platform-specific features, like symlink.

If you want code to compile on all platforms, with support for symbolic links on Unix, for instance, you must use #[cfg] in the program as well. In such cases, it's easiest to import symlink on Unix, while defining a symlink stub on other systems:


#![allow(unused)]
fn main() {
#[cfg(unix)]
use std::os::unix::fs::symlink;

// Stub implementation of symlink for platforms that don't have it
#[cfg(not(unix))]
fn symlink<P: AsRef<Path>, Q: AsRef<Path>>(src: P, _dst: Q) -> std::io::Result<()> {
    Err(io::Error::new(
        io::ErrorKind::Other,
        format!("can't copy symbolic link {}", src.as_ref().display())
    ))
}
}

There's a prelude module that can be used to enable all Unix extensions at once:


#![allow(unused)]
fn main() {
use std::os::unix::prelude::*;
}

Networking

For low-level networking code, start with the std::net module.

Go here for networking crate recommendations.