Understanding deserializer lifetimes

The Deserialize and Deserializer traits both have a lifetime called 'de, as do some of the other deserialization-related traits.

trait Deserialize<'de>: Sized {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>;
}

This lifetime is what enables Serde to safely perform efficient zero-copy deserialization across a variety of data formats, something that would be impossible or recklessly unsafe in languages other than Rust.

#[derive(Deserialize)]
struct User<'a> {
    id: u32,
    name: &'a str,
    screen_name: &'a str,
    location: &'a str,
}

Zero-copy deserialization means deserializing into a data structure, like the User struct above, that borrows string or byte array data from the string or byte array holding the input. This avoids allocating memory to store a string for each individual field and then copying string data out of the input over to the newly allocated field. Rust guarantees that the input data outlives the period during which the output data structure is in scope, meaning it is impossible to have dangling pointer errors as a result of losing the input data while the output data structure still refers to it.

Trait bounds

There are two main ways to write Deserialize trait bounds, whether on an impl block or a function or anywhere else.

  • <'de, T> where T: Deserialize<'de>

    This means "T can be deserialized from some lifetime." The caller gets to decide what lifetime that is. Typically this is used when the caller also provides the data that is being deserialized from, for example in a function like serde_json::from_str. In that case the input data must also have lifetime 'de, for example it could be &'de str.

  • <T> where T: DeserializeOwned

    This means "T can be deserialized from any lifetime." The callee gets to decide what lifetime. Usually this is because the data that is being deserialized from is going to be thrown away before the function returns, so T must not be allowed to borrow from it. For example a function that accepts base64-encoded data as input, decodes it from base64, deserializes a value of type T, then throws away the result of base64 decoding. Another common use of this bound is functions that deserialize from an IO stream, such as serde_json::from_reader.

    To say it more technically, the DeserializeOwned trait is equivalent to the higher-rank trait bound for<'de> Deserialize<'de>. The only difference is DeserializeOwned is more intuitive to read. It means T owns all the data that gets deserialized.

Note that <T> where T: Deserialize<'static> is never what you want. Also Deserialize<'de> + 'static is never what you want. Generally writing 'static anywhere near Deserialize is a sign of being on the wrong track. Use one of the above bounds instead.

Transient, borrowed, and owned data

The Serde data model has three flavors of strings and byte arrays during deserialization. They correspond to different methods on the Visitor trait.

Transient data is not guaranteed to last beyond the method call it is passed to. Often this is sufficient, for example when deserializing something like an IP address from a Serde string using the FromStr trait. When it is not sufficient, the data can be copied by calling to_owned(). Deserializers commonly use transient data when input from an IO stream is being buffered in memory before being passed to the Visitor, or when escape sequences are being processed so the resulting string is not present verbatim in the input.

Borrowed data is guaranteed to live at least as long as the 'de lifetime parameter of the Deserializer. Not all deserializers support handing out borrowed data. For example when deserializing from an IO stream no data can be borrowed.

Owned data is guaranteed to live as long as the Visitor wants it to. Some visitors benefit from receiving owned data. For example the Deserialize impl for Rust's String type benefits from being given ownership of the Serde string data that has been deserialized.

The Deserialize<'de> lifetime

This lifetime records the constraints on how long data borrowed by this type must be valid.

Every lifetime of data borrowed by this type must be a bound on the 'de lifetime of its Deserialize impl. If this type borrows data with lifetime 'a, then 'de must be constrained to outlive 'a.

struct S<'a, 'b, T> {
    a: &'a str,
    b: &'b str,
    bb: &'b str,
    t: T,
}

impl<'de: 'a + 'b, 'a, 'b, T> Deserialize<'de> for S<'a, 'b, T>
where
    T: Deserialize<'de>,
{
    /* ... */
}

If this type does not borrow any data from the Deserializer, there are simply no bounds on the 'de lifetime. Such types automatically implement the DeserializeOwned trait.

struct S {
    owned: String,
}

impl<'de> Deserialize<'de> for S {
    /* ... */
}

The 'de lifetime should not appear in the type to which the Deserialize impl applies.

- // Do not do this. Sooner or later you will be sad.
- impl<'de> Deserialize<'de> for Q<'de> {

+ // Do this instead.
+ impl<'de: 'a, 'a> Deserialize<'de> for Q<'a> {

The Deserializer<'de> lifetime

This is the lifetime of data that can be borrowed from the Deserializer.

struct MyDeserializer<'de> {
    input_data: &'de [u8],
    pos: usize,
}

impl<'de> Deserializer<'de> for MyDeserializer<'de> {
    /* ... */
}

If the Deserializer never invokes visit_borrowed_str or visit_borrowed_bytes, the 'de lifetime will be an unconstrained lifetime parameter.

struct MyDeserializer<R> {
    read: R,
}

impl<'de, R> Deserializer<'de> for MyDeserializer<R>
where
    R: io::Read,
{
    /* ... */
}

Borrowing data in a derived impl

Fields of type &str and &[u8] are implicitly borrowed from the input data by Serde. Any other type of field can opt in to borrowing by using the #[serde(borrow)] attribute.

use serde::Deserialize;

use std::borrow::Cow;

#[derive(Deserialize)]
struct Inner<'a, 'b> {
    // &str and &[u8] are implicitly borrowed.
    username: &'a str,

    // Other types must be borrowed explicitly.
    #[serde(borrow)]
    comment: Cow<'b, str>,
}

#[derive(Deserialize)]
struct Outer<'a, 'b, 'c> {
    owned: String,

    #[serde(borrow)]
    inner: Inner<'a, 'b>,

    // This field is never borrowed.
    not_borrowed: Cow<'c, str>,
}

This attribute works by placing bounds on the 'de lifetime of the generated Deserialize impl. For example the impl for the struct Outer defined above looks like this:

// The lifetimes 'a and 'b are borrowed while 'c is not.
impl<'de: 'a + 'b, 'a, 'b, 'c> Deserialize<'de> for Outer<'a, 'b, 'c> {
    /* ... */
}

The attribute may specify explicitly which lifetimes should be borrowed.

use std::marker::PhantomData;

// This struct borrows the first two lifetimes but not the third.
#[derive(Deserialize)]
struct Three<'a, 'b, 'c> {
    a: &'a str,
    b: &'b str,
    c: PhantomData<&'c str>,
}

#[derive(Deserialize)]
struct Example<'a, 'b, 'c> {
    // Borrow 'a and 'b only, not 'c.
    #[serde(borrow = "'a + 'b")]
    three: Three<'a, 'b, 'c>,
}