I saved 475 MB out of the 895 MB used by a real-world Rust program by changing the layout of some structs and the way I was deserializing JSON files.
The real use case
My program deserializes all the JSON files of https://github.com/awslabs/aws-sdk-rust/tree/main/aws-models into "Smithy Shape" structs.
Those files contain thousands of structures similar to this one:
"com.amazonaws.iam#EnableOrganizationsRootSessionsResponse" : { "type" : "structure" , "members" : { "OrganizationId" : { "target" : "com.amazonaws.iam#OrganizationIdType" , "traits" : { "smithy.api#documentation" : "<p>The unique identifier (ID) of an organization.</p>" } }, "EnabledFeatures" : { "target" : "com.amazonaws.iam#FeaturesListType" , "traits" : { "smithy.api#documentation" : "<p>The features you have enabled for centralized root access.</p>" } } }, "traits" : { "smithy.api#output" : {} } },
As is common in Rust, my program uses the very convenient serde.
I won't go into every details, but part of the structure needs to be shown at this point for clarity.
Don't read it entirely, just note that it's a bunch of structs containing structs, some optional, with serde attributes:
#[ derive (Clone, Deserialize, Serialize)] pub struct SmithyShape { #[ serde (rename = "type" )] pub shape_type : SmithyShapeType, #[ serde (default, skip_serializing_if = "Vec::is_empty" )] pub operations : Vec <SmithyReference>, #[ serde (default)] pub members : FxHashMap< String , SmithyReference>, #[ serde (default, skip_serializing_if = "Option::is_none" )] pub key : Option <SmithyReference>, #[ serde (default, skip_serializing_if = "Option::is_none" )] pub value : Option <SmithyReference>, #[ serde (default, skip_serializing_if = "Option::is_none" )] pub member : Option <SmithyReference>, #[ serde (default, skip_serializing_if = "Option::is_none" )] pub input : Option <SmithyReference>, #[ serde (default, skip_serializing_if = "Option::is_none" )] pub output : Option <SmithyReference>, #[ serde (default)] pub traits : SmithyTraits, } #[ derive (Debug, Clone, Deserialize, Serialize)] pub struct SmithyReference { pub target : ShortShapeId, #[ serde (default)] pub traits : SmithyTraits, } #[ derive (Debug, Clone, Default, Deserialize, Serialize)] pub struct SmithyTraits { #[ serde (rename = "smithy.api#title" , skip_serializing_if = "Option::is_none" )] pub title : Option < String >, #[ serde (rename = "aws.api#service" , skip_serializing_if = "Option::is_none" )] pub service : Option <SmithyServiceTrait>, #[ serde ( rename = "smithy.api#sensitive" , skip_serializing_if = "Option::is_none" )] pub sensitive : Option <SmithySensitiveTrait>, #[ serde ( rename = "smithy.api#documentation" , skip_serializing_if = "Option::is_none" )] pub documentation : Option < String >, #[ serde (rename = "smithy.api#pattern" , skip_serializing_if = "Option::is_none" )] pub pattern : Option < String >, #[ serde (rename = "aws.iam#iamAction" , skip_serializing_if = "Option::is_none" )] pub iam_action : Option <SmithyIamAction>, } #[ derive (Debug, Clone, Deserialize, Serialize)] #[ serde (rename_all = "camelCase" )] pub struct SmithyServiceTrait { pub sdk_id : Option < String >, pub arn_namespace : Option < String >, pub cloud_formation_name : Option < String >, pub cloud_trail_event_source : Option < String >, pub endpoint_prefix : Option < String >, }
This is some standard looking code, the current practice, but we can also call it naïve. By deserializing this way, the structures were taking 895MB in memory.
... continue reading