Table of contents
Open Table of contents
About AWS Step Functions
AWS Step Functions is a powerful visual workflow service that helps developers use AWS services to build distributed applications, automate processes, orchestrate microservices, and create data and machine learning (ML) pipelines. This is done by defining steps within a state machine definition.
In this article, we will delve into the ‘how to’ setup default parameters for your state machine inputs. Which is helpful when you’re invoking your step functions from a pipeline, another application, a web user interface, etc, that could modify some of expected inputs but not all.
State Machine Initial Input
The initial input of an AWS Step Functions state machine is given when you start an “execution” of it, either through AWS Console or by one of the other methods like AWS CLI, or AWS SDK.
Initial data is passed to the state defined as the “StartAt” state in it’s definition, if no input is provided then the default is an empty object ({})
.
Learn by example: Ice Maker Factory
Let’s say that our state machine needs to produce several flavors for an IceCream factory given as follows:
System Inputs:
flavors [string]
: Comma separated list of flavors to process, if not given, defaults to process all flavors.maxConcurrency (integer)
: Number of flavors the machine is capable to process at the same time, if not specified defaults to5
.
List of flavors:
- vanilla
- chocolate
- strawberry
- bubblegum blast
- cookies and cream
- rocky road
- butter pecan
- coffee
- pistachio
- neapolitan
Let’s have a picture of how our state machine initial definition looks like:
A Map state is capable to run a set of steps for each item in a dataset, in our case, a list of flavors. The Map state’s iterations run in parallel, which makes it possible to speed up the whole process. But as it is tied to the capacity of our factory machinery, we cannot process a huge amount of them at the same time.
Using AWS States Language it would look something like this:
{
"StartAt": "Process Flavors",
"States": {
"Process Flavors": {
"Type": "Map",
"MaxConcurrency": 5,
"ItemProcessor": {
"ProcessorConfig": {
"Mode": "INLINE"
},
"StartAt": "Process Flavor",
"States": {
"Process Flavor": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"Payload.$": "$",
"FunctionName": "arn:aws:lambda:{{AWS_REGION}}:{{AWS_ACCOUNT}}:function:process-flavor-lambda:$LATEST"
},
"End": true
}
}
},
"End": true
}
}
}
Notice that we have manually set the “MaxConcurrency” property to a fixed value of 5
, meaning it will process 5
flavors at the same time at max.
If we want to provide a maximum concurrency value dynamically from the state input using a reference path, use “MaxConcurrencyPath”.
But we want to provide a value from the initial data, in case one of the ice maker machines is on maintenance, but we need to have a default value in case it is not given and we can operate at maximum capacity.
Instrinsic Functions
On September 2022, AWS release new instric functions that help to perform more data transformation tasks, such as formatting JSON strings, creating arrays, generating UUIDs, and encoding data.
In our use case, we will be using the States.JsonMerge function to merge two JSON objects into a single object.
This function takes three arguments. The first two arguments are the JSON objects that you want to merge. The third argument is a boolean value of false
. This boolean value determines if the deep merging mode is enabled.
Currently, Step Functions only supports the shallow merging mode; therefore, you must specify the boolean value as false
.
For example, you can use the States.JsonMerge function to merge the following JSON objects that share the key a.
{
"json1": { "a": {"a1": 1, "a2": 2}, "b": 2 },
"json2": { "a": {"a3": 1, "a4": 2}, "c": 3 }
}
You can specify the json1 and json2 objects as inputs in the States.JsonMerge function to merge them together:
"output.$": "States.JsonMerge($.json1, $.json2, false)"
The States.JsonMerge returns the following merged JSON object as result. In the merged JSON object output, the json2 object’s key a replaces the json1 object’s key a. Also, the nested object in json1 object’s key a is discarded because shallow mode doesn’t support merging nested objects.
{
"output": {
"a": {"a3": 1, "a4": 2},
"b": 2,
"c": 3
}
}
Merging Defaults
Alright, this is the plan:
We will have defaults as:
{
"maxConcurrency": 5,
"flavors": ["vanilla","chocolate","strawberry","bubblegum blast","cookies and cream","rocky road","butter pecan","coffee","pistachio","neapolitan"]
}
Expected initial data input could optionally override them by passing an object with “maxConcurrency” property as a non-negative integer, and “flavors” as an array of strings from the list of possible ones.
For that, we will be introducing a Pass state at the begining of our state machine. Pass states passes its input to its output, without performing work. They’re useful when constructing and debugging state machines, and allow to perform data transformations as they support instrinsic functions in the Parameters field.
To wrap up all the commented previously, our state machine definition would end up like this.
{
"StartAt": "Merge Defaults",
"States": {
"MergeDefaults": {
"Type": "Pass",
"Next": "Process Flavors",
"Parameters": {
"merged.$": "States.JsonMerge(States.StringToJson('{\"maxConcurrency\": 5, \"flavors\": [\"vanilla\",\"chocolate\",\"strawberry\",\"bubblegum blast\",\"cookies and cream\",\"rocky road\",\"butter pecan\",\"coffee\",\"pistachio\",\"neapolitan\"]}'), $$.Execution.Input, false)",
},
"OutputPath": "$.merged",
},
"ProcessFlavors": {
"Type": "Map",
"MaxConcurrency": 5,
"ItemProcessor": {
"ProcessorConfig": {
"Mode": "INLINE"
},
"StartAt": "Process Flavor",
"States": {
"Process Flavor": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"Payload.$": "$",
"FunctionName": "arn:aws:lambda:{{AWS_REGION}}:{{AWS_ACCOUNT}}:function:process-flavor-lambda:$LATEST"
},
"End": true
}
}
},
"End": true
}
}
}
Which is described in the following diagram:
Notice that we also introduced another intrinsic function States.StringToJson which, as the name implies, converts a string to a json object so we can merge our hardcoded default values with what comes in the context as $$.Execution.Input, which holds the initial input of the execution running.
You can see what else the Context object includes here.
Therefore, if we define an input like:
{
"maxConcurrency": 2,
"flavors": ["vanilla", "chocolate"]
}
The Merge Defaults state output will be exactly the same as it overrided both fields.
But if the input only defines one of them, like:
{
"maxConcurrency": 2
}
Then the output will override only the maxConcurrency field:
{
"flavors": ["vanilla", "chocolate", "strawberry", "bubblegum blast", "cookies and cream", "rocky road", "butter pecan", "coffee", "pistachio", "neapolitan"],
"maxConcurrency": 2
}
Extra Bonus
If we wanted to include validations for maxConcurrency not been greather than 5
and flavors to only include allowed values,
we could add more Choice states
and taking approach of Rules like NumericLessThanEquals and IsPresent, along with the intrinsic function States.ArrayContains.
Conclusion
When you need to setup default values for an AWS Step Functions state machine, you can leverage on intrinsic functions to merge those default values with those incoming from the execution input taking approach of a Pass state capability to perform transformations on data.