Nexus Services and Operations
This page covers the following:
Nexus Services
A Nexus Service is a named collection of arbitrary-duration Nexus Operations that provide an API contract suitable for sharing across team boundaries. Nexus Services are registered with a Temporal Worker that is listening on the target Namespace and Task Queue for an Endpoint.
For example, a Nexus Service is often registered in the same Worker as the underlying Workflows they abstract:
func main() {
c, err := client.Dial(client.Options{})
if err != nil {
log.Fatalln("Unable to create client", err)
}
defer c.Close()
w := worker.New(c, taskQueue, worker.Options{})
service := nexus.NewService(service.HelloServiceName)
err = service.Register(handler.EchoOperation, handler.HelloOperation)
if err != nil {
log.Fatalln("Unable to register operations", err)
}
w.RegisterNexusService(service)
w.RegisterWorkflow(handler.HelloHandlerWorkflow)
err = w.Run(worker.InterruptCh())
if err != nil {
log.Fatalln("Unable to start worker", err)
}
}
Operations
Nexus Operations are not intrinsically durable, but can be made durable by backing them with Temporal primitives like a Workflow, Update, or Signal. The Temporal SDK provides helper functions to create Operations that may be registered with a Service in a Temporal Worker.
Calling a Nexus Operation from a caller Workflow presents as a non-blocking, arbitrary duration operation. This schedules the Operation with the caller’s Temporal Service and does not block caller Workflow execution, unless the caller Workflow chooses to do so by waiting for the Nexus Operation to be started or in a final state (completed, failed, timed out, canceled). The caller’s Nexus Machinery is responsible for making the underlying Nexus RPC calls and updating the caller’s Workflow history with the result.
Operation Lifecycle
Nexus Operation handlers are created with the Temporal SDK:
- New-Workflow-Run-Operation
- Start a Workflow as an asynchronous operation.
- New-Sync-Operation
- Invoke an underlying Query or Signal as a synchronous operation.
- Invoke an Update as a synchronous operation.
- Execute arbitrary code as a synchronous operation.
Caller Workflows use the Temporal SDK to execute a Nexus Operation.
Asynchronous Operation Lifecycle
An asynchronous Nexus Operation may take up to 60 days to complete in Temporal Cloud, to allow for callback URL token validation with asymmetric key rotation at those intervals.
The lifecycle of an asynchronous Nexus Operation:
- Caller’s Workflow executes a Nexus Operation using the Temporal SDK.
NexusOperationScheduled
is added to caller’s Event History.- Nexus Operation added to Nexus Machinery in the caller’s Temporal cluster.
- Caller Workflow doesn’t automatically block on a Nexus Operation.
- Caller’s Workflow can interact with the Nexus Operation execution:
- Wait for the Nexus Operation to start.
- Wait for the Operation result.
- Cancel the Nexus Operation.
- The caller’s Temporal Service attempts to start the Nexus Operation on the target cluster through the Nexus Endpoint.
- Nexus Machinery in the target cluster sync matches the Nexus request to a Worker polling the endpoint's target Namespace and Task Queue.
- Retries are processed by the Nexus Machinery in the caller’s Temporal cluster and tracked as part of the caller Workflow, similar to Activities.
- Nexus Operation handler processes the Nexus Operation.
- Handler Worker gets a Nexus Task from Task Queue.
- Operation handler starts a Workflow in the target cluster with attached Nexus Callback.
- Nexus Operation ID is returned since the Nexus Operation is asynchronous.
NexusOperationStarted
is added to caller’s Event History.- Operation ID is associated with the caller's
NexusOperationStarted
event and the caller Workflow is unblocked if waiting forNexusOperationStarted
using the Temporal SDK.
- Handler’s Nexus Machinery sends a Nexus completion Callback when the Workflow reaches a terminal state.
- Operation result is returned via callback since the operation is asynchronous.
NexusOperationCompleted
is added to caller’s Event History.
- Caller’s Workflow code is returned the result if requested.
Synchronous Operations Lifecycle
A synchronous Nexus Operation handler has less than 10 seconds to complete, as measured from the caller's Nexus Machinery, and will then timeout with a retryable error. The handler should stay within the context deadline to not timeout. The Nexus Operation will be retried by the caller's Nexus Machinery, until the operation's Start-to-Close timeout has been exceeded.
The lifecycle of a synchronous Nexus Operation, for example to do a Query or a Signal:
- Caller’s Workflow executes a Nexus Operation using the Temporal SDK.
NexusOperationScheduled
added to caller’s Event History.- Nexus Operation added to Nexus Machinery in the caller’s Temporal cluster.
- Caller Workflow doesn’t automatically block on a Nexus Operation.
- Caller’s Workflow can interact with the Nexus Operation execution:
- Wait for the Nexus Operation to start (not needed for synchronous operations).
- Wait for the Operation result.
- Cancel the Nexus Operation (not allowed for synchronous operations).
- The caller’s Temporal cluster attempts to start the Nexus Operation on the target cluster via the Nexus Endpoint.
- Nexus Machinery in the target cluster sync matches the Nexus request to a Worker polling the endpoint's target Namespace and Task Queue.
- Retries are processed by the Nexus Machinery in the caller’s Temporal cluster and tracked as part of the caller Workflow, similar to Activities.
- Nexus Operation handler processes the Nexus Operation.
- Handler Worker gets a Nexus Task from Task Queue.
- Operation handler starts a Workflow in the target cluster.
- Operation result is returned inline, since the operation is synchronous.
NexusOperationCompleted
added to caller’s Event History.
- Caller’s Workflow code is returned the result if requested.
Executing Arbitrary Code from a Synchronous Nexus Operation Handler
Synchronous Nexus Operation handlers can execute arbitrary code, but unlike Activities they should be short-lived. As mentioned above, a synchronous Nexus Operation handler has less than 10 seconds to process a Nexus start operation request and should stay within the context deadline provided by the Temporal SDK.
Automatic Retries
Once the caller Workflow schedules an Operation with the caller’s Temporal cluster, the caller’s Nexus Machinery keeps trying to start the Operation, with automatic retries and exponential backoff. If a Nexus Operation returns a retryable error when attempting to start, the Operation it will be retried up to the default Retry Policy’s max attempts and expiration interval.
Circuit Breaker
The circuit breaker kicks in when requests fail with a retryable error consecutively as it might indicate that the destination (eg: Nexus service to start operation, or the caller for callback request) is down or unable to process the request. The default behavior of the circuit breaker is to open after 5 consecutive failed requests. Once in open state, Nexus taskk will fail early and requests won't be sent to destination. After a minute in open state, it will change to half-open state, which will allow only 1 request to be made. If the request is successful, then the circuit breaker changes its state to closed, and allows all requests to pass through.
Execution Semantics
At-least-once Execution Semantics and idempotency
Since the caller's Nexus Machinery will keep trying to start the Operation multiple times, the Nexus Operation handler should be idempotent, like Activities should be idempotent. It's not required in all cases, but highly recommended in general. With retries an Operation, like Activities, has at-least-once execution semantics, so an Operation should be idempotent as a general rule.
At-most-once Execution Semantics through an Underlying WorkflowIDReusePolicy
To deduplicate work and get at-most-once execution semantics, an Operation can start a Workflow with a WorkflowIDReusePolicy of RejectDuplicates which only allows one Workflow Execution per Workflow ID within a Namespace for the Retention Period.
Versioning
Task Routing is the simplest way to version your service code.
If you have a new backward-incompatible Nexus Operation Handler, for example due to a wire-level incompatible breaking change, start by using a different Service and Task Queue.
The version may be part of the service name, for example prod.payments.v2
.
Callers can then migrate to the new version in their normal deployment schedule.