iwf

module
v1.0.0-final Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 18, 2022 License: MIT

README

iWF project - main & server repo

iWF is a platform providing an all-in-one tooling for building long-running business application. It provides an abstraction for persistence(database, elasticSearch) and more! It aims to provide clean, simple and easy to use interface, like an iPhone.

It will not make you a 10x developer...but you may feel like one!

We call long running application Workflow.

It's a simple and powerful WorkflowAsCode general purpose workflow engine.

The server is back by Cadence/Temporal as an interpreter.

Related projects:

Table of contents

Community & Help

What is iWF

Architecture

A iWF application will host a set of iWF workflow workers. The workers host two REST APIs of WorkflowState start and decide using iWF SDKs. The application will call iWF server to interact with workflow executions -- start, stop, signal, get results, etc, using iWF SDKs.

iWF server hosts those APIs(also REST) as a iWF API service. The API service will call Cadence/Temporal service as the backend.

iWF server also hosts Cadence/Temporal workers which hosts an interpreter workflow. Any iWF workflows are interpreted into this Cadence/Temporal workflow. The interpreter workflow will invoke the two iWF APIs of the application workflow workers. Internally, the two APIs are executed by Cadence/Temporal activity. Therefore, all the REST API request/response with the worker are stored in history events which are useful for debugging/troubleshooting.

architecture diagram

Basic Concepts

Workflow and WorkflowState definition

iWF lets you build long-running applications by implementing the workflow interface, e.g. Java Workflow interface. An instance of the interface is a WorkflowDefinition. User applications use iwfWorkflowType to differentiate WorkflowDefinitions.

A WorkflowDefinition contains several WorkflowState e.g. Java WorkflowState interface. A WorkflowState is implemented with two APIs: start and decide.

  • start API is invoked immediately when a WorkflowState is started. It will return some Commands to server. When the requested Commands are completed, decide API will be triggered.
  • decide API will decide next states to execute. Next states be multiple, and can be re-executed as different stateExecutions.
Workflow execution and WorkflowState execution

Application can start a workflow instance with a workflowId for any workflow definition. A workflow instance is called WorkflowExecution. iWF server returns runId of UUID as the identifier of the WorkflowExecution. The runId is globally unique.

WorkflowId uniqueness: At anytime, there must be at most one WorkflowExecution running with the same workflowId. However, after a previous WorkflowExecution finished running (in any closed status), application may start a new WorkflowExecutions with the same workflowId using appropriate IdReusePolicy.

There must be at least one WorkflowState being executed for a running WorkflowExecution. The instance of WorkflowState is called StateExecution.

⚠ Note:

Depends on the context, the only word workflow may mean WorkflowExecution(most commonly), WorkflowDefinition or both.

Commands

These are the three command types:

  • SignalCommand: will be waiting for a signal from external to the workflow signal channel. External application can use SignalWorkflow API to signal a workflow.
  • TimerCommand: will be waiting for a durable timer to fire.
  • InterStateChannelCommand: will be waiting for a value being published from another state execution(internally in the same workflow execution)

Note that start API can return multiple commands, and choose different DeciderTriggerType for triggering decide API:

  • AllCommandCompleted: this will wait for all command completed
  • AnyCommandCompleted: this will wait for any command completed
Persistence

iWF provides super simple persistence abstraction for workflow to use. Developers don't need to touch any database system to register/maintain the schemas. The only schema is defined in the workflow code.

  • DataObject is
    • sharing some data values across the workflow
    • can be retrieved by external application using GetDataObjects API
    • can be viewed in Cadence/Temporal WebUI in QueryHandler tab
  • SearchAttribute is similarly:
    • sharing some data values across the workflow
    • can be retrieved by external application using GetSearchAttributes API
    • search for workflows by external application using SearchWorkflow API
    • search for workflows in Cadence/Temporal WebUI in Advanced tab
    • search attribute type must be registered in Cadence/Temporal server before using for searching because it is backed up ElasticSearch
    • the data types supported are limited as server has to understand the value for indexing
    • See Temporal doc and Cadence doc to understand more about SearchAttribute
  • StateLocal is for
    • passing some data values from state API to decide API in the same WorkflowState execution
  • RecordEvent is for
    • recording some events within the state execution. They are useful for debugging using Workflow history. Usually you may want to record the input/output of the dependency RPC calls.

Logically, each workflow type will have a persistence schema like below:

+-------------+-------+-----------------+-----------------+----------------------+----------------------+-----+
| workflowId  | runId | dataObject key1 | dataObject key2 | searchAttribute key1 | searchAttribute key2 | ... |
+-------------+-------+-----------------+-----------------+----------------------+----------------------+-----+
| your-wf-id1 | uuid1 | valu1           | value2          | keyword-value1       | 123(integer)         | ... |
+-------------+-------+-----------------+-----------------+----------------------+----------------------+-----+
| your-wf-id1 | uuid2 | value3          | value4          | keyword-value2       | 456(integer)         | ... |
+-------------+-------+-----------------+-----------------+----------------------+----------------------+-----+
| your-wf-id2 | uuid3 | value5          | value5          | keyword-value3       | 789(integer)         | ... |
+-------------+-------+-----------------+-----------------+----------------------+----------------------+-----+
| ...         | ...   | ...             | ...             | ...                  | ...                  | ... |
+-------------+-------+-----------------+-----------------+----------------------+----------------------+-----+
Communication

There are two major communication mechanism in iWF:

  • SignalChannel is for receiving input from external asynchronously. It's used with SignalCommand.
  • InterStateChannel: for interaction between state executions. It's used with InterStateChannelCommand.

Client APIs

Client APIs are hosted by iWF server for user workflow application to interact with their workflow executions.

  • Start workflow: start a new workflow execution
  • Stop workflow: stop a workflow execution
  • Signal workflow: send a signal to a workflow execution
  • Search workflow: search for workflows using a query language like SQL with search attributes
  • Get workflow: get basic information about a workflow like status and results(if completed or waiting for completed)
  • Get workflow data objects: get the dataObjects of a workflow execution
  • Get workflow search attributes: get the search attributes of a workflow execution
  • Reset workflow: reset a workflow to previous states

Why iWF

If you are familiar with Cadence/Temporal

If you are not

  • Check out this doc to understand some history

iWF is an application platform that provides you a comprehensive tooling:

  • WorkflowAsCode for highly flexibile/customizable business logic
  • Parallel execution of multiple threads of business
  • Persistence storage for intermediate states stored as "dataObjects"
  • Persistence searchable attributes that can be used for flexible searching, even full text searching, backed by ElasticSearch
  • Receiving data from external system by Signal
  • Durable timer, and cron job scheduling
  • Reset workflow to let you recover the workflows from bad states easily
  • Highly testable and easy to maintain
  • ...

How to run this server

Using docker image & docker-compose

Checkout this repo, go to the docker-compose folder and run it:

cd docker-compose && docker-compose up

This by default will run Temporal server with it. And it will also register a default namespace and required search attributes by iWF. Link to WebUI: http://localhost:8233/namespaces/default/workflows

By default, iWF server is serving port 8801, server URL is http://localhost:8801/ )

NOTE:

Use docker pull iworkflowio/iwf-server:latest to update the latest image.Or update the docker-compose file to specify the version tag.

How to build & run locally

  • Run make bins to build the binary iwf-server
  • Then run ./iwf-server start to run the service . This defaults to serve workflows APIs with Temporal interpreter implementation. It requires to have local Temporal setup. See Run with local Temporal.
  • Alternatively, run ./iwf-server --config config/development_cadence.yaml start to run with local Cadence. See below instructions for setting up local Cadence.
  • Run make integTests to run all integration tests. This by default requires to have both local Cadence and Temporal to be set up.

How to use in production

You can customize the docker image, or just use the api and interpreter that are exposed as the api service and workflow service.

For more info, contact qlong.seattle@gmail.com

Development

Any contribution is welcome.

Here is the repository layout if you are interested to learn about it:

  • cmd/ the code to bootstrap the server -- loading config and connect to Cadence/Temporal service, and start iWF API and interpreter service
  • config the config to start the server, and also config template to start the Docker image
  • docker-compose the docker compose file to start a full iWF server with Temporal dependency
  • gen the generated code from iwf-idl (Open API definition/Swagger)
  • integ the end to end integration tests.
    • workflow the iWF workflows that are written without SDK(just implemented the REST APIs)
    • *.go the tests
  • iwf-idl the idl submodule
  • script some scripts
    • http some example HTTP scripts to call server, like REST API
    • start-server.sh the script to start iWF server in Docker image
  • service iWF implementation
    • api API service implementation
      • cadence the Cadence abstraction of UnifiedClient
      • temporal the Temporal abstraction of UnifiedClient
      • *.go the implementation of API service using UnifiedClient so that it works for both Cadence and Temporal
    • interpreter interpreter worker service implementation
      • cadence the Cadence abstraction of ActivityProvider and WorkflowProvider
      • temporal the Temporal abstraction of ActivityProvider and WorkflowProvider
      • *.go the implementation of interpreter workflow service using ActivityProvider and WorkflowProvider so that it works for both Cadence and Temporal
        • workflowImpl.go the core workflow implementation
    • common some common libraries between api and interpreter
    • *.go some common definitions between api and interpreter
How to update IDL and the generated code
  1. Install openapi-generator using Homebrew if you haven't. See more documentation
  2. Check out the idl submodule by running the command: git submodule update --init --recursive
  3. Run the command git submodule update --remote --merge to update IDL to the latest commit
  4. Run make idl-code-gen to refresh the generated code
Run with local Temporalite
  1. Run a local Temporalite following the instruction. If you see error error setting up schema, try use command temporalite start --namespace default -f my_test.db instead to start.
  2. Register a default namespace
tctl --ns default n re
  1. Go to http://localhost:8233/ for Temporal WebUI

NOTE: alternatively, go to Temporal-dockercompose to run with docker

  1. Register system search attributes required by iWF server
tctl adm cl asa -n IwfWorkflowType -t Keyword
tctl adm cl asa -n IwfGlobalWorkflowVersion -t Int
tctl adm cl asa -n IwfExecutingStateIds -t Keyword

4 For attribute_test.go integTests, you need to register search attributes:

tctl adm cl asa -n CustomKeywordField -t Keyword
tctl adm cl asa -n CustomIntField -t Int
Run with local Cadence
  1. Run a local Cadence server following the instructions
docker-compose -f docker-compose-es-v7.yml up
  1. Register a new domain if not haven cadence --do default domain register
  2. Register system search attributes required by iWF server
cadence adm cl asa --search_attr_key IwfGlobalWorkflowVersion --search_attr_type 2
cadence adm cl asa --search_attr_key IwfExecutingStateIds --search_attr_type 0
cadence adm cl asa --search_attr_key IwfWorkflowType --search_attr_type 0
  1. Go to Cadence http://localhost:8088/domains/default/workflows?range=last-30-days

How to migrate from Cadence/Temporal

Migrating from Cadence/Temporal is simple and easy. However, it's only possible to migrate new workflows. The existing running workflows in Cadence/Temporal will require you to keep the Cadence/Temporal workers until they are finished.

Activity

Wait, what? There is no activity at all in iWF? Yes, iWF workflows are essentially a REST service and all the activity code in Cadence/Temporal can just move in iWF workflow code -- start or decide API of WorkflowState.

One main reason that many people use Cadence/Temporal is to take advantage of the history showing input/output in WebUI. This is handy for debugging/troubleshooting. iWF provides a RecordEvent API to mimic. You can call with any arbitrary data, and they will be recorded into history just for your debugging/troubleshooting.

Signal

Depends on different SDKs of Cadence/Temporal, there are different APIs like SignalMethod/SignalChannel/SignalHandler etc. In iWF, just use SignalCommand as equivalent.

In some use cases, you may have multiple signals commands and use AnyCommandCompleted decider trigger type to wait for any command completed.

Timer

There are different timer APIs in Cadence/Temporal depends on which SDK:

  • workflow.Sleep(duration)
  • workflow.Await(duration, condition)
  • workflow.NewTimer(duration)
  • ...

In iWF, just use TimerCommand as equivalent.

Again in some use cases, you may combine signal/timer commands and use AnyCommandCompleted decider trigger type to wait for any command completed.

Query

Depends on different SDKs of Cadence/Temporal, there are different APIs like QueryHandler/QueryMethod/etc. In iWF, use DataObjects as equivalent. Unlike Cadence/Temporal, DataObjects should be explicitly defined in WorkflowDefinition.

Note that by default all the DataObjects and SearchAttributes will be loaded into any WorkflowState as LOAD_ALL_WITHOUT_LOCKING persistence loading policy. This could be a performance issue if there are too many big items. You should consider using different loading policy like LOAD_PARTIAL_WITHOUT_LOCKING to improve by changing the WorkflowStateOptions.

Also note that DataObjects are not just for returning data to API, but also for sharing data across different StateExecutions. But if it's just to share data from start API to decide API, using StateLocal is preferred for efficiency reason.

Search Attribute

iWF has the same concepts of Search Attribute. Unlike Cadence/Temporal, SearchAttribute should be explicitly defined in WorkflowDefinition.

Versioning and change compatibility

There is no versioning anymore in iWF! As there is no non-deterministic errors in iWF applications. Because there is no replay at all for iWF workflows. All workflow state executions are stored in Cadence/Temporal activities.

Workflow code change will always apply to any running existing and new workflow executions. This gives superpower and flexibility to maintain long-running business applications.

However, making workflow code change will still have backward-compatibility issue like all other microservice applications. You just need to apply all the standard ways to address the issues:

  1. It's rare but if you don't want old workflows to execute the new code, the standard way is to use a flag in new executions to branch out. For example, if you want to change StateA->StateB to StateA->StateC only for new workflows, then set a new flag in the new workflow so that StateA decide API implementation can know if it should go to StateB or StateC.
  2. Removing state code could cause errors if there is any state execution still running. For example, if you have changed StateA->StateB to StateA->StateC, you may want to delete StateB. However, there could be a StateExecution stays at StateB(most commonly waiting for commands to complete). Deleting StateB will cause a not found error when StateB is executed.
    1. If you want to delete StateB as early as possible, use IwfWorkflowType and IwfExecutingStateIds search attributes to confirm if there is any workflows still running at the state. These are built-in search attributes from iWF server.
    2. The error will be gone if you add the StateB back. Because by default, all State APIs will be backoff retried forever.

Parallel execution with synchronization

In Cadence/Temporal, multi-threading is useful for complicated applications. But the APIs are hard to understand to use, and debug. Each language/SDK has its own APIs without much consistency.

In iWF, there are just a few concepts that are very straightforward:

  1. The decide API can go to multiple next states. The next states will be executed in parallel
  2. It can go back to any previous StateId to form a loop. The StateExecutionId is the unique identifier.
  3. Use InterStateChannel for synchronization communication. It's just like a signal channel that works internally.

Notes:

  1. Any state can decide to complete or fail the workflow, or just go to a dead end(no next state).
  2. Because of above, there could be more than one completing with data as workflow results. The client API provides a special way to retrieve this kind of results.

Non-workflow code

Check Client APIs for all the APIs that are equivalent to Cadence/Temporal client APIs.

Features like IdReusePolicy, CronSchedule, RetryPolicy are also supported in iWF.

What's more, there are features that are impossible in Cadence/Temporal are provided like reset workflow by StateId or StateExecutionId. Because WorkflowState are explicitly defined, resetting API is a lot more friendly to use.

Anything else

Is that all? For now yes. We believe these are all you really need to migrate to iWF from Cadence/Temporal. One philosophy of iWF is providing simple and easy to understand APIs to users(as minimist). As apposed to the complicated and also huge number APIs in Cadence/Temporal.

So what about something else:

  • Timeout and backoff retry: State start/decide APIs have default timeout and infinite backoff retry. You can use StateOptions to customize.
  • ChildWorkflow can be replaced with regular workflow+signal. See this StackOverflow for why.
  • SignalWithStart: Use start+signal API will be the same except for more exception handling work. We may consider provide it in the future because we have seen a lot of people don't know how to use it correctly in Cadence/Temporal.
  • ContinueAsNew: this is missing in iWF for now. But as the philosophy of hiding internal details, we will implement it in a way that user workflow code doesn't have to know. Internally the workflow execution can do a continueAsNew without letting user workflow to know.
  • Long-running activity with stateful recovery(heartbeat details): this is indeed a good one that we want to add. But we don't see it very commonly used. Please leave your message if you are in a need.

But we may be wrong. If you believe there is something else you really need, open a ticket or join us in the discussion.

Monitoring and Operations

iWF server

There are two components for iWF server: API service and interpreter worker service.

For API service, you need to set up monitors/dashboards:

  • API availability
  • API latency

The interpreter worker service is just a standard Cadence/Temporal workflow application. Follow the developer guides:

iWF application

As you may realize, iWF application is just a standard REST microservice. Therefore, you just need to use the standard way of set up monitor.

Usually, you need to set up monitors/dashboards:

  • API availability
  • API latency

When something goes wrong in your applications, here are the tips for troubleshooting:

  • Let your worker service return error stacktrace as the response body to iWF server. E.g. like this example of Spring Boot using ExceptionHandler.
  • Use Cadence/Temporal WebUI to debug your application. If you return the full stacktrace in response body, the pending activity view will show it to you!
  • All the input/output to your workflow are stored in the activity input/output of history event. The input is in ActivityTaskScheduledEvent, output is in ActivityTaskCompletedEvent or in pending activity view if having errors.

Development Plan

1.0
  • Start workflow API
  • Executing start/decide APIs and completing workflow
  • Parallel execution of multiple states
  • Timer command
  • Signal command
  • SearchAttributeRW
  • DataObjectRW
  • StateLocal
  • Signal workflow API
  • Get DataObjects/SearchAttributes API
  • Get workflow info API
  • Search workflow API
  • Stop workflow API
1.1
  • Reset workflow API
  • Command type(s) for inter-state communications (e.g. internal channel)
  • AnyCommandCompleted Decider trigger type
  • More workflow start options: IdReusePolicy, cron schedule, retry
  • StateOption: Start/Decide API timeout and retry policy
  • Reset workflow by stateId or stateExecutionId
  • StateOption.PersistenceLoadingPolicy: LOAD_PARTIAL_WITHOUT_LOCKING
  • More workflow start options: initial search attributes/memo
1.2
  • Decider trigger type: AnyCommandClosed
  • WaitForMoreResults in StateDecision
  • Skip timer API for testing/operation
  • LongRunningActivityCommand
  • Failing workflow details
  • Auto ContinueAsNew
  • StateOption.PersistenceLoadingPolicy: LOAD_ALL_WITH_EXCLUSIVE_LOCK and LOAD_PARTIAL_WITH_EXCLUSIVE_LOCK

Some history

AWS published SWF in 2012 and then moved to Step Functions in 2016 because they found it’s too hard to support SWF. Cadence & Temporal continued the idea of SWF and became much more powerful. However, AWS is right that the programming of SWF/Cadence/Temporal is hard to adopt because of leaking too many internals. Inspired by Step Function, iWF is created to provide equivalent power of Cadence/Temporal, but hiding all the internal details and provide clean and simple API to use.

Read this doc for more.

history diagram

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL