Local Cloud Workflows
This tutorial is from the MBrace Starter Kit.
In this tutorial we will be offering an description of local cloud workflows and how they can be useful in avoiding common errors when developing for MBrace.
Motivation
Cloud workflows are computations that often span multiple machines. This means variables in scope often need to be serialized and sent to a remote machine for resumption of the computation. This leads to a new class of potential errors, as illustrated below:
1: 2: 3: 4: 5: 6: |
|
which when run yields
1: 2: 3: 4: |
|
The obvious fix here is to remove the global WebClient
instance, which is not serializable
and replace it with localized instantiations:
1: 2: 3: 4: 5: 6: 7: 8: 9: |
|
The example however does illustrate a more general problem; suppose we have a black-box cloud computation:
1:
|
|
Dependending on the implementation, this could either introduce distribution:
1: 2: 3: 4: 5: |
|
or no distribution at all:
1: 2: 3: 4: 5: 6: |
|
The two computations carry identical types, yet their execution patterns are very different. This can often lead to unanticipated errors, for instance
1: 2: 3: 4: 5: 6: 7: 8: |
|
which would fail with a runtime serialization error
only if comp
entails distribution.
Local Cloud Workflows
For the reasons outlined above, MBrace comes with local cloud workflows.
These are a special type of cloud computation which are necessarily constrained
to execute within a single worker. They can defined using the special local
builder:
1:
|
|
This creates workflows of type LocalCloud<'T>
, which is a subtype of Cloud<'T>
.
Local workflows can be safely used in place of cloud workflows, however the opposite is not possible.
In other words, the workflow below type checks:
1: 2: 3: 4: |
|
However, attempting to distribute inside the body of a local workflow:
1: 2: 3: 4: 5: |
|
yields a compile-time error:
1: 2: 3: 4: |
|
In other words, local workflows provided a compile-time guarantee that their execution will never execute beyond the context of a single machine. This allows the MBrace library author to enforce a certain degree of sanity with respect to serialization:
1: 2: 3: 4: 5: |
|
Applications
The MBrace core APIs already make heavy use of local workflows;
most store primitive operations are of type LocalCloud<'T>
since
they usually do not entail distribution:
1: 2: 3: 4: |
|
The same happens with many library implementations:
1: 2: 3: 4: 5: |
|
In this case, a distributed cloud computation is created given user-supplied computations that must be constrained to a single machine. This API restriction enables the library author to efficiently schedule computation across the cluster based on the assumption that user code will never escape the scope of a single machine per input.
Gotchas
It is important to clarify that even though local workflows do not introduce distribution inside their implementation, this does not imply that they are devoid of any serialization issues. Let's illustrate using a couple of examples. Running
1:
|
|
fails with the error
1: 2: |
|
This happens because the local computation returns result which cannot be serialized. Similarly
1: 2: 3: 4: 5: |
|
fails with the error
1: 2: |
|
Since its closure has been rendered nonserializable due to its containing an instance of type WebClient
.
Summary
In this tutorial, you've learned how to use local workflows to avoid common errors when developing in MBrace. Continue with further samples to learn more about the MBrace programming model.
Note, you can use the above techniques from both scripts and compiled projects. To see the components referenced by this script, see ThespianCluster.fsx or AzureCluster.fsx.
from MBrace.Core
Full name: 11-local-cloud-workflows.cluster
Full name: Config.GetCluster
Gets or creates a new Thespian cluster session.
type WebClient =
inherit Component
new : unit -> WebClient
member BaseAddress : string with get, set
member CachePolicy : RequestCachePolicy with get, set
member CancelAsync : unit -> unit
member Credentials : ICredentials with get, set
member DownloadData : address:string -> byte[] + 1 overload
member DownloadDataAsync : address:Uri -> unit + 1 overload
member DownloadFile : address:string * fileName:string -> unit + 1 overload
member DownloadFileAsync : address:Uri * fileName:string -> unit + 1 overload
member DownloadString : address:string -> string + 1 overload
...
Full name: System.Net.WebClient
--------------------
Net.WebClient() : unit
type Uri =
new : uriString:string -> Uri + 5 overloads
member AbsolutePath : string
member AbsoluteUri : string
member Authority : string
member DnsSafeHost : string
member Equals : comparand:obj -> bool
member Fragment : string
member GetComponents : components:UriComponents * format:UriFormat -> string
member GetHashCode : unit -> int
member GetLeftPart : part:UriPartial -> string
...
Full name: System.Uri
--------------------
Uri(uriString: string) : unit
Uri(uriString: string, uriKind: UriKind) : unit
Uri(baseUri: Uri, relativeUri: string) : unit
Uri(baseUri: Uri, relativeUri: Uri) : unit
val int : value:'T -> int (requires member op_Explicit)
Full name: Microsoft.FSharp.Core.Operators.int
--------------------
type int = int32
Full name: Microsoft.FSharp.Core.int
--------------------
type int<'Measure> = int
Full name: Microsoft.FSharp.Core.int<_>
Full name: 11-local-cloud-workflows.comp
member Clone : unit -> obj
member CopyTo : array:Array * index:int -> unit + 1 overload
member GetEnumerator : unit -> IEnumerator
member GetLength : dimension:int -> int
member GetLongLength : dimension:int -> int64
member GetLowerBound : dimension:int -> int
member GetUpperBound : dimension:int -> int
member GetValue : [<ParamArray>] indices:int[] -> obj + 7 overloads
member Initialize : unit -> unit
member IsFixedSize : bool
...
Full name: System.Array
Full name: Microsoft.FSharp.Collections.Array.sum
Full name: 11-local-cloud-workflows.comp'
Full name: 11-local-cloud-workflows.test
Full name: 11-local-cloud-workflows.localWorkflow
Full name: 11-local-cloud-workflows.testLocal
val seq : sequence:seq<'T> -> seq<'T>
Full name: Microsoft.FSharp.Core.Operators.seq
--------------------
type seq<'T> = System.Collections.Generic.IEnumerable<'T>
Full name: Microsoft.FSharp.Collections.seq<_>