Example: Training in the Cloud
This example demonstrates Norvig's Spelling Corrector (http://norvig.com/spell-correct.html). It is a prototypical workflow for training and learning in the cloud. You use the cloud to extract statistical information from a body of text. The statistical summary is used locally in your client application.
This example is from the MBrace Starter Kit.
Part 1 - Extract Statistics in the Cloud
1: 2: 3: 4: 5: |
|
Step 1: download text file from source, saving it to blob storage chunked into smaller files of 10000 lines each.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: |
|
In the second step, use cloud data flow to perform a parallel word frequency count on the stored text.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: |
|
Part 2 - Use the Frequency Counts in our Application
In the final step, use the calculated frequency counts to compute suggested spelling corrections in your client.
At this point, you've finished using the cluster and no longer need it.
We have the computed the frequency table, all the rest of this example is run locally.
The statistics could be saved to disk for use in an application. We will use them directly in the client.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: |
|
In this example, you've seen how cloud tasks can be used to extract statistical information returned to the client. Continue with further samples to learn more about the MBrace programming model.
Note, you can use the above techniques from both scripts and compiled projects. To see the components referenced by this script, see ThespianCluster.fsx or AzureCluster.fsx.
Full name: 200-norvigs-spelling-corrector-example.cluster
Full name: Config.GetCluster
Gets or creates a new Thespian cluster session.
Full name: 200-norvigs-spelling-corrector-example.fs
Full name: 200-norvigs-spelling-corrector-example.download
val string : value:'T -> string
Full name: Microsoft.FSharp.Core.Operators.string
--------------------
type string = String
Full name: Microsoft.FSharp.Core.string
type WebClient =
inherit Component
new : unit -> WebClient
member BaseAddress : string with get, set
member CachePolicy : RequestCachePolicy with get, set
member CancelAsync : unit -> unit
member Credentials : ICredentials with get, set
member DownloadData : address:string -> byte[] + 1 overload
member DownloadDataAsync : address:Uri -> unit + 1 overload
member DownloadFile : address:string * fileName:string -> unit + 1 overload
member DownloadFileAsync : address:Uri * fileName:string -> unit + 1 overload
member DownloadString : address:string -> string + 1 overload
...
Full name: System.Net.WebClient
--------------------
WebClient() : unit
member Clone : unit -> obj
member CopyTo : array:Array * index:int -> unit + 1 overload
member GetEnumerator : unit -> IEnumerator
member GetLength : dimension:int -> int
member GetLongLength : dimension:int -> int64
member GetLowerBound : dimension:int -> int
member GetUpperBound : dimension:int -> int
member GetValue : [<ParamArray>] indices:int[] -> obj + 7 overloads
member Initialize : unit -> unit
member IsFixedSize : bool
...
Full name: System.Array
val chunkBySize : n:int -> numbers:'T [] -> 'T [] []
Full name: Utils.Array.chunkBySize
--------------------
val chunkBySize : chunkSize:int -> array:'T [] -> 'T [] []
Full name: Microsoft.FSharp.Collections.Array.chunkBySize
Full name: Microsoft.FSharp.Collections.Array.mapi
Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.sprintf
Full name: 200-norvigs-spelling-corrector-example.downloadTask
Full name: 200-norvigs-spelling-corrector-example.files
Full name: 200-norvigs-spelling-corrector-example.fileSizesJob
Full name: Microsoft.FSharp.Collections.Array.map
static val DirectorySeparatorChar : char
static val AltDirectorySeparatorChar : char
static val VolumeSeparatorChar : char
static val InvalidPathChars : char[]
static val PathSeparator : char
static member ChangeExtension : path:string * extension:string -> string
static member Combine : [<ParamArray>] paths:string[] -> string + 3 overloads
static member GetDirectoryName : path:string -> string
static member GetExtension : path:string -> string
static member GetFileName : path:string -> string
...
Full name: System.IO.Path
Full name: 200-norvigs-spelling-corrector-example.fileSizes
Full name: 200-norvigs-spelling-corrector-example.regex
type Regex =
new : pattern:string -> Regex + 1 overload
member GetGroupNames : unit -> string[]
member GetGroupNumbers : unit -> int[]
member GroupNameFromNumber : i:int -> string
member GroupNumberFromName : name:string -> int
member IsMatch : input:string -> bool + 1 overload
member Match : input:string -> Match + 2 overloads
member Matches : input:string -> MatchCollection + 1 overload
member Options : RegexOptions
member Replace : input:string * replacement:string -> string + 5 overloads
...
Full name: System.Text.RegularExpressions.Regex
--------------------
Regex(pattern: string) : unit
Regex(pattern: string, options: RegexOptions) : unit
| None = 0
| IgnoreCase = 1
| Multiline = 2
| ExplicitCapture = 4
| Compiled = 8
| Singleline = 16
| IgnorePatternWhitespace = 32
| RightToLeft = 64
| ECMAScript = 256
| CultureInvariant = 512
Full name: System.Text.RegularExpressions.RegexOptions
Full name: 200-norvigs-spelling-corrector-example.wordCountJob
module CloudFlow
from MBrace.Flow
--------------------
module CloudFlow
from Utils
--------------------
type CloudFlow =
static member OfArray : source:'T [] -> CloudFlow<'T>
static member OfCloudArrays : cloudArrays:seq<#CloudArray<'T>> -> LocalCloud<PersistedCloudFlow<'T>>
static member OfCloudCollection : collection:ICloudCollection<'T> * ?sizeThresholdPerWorker:(unit -> int64) -> CloudFlow<'T>
static member OfCloudDirectory : dirPath:string * serializer:ISerializer * ?sizeThresholdPerCore:int64 -> CloudFlow<'T>
static member OfCloudDirectory : dirPath:string * ?deserializer:(Stream -> seq<'T>) * ?sizeThresholdPerCore:int64 -> CloudFlow<'T>
static member OfCloudDirectory : dirPath:string * deserializer:(TextReader -> seq<'T>) * ?encoding:Encoding * ?sizeThresholdPerCore:int64 -> CloudFlow<'T>
static member OfCloudDirectoryByLine : dirPath:string * ?encoding:Encoding * ?sizeThresholdPerCore:int64 -> CloudFlow<string>
static member OfCloudFileByLine : path:string * ?encoding:Encoding -> CloudFlow<string>
static member OfCloudFileByLine : paths:seq<string> * ?encoding:Encoding * ?sizeThresholdPerCore:int64 -> CloudFlow<string>
static member OfCloudFiles : paths:seq<string> * serializer:ISerializer * ?sizeThresholdPerCore:int64 -> CloudFlow<'T>
...
Full name: MBrace.Flow.CloudFlow
--------------------
type CloudFlow<'T> =
interface
abstract member WithEvaluators : collectorFactory:LocalCloud<Collector<'T,'S>> -> projection:('S -> LocalCloud<'R>) -> combiner:('R [] -> LocalCloud<'R>) -> Cloud<'R>
abstract member DegreeOfParallelism : int option
end
Full name: MBrace.Flow.CloudFlow<_>
static member CloudFlow.OfCloudFileByLine : paths:seq<string> * ?encoding:Text.Encoding * ?sizeThresholdPerCore:int64 -> CloudFlow<string>
Full name: MBrace.Flow.CloudFlow.collect
Regex.Matches(input: string, startat: int) : MatchCollection
from Microsoft.FSharp.Collections
Full name: Microsoft.FSharp.Collections.Seq.cast
Full name: MBrace.Flow.CloudFlow.map
inherit Group
member Groups : GroupCollection
member NextMatch : unit -> Match
member Result : replacement:string -> string
static member Empty : Match
static member Synchronized : inner:Match -> Match
Full name: System.Text.RegularExpressions.Match
String.ToLower(culture: Globalization.CultureInfo) : string
Full name: MBrace.Flow.CloudFlow.countBy
Full name: Microsoft.FSharp.Core.Operators.id
Full name: MBrace.Flow.CloudFlow.toArray
Full name: 200-norvigs-spelling-corrector-example.NWORDS
module Map
from Microsoft.FSharp.Collections
--------------------
type Map<'Key,'Value (requires comparison)> =
interface IEnumerable
interface IComparable
interface IEnumerable<KeyValuePair<'Key,'Value>>
interface ICollection<KeyValuePair<'Key,'Value>>
interface IDictionary<'Key,'Value>
new : elements:seq<'Key * 'Value> -> Map<'Key,'Value>
member Add : key:'Key * value:'Value -> Map<'Key,'Value>
member ContainsKey : key:'Key -> bool
override Equals : obj -> bool
member Remove : key:'Key -> Map<'Key,'Value>
...
Full name: Microsoft.FSharp.Collections.Map<_,_>
--------------------
new : elements:seq<'Key * 'Value> -> Map<'Key,'Value>
Full name: Microsoft.FSharp.Collections.Map.ofArray
Full name: 200-norvigs-spelling-corrector-example.isKnown
Full name: 200-norvigs-spelling-corrector-example.edits1
module Set
from Microsoft.FSharp.Collections
--------------------
type Set<'T (requires comparison)> =
interface IComparable
interface IEnumerable
interface IEnumerable<'T>
interface ICollection<'T>
new : elements:seq<'T> -> Set<'T>
member Add : value:'T -> Set<'T>
member Contains : value:'T -> bool
override Equals : obj -> bool
member IsProperSubsetOf : otherSet:Set<'T> -> bool
member IsProperSupersetOf : otherSet:Set<'T> -> bool
...
Full name: Microsoft.FSharp.Collections.Set<_>
--------------------
new : elements:seq<'T> -> Set<'T>
Full name: Microsoft.FSharp.Collections.Set.ofList
Full name: 200-norvigs-spelling-corrector-example.knownEdits1
Full name: Microsoft.FSharp.Collections.Map.containsKey
Full name: 200-norvigs-spelling-corrector-example.knownEdits2
Full name: 200-norvigs-spelling-corrector-example.findBestCorrection
Full name: Microsoft.FSharp.Collections.Seq.sortBy
Full name: Microsoft.FSharp.Collections.Seq.head