EK9 Standard Types
EK9 has a number of Built in Types and Collection Types which are designed to be extended and reused. The Standard Types described below however are more like utility API classes.
The Types
- Stdin
- Stdout
- Stderr
- TextFile
- EnvVars
- GetOpt
- FileSystem
- FileSystemPath
- Signals
- SignalHandler
- OS
- Mutex Lock and Key
These types are particularly useful for writing CLI (command line) applications (though can be used in any sort of application). As mentioned in the introduction - purpose; EK9 is aimed at at range of different sized applications. The CLI has come back into favour for a variety of reasons having been sidelined in preference to GUI based applications for many years. EK9 has several types and constructs that support the development of CLI applications. The approach is quite Unix/Linux/MacOS based to some degree; especially with Signal handling.
What's needed
To write an effective CLI application it is necessary to understand and embrace stdin, stdout, stderr; together with text file processing and command line options.
Picking up values from the environment is also quite important for many applications and provides a mechanism to supply runtime configuration (in a limited and controlled manner).
Finally dealing with signals (not very viable if you are on a Windows platform, see Windows limited signal support) and setting of exit codes when the application is complete is also critical for shell programming. If you want to write CLI applications that can work in scripts and in conjunction with other CLI applications, setting the exit code is important as it enables script to determine if the processing worked or failed (and how it failed).
Some may consider this anachronistic/backwards or retrograde (too Unix?); but for many in 'DevOps' roles 'knocking up a quick script' to accomplish small one off tasks it is essential.
But note that you may find existing tooling (if you are on a Unix/Linux platform) can provide 50-60% of the functionality you need. KSH/BASH will get you quite a long way, but can get complex when used at size and as it is interpreted (much like Python); you have to test every path. Use of sed, awk, cut etc are very powerful and capable, use these tools and shell scripting (if you know them) before writing any EK9.
Depending on your background (coder first or sys admin first) you might adopt EK9 sooner rather than later. But once you get past a certain size or complexity you actually need a programming language. For very high performance you could use C, GoLang or Rust; for highly mathematical or very scientific functionality maybe use Python.
This is not intended to put you off using EK9, but to ensure that you make an informed choice in programming language. But EK9 is designed to be used as an intermediate programming language, while it may not have to same level of performance as C/Golang/Rust; it will be performant.
You will find EK9 provides you with a very quick, reliable, readable language to create CLI solutions. With EK9 the code is strongly typed and compiled; therefore many errors will be found before execution. You will also get much more reuse by using EK9. Remember computer resources are much more powerful/capable/flexible and cheaper in comparison to human resources. Developing something that is good/performant enough in a shorter time with fewer defects can be preferred in many cases. Cloud flexibility in running up powerful servers for one off jobs can make the use other languages viable.
Creating the tooling to create an entire cloud infrastructure like the AWS CLI shows the power of adopting the CLI approach. The fact that the CLI approach is still in use 50 years after the creation of the concept; should tell you it is probably here to stay!
It is very likely that with any CLI application it will be necessary to read/write and process text files to locate and access these files it is important to understand paths. Unix/Linux/MacOS uses '/' for directories and Windows uses '\'; EK9 helps with the processing of these paths by providing FileSystemPath.
Worked Example
Unlike some of the other sections prior to this where each type or bit of capability was described in isolation, this section uses a worked example to show how the standard types can be used together.
This example has been coded in a single source file. This is quite a short example and none of the code will be reused in other projects; this is the most simple option (for this explanation).
The set of utility classes listed above are all demonstrated below in this worked example. The example has quite a simple purpose (and is about 400 lines long). Most of the examples up until this point have been quite short, this example is the first that shows a full application (albeit a very simple one). The basic requirement of the application is as follows:
Main Requirements
- To accept a stream of comma separated String values in via 'Standard Input', the first value of each line is an identifier, the rest are just data items.
- To open a named file that contains additional information (in colon separated format - but with the same identifier).
- The application must correlate the two identifiers.
- If the identifier from the 'Standard Input' is not found in the named file; the identifier or line must be printed out to 'Standard Error'.
- If the identifier is found then the two records must be merged and output to 'Standard Ouput' in comma separated format.
- Date fields in the incoming data streams are of the form YYYYMMDD and must be converted to iso format 'Zulu' zone. i.e. 20201230 must be converted to 2020-12-30T00:00:00Z i.e. the start of the day on the 30th.
- The the first line of the output to 'Standard Output' must be a comment (starting with a # character) and must show the name of the user (from the environment) that produced the output and the date and time it was produced.
- If any of the data in the named file is invalid (or the file cannot be accessed) - The application must print the identifier/line (or name of the file) to 'Standard Error' and exit with an exit code of 2.
- If any of the data from the 'Standard Input' is invalid - The identifier/line must be printed out to 'Standard Error' - but the process must continue.
Note that blank lines or lines starting with a # should be discarded.
Additional Requirements
- The application should accept a command line option of -v for verbose mode.
- It should be possible to set debug level logging to 'Standard Error' with a range of 1-3, with 0 being no debugging, 1 being minimal debugging information and 3 being the maximum. -d 2 for example woud be debug level 2.
- The named file should be supplied using the command line option of -f filename.txt - note that the file name can be a fully qualified path or a relative path to the current working directory (both absolute and relative paths must be supported). Clearly the file must be a file and must also be readable.
- On starting the application should print the 'process id' to 'Standard Error' if in verbose mode.
For runtime platforms that can fully support signals (MacOS, Unix and all Linux - but not Windows)
- It must be possible to alter the debug level while the program is running, this should be done by sending Signals to the running application. Specifically SIGUSR1 to increase debug level and SIGUSR2 to decrease the debug level.
- Finally it must be possible to terminate the processing early by sending a Signal to the running application (SIGNABRT and SIGHUP). When the application is terminated in this way it must exit with an exit code of 1.
Please remember support for Signals varies significantly from operating system to operating system.
Sample Inputs
There are two inputs, the 'Standard Input' and the 'Named File'. Examples shown below.
- #Standard Input as follows
- #id, firstName, lastName, dateOfBirth
- AB-900011, John, Doe, 19601220
- #An invalid example - missing first name
- AB-900012,, Tonks, 19631220
- #Named File Input as follows
- #id: emailAddress: dateCustomer: dateLastPurchase
- AB-900011: jdoe@example.com: 20201220: 20210101
- #An invalid example - dateCustomer not valid
- AB-900012: tonks@example.com: 2001220: 20210101
Desired Output
The desired output format is shown below (note dateCustomer is not required):
- #Created by S.Limb on 1 Jan 2021, 19:43:21
- #id, firstName, lastName, emailAddress, dateOfBirth, dateLastPurchase
- AB-900011, John, Doe, jdoe@example.com, 1960-12-20T00:00:00Z, 2021-01-01T00:00:00Z
The example above is fairly typical of a data migration processing application. There are always times when extracting data, cleaning it up, validating it and/or transforming it is needed. As stated before this could be done in any programming language, but below is a demonstration of how it can be accomplished with EK9.
How the solution is structured
The example below makes use of a number of constructs that are available in EK9. These are:
- Functions - abstract, dynamic and standard
- Records - to hold the structure to output
- Classes - to handle command line options and named file processing
- Program - to trigger the whole processing
- Stream pipelines - to complete the processing
Note that it does not define any:
- Constants
- Types
- Packages
- Traits
- Components
- Texts
- Applications
- Services
Just because EK9 has a rich set of constructs does not mean we have to use them all. Like tools in a tool box or ingredients for cooking - you won't always need to use everything. The example below is coded up in a blend of pragmatic Functional and an Object Oriented techniques.
Finally, the standard types listed at the top of this page will be utilised. The example is broken down into sections and each section has an explanation that discusses the design decisions. There are any number of different ways the application could have been designed, from single monolithic program, to just functional or just Object-Oriented (classes).
For example; we could have employed an application with the program and a component to hold the OptionsHandler and command line flag and another component to hold the NamedFileProcessor - those components could then have been injected into functions. This would have reduced the parameter passing and employed IOC (inversion of control). See Components And Applications for this approach to the same example problem.
As with most problems to be solved with software; there are a range of different solutions. These different architectural solutions tend to vary in nature from business area to business area and on the experience of the team members. A solution from a finance team would be different to one from a team with telecommunications or a military software background. Teams with strong mathematics, science will produce solutions in one form and those with engineering backgrounds another. It is hoped that EK9 provides the constructs that allow various solution architectures to be employed.
The aim here is not to show the only one right way; as such a way does not exist.
Starting at the entry point into the main program:
The Main Program
#!ek9 defines module introduction defines program DataCorrelation() -> argv as List of String stdin <- Stdin() stdout <- Stdout() stderr <- Stderr() optionsHandler <- OptionsHandler() options <- optionsHandler.processCommandLine(argv) verboseMode <- options contains "-v" debugLevel <- optionsHandler.processDebugLevel(verboseMode, options.get("-d")) namedFileContents <- NamedFileProcessor(verboseMode, debugLevel).processNamedFile(options.get("-f")) if verboseMode stderr.println(`Process id is ${OS().pid()}`) if verboseMode or debugLevel > 0 stderr.println(`Loaded ${length namedFileContents} records from named file`) setupSignalHandling(verboseMode, debugLevel) validLine <- createLineValidator(stderr, debugLevel) toEntry <- createStdinLineHandler(stderr, debugLevel) byMerging <- createCustomerRecordMerger(stderr, debugLevel, namedFileContents) if debugLevel > 1 stderr.println("Ready to start processing Standard Input") outputHeader(verboseMode, debugLevel, stdout) cat stdin | filter by validLine | map toEntry | filter by validEntry | map byMerging > stdout if debugLevel > 1 stderr.println("Standard Input processing complete") ...
Incoming Parameters
If a program is defined that declares a single incoming parameter of type List of String then the set of command line arguments the user entered will be automatically populated in that object. In this case the name of that object is argv (C tradition). See programs on how EK9 can map command line arguments directly to typed parameters. But in this example we want full control of a range of flexible and optional arguments, and to explain the GetOpt class.
Stdin
To access the 'Standard Input' i.e. the content that is 'piped' into the application just declare a variable of type Stdin. You can use Stdin a bit like an iterator with hasNext() and next(), or as a source for EK9 Stream pipelines as you can see towards the end of the program above.
Stdout
'Standard Out' can again be accessed by declaring a variable of Stdout. You can then use methods like print and println to send String values to the 'Standard Out'. In the program above it is just used as a sink at the end of a Stream pipeline.
Stderr
'Standard Error' is almost the same as 'Standard Out' but just sends content to the error channel. In the example above you can see stderr is used quite widely to provide verbose and debug information.
The Developed Classes
OptionsHandler is a class that will be shown later; it just deals with handling the options from the command line. It produces a Dictionary of the options.
NamedFileProcessor is also a class; it processes the named file and produces a Dictionary of the CustomerRecords (a record shown later).
OS - Operating System
To obtain the process id of the program to print to 'Standard Error'; the provided class OS and method pid() is used.
Standard Functions
setupSignalHandling is a standard function that registers the signal handlers. The implementation of which is shown later. The other standard function outputHeader just outputs the header (commented csv) - the implementation is also shown later.
Higher Order Functions
There are a number of higher order functions that are used to create the functions validLine, toEntry, byMerging that are used in the Stream pipeline that processes stdin; see below.
Main Processing Pipeline
The pipeline below is the main driver of processing for the program. There are alternative ways to implement this functionality using for/while loops. But EK9 provides the Stream pipeline construct to be able to join processing steps together in a readable, reusable and testable manner.
For some this might be too 'functional', but having used it for a while; it seems to bring clarity to stages of processing that seem to get lost in lots of 'nested loops'. The decision logic that was nested in a loop can be pulled out and used separately (and tested in isolation).
Additionally, it is the same pattern of development everywhere (if you use it). To start with it looks a little strange (unless you are from a Unix shell background - then it looks familiar); but after a short while it becomes quite natural to start thinking in the pipeline way.
- cat stdin | filter by validLine | map toEntry | filter by validEntry | map byMerging > stdout
Cat
The single line above encapsulates the stages of processing the incoming data in a abstract but readable way. cat is the command to catenate some type of collection or iterator. In this case stdin is used as a collection of Strings.
Filter
The first stage of the processing is to filter out any blank empty lines or lines that start with #. The high order function createLineValidator creates the validator function (and is shown later). For now we can just use this in an abstract manner (i.e. just accept for now; that it only allows valid lines through).
Map
The next stage is to accept the incoming String and map it to a DictEntry of (String, CustomerRecord). The high order function createStdinLineHandler returns a function that is an extension of the abstract function lineToCustomerRecord. This function signature accepts a String and returns a DictEntry of (String, CustomerRecord). The implementation of createStdinLineHandler is shown later, for now accept it is capable of converting a String to a dictionary entry that has a String as key and a value that is a CustomerRecord (shown later).
Filter
The previous stage will have attempted to convert a String into a valid DictEntry of (String, CustomerRecord) but there might have been something wrong with the data. The requirement was to continue processing but not output invalid data. So this next stage filters out invalid DictEntry of (String, CustomerRecord)s.
Map
The penultimate stage is to map by merging the DictEntry of (String, CustomerRecord) from stdin and those correlated entries from the named file and then produce a valid output String. To accomplish this a mapping function is made by the higher order function createCustomerRecordMerger.
Collect
The final stage is just to send the String created to stdout. If the String is not valid (i.e. not set) then Stdout will just ignore the String and it won't be output.
Discussion
So that's the main part of the program. Why was this approach taken?
- Command line parameters are required - so argv as a List of String is needed
- stdin, stdout and stderr are needed throughout - so declare up front
- Dealing with the command-line parameters is an ancillary operation - so it is wrapped up into a class
- Named file process is a defined operation - so it too is wrapped up into a class
- Signal handing is important but can be completed in a single function - so use a standard function
- As the main processing will be done via a pipeline all the functionality must be in functions
- As the functionality in each pipeline function is significant it needs to be encapsulated in a higher order function.
- Some output strings have been interpolated but could have been pulled out to Text constructs.
As stated before there are different ways to code the requirements. Indeed, first time though only a program was used, then bigger blocks of code were pulled into functions out of the program. Then those functions were broken down creating smaller functions. So all functions and no classes at all.
But when those functions were broken down further, it became simpler to pull those functions into classes as methods. These classes are shown later and many of the methods are hidden as private. It is possible to go further and encapsulated some of the classes/data into components.
The next phase would be to pull all the output Strings into interpolated Strings and then refactor them into Text blocks. This would then enable the application to be ported to different spoken languages.
Development Process
There is an important point here; 'Just do something, start writing code'. This may be (probably is) counter to what you have been taught (sit there for days designing and procrastinating - do some UML, power points etc).
Ideally I'd like to have shown the example driver code (Test Cases) in this example - but it would be too large.
But, just do bits you know need doing, come at it from different angles and points of view - then worry about how it will all fit later. Don't be afraid to delete stuff, don't get emotionally attached to code (strangely don't invest too much effort too early). Then when you see complexity or potential reuse; refactor, 'pull apart', encapsulate, use abstractions. In general move 'stuff' about or delete it. Then the solution will drop into place (and you will have enjoyed it).
There is a reason EK9 has so many constructs, use them when you 'feel' this time is right (if time is never right; then don't use them!). Fluidity and freedom to relocate processing is why EK9 has evolved in the way it has.
Development with EK9 has been designed to be enjoyable and fluid.
The Functions
... defines function lineValidator() as pure abstract -> line as String <- rtn as Boolean? lineToCustomerRecord() as pure abstract -> line as String <- rtn as DictEntry of (String, CustomerRecord) customerRecordMerger() as pure abstract -> entry as DictEntry of (String, CustomerRecord) <- rtn as String? createLineValidator() as pure -> stderr as Stderr debugLevel as Integer <- validator as lineValidator validator: (stderr, debugLevel) is lineValidator as pure function rtn: line is not empty and #<line != '#' if debugLevel > 1 and not rtn stderr.println("Discarding [" + line + "]") createStdinLineHandler() as pure -> stderr as Stderr debugLevel as Integer <- processor as lineToCustomerRecord processor: (stderr, debugLevel) is lineToCustomerRecord as pure function rtn: DictEntry() splitLine <- line.split(/,/) if debugLevel == 3 stderr.println("About to split stdin line [" + line + "]") if length of splitLine == 4 id <- splitLine.get(0).trim() firstname <- splitLine.get(1).trim() lastname <- splitLine.get(2).trim() dobStr <- splitLine.get(3).trim() if id is not empty and firstname is not empty and lastname is not empty and dobStr is not empty dob <- $getAsDateTime(id, false, dobStr) if dob? entryValue <- CustomerRecord(id, firstname, lastname, String(), dob, String()) if debugLevel == 3 stderr.println("Line [" + line + "] processed") rtn: DictEntry(id, entryValue) if not rtn? stderr.println("Invalid line [" + line + "]") getAsDateTime() as pure -> id as String throwException as Boolean aDateInput as String //expecting YYYYMMDD <- rtn as DateTime: DateTime() group <- aDateInput.group(/(\d{4})(\d{2})(\d{2})/) if length group == 3 dateStr <- cat group | join with dashSeparated | collect as String rtn: DateTime(Date(dateStr)) else if throwException throw Exception("Id [" + id + "] Invalid date [" + aDateInput + "]", 2) else Stderr().println("Id [" + id + "] Invalid date [" + aDateInput + "]") validEntry() as pure -> entry as DictEntry of (String, CustomerRecord) <- rtn as Boolean: entry? dashSeparated() as pure -> firstPart String secondPart String <- rtn as String: firstPart? and secondPart? <- firstPart + "-" + secondPart : String() outputFormatSeparated() as pure -> firstPart String secondPart String <- rtn as String: firstPart? and secondPart? <- firstPart + ", " + secondPart : String() setupSignalHandling() -> verboseMode as Boolean debugLevel as Integer terminationHandler <- (verboseMode, debugLevel) of SignalHandler override handleSignal() -> signal as String <- rtn as Integer: 1 //Process will exit with code of one if verboseMode or debugLevel > 0 Stderr().println("Handled Terminal Signal [" + signal + "]") terminations <- Signals().register(Strings("HUP", "ABRT"), terminationHandler) if terminations not contains "HUP" Stderr().println("HUP Signal not supported") if terminations not contains "ABRT" Stderr().println("ABRT Signal not supported") debugHandler <- (verboseMode, debugLevel) of SignalHandler override handleSignal() -> signal as String <- rtn as Integer: Integer() //Note the Integer is not set so process will not terminate if signal == "USR1" and debugLevel < 3 debugLevel++ else if signal == "USR2" and debugLevel > 0 debugLevel-- if verboseMode or debugLevel > 0 Stderr().println("Handled Info Signal [" + signal + "] Debug Level now [" + $debugLevel + "]") debugs <- Signals().register(Strings("USR1", "USR2"), debugHandler) if debugs not contains "USR1" Stderr().println("USR1 Signal not supported") if debugs not contains "USR2" Stderr().println("USR2 Signal not supported") outputHeader() -> verboseMode as Boolean debugLevel as Integer stdout as Stdout enGB <- Locale("en_GB") if verboseMode or debugLevel > 2 Stderr().println("About to create output header with locale " + $enGB) envVars <- EnvVars() user <- envVars contains "USER" <- envVars.get("USER") else envVars.get("USERNAME") dateTime <- enGB.mediumFormat(SystemClock().dateTime()) stdout.println("#Created by " + user + " on " + dateTime) stdout.println("#id, firstName, lastName, emailAddress, dateOfBirth, dateLastPurchase") createCustomerRecordMerger() -> stderr as Stderr debugLevel as Integer namedFileContents as Dict of (String, CustomerRecord) <- merge as customerRecordMerger merge: (stderr, debugLevel, namedFileContents) is customerRecordMerger rtn: String() //Output will be unset by default and there for not ouput id <- entry.getKey() if debugLevel == 3 stderr.println("Merging id [" + id + "]") if namedFileContents contains id stdinEntry <- entry.getValue() namedFileEntry <- namedFileContents.get(id).get() stdinEntry :~: namedFileEntry rtn := $stdinEntry else stderr.println("Not merging [" + id + "] as named file does not contain id") ...
Abstract Functions
Well that's quite a few functions, the first three are just function signatures to be used by dynamic functions. The createLineValidator is the first of the higher order functions that just creates and returns a dynamic function validator which is of type lineValidator. As you can see it just checks if the line is empty or start with #.
createStdinLineHandler is really the main high order function that creates the function that parses the incoming line from stdin, it validates the content and creates a CustomerRecord that is added into a DictEntry (which can in turn be further processed). Note that if processing fails an un set DictEntry is returned not a note of the line that failed is output to stderr.
EnvVars
The function outputHeader deals with outputting the header comment and uses a class as shown below. Environment variables are not standard per platform and can be altered and manipulated before your program runs (this can be both good and bad). Note that EnvVars looks an behaves much like a dictionary in many ways (indeed you can call keys() to get the names of all the entries).
- envVars ← EnvVars()
- user ← envVars contains "USER" ← envVars.get("USER") else envVars.get("USERNAME")
Most of the rest of the functions are fairly obvious, but setupSignalHandling and createCustomerRecordMerger need more of an explanation.
setupSignalHandling
This function sets up two SignalHandlers the first is one for terminating the application when signals HUP/ABRT are received as that handler returns and Integer value of 1 it will cause the whole program to exit with a code of 1. Remember signal handling is a function of the platform you are running on (Windows has limited support for this).
Signal Handler
The second SignalHandler is for non terminating signals of USR1/USR2 as you can see they increment and decrement the debugLevel.
Now the observant will be asking how can altering the debugLevel like this actually affect the whole application. If you think of debugLevel as a primitive type this approach just won't work. But EK9 only has Objects, so from the main program line:
- debugLevel ← optionsHandler.processDebugLevel(...)
- //And is passed in by reference to the function below
- setupSignalHandling(verboseMode, debugLevel)
- //The debugLevel is further 'captured' by the dynamic class implementation of SignalHandler
- debugHandler ← DebugHandler(verboseMode, debugLevel) of SignalHandler
Because debugLevel is an Object it is passed by reference everywhere, this means that the debugHandler method handleSignal() and has 'captured' debugLevel it means that all references actually point to the same memory location. Hence debugLevel++ and debugLevel-- will operator on the same value.
This is very unlike an int primitive, in addition we are allowing modification of data in various functions (not very pure).
createCustomerRecordMerger
This is the second significant high order function. It serves the purpose of creating the merge function that looks up and correlating ids, merges the two CustomerRecords; one from 'Standard Input' and the other from the 'Named File', both of which are partial. The merge is done via the operator :~:, the conversion to a String is done by the $ operator - both on the CustomerRecord record (shown later).
- //The Merge
- stdinEntry :~: namedFileEntry
- //The Conversion to a String
- rtn := $stdinEntry
The namedFileContents object (which is the dictionary of the 'named file' contents) is also 'captured' from the main program down through the high order function createCustomerRecordMerger and into the dynamic function merge as customerRecordMerger.
Clearly if you had several applications with a need for dealing with comma separated or colon separated data, you would pull the appropriate abstract and concrete functions into a separate utility module so they could be reused.
The same could be said for the signal handing, ideally all your application would use the same signal processing mechanism and so could reused the setupSignalHandling function.
By creating a range of reusable and small functions you can build an internal library of small reliable and reusable software 'chunks'.
The Classes
There are only two classes used in this implementation and OptionsHandler is shown below. This class has the responsibility of processing the command line options that the user entered when running the application.
... defines class OptionsHandler processCommandLine() as pure -> arguments as List of String <- options as Dict of (String, String): Dict() getopts <- setupGetOpt(":") options := getopts.options(arguments) private setupGetOpt() as pure -> rqParam as String <- getopts as GetOpt of String? supportedOptions <- setupSupportedOptions(rqParam) usage <- setupUsage() getopts: GetOpt(String()).make(rqParam, supportedOptions, usage) private setupSupportedOptions() as pure -> rqParam as String <- rtn as Dict of (String, String): Dict() supportedOptions <- { "-v": String(), "-f": rqParam, "-d": rqParam } rtn: supportedOptions private setupUsage() as pure <- rtn as String: "Invalid option, only those list below are supported:\n" rtn += "-v, verbose\n" rtn += "-f filename, use file of filename (mandatory option)\n" rtn += "-d level, use of debugging" processDebugLevel() -> verboseMode as Boolean level as Optional of String <- rtn as Integer: 0 //default to zero debugLevel <- cat level | collect as Integer rtn := debugLevel in 0 ... 3 <- debugLevel else 0 if verboseMode and debugLevel not in 0 ... 3 Stderr().println("Debug level " + $debugLevel + " not supported - reverting to debug level '0'") ...
GetOpt
The processCommandLine method on the class OptionsHandler uses private methods to setup and configure the GetOpt generic class. Each of the private methodsperform a specific task; configuring the supported options and configuring the help text options.
The final public method; processDebugLevel is called from the main program to access the debug level option the user entered on the command line (if it was entered). The ternary operator tests the result of cat level | collect as Integer as it may not even be set or it could be less than zero or greater than 3 and ensure it is between 0 and 3.
TextFile
The second class NamedFileProcessor it has the single responsibility of loading the data from the 'named file' into a Dict of (String, CustomerRecord). But it is much stricter about errors.
... NamedFileProcessor verboseMode as Boolean: false debugLevel as Integer: Integer() NamedFileProcessor() -> verboseMode as Boolean debugLevel as Integer this.verboseMode = verboseMode this.debugLevel = debugLevel processNamedFile() -> filename as Optional of String <- rtn as Dict of (String, CustomerRecord): Dict() stderr <- Stderr() if not filename? throw Exception("Filename of 'named file' is required", 2) namedFile <- getNamedFile(filename.get()) validLine <- createLineValidator(stderr, debugLevel) toEntry <- createNamedFileLineHandler(stderr, debugLevel) if verboseMode or debugLevel > 0 stderr.println("About to start processing [" + $namedFile + "]") cat namedFile | filter by validLine | map toEntry > rtn if debugLevel > 1 stderr.println("Processing [" + $namedFile + "] complete") private getNamedFile() -> filename as String <- rtn as TextFile: TextFile() filePath <- FileSystemPath(filename) rtn := filePath.isAbsolute() <- TextFile(filePath) else TextFile(FileSystem().cwd() + filePath) if not rtn.isReadable() throw Exception($rtn + " is not readable", 2) if not rtn.isFile() throw Exception($rtn + " is not a file", 2) private createNamedFileLineHandler() as pure -> stderr as Stderr debugLevel as Integer <- processor as lineToCustomerRecord? processor: (stderr, debugLevel) is lineToCustomerRecord as pure function //The regular expression /:/ is significant should it also be a constant? splitLine <- line.split(/:/) if debugLevel == 3 stderr.println("About to split line [" + line + "]") if length of splitLine != 4 throw Exception("Invalid line [" + line + "]", 2) id <- splitLine.get(0).trim() if id is empty throw Exception("Invalid line [" + line + "] empty ID", 2) emailAddress <- splitLine.get(1).trim() //A significant regular expression buried deep in processing! maybe use a constant. if emailAddress not matches /[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$/ throw Exception("Id [" + id + "] Invalid email address [" + emailAddress + "]", 2) dateLastPurchase <- $getAsDateTime(id, true, splitLine.get(3).trim()) entryValue <- CustomerRecord(id, String(), String(), emailAddress, String(), dateLastPurchase) if debugLevel == 3 stderr.println("Named file line [" + line + "] processed") rtn: DictEntry(id, entryValue) ...
Firstly in the public method processNamedFile() a check is made to ensure a file name has been provided. An Exception is thrown with exit code 2 if no file name was provided. The next step is to get a TextFile using method getNamedFile(). Then a couple of pipeline functions are created and the main pipeline processing is started.
private getNamedFile()
This private method firstly creates a FileSystemPath from the String filename. This is followed by a ternary operation to see if the filename is a relative or absolute path to a file. There is then a check on whether the file is readable and actually is a file. Finally the TextFile is returned.
Current Working Directory
When processing the path it is important to get the users current working directory, this is done by using the FileSystem class.
createLineValidator
This high order function has already been covered.
createNamedFileLineHandler
This is the high order function/method that creates the function that accepts each line and splits the line by the : separator, it then does some checks on the data and throws Exceptions if any of the data is invalid. It uses the standard function called getAsDateTime to process the date.
Processing Pipeline
As you can see from the processing pipeline below its structure and pattern is almost the same as the main processing pipeline in the main program. In fact it uses the same standard high order function createLineValidator to create the function to validate lines. The main difference in the final part of the processing is just to collect the DictEntry to return variable rtn.
- cat namedFile | filter by validLine | map toEntry > rtn
By employing functions rather than just classes it is possible to create much more reuse as shown with the createLineValidator and the abstract function signatures. This also reduces the need for excessive class hierarchies and inheritance. Basically the smaller and tighter the 'chunk' of code and with fewest side effects the more reusable and reliable the code is.
As you can see the same pattern of processing can be applied over and over again and the pipeline is simpler to understand and read than using nested loops. The final part of the overall program is the data item that is the main subject of how the pipeline merging and output actually works - the record.
The Record
The CustomerRecord is mainly used as a data object, but does have a couple of key operators for processing the data it holds.
... defines record CustomerRecord id <- String() firstName <- String() lastName <- String() email <- String() dob <- String() lastPurchase <- String() operator ? as pure <- rtn as Boolean: id? operator :~: -> arg as CustomerRecord <- rtn as CustomerRecord: this id :=? String(arg.id) firstName :=? String(arg.firstName) lastName :=? String(arg.lastName) email :=? String(arg.email) dob :=? String(arg.dob) lastPurchase :=? String(arg.lastPurchase) operator $ as pure <- rtn as String: cat [id, firstName, lastName, email, dob, lastPurchase] | join with outputFormatSeparated | collect as String //EOF
The data this record holds is pretty obvious as it is the output data that must be converted to a single comma separated String.
operator ?
The is set operator is used to check if the record is valid for filtering and processing.
operator :~:
The merge operator is used to 'merge' the stdin partial record and the named file partial record into one fully populated record. The assignment coalescing operator is used in the merge. This only assigns the property/field if it is currently un set. i.e it coalesces the field and the argument field.
operator $
The string conversion operator takes each of the properties/fields on the record and places then in a List of Strings this is then used in another pipeline to join the fields with a 'comma' and these are collected into a rtn String. This could be written long hand like this.
- fields ← [id, firstName, lastName, email, dob, lastPurchase]
- cat fields | join with outputFormatSeparated > rtn
This final pipeline could have been accomplished by just using the + operator on each of the fields as shown below.
- rtn := id + ", " + firstName + ", " + lastName + ", " + email + ", " + dob + ", " + lastPurchase
The alternative would be to use String interpolation.
- rtn := `${id}, ${firstName}, ${lastName}, ${email}, ${dob}, ${lastPurchase}`
While the code above is slightly shorter - it is a bit more intricate. If we needed to add more fields or change the order of the fields it much easier and more reliable to just use the List approach. The other advantage is that if the output format needs to be altered it has been encapsulated within a single function outputFormatSeparated.
Summary
This has been a long section, but it has shown a more concrete example of how EK9 can be used. There is little doubt the functionality could have been written in a single program without any other constructs. The code would (and was) much shorter, but decision logic was mixed in with format logic. There was also quite a bit of repetition and duplication.
You might argue the above example code is over engineered, depending on your background and experience you may or may not be right. When developing a solution to the problem set out above; you have to find the right balance of engineering to bloat/complexity or just plain 'dirty' code.
Those from a functional background would probably argue it is under engineered and should have used much more pure immutability and has missed the opportunity to make it more robust.
Justification of Approach
So as to provide some justification for the approach above it should be obvious that the following are all encapsulated:
- Command Line processing
- Named file processing
- Signal handling and setup
- Customer record merging and output formatting
- pure could have been used in many of the functions/methods for immutability
- Strings could have been refactored out to text constructs to enable spoken language portability
- String interpolation could have been used more than it has
- constants could be used for values like the splitting on colon/comma or checking email addresses
So if there were defects or a need to improve/augment the software it would be really obvious where to look in the code. Moreover by breaking a monolithic program into functions and classes it is now possible to make those pure and also refactor them out to separate modules for reuse.
It should also be clear that what can be re-used has been re-used
- createLineValidator
- getAsDateTime
- CustomerRecord
Again if new fields were required or a variation on date formats needed to be accepted it would be obvious where to look.
Common Design patterns are used and re-used
- cat stdin | filter by validLine | map toEntry | filter by validEntry | map byMerging > stdout
- dateStr ← cat group | join with dashSeparated | collect as String
- debugLevel ← cat level | collect as Integer
- cat namedFile | filter by validLine | map toEntry > rtn
- cat fields | join with outputFormatSeparated > rtn
You may or may not like the Stream pipeline approach, but if you needed to alter the processing it would again be simple to look at where and how to alter that processing - rather than looking in deep nested loops.
Sample of the Looping approach
As a contrast; the 'named file processing' could have been implemented directly in the program in the following way. For some developers this may feel more natural (especially if you are from a C or Python background). Personally I find it distracting to have all that low level utility code high up in the main part of the program.
#!ek9 defines module introduction defines program DataCorrelation() -> argv as Strings stdin <- Stdin() stdout <- Stdout() stderr <- Stderr() optionsHandler <- OptionsHandler() options <- optionsHandler.processCommandLine(argv) verboseMode <- options contains "-v" debugLevel <- optionsHandler.processDebugLevel(verboseMode, options.get("-d")) //So lets comment out the use of the class and method for processing //namedFileContents <- NamedFileProcessor(verboseMode, debugLevel).processNamedFile(options.get("-f")) //Now inline all the code that is needed filename <- options.get("-f") if not filename? throw Exception("Filename of 'named file' is required", 2) filePath <- FileSystemPath(filename.get()) namedFile <- filePath.isAbsolute() <- TextFile(filePath) else TextFile(FileSystem().cwd() + filePath) if not namedFile.isReadable() throw Exception($namedFile + " is not readable", 2) if not namedFile.isFile() throw Exception($namedFile + " is not a file", 2) if verboseMode or debugLevel > 0 stderr.println("About to start processing [" + $namedFile + "]") namedFileContents as Dict of (String, CustomerRecord): Dict() try -> input <- namedFile.input() while input? line <- input.next() if line is not empty and #<line != '#' splitLine <- line.split(/:/) if debugLevel == 3 stderr.println("About to split line [" + line + "]") if length of splitLine != 4 throw Exception("Invalid line [" + line + "]", 2) id <- splitLine.get(0).trim() if id is empty throw Exception("Invalid line [" + line + "] empty ID", 2) emailAddress <- splitLine.get(1).trim() if emailAddress not matches /[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$/ throw Exception("Id [" + id + "] Invalid email address [" + emailAddress + "]", 2) dateLastPurchase <- $getAsDateTime(id, true, splitLine.get(3).trim()) entryValue <- CustomerRecord(id, String(), String(), emailAddress, String(), dateLastPurchase) if debugLevel == 3 stderr.println("Named file line [" + line + "] processed") namedFileContents += DictEntry(id, entryValue) else if debugLevel > 1 stderr.println("Discarding [" + line + "]") //End of inlining the code if verboseMode stderr.println("Process id is [" + $OS().pid() + "]") if verboseMode or debugLevel > 0 stderr.println("Loaded " + $ length namedFileContents + " records from named file") setupSignalHandling(verboseMode, debugLevel) validLine <- createLineValidator(stderr, debugLevel) toEntry <- createStdinLineHandler(stderr, debugLevel) byMerging <- createCustomerRecordMerger(stderr, debugLevel, namedFileContents) if debugLevel > 1 stderr.println("Ready to start processing Standard Input") outputHeader(verboseMode, debugLevel, stdout) cat stdin | filter by validLine | map toEntry | filter by validEntry | map byMerging > stdout if debugLevel > 1 stderr.println("Standard Input processing complete") ...
Now I feel then need to explain the code with comments in the code, whereas before the breaking down into classes and functions gave me the opportunity to create a meaningful name to encapsulate that bit of functionality.
As an aside, this above does demonstrate the alternative loop and try with resource approach to handling resources like the TextFile as shown below.
- try
- → input ← namedFile.input()
- while input?
- line ← input.next()
By creating the 'input' within the incoming try parameter by using 'namedFile.input()', the try block will automatically close the input once processing is finished (much like Java does). The input is then used like an iterator to get the next line from the input. For many this will be a very familiar concept (though the syntax is different).
Interestingly (having written the same functionality in two different ways) you can see patterns of how to convert the above 'procedural code' into a more 'functional pipeline' (should you want to). Note the check for the line content (shown below).
- if line is not empty and #< line != '#'
If there is no meaningful 'else' (other than the output to stderr) then it can be pulled into a filter function! You can then see the next bit of the code above just really takes the incoming String and processes and validates its parts before making a partial DictEntry of (String, CustomerRecord). Look like it maps some type of content into another type - so a map function.
An additional class is shown below (not used in the example above) but useful in a range of applications.
Mutex Lock and Key
Really only to be used as a matter of necessity where the application has multiple concurrent Threads running and read/write data must be shared between the threads. Typically this is the case where async is used in pipelines, TCP, UDP or other HTTP server type constructs. Here be Dragons - as they say.
Below is a simple example of the syntax and design pattern to be used with MutexLocks.
#!ek9 defines module introduction defines function createProtectedName() <- lock as MutexLock of String: MutexLock("Steve") defines program LockExample() stdout <- Stdout() lockableItem <- createProtectedName() accessKey <- (stdout) extends MutexKey of String as function stdout.println("Accessing [" + $value + "]") //Now update the value - by copying data into it. value :=: "Stephen" //Now try access via key and wait on mutex lockableItem.enter(accessKey) //As there are no other threads access will be granted here //If other held the lock this would return false and key.access() would not be called. assert lockableItem.tryEnter(accessKey) //EOF
While on the surface the code above looks fairly straightforward, multi-threaded access to a single data structure is fraught with difficulties and race conditions and even deadlock. Avoid if at all possible. But sometimes you cannot avoid it and this is what the Mutex lock/key are for.
I'll say it one last time - rack your brains for a solution that does not involve concurrent access to data structures before going down that solution path.
What the EK9 MutexLock gives you has the following characteristics/conditions:
- The data structure you are protecting is associated with the lock itself
- The data held in the lock can only be accessed via get() when the lock is held
- Block and wait to gain access enter(key) or tryEnter(key) if you don't want to wait
- Check if the lock is owned so that you can access the data
- There is a very specific defined scope where you have access to the protected data
- Do the minimum amount of processing in the access() method
- Avoid calling out of process or long running operations in the enter() method
- The lock is guaranteed to be released at the end of the enter() method
You maybe wondering why create such a big object/process around access to data; why not just use synchronized or something like that? Multi-threaded access to shared data is a big deal. So just like doing remote calls to other systems via TCP/UDP/HTTP don't try and hide any of the nasty details; get them out in the open. These are not normal method calls, they are costly/expensive and risky calls that need 'focus'.
What the MutexLock does not do is stop a developer taking a reference to the protected item and passing it around to be modified outside of a lock. If and when you build complex data structures like trees of lists you have to be meticulous in copying data and not just holding references. If you hold references and some other part of the application also has a reference to the same data, then that underlying data can be altered without the locks being obtained. This requires extreme discipline. You must give that data to the MutexLock like in the example above with "Steve" and "Stephen", the references to those values are lost to everything but the MutexLock.
If you have to have more than one data structure protected by a MutexLock you've just made your life very hard indeed. It is highly likely that over a prolonged period of development with long lived code you will obtain locks in different orders; Deadlock will occur from time to time in a seeming random manner.
Avoid MutexLocks of data if at all possible, if you have to use MutexLocks; employ pure with rigour. Your mindset has to see modification of a MutexLocked variable as a very rare and very significant event!
When estimating work, multiply your estimate by 2 if you have to do any multi-threaded work. If the data structures are shared now multiply your estimate by 5. If there a multiple data structures being shared multiply by 10 and expect live operational issues. However hard you think the development will be - it will turn out to be much harder to do correctly with multiple threads!
If you never use this class consider that a major achievement! You have saved yourself a whole world of hate.
Conclusion
It is hoped that this longer example that does have some meaningful functionality and shows different approaches to development (in general and with EK9 specifically) will provide you with some concrete snips of code. It shows different techniques that you may not have employed before and demonstrates the the EK9 language can be used for CLI development.
Next Steps
If you are interested in networking with EK9 then the next section on network types should be of interest.