EK9 Standard Types

EK9 has a number of Built in Types and Collection Types which are designed to be extended and reused. The Standard Types described below however are more like utility API classes.

The Types

These types are particularly useful for writing CLI (command line) applications (though can be used in any sort of application). As mentioned in the introduction - purpose; EK9 is aimed at at range of different sized applications. The CLI has come back into favour for a variety of reasons having been sidelined in preference to GUI based applications for many years. EK9 has several types and constructs that support the development of CLI applications. The focus is quite Unix/Linux/MacOS based to some degree; especially with Signal handling.

What's needed

To write an effective CLI application it is necessary to understand and embrace stdin, stdout, stderr and text file processing in conjunction with command line options.

Picking up values from the environment is also quite important for many applications and provides a mechanism to supply runtime configuration (in a limited and controlled manner).

Finally dealing with signals (not very viable if you are on a Windows platform, see Windows limited signal support) and setting of exit codes when the application is complete is also critical for shell programming. If you want to write CLI applications that can work in scripts and in conjunction with other CLI applications, setting the exit code is important as it enables script to determine if the processing worked or failed (and how it failed).

Some may consider this anachronistic/backwards or retrograde (too Unix?); but for many in 'DevOps' roles 'knocking up a quick script' to accomplish small one off tasks it is essential. But note that you may find existing tooling (if you are on a Unix/Linux platform) can provide 60-80% of the functionality you need. KSH/BASH will get you quite a long way, but can get complex when used at size and as it is interpreted (much like Python); you have to test every path. Use of sed, awk, cut etc are very powerful and capable, use these tools and shell scripting (if you know them) before writing any EK9.

Depending on your background (coder first or sys admin first) you might adopt EK9 sooner rather than later. But once you get past a certain size or complexity you actually need a programming language. For very high performance you could use C, GO or Rust; for highly mathematical or very scientific functionality maybe use Python.

This is not intended to put you off using EK9, but to ensure that you make an informed choice in programming language. But EK9 is designed to be used as a replacement for KSH/BASH and Python for shell programming.

You will find EK9 provides you with a very quick, reliable, readable language to create CLI solutions. With EK9 the code is compiled and therefore many errors will be found before execution. You will also get much more reuse by using EK9. Remember computer resources are much more powerful/capable/flexible and cheaper in comparison to human resources; developing something that is good/performant enough in a shorter time with fewer defects can be preferred in many cases. Cloud flexibility in running up powerful servers for one off jobs can make the use other languages viable.

Creating the tooling to create an entire cloud infrastructure like the AWS CLI shows the power of adopting the CLI approach. The fact that the CLI approach is still in use 50 years after the creation of the concept; should tell you it is probably here to stay!

It is very likely that with any CLI application it will be necessary to read/write and process text files to locate and access these files it is important to understand paths. Unix/Linux/MacOS uses '/' for directories and Windows uses '\'; EK9 helps with the processing of these paths by providing FileSystemPath.

Worked Example

Unlike some of the other sections prior to this where each type or bit of capability was described in isolation, this section uses a worked example to show how the standard types can be used together.

This example has been coded in a single source file. This is quite a short example and none of the code will be reused in other projects; this is the most simple option (for this explanation).

The set of utility classes listed above are all demonstrated below in this worked example. The example has quite a simple purpose (and is about 400 lines long). Most of the examples up until this point have been quite short, this example is the first that shows a full application (albeit a very simple one). The basic requirement of the application is as follows:

Main Requirements

Note that blank lines or lines starting with a # should be discarded.

Additional Requirements

For runtime platforms that can fully support signals (MacOS, Unix and all Linux - but not Windows)

Please remember support for Signals varies significantly from operating system to operating system.

Sample Inputs

There are two inputs, the 'Standard Input' and the 'Named File'. Examples shown below.

Desired Output

The desired output format is shown below (note dateCustomer is not required):

The example above is fairly typical of a data migration processing application. There are always times when extracting data, cleaning it up, validating it and/or transforming it is needed. As stated before this could be done in any programming language, but below is a demonstration of how it can be accomplished with EK9.

How the solution is structured

The example below makes use of a number of constructs that are available in EK9. These are:

Note that it does not define any:

Just because EK9 has a rich set of constructs does not mean we have to use them all. Like tools in a tool box or ingredients for cooking - you won't always need to use everything. The example below is coded up in a blend of pragmatic Functional and an Object Oriented techniques.

Finally the standard types listed at the top of this page will be utilised. The example is broken down into sections and each section has an explanation that discusses the design decisions. There are any number of different ways the application could be been designed, from single monolithic program, to just functional or just object oriented (classes).

For example we could have employed an application with the program and a component to hold the OptionsHandler and command line flag and another component to hold the NamedFileProcessor - those components could then have been injected into functions. This would have reduced the parameter passing and employed IOC (inversion of control). See Components And Applications for this approach to the same example problem.

As with most problems to be solved with software; there are a range of different solutions. These different architectural solutions tend to vary in nature from business area to business area and on the experience of the team members. A solution from a finance team would be different to one from a team with telecommunications or a military software background. Teams with strong mathematics, science will produce solutions in one form and those with engineering backgrounds another. It is hoped that EK9 provides the constructs that allows various solution architectures to be employed.

The aim here is not to show the only one right way; as such a way does not exist.

Starting at the entry point into the main program:

The Main Program
#!ek9
defines module introduction
  defines program
    DataCorrelation()
      -> argv as Strings

      stdin <- Stdin()
      stdout <- Stdout()
      stderr <- Stderr()

      optionsHandler <- OptionsHandler()
      options <- optionsHandler.processCommandLine(argv)
      verboseMode <- options contains "-v"
      debugLevel <- optionsHandler.processDebugLevel(verboseMode, options.get("-d"))

      namedFileContents <- NamedFileProcessor(verboseMode, debugLevel).processNamedFile(options.get("-f"))

      if verboseMode
        stderr.println(`Process id is ${OS().pid()}`)
      if verboseMode or debugLevel > 0
        stderr.println(`Loaded ${length namedFileContents} records from named file`)

      setupSignalHandling(verboseMode, debugLevel)

      validLine <- createLineValidator(stderr, debugLevel)
      toEntry <- createStdinLineHandler(stderr, debugLevel)
      byMerging <- createCustomerRecordMerger(stderr, debugLevel, namedFileContents)

      if debugLevel > 1
        stderr.println("Ready to start processing Standard Input")

      outputHeader(verboseMode, debugLevel, stdout)
      cat stdin | filter by validLine | map toEntry | filter by validEntry | map byMerging > stdout

      if debugLevel > 1
        stderr.println("Standard Input processing complete")
...
Incoming Parameters

If a program is defined that declares a single incoming parameter of type Strings then the set of command line arguments the user entered will be automatically populated in the Strings object. In this case the name of that object is argv (C tradition). See programs on how EK9 can map command line arguments directly to typed parameters. But in this example we want full control of a range of flexible and optional arguments, and to explain the GetOpt class.

Stdin

To access the 'Standard Input' i.e the content that is 'piped' into the application just declare a variable of type Stdin. You can use Stdin a bit like an iterator with hasNext() and next(), or as a source for EK9 Stream pipelines as you can see towards the end of the program above.

Stdout

'Standard Out' can again be accessed by declaring a variable of Stdout. You can then use methods like print and println to send String values to the 'Standard Out'. In the program above it is just used as a sink at the end of a Stream pipeline.

Stderr

'Standard Error' is almost the same as 'Standard Out' but just sends content to the error channel. In the example above you can see stderr is used quite widely to provide verbose and debug information.

The Developed Classes

OptionsHandler is a class that will be shown later; it just deals with handling the options from the command line. It produces a Dictionary of the options.

NamedFileProcessor is also a class; it processes the named file and produces a Dictionary of the CustomerRecords (a record shown later).

OS - Operating System

To obtain the process id of the program to print to 'Standard Error'; the provided class OS and method pid() is used.

Standard Functions

setupSignalHandling is a standard function that registers the signal handlers. The implementation of which is shown later. The other standard function outputHeader just outputs the header (commented csv) - the implementation is also shown later.

Higher Order Functions

There are a number of higher order functions that are used to create the functions validLine, toEntry, byMerging that are used in the Stream pipeline that processes stdin; see below.

Main Processing Pipeline

The pipeline below is the main driver of processing for the program. There are alternative ways to implement this functionality - for/while loops for example. But EK9 provides the Stream pipeline construct to be able to join processing steps together in a readable, reusable and testable manner.

For some this might be too 'functional', but having used it for a while it seems to bring clarity to stages of processing that seem to get lost in lots of 'nested loops', it also means the decision logic that was nested in a loop can be pulled out and used separately (and tested in isolation).

Additionally, it is the same pattern of development everywhere (if you use it). To start with is looks a little strange (unless you are from a Unix shell background - then it looks familiar), but after a short while it becomes quite natural to start thinking in the pipeline way.

  • cat stdin | filter by validLine | map toEntry | filter by validEntry | map byMerging > stdout
Cat

The single line above encapsulates the stages of processing the incoming data in a abstract but readable way. cat is the command to catenate some type of collection or iterator. In this case stdin is used as a collection of Strings.

Filter

The first stage of the processing is to filter out any blank empty lines or lines that start with #. The high order function createLineValidator creates the validator function (and is shown later). For now we can just use this in an abstract manner (i.e. just accept for now; that it only allows valid lines through).

Map

The next stage is to accept the incoming String and map it to a DictEntry of (String, CustomerRecord). The high order function createStdinLineHandler returns a function that is an extension of the abstract function lineToCustomerRecord. This function signature accepts a String and returns a DictEntry of (String, CustomerRecord). The implementation of createStdinLineHandler is shown later, for now accept it is capable of converting a String to a dictionary entry that has a String as key and a value that is a CustomerRecord (shown later).

Filter

The previous stage will have attempted to convert a String into a valid DictEntry of (String, CustomerRecord) but there might have been something wrong with the data. The requirement was to continue processing but not output invalid data. So this next stage filters out invalid DictEntry of (String, CustomerRecord)s.

Map

The penultimate stage is to map by merging the DictEntry of (String, CustomerRecord) from stdin and those correlated entries from the named file and then produce a valid output String. To accomplish this a mapping function is made by the higher order function createCustomerRecordMerger.

Collect

The final stage is just to send the String created to stdout. If the String is not valid (i.e. not set) then Stdout will just ignore the String and it won't be output.

Discussion

So that's the main part of the program. Why was this approach taken?

As stated before there are different ways to code the requirements. Indeed, first time though only a program was used, then bigger blocks of code were pulled into functions out of the program. Then those functions were broken down creating smaller functions. So all functions and no classes at all.

But when those functions were broken down further, it became simpler to pull those functions into classes as methods. These classes are shown later and many of the methods are hidden as private. It is possible to go further and encapsulated some of the classes/data into components.

The next phase would be to pull all the output Strings into interpolated Strings and then refactor them into Text blocks. This would then enable the application to be ported to different spoken languages.

Development Process

There is an important point here; 'Just do something, start writing code'. This may (probably is) counter to what you have been taught (sit there for days designing and procrastinating - do some UML, power points etc). But, just do bits you know need doing - then worry about how it will all fit later. Don't be afraid to delete stuff, don't get emotionally attached to code (strangely don't invest too much effort too early). Then when you see complexity or potential reuse; refactor, 'pull apart', encapsulate, use abstractions. In general move 'stuff' about or delete it. Then the solution will drop into place (and you will have enjoyed it).

There is a reason EK9 has so many constructs, use them when you 'feel' this time is right (if time is never right; then don't use them!). Fluidity and freedom to relocate processing is why EK9 has evolved in the way it has.

The Functions
...
  defines function

    lineValidator() as abstract
      -> line as String
      <- rtn as Boolean

    lineToCustomerRecord() as abstract
      -> line as String
      <- rtn as DictEntry of (String, CustomerRecord)

    customerRecordMerger() as abstract
      -> entry as DictEntry of (String, CustomerRecord)
      <- rtn as String

    createLineValidator()
      ->
        stderr as Stderr
        debugLevel as Integer
      <-
        validator as lineValidator

      validator: (stderr, debugLevel) is lineValidator
        rtn: line is not empty and #<line != '#'
        if debugLevel > 1 and not rtn
          stderr.println("Discarding [" + line + "]")

    createStdinLineHandler()
      ->
        stderr as Stderr
        debugLevel as Integer
      <-
        processor as lineToCustomerRecord

      processor: (stderr, debugLevel) is lineToCustomerRecord
        rtn: DictEntry()

        splitLine <- line.split(/,/)
        if debugLevel == 3
          stderr.println("About to split stdin line [" + line + "]")

        if length of splitLine == 4
          id <- splitLine.get(0).trim()
          firstname <- splitLine.get(1).trim()
          lastname <- splitLine.get(2).trim()
          dobStr <- splitLine.get(3).trim()
          if id is not empty and firstname is not empty and lastname is not empty and dobStr is not empty
            dob <- $getAsDateTime(id, false, dobStr)
            if dob?
              entryValue <- CustomerRecord(id, firstname, lastname, String(), dob, String())
              if debugLevel == 3
                stderr.println("Line [" + line + "] processed")
              rtn: DictEntry(id, entryValue)

        if not rtn?
          stderr.println("Invalid line [" + line + "]")

    getAsDateTime()
      ->
        id as String
        throwException as Boolean
        aDateInput as String //expecting YYYYMMDD
      <-
        rtn as DateTime: DateTime()

      group <- aDateInput.group(/(\d{4})(\d{2})(\d{2})/)
      if length group == 3
        dateStr <- cat group | join with dashSeparated | collect as String
        rtn: DateTime(Date(dateStr))
      else if throwException
        throw Exception("Id [" + id + "] Invalid date [" + aDateInput + "]", 2)
      else
        Stderr().println("Id [" + id + "] Invalid date [" + aDateInput + "]")

    validEntry()
      -> entry as DictEntry of (String, CustomerRecord)
      <- rtn as Boolean: entry?

    dashSeparated()
      ->
        firstPart String
        secondPart String
      <-
        rtn as String: firstPart? and secondPart? <- firstPart + "-" + secondPart : String()

    outputFormatSeparated()
      ->
        firstPart String
        secondPart String
      <-
        rtn as String: firstPart? and secondPart? <- firstPart + ", " + secondPart : String()

    setupSignalHandling()
      ->
        verboseMode as Boolean
        debugLevel as Integer

      terminationHandler <- (verboseMode, debugLevel) of SignalHandler
        override handleSignal()
          -> signal as String
          <- rtn as Integer: 1 //Process will exit with code of one
          if verboseMode or debugLevel > 0
            Stderr().println("Handled Terminal Signal [" + signal + "]")

      terminations <- Signals().register(Strings("HUP", "ABRT"), terminationHandler)
      if terminations not contains "HUP"
        Stderr().println("HUP Signal not supported")
      if terminations not contains "ABRT"
        Stderr().println("ABRT Signal not supported")

      debugHandler <- (verboseMode, debugLevel) of SignalHandler
        override handleSignal()
          -> signal as String
          <- rtn as Integer: Integer() //Note the Integer is not set so process will not terminate
          if signal == "USR1" and debugLevel < 3
            debugLevel++
          else if signal == "USR2" and debugLevel > 0
            debugLevel--
          if verboseMode or debugLevel > 0
            Stderr().println("Handled Info Signal [" + signal + "] Debug Level now [" + $debugLevel + "]")

      debugs <- Signals().register(Strings("USR1", "USR2"), debugHandler)
      if debugs not contains "USR1"
        Stderr().println("USR1 Signal not supported")
      if debugs not contains "USR2"
        Stderr().println("USR2 Signal not supported")

    outputHeader()
      ->
        verboseMode as Boolean
        debugLevel as Integer
        stdout as Stdout

      enGB <- Locale("en_GB")
      if verboseMode or debugLevel > 2
        Stderr().println("About to create output header with locale " + $enGB)

      envVars <- EnvVars()
      user <- envVars contains "USER" <- envVars.get("USER") else envVars.get("USERNAME")
      dateTime <- enGB.mediumFormat(SystemClock().dateTime())

      stdout.println("#Created by " + user + " on " + dateTime)
      stdout.println("#id, firstName, lastName, emailAddress, dateOfBirth, dateLastPurchase")

    createCustomerRecordMerger()
      ->
        stderr as Stderr
        debugLevel as Integer
        namedFileContents as Dict of (String, CustomerRecord)
      <-
        merge as customerRecordMerger

      merge: (stderr, debugLevel, namedFileContents) is customerRecordMerger
        rtn: String() //Output will be unset by default and there for not ouput
        id <- entry.getKey()
        if debugLevel == 3
          stderr.println("Merging id [" + id + "]")
        if namedFileContents contains id
          stdinEntry <- entry.getValue()
          namedFileEntry <- namedFileContents.get(id).get()
          stdinEntry :~: namedFileEntry
          rtn := $stdinEntry
        else
          stderr.println("Not merging [" + id + "] as named file does not contain id")
...
Abstract Functions

Well that's quite a few functions, the first three are just function signatures to be used by dynamic functions. The createLineValidator is the first of the higher order functions that just creates and returns a dynamic function validator which is of type lineValidator. As you can see it just checks if the line is empty or start with #.

createStdinLineHandler is really the main high order function that creates the function that parses the incoming line from stdin, it validates the content and creates a CustomerRecord that is added into a DictEntry (which can in turn be further processed). Note that if processing fails an un set DictEntry is returned not a note of the line that failed is output to stderr.

EnvVars

The function outputHeader deals with outputting the header comment and uses a class as shown below. Environment variables are not standard per platform and can be altered and manipulated before your program runs (this can be both good and bad). Note that EnvVars looks an behaves much like a dictionary in many ways (indeed you can call keys() to get the names of all the entries).

  • envVars EnvVars()
  • user envVars contains "USER" envVars.get("USER") else envVars.get("USERNAME")

Most of the rest of the functions are fairly obvious, but setupSignalHandling and createCustomerRecordMerger need more of an explanation.

setupSignalHandling

This function sets up two SignalHandlers the first is one for terminating the application when signals HUP/ABRT are received as that handler returns and Integer value of 1 it will cause the whole program to exit with a code of 1. Remember signal handling is a function of the platform you are running on (Windows has limited support for this).

Signal Handler

The second SignalHandler is for non terminating signals of USR1/USR2 as you can see they increment and decrement the debugLevel.

Now the observant will be asking how can altering the debugLevel like this actually affect the whole application. If you think of debugLevel as a primitive type this approach just won't work. But EK9 only has Objects, so from the main program line:

  • debugLevel optionsHandler.processDebugLevel(...)
  •  
  • //And is passed in by reference to the function below
  • setupSignalHandling(verboseMode, debugLevel)
  •  
  • //The debugLevel is further 'captured' by the dynamic class implementation of SignalHandler
  • debugHandler DebugHandler(verboseMode, debugLevel) of SignalHandler

Because debugLevel is an Object it is passed by reference everywhere, this means that the debugHandler method handleSignal() and has 'captured' debugLevel it means that all references actually point to the same memory location. Hence debugLevel++ and debugLevel-- will operator on the same value (this is very unlike an int primitive).

createCustomerRecordMerger

This is the second significant high order function. It serves the purpose of creating the merge function that looks up and correlating ids, merges the two CustomerRecords; one from 'Standard Input' and the other from the 'Named File', both of which are partial. The merge is done via the operator :~:, the conversion to a String is done by the $ operator - both on the CustomerRecord record (shown later).

  • //The Merge
  • stdinEntry :~: namedFileEntry
  •  
  • //The Conversion to a String
  • rtn := $stdinEntry

The namedFileContents object (which is the dictionary of the 'named file' contents) is also 'captured' from the main program down through the high order function createCustomerRecordMerger and into the dynamic function merge as customerRecordMerger.

Clearly if you had several applications with a need for dealing with comma separated or colon separated data, you would pull the appropriate abstract and concrete functions into a separate utility module so they would be reused.

The same could be said for the signal handing, ideally all your application would use the same signal processing mechanism and so could reused the setupSignalHandling function.

By creating a range of reusable and small functions you can build an internal library of small reliable and reusable software 'chunks'.

The Classes

There are only two classes used in this implementation and below is the implementation of the OptionsHandler. This class has the responsibility of processing the command line options that the user entered when running the application.

...
  defines class

    OptionsHandler

      processCommandLine()
        -> argv as Strings
        <- options as Dict of (String, String): Dict()

        arguments <- cat argv | collect as List of String
        getopts <- setupGetOpt(":")
        options := getopts.options(arguments)

      private setupGetOpt()
        -> rqParam as String
        <- getopts as GetOpt of String: GetOpt(rqParam)

        supportedOptions <- setupSupportedOptions(rqParam)
        usage <- setupUsage()
        getopts.pattern(supportedOptions).usage(usage)

      private setupSupportedOptions()
        -> rqParam as String
        <- rtn as Dict of (String, String): Dict()

        supportedOptions <- {
          "-v": String(),
          "-f": rqParam,
          "-d": rqParam
          }
        rtn: supportedOptions

      private setupUsage()
        <- rtn as String: "Invalid option, only those list below are supported:\n"

        rtn += "-v, verbose\n"
        rtn += "-f filename, use file of filename (mandatory option)\n"
        rtn += "-d level, use of debugging"

      processDebugLevel()
        ->
          verboseMode as Boolean
          level as Optional of String
        <-
          rtn as Integer: 0 //default to zero

        debugLevel <- cat level | collect as Integer

        rtn := debugLevel >= 0 and debugLevel <= 3 <- debugLevel else 0

        if verboseMode and debugLevel < 0 or debugLevel > 3
          Stderr().println("Debug level " + $debugLevel + " not supported - reverting to debug level '0'")
...
GetOpt

The processCommandLine method on the class OptionsHandler converts the incoming Strings to a List of Strings and the private methods then setup and configure the GetOpt generic class. Each of the private methods perform a specific task; configuring the supported options and configuring the help text options.

The final public method; processDebugLevel is called from the main program to access the debug level option the user entered on the command line (if it was entered). The ternary operator tests the result of cat level | collect as Integer as it may not even be set or it could be less than zero or greater than 3 and ensure it is between 0 and 3.

TextFile

The second class NamedFileProcessor it has the single responsibility of loading the data from the 'named file' into a Dict of (String, CustomerRecord). But it is much stricter about errors.

...
    NamedFileProcessor
      verboseMode as Boolean: false
      debugLevel as Integer: Integer()

      NamedFileProcessor()
        ->
          verboseMode as Boolean
          debugLevel as Integer
        this.verboseMode = verboseMode
        this.debugLevel = debugLevel

      processNamedFile()
        -> filename as Optional of String
        <- rtn as Dict of (String, CustomerRecord): Dict()

        stderr <- Stderr()
        if not filename?
          throw Exception("Filename of 'named file' is required", 2)

        namedFile <- getNamedFile(filename.get())
        validLine <- createLineValidator(stderr, debugLevel)
        toEntry <- createNamedFileLineHandler(stderr, debugLevel)

        if verboseMode or debugLevel > 0
          stderr.println("About to start processing [" + $namedFile + "]")

        cat namedFile | filter by validLine | map toEntry > rtn

        if debugLevel > 1
          stderr.println("Processing [" + $namedFile + "] complete")

      private getNamedFile()
        -> filename as String
        <- rtn as TextFile: TextFile()

        filePath <- FileSystemPath(filename)
        rtn := filePath.isAbsolute() <- TextFile(filePath) else TextFile(FileSystem().cwd() + filePath)
        if not rtn.isReadable()
          throw Exception($rtn + " is not readable", 2)
        if not rtn.isFile()
          throw Exception($rtn + " is not a file", 2)

      private createNamedFileLineHandler()
        ->
          stderr as Stderr
          debugLevel as Integer
        <-
          processor as lineToCustomerRecord

        processor: (stderr, debugLevel) is lineToCustomerRecord
          //The regular expression /:/ is significant should it also be a constant?
          splitLine <- line.split(/:/)
          if debugLevel == 3
            stderr.println("About to split line [" + line + "]")

          if length of splitLine != 4
            throw Exception("Invalid line [" + line + "]", 2)
          id <- splitLine.get(0).trim()
          if id is empty
            throw Exception("Invalid line [" + line + "] empty ID", 2)

          emailAddress <- splitLine.get(1).trim()
          //A significant regular expression buried deep in processing! maybe use a constant.
          if emailAddress not matches /[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$/
            throw Exception("Id [" + id + "] Invalid email address [" + emailAddress + "]", 2)

          dateLastPurchase <- $getAsDateTime(id, true, splitLine.get(3).trim())

          entryValue <- CustomerRecord(id, String(), String(), emailAddress, String(), dateLastPurchase)
          if debugLevel == 3
            stderr.println("Named file line [" + line + "] processed")
          rtn: DictEntry(id, entryValue)
...

Firstly in the public method processNamedFile() a check is made to ensure a file name has been provided. An Exception is thrown with exit code 2 if no file name was provided. The next step is to get a TextFile using method getNamedFile(). Then a couple of pipeline functions are created and the main pipeline processing is started.

private getNamedFile()

This private method firstly creates a FileSystemPath from the String filename. This is followed by a ternary operation to see if the filename is a relative or absolute path to a file. There is then a check on whether the file is readable and actually is a file. Finally the TextFile is returned.

Current Working Directory

When processing the path it is important to get the users current working directory, this is done by using the FileSystem class.

createLineValidator

This high order function has already been covered.

createNamedFileLineHandler

This is the high order function/method that creates the function that accepts each line and splits the line by the : separator, it then does some checks on the data and throws Exceptions if any of the data is invalid. It uses the standard function called getAsDateTime to process the date.

Processing Pipeline

As you can see from the processing pipeline below its structure and pattern is almost the same as the main processing pipeline in the main program. In fact it uses the same standard high order function createLineValidator to create the function to validate lines. The main difference in the final part of the processing is just to collect the DictEntry to return variable rtn.

  • cat namedFile | filter by validLine | map toEntry > rtn

By employing functions rather than just classes it is possible to create much more reuse as shown with the createLineValidator and the abstract function signatures. This also reduces the need for excessive class hierarchies and inheritance. Basically the smaller and tighter the 'chunk' of code and with fewest side effects the more reusable and reliable the code.

As you can see the same pattern of processing can be applied over and over again and the pipeline is simpler to understand and read than using nested loops. The final part of the overall program is the data item that is the main subject of how the pipeline merging and output actually works - the record.

The Record

The CustomerRecord is mainly used as a data object, but does have a couple of key operators for processing the data it holds.

...
  defines record

    CustomerRecord
      id as String: String()
      firstName as String: String()
      lastName as String: String()
      email as String: String()
      dob as String: String()
      lastPurchase as String: String()

      operator ?
        <- rtn as Boolean: id?

      operator :~:
        -> arg as CustomerRecord
        <- rtn as CustomerRecord: this
        id :=? String(arg.id)
        firstName :=? String(arg.firstName)
        lastName :=? String(arg.lastName)
        email :=? String(arg.email)
        dob :=? String(arg.dob)
        lastPurchase :=? String(arg.lastPurchase)

      operator $
        <- rtn as String: cat [id, firstName, lastName, email, dob, lastPurchase]
          | join with outputFormatSeparated
          | collect as String
        
//EOF

The data this record holds is pretty obvious as it is the output data that must be converted to a single comma separated String.

operator ?

The is set operator is used to check if the record is valid for filtering and processing.

operator :~:

The merge operator is used to 'merge' the stdin partial record and the named file partial record into one fully populated record. The assignment coalescing operator is used in the merge. This only assigns the property/field if it is currently un set. i.e it coalesces the field and the argument field.

operator $

The string conversion operator takes each of the properties/fields on the record and places then in a List of Strings this is then used in another pipeline to join the fields with a 'comma' and these are collected into a rtn String. This could be written long hand like this.

  • fields [id, firstName, lastName, email, dob, lastPurchase]
  • cat fields | join with outputFormatSeparated > rtn

This final pipeline could have been accomplished by just using the + operator on each of the fields as shown below.

  • rtn := id + ", " + firstName + ", " + lastName + ", " + email + ", " + dob + ", " + lastPurchase

The alternative would be to use String interpolation.

  • rtn := `${id}, ${firstName}, ${lastName}, ${email}, ${dob}, ${lastPurchase}`

While the code above is slightly shorter - it is a bit more intricate. If we needed to add more fields or change the order of the fields it much easier and more reliable to just use the List approach. The other advantage is that if the output format needs to be altered it has been encapsulated within a single function outputFormatSeparated.

Summary

This has been quite a long section, but it has shown a more concrete example of how EK9 can be used. There is little doubt the functionality could have been written in a single program without any other constructs. The code would (and was) much shorter, but decision logic was mixed in with format logic. There was also quite a bit of repetition and duplication.

You might argue the above example code is over engineered, depending on your background and experience you may or may not be right. When developing a solution to the problem set out above; you have to find the right balance of engineering to bloat/complexity or just plain 'dirty' code.

Those from a functional background would probably argue it is under engineered and should have used much more pure immutability and have missed the opportunity to make it more robust.

Justification of Approach

So as to provide some justification for the approach above it should be obvious that the following are all encapsulated:

So if there were defects or a need to improve/augment the software it would be really obvious where to look in the code. Moreover by breaking a monolithic program into functions and classes it is now possible to make those pure and also refactor them out to separate modules for reuse.

It should also be clear that what can be re-used has been re-used

Again if new fields were required or a variation on date formats needed to be accepted it would be obvious where to look.

Common Design patterns are used and re-used

You may or may not like the Stream pipeline approach, but if you needed to alter the processing it would again be simple to look at where and how to alter that processing - rather than looking in deep nested loops.

Sample of the Looping approach

As a contrast; the 'named file processing' could have been implemented directly in the program in the following way. For some developers this may feel more natural (especially if you are from a C or Python background). Personally I find it distracting to have all that low level utility code high up in the main part of my program.

#!ek9
defines module introduction
  defines program
    DataCorrelation()
      -> argv as Strings

      stdin <- Stdin()
      stdout <- Stdout()
      stderr <- Stderr()

      optionsHandler <- OptionsHandler()
      options <- optionsHandler.processCommandLine(argv)
      verboseMode <- options contains "-v"
      debugLevel <- optionsHandler.processDebugLevel(verboseMode, options.get("-d"))
        
      //So lets comment out the use of the class and method for processing  
      //namedFileContents <- NamedFileProcessor(verboseMode, debugLevel).processNamedFile(options.get("-f"))
      
      //Now inline all the code that is needed      
      
      filename <- options.get("-f")
      if not filename?
        throw Exception("Filename of 'named file' is required", 2)
      filePath <- FileSystemPath(filename.get())
      namedFile <- filePath.isAbsolute() <- TextFile(filePath) else TextFile(FileSystem().cwd() + filePath)
      if not namedFile.isReadable()
        throw Exception($namedFile + " is not readable", 2)
      if not namedFile.isFile()
        throw Exception($namedFile + " is not a file", 2)
      
      if verboseMode or debugLevel > 0
        stderr.println("About to start processing [" + $namedFile + "]")
      
      namedFileContents as Dict of (String, CustomerRecord): Dict()  
      try
        -> input <- namedFile.input()
        while input?
          line <- input.next()
          if line is not empty and #<line != '#'
            splitLine <- line.split(/:/)
            if debugLevel == 3
              stderr.println("About to split line [" + line + "]")
  
            if length of splitLine != 4
              throw Exception("Invalid line [" + line + "]", 2)
            id <- splitLine.get(0).trim()
            if id is empty
              throw Exception("Invalid line [" + line + "] empty ID", 2)
  
            emailAddress <- splitLine.get(1).trim()
            if emailAddress not matches /[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$/
              throw Exception("Id [" + id + "] Invalid email address [" + emailAddress + "]", 2)
  
            dateLastPurchase <- $getAsDateTime(id, true, splitLine.get(3).trim())
  
            entryValue <- CustomerRecord(id, String(), String(), emailAddress, String(), dateLastPurchase)
            if debugLevel == 3
              stderr.println("Named file line [" + line + "] processed")
              
            namedFileContents += DictEntry(id, entryValue)            
            
          else if debugLevel > 1
            stderr.println("Discarding [" + line + "]") 
      
      //End of inlining the code
      
      if verboseMode
        stderr.println("Process id is [" + $OS().pid() + "]")
      if verboseMode or debugLevel > 0
        stderr.println("Loaded " + $ length namedFileContents + " records from named file")

      setupSignalHandling(verboseMode, debugLevel)

      validLine <- createLineValidator(stderr, debugLevel)
      toEntry <- createStdinLineHandler(stderr, debugLevel)
      byMerging <- createCustomerRecordMerger(stderr, debugLevel, namedFileContents)

      if debugLevel > 1
        stderr.println("Ready to start processing Standard Input")

      outputHeader(verboseMode, debugLevel, stdout)
      cat stdin | filter by validLine | map toEntry | filter by validEntry | map byMerging > stdout

      if debugLevel > 1
        stderr.println("Standard Input processing complete")       
...

Now I feel then need to explain the code with comments in the code, whereas before the breaking down into classes and functions gave me the opportunity to create a meaningful name to encapsulate that bit of functionality.

As an aside, this above does demonstrate the alternative loop and try with resource approach to handling resources like the TextFile as shown below.

  • try
  •   → input namedFile.input()
  •   while input?
  •     line input.next()

By creating the 'input' within the incoming try parameter by using 'namedFile.input()', the try block will automatically close the input once processing is finished (much like Java does). The input is then used like an iterator to get the next line from the input. For many this will be a very familiar concept (though the syntax is different).

Interestingly (having written the same functionality in two different ways) you can see patterns of how to convert the above 'procedural code' into a more 'functional pipeline' (should you want to). Note the check for the line content (shown below).

  • if line is not empty and #< line != '#'

If there is no meaningful 'else' (other than the output to stderr) then it can be pulled into a filter function! You can then see the next bit of the code above just really takes the incoming String and processes and validates its parts before making a partial DictEntry of (String, CustomerRecord). Look like it maps some type of content into another type - so a map function.

An additional class is shown below (not used in the example above) but useful in a range of applications.

Mutex Lock and Key

Really only to be used as a matter of necessity where the application has multiple concurrent Threads running and read/write data must be shared between the threads. Typically this is the case where async is used in pipelines, TCP, UDP or other HTTP server type constructs. Here be Dragons - as they say.

Below is a simple example of the syntax and design pattern to be used with MutexLocks.

#!ek9
defines module introduction
  defines function

    createProtectedName()
      <- lock as MutexLock of String: MutexLock("Steve")

  defines program

    LockExample()

      stdout <- Stdout()
      lockableItem <- createProtectedName()

      accessKey <- (stdout, lockableItem) with trait of MutexKey as class
        override access()
          name <- lockableItem.get()
          stdout.println("Accessing [" + $name + "]")
          name :=: "Stephen"
          assert lockableItem.owner()

      //Now try access via key and wait on mutex
      lockableItem.enter(accessKey)

      //Don't think you own the lock here - only owned in access() above
      assert not lockableItem.owner()

      //As there are no other threads access will be granted here
      //If other held the lock this would return false and key.access() would not be called.
      assert lockableItem.tryEnter(accessKey)

      //Attempt access without key - should get exception.
      try
        stdout.println("Accessing [" + lockableItem.get() + "]")
      catch
        -> ex as Exception
        stdout.println("Exception [" + $ex + "]")
//EOF
        

While on the surface the code above looks fairly straightforward, multi-threaded access to a single data structure is fraught with difficulties and race conditions and even deadlock. Avoid if at all possible. But sometimes you cannot avoid it and this is what the Mutex lock/key are for.

I'll say it one last time - rack your brains for a solution that does not involve concurrent access to data structures before going down that solution path.

What the EK9 MutexLock gives you has the following characteristics/conditions:

You maybe wondering why create such a big object/process around access to data; why not just use synchronized or something like that? Multi-threaded access to shared data is a big deal. So just like doing remote calls to other systems via TCP/UDP/HTTP don't try and hide any of the nasty details; get them out in the open. These are not normal method calls, the are costly/expensive and risky calls that need 'focus'.

What the MutexLock does not do is stop a developer taking a reference to the protected item and passing it around to be modified outside of a lock. If and when you build complex data structures like trees of lists you have to be meticulous in copying data and not just holding references. If you hold references and some other part of the application also has a reference to the same data, then that underlying data can be altered without the locks being obtained. This requires extreme discipline. You must give that data to the MutexLock like in the example above with "Steve" and "Stephen", the references to those values are lost to everything but the MutexLock.

If you have to have more than one data structure protected by a MutexLock you've just made your life very hard indeed. It is highly likely that over a prolonged period of development with long lived code you will obtain locks in different orders; Deadlock will occur from time to time in a seeming random manner.

Avoid MutexLocks of data if at all possible, if you have to use MutexLocks; employ pure with rigour. Your mind set has to see modification of a MutexLocked variable as a very rare and very significant event!

When estimating work, multiply your estimate by 2 if you have to do any multi-threaded work. If the data structures are shared now multiply your estimate by 5. If there a multiple data structures being shared multiply by 10 and expect live operational issues. However hard you think the development will be - it will turn out to be much harder to do correctly with multiple threads!

If you never use this class consider that a major achievement! You have saved yourself a whole world of hate.

Conclusion

It is hoped that this longer example that does have some meaningful functionality and shows different approaches to development (in general and with EK9 specifically) will provide you with some concrete snips of code. It shows different techniques that you may not have employed before and demonstrates the the EK9 language can be used for CLI development.

Next Steps

If you are interested in networking with EK9 then the next section on network types should be of interest.