Friday, October 16, 2015

Constructing and compiling C# using Roslyn

The optimal path for my current project requires the ability to create and use new classes at runtime. The alternatives all look rather clumsy while this approach looked elegant.


.NET has had some capacity to compile arbitrary code since the beginning but a lot of effort has recently gone in to a new compiler – known as Roslyn. On the face of it there are two ways towards my goal – use the CodeDom which is language agnostic or try out the newer (Microsoft) open source Roslyn technology that is designed to compile C# and VB.NET.


CodeDom and Roslyn are only partly related and after some research it looked like Roslyn was the way ahead as it provides a feature complete compiler for my language of choice C#. The other major plus is that Roslyn runs “in process” while CodeDom runs compiles through another executable with (perhaps) a rather clunky API.


If you fancy going the well tried CodeDom way then MSDN has a good code example towards the bottom of this article.


To add the Roslyn tooling to a project use the NuGet Package Manager option on the Tools menu and Choose “Manage NuGet Packages for Solution…” Select nuget.org as the package source and then search for “Microsoft.CodeAnalysis”. Click on Microsoft.CodeAnalysis.CSharp , click the “Install” button and when offered, accept the license.


The actual development task turned out to be pretty steady.


I started with a class to define the properties of the class to be constructed


internal class ClassProperty {     public string PropertyName { get; set; } = "";     public string DataType { get; set; } = "";     public ClassProperty() { }     public ClassProperty(string propertyName, string dataType)     {         PropertyName = propertyName;         DataType = dataType;     } }
And supplied a populated List<ClassProperty> to a new class that I was going to use to construct and compile the code. In fact I created a little method to loop through the fields of a database table and populate the list (as that was the direction I was going in and I am lazy) but a List can be quickly populated with just a few lines of code.


The new class also had properties exposed for a namespace, class name, using directives, inheritance and interfaces. (The latter two are part implemented in the following code in that they are added to the class declaration but any related methods remain unimplemented at this stage.)


I backed up the property creation by creating a Dictionary of default values and you can see how it is applied after this.


private Dictionary<string, string> defaultValues = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase) {     ["string"] = " = String.Empty;",     ["Int64"] = " = 0;",     ["Int32"] = " = 0;",     ["Int16"] = " = 0;",     ["DateTime"] = " = new DateTime(1970,1,1);",     ["double"] = " = 0;",     ["single"] = " = 0;",     ["Decimal"] = " = 0;",     ["bool"] = " = false;",     ["Boolean"] = " = false;",     ["int"] = " = 0;",     ["long"] = " = 0;" };

The method to create the code is pretty short.
private string buildClassCode() {     StringBuilder sb = new StringBuilder();     foreach(string uses in usings)     {         sb.AppendLine("using " + uses + ";");     }     sb.AppendLine("namespace " + nameSpace);     sb.AppendLine("{"); // start namespace     string classInherits = (inherits.Length > 0 || interfaces.Count > 0) ? " : " + inherits : "";     foreach(string inface in interfaces)     {         classInherits += (classInherits.Length > 3) ? ", " + inface : inface;     }     sb.AppendLine($"public class {tableName}{classInherits}" );     sb.AppendLine("{"); // start class     sb.AppendLine($"public {tableName}()"); // default constructor     sb.AppendLine("{}");     foreach (ClassProperty newProperty in classProperties)     {         sb.AppendLine($"public {newProperty.DataType} {newProperty.PropertyName} {"{ get; set;}"}{defaultValues[newProperty.DataType]}");     }     sb.AppendLine("}"); // end class     sb.AppendLine("}"); // end namespace     return sb.ToString(); }
I tested this by popping the output into a multi-line textbox. while the code lacks indentation it is perfectly clear as the image below shows



Now to get that code compiled and run an initial test using VS Debug mode to step through the code and watch the results.


private void buildinstance() {     string instance = buildClassCode();     SyntaxTree syntaxTree = CSharpSyntaxTree.ParseText(instance);     // inspect the tree     //SyntaxNode root = syntaxTree.GetRoot();     //foreach (var node in root.DescendantNodes())     //{     //    var x = node.GetText();     //}     string assemblyName = tableName; //Path.GetRandomFileName();     MetadataReference[] references = new MetadataReference[]     {         MetadataReference.CreateFromFile(typeof(object).Assembly.Location),         MetadataReference.CreateFromFile(typeof(Enumerable).Assembly.Location)     };     CSharpCompilation compilation = CSharpCompilation.Create(         assemblyName,         syntaxTrees: new[] { syntaxTree },         references: references,         options: new CSharpCompilationOptions(OutputKind.DynamicallyLinkedLibrary)); // you can also build as a console app, windows.exe etc     Assembly assembly = null;     var ms = new MemoryStream();     EmitResult eResult = compilation.Emit(ms);     if (eResult.Success)     {                  ms.Seek(0, SeekOrigin.Begin);         assembly = Assembly.Load(ms.ToArray());         Type type = assembly.GetType(nameSpace + "." + tableName);         object newObject = Activator.CreateInstance(type);         // now we can prove that worked by stepping through the code while iterating over the properties         foreach (PropertyInfo propertyInfo in newObject.GetType().GetProperties())         {             string pName = propertyInfo.Name;             string dataType = propertyInfo.PropertyType.Name;             // and test we can assign a value             if (dataType.Equals("String", StringComparison.OrdinalIgnoreCase))             {                 propertyInfo.SetValue(newObject, "foo", null);             }         }     }     else     {         IEnumerable<Diagnostic> failures = eResult.Diagnostics.Where(diagnostic =>             diagnostic.IsWarningAsError || diagnostic.Severity == DiagnosticSeverity.Error);         string msg = "";         foreach (Diagnostic diagnostic in failures)         {             msg+= $"{diagnostic.Id}: {diagnostic.GetMessage()}" + "\r\n";         }         // do something useful with the message     }     ms.Close();     ms.Dispose(); }
Ran first time, although I went back and added the diagnostics you can see just to step throght things as they happened.

The SyntaxTree built a structure for the code elements from the string. The CSharpCompilation object took that syntax tree and compiled the code as C#. The compiled assembly was saved into a memory stream and the compilation results checked for errors. Assuming no errors, the code was loaded into an Assembly object and a new instance of the compiled object (class in this instance) created and inspected.


Next I checked that I could execute a method on the compiled class. I added the following line to the class code creation method:


sb.AppendLine("public string getBar(string foo) {return foo + \" bar\";}");

and a code line to call the method after the object is created in the buildinstance() method tested above


string res = (string)type.InvokeMember("getBar", BindingFlags.Default | BindingFlags.InvokeMethod,     null, newObject, new object[] { "foo" });

which returned the string “foo bar” to the variable res.

Job done.

Addendum


Thought I would check that the default class constructor was executed when the compiled object instance was created. Thought it must be but…


Adjusted the class constructor to set one of the properties and then read that back when looping over the properties. Worked as expected.


The Activator.CreateInstance() method can be called with a parameter list to actuate alternate constructors with their parameters thus:

public static object CreateInstance(Type type, params object[] args)


which probably covers that issue as an Interface can’t contain a constructor but can contain any other methods I want to expose. Clearly, there is a wee bit more code to write to get a practical implementation but it is also clear that the overall approach is very practical.



And Then The Real World


In the real world your compiled class will probably reference one or more System/.NET dll, a custom library or your new class may inherit from another class already defined in the main program assembly.


You can add such references but you need to supply the path to the relevant dll or exe file. Apparently this changed from early Roslyn releases so (like me) you might find many obsolete methods if you Google around the subject. You also need to include the relevant "using" statements in the code itself of course (see above).


In my first stab at this the CSharpCompilation constructor was supplied with a default array of MetadataReference objects. This array can be extended to include whatever is required.

The default location for the main System dlls can be found with this line of code:


var assemblyPath = Path.GetDirectoryName(typeof(object).Assembly.Location);


You can then construct a path to (say) System.dll and Syatem.Data.dll like so:


string sysTm = Path.Combine(assemblyPath, "System.dll"); string sysData = Path.Combine(assemblyPath, "System.Data.dll");

You can similarly construct the path to the classes in your own program as well (assuming they are public):


string mePath = Path.Combine(Application.StartupPath, "MyProg.exe");



The paths can then be used to add the references like so:


MetadataReference[] references = new MetadataReference[] {     MetadataReference.CreateFromFile(typeof(object).Assembly.Location),     MetadataReference.CreateFromFile(typeof(Enumerable).Assembly.Location),     MetadataReference.CreateFromFile(sysTm),     MetadataReference.CreateFromFile(sysData),     MetadataReference.CreateFromFile(mePath) };

And that is how to add those vital references to your Roslyn compiled assembly.

Tuesday, October 13, 2015

Interfaces and Unknown Classes

I wanted to make use of the ExcelReader class I created as a source of data to create class instances.


Generally the classes in question inherit from a base class that also knows how to save itself to a database. I could, of course maintain a list of such classes within my Excel reading code and call an appropriate routine as required. However I had in mind to create something that was generic enough to handle a whole raft of classes – including some created and compiled “on the fly” and thus unknowable to the source code.


The normal method for dealing with a range of classes through a common function would be to specify an interface common to all and then to interact through that interface.



Interfaces (if this is an unfamiliar term)


An interface is a list of public properties and methods (also indexers and events) that should must be exposed by a class that implements the interface. Interfaces themselves can be inherited but contain no actual code – just the signatures (if you like) of the components being specified.


One of the simplest ways of creating an interface is to (VS2015) right click on the class name in the code window and select “Extract Interface” from the “Quick Actions” context menu. Earlier versions of Visual Studio offer a “Refactor” menu option that includes extracting an interface.


Requiring a specific Interface is an excellent way of ensuring that objects calling the functionality of a class you provide supply full support. You can rely upon a method or event handler being available – even if it just a “stub”. IComparable would be a good example of just such an interface.


I have used interfaces in the past for testing alternate algorithms. An interface allowed me to construct a standardised software socket into which I could plug different classes at will. I could thus test performance and function of alternate approaches without having to change the code “consuming” the service/process.


If I come across a great in depth web page on Interfaces I will post a link here but for a couple of helpful examples you can do a lot worse than this C# Station post 


Just about the only thing my target classes have in common is an Update method. This limits the scope of any Interface a bit. The properties could be addressed through reflection and it was a reasonable bet that a class Clone() method could be used to create new class instances. So that was beginning to sound like a workable strategy.



So my interface ended up being defined like this:


interface ObjectClone {      object CloneAsObject();      void UpdateClass(); }

Each class implementing the interface needs then to add a CloneAsObject method alongside an UpdateClass method. The CloneAsObject() method is very simple:


public object CloneAsObject() {     return (MyClassName)this.MemberwiseClone(); }

There is an assumption here that all such classes expose public properties for all of the key data elements.

I defined an ExcelDataLoader class that will use an instance of the ExcelReaderclass to gather data from selected columns in a spreadsheet and use this data to create instances of a class that implements that very simple ObjectClone Interface.



The ExcelDataLoader class needs a local definition and a public property:


private ObjectClone targetClass = null; public ObjectClone TargetClass {     set { targetClass = value; } }
To pass in a new instance of the required class.


Plus I also needed a way of passing the required instructions to the ExcelDataLoader. A simple class to define the required columns and the associated data name and type seemed the thing and that could be as simple as:


public class ExcelColumnData : IComparable {     public string ColumnName { get; set; } = "";     public string DataName { get; set; } = "";     public ExcelColumnData()     {     }     public ExcelColumnData(string ColumnName, string DataName)     {         this.ColumnName = ColumnName;         this.DataName = DataName;     }     public int CompareTo(object objx)     {         ExcelColumnData x = (ExcelColumnData)objx;         int res = this.ColumnName.Length.CompareTo(x.ColumnName.Length);         return (res == 0) ? this.ColumnName.CompareTo(x.ColumnName) : res; // AA after Z     } }

Although you will note that this class implements an interface – Icomparable so that List<T>Sort() can be used to ensure the columns are presented in order.



These column definitions could then be passed as a List<ExcelColumndData> to the ExcelDataLoader class.


The ExcelDataLoader class loadData() method uses an instance of the ExcelReader class to extract some columns of data from a specified sheet and then creates an instance of the class to be created. It then uses reflection to loop through the class properties and if one of these matches a specified data name associated with an Excel data column then the expected data type is checked and an attempt made to cast (or parse) the data extracted from Excel to the required type. If the parse works then the property is set to the excel value.


private int loadData() {     int recCount = 0;     excelReader = new ExcelReader(excelFilepath, sheetName);     List<string> colNames = new List<string>();     if(excelColumns.Count > 1) { excelColumns.Sort(); }     foreach (ExcelColumnData eColData in excelColumns)     {         colNames.Add(eColData.ColumnName);     }     List<List<string>> selCols = excelReader.GetSelectedColumns(colNames, skipFirstRow);     foreach(List<string> excelRow in selCols)     {         ObjectClone newObject = (ObjectClone)targetClass.CloneAsObject();         foreach (PropertyInfo propertyInfo in newObject.GetType().GetProperties())         {             if (propertyInfo.CanWrite)             {                 string propertyName = propertyInfo.Name;                 string dataType = propertyInfo.PropertyType.Name;                 for(var eci=0; eci< excelColumns.Count; eci++)                 {                     if (excelColumns[eci].DataName.Equals(propertyName, StringComparison.OrdinalIgnoreCase))                     {                         // the strategy is to set the Property value if the incoming data is valid for the Property type                         // otherwise the value is left as the class default setting                         if (dataType.Equals("String", StringComparison.OrdinalIgnoreCase))                         {                             propertyInfo.SetValue(newObject, excelRow[eci], null);                         } else if (dataType.Equals("Int64", StringComparison.OrdinalIgnoreCase))                         {                             long intVal = 0;                             if (long.TryParse(excelRow[eci], out intVal))                             {                                 propertyInfo.SetValue(newObject, intVal, null);                             }                         } else if (dataType.Equals("Int32", StringComparison.OrdinalIgnoreCase))                         {                             int intVal = 0;                             if (int.TryParse(excelRow[eci], out intVal))                             {                                 propertyInfo.SetValue(newObject, intVal, null);                             }                         } else if (dataType.Equals("Int16", StringComparison.OrdinalIgnoreCase))                         {                             Int16 intVal = 0;                             if (Int16.TryParse(excelRow[eci], out intVal))                             {                                 propertyInfo.SetValue(newObject, intVal, null);                             }                         } ... etc                          '''                          ...                 }             }         }         newObject.UpdateClass();         recCount++;     }     return recCount; }

And you can pump data from Excel (or anywhere else) to create classes that are anonymous as far as the process is concerned and (in this instance) call a method to retain the data in a database for future use.

Usual caveats as this is prototype (although working) code and my production version will inevitably grow "horns" over time as I deal with edge cases and extra features.

Friday, October 09, 2015

Reading Excel with the Open XML SDK


A while back I posted Look Mum no Office on the subject of the Microsoft Open XML SDK that I had used for writing an Excel file without the need for a copy of Office on the machine running the code. This proved to be very straightforward and there was some excellent samplecode available 


Yesterday I needed to read data from an Excel workbook and decided to write a class that I could use in multiple scenarios to grab data from this source. Using the Open XML SDK looked the way to go as this should work with all Excel files produced by versions after Office 2003.


First use the Package Manager to grab the support libraries


PM> Install-Package DocumentFormat.OpenXml


And then add WindowsBase.dll to the project references (it is in the Framework Assembly list so easy to spot using the “Add Reference” dialog.



My first Excel reading class function was to read the list of Sheet Names in a workbook.

private List<string> getSheetNames() {     List<string> sheetNames = new List<string>();     try     {         var document = SpreadsheetDocument.Open(excelFilePath, false);         var workbookPart = document.WorkbookPart;         var sheets = workbookPart.Workbook.Descendants<Sheet>();         foreach(Sheet sheet in sheets)         {             sheetNames.Add(sheet.Name);         }         document.Close();     }     catch (Exception ex)     {         ...     }     return sheetNames; }

This was a good indicator that this process was going to be pretty steady although there were a couple of twists coming up (with all making sense when thought about).


I wanted my class to have two further main features. One to take a sample of rows from a specified sheet and another to extract defined columns from a specified sheet. As both cover similar ground, let’s look at the development of the sample row extract.



The process is 1. Identify the worksheet, 2. Grab the sheet, 3. Extract the sheet data and then 4 Loop through the Rows in the sheet data. Within each row we can loop through the cells and read the string content.


private List<List<string>> initialGetSampleRows() {     List<List<string>> samples = new List<List<string>>();     try     {         int selCount = 0;         var document = SpreadsheetDocument.Open(excelFilePath, false);         var workbookPart = document.WorkbookPart;         var sheets = workbookPart.Workbook.Descendants<Sheet>();         foreach (Sheet sheet in sheets)         {             if (sheet.Name == sheetName)             {                 var workSheet = ((WorksheetPart)workbookPart.GetPartById(sheet.Id)).Worksheet;                 var sheetData = workSheet.Elements<SheetData>().First();                 List<Row> rows = sheetData.Elements<Row>().ToList();                 foreach (Row row in rows)                 {                     var thisRow = new List<string>();                     foreach (Cell cell in row.Descendants<Cell>())                     {                         var cellContent = cell.CellValue;                         thisRow.Add((cellContent == null) ? cell.InnerText : cellContent.Text);                     }                         samples.Add(thisRow);                     selCount++;                     if (selCount >= sampleRowCount)                     {                         break;                     }                 }                 break;             }         }         document.Close();     }     catch (Exception ex)     {         ...     }     return samples; }

Does it work? Well there are a few snags. Dates are (perhaps not unexpectedly) showing as a number, numbers themselves are fine but cells with a string content are being read as an integer of some sort.



Turns out that strings are held in a shared string table so we need to change the code to read that table when required based on the cell DataType ("s"). 


foreach (Cell cell in row.Descendants<Cell>()) {     var cellContent = cell.CellValue;     var cellText = (cellContent == null) ? cell.InnerText : cellContent.Text;     var dataType = cell.DataType ?? CellValues.Error;     if (dataType == CellValues.SharedString)     {         cellText = workbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem>().ElementAt(Convert.ToInt32(cell.CellValue.Text)).InnerText;     }     thisRow.Add(cellText); }

Excel dates are stored as an integer number of days since 31st December 1899 – so the Excel date for 1st January 1900 is 1. There is a snag inherited for compatibility reasons from Lotus – where the year 1900 was assumed not to be a leap year when in fact (unlike the year 2000) it was. So the numbers are almost always out by 1. Dates are represented as human sensible dates on an Excel spreadsheet by using a “style”. It is possible to detect a style value for a cell although this is normally null.

Something like:


uint styleIndex = cell.StyleIndex ?? uint.MinValue; if (styleIndex == 3) {     Double serialDate;     if (Double.TryParse(cellText, out serialDate))     {         DateTime cellDate = DateTime.FromOADate(serialDate);         cellText = cellDate.ToString(dateFormat);     } }

Where we can use the inbuilt DateTime.FromOADate() function to handle the conversion, applying string formatting to suite.


There is another snag. It would not make any sense for an XML based Excel file to include empty cells. This means that when a row of cells is read the row only contains cells with content. This would present a problem where empty cells appear (perhaps now and again) before columns of cells of interest. Fortunately each cell object contains a CellReference that can be used to extract the Column+Row identity we are familiar with on Excel sheets. This can be used to detect a cell being presented earlier than expected in a row.


One solution (now deprecated - see below) is to create an Enumerator that can manage the detection of missing cells and supply empty ones as required. The class code was re-structured a bit at this point in time as well – partly to make the code units available to functionality to be added to the class later.



The Enumerator looks like this:


private IEnumerator<Cell> excelCellEnumerator(Row row) {     int currentCount = 0;     foreach (Cell cell in row.Descendants<Cell>())     {         string columnName = columnNameFromReference(cell.CellReference);         int thisColIndex = convertColumnNameToNumber(columnName);         while(currentCount < thisColIndex)         {             var emptyCell = new Cell()             {                 DataType = null,                 CellValue = new CellValue(string.Empty)             };             yield return emptyCell;             currentCount++;         }         yield return cell;         currentCount++;     } }

With support from:


private string columnNameFromReference(string cellReference) {     var regex = new Regex("[A-Za-z]+");     var match = regex.Match(cellReference);     return match.Value; } private int convertColumnNameToNumber(string columnName) {     char[] colAlphas = columnName.ToCharArray();     Array.Reverse(colAlphas);     int colNumber = 0;     for (int i = 0; i < colAlphas.Length; i++)     {         char alpha = colAlphas[i];         int tAlpha = i == 0 ? alpha - 65 : alpha - 64;         colNumber += tAlpha * (int)Math.Pow(26, i);     }     return colNumber; }

And with the nitty gritty of the sample extract function being changed to use the Enumerator thus:


foreach(Row row in rows) {     var thisRow = new List<string>();     var cellEnumerator = excelCellEnumerator(row);     while (cellEnumerator.MoveNext())     {         var cell = cellEnumerator.Current;         var contentString = readCellContent(cell, workbookPart);         thisRow.Add(contentString);     }     samples.Add(thisRow); }

And the reading of the cell content (all code earlier in the post) being moved to a separate function.

Addendum


A little experimentation indicated that it might also be a good idea to make provision for Boolean values in an Excel sheet. Where Booleans exist they are presented in Excel as TRUE and FALSE but the underlying values are a (reasonable) 1 and 0 respectively. You can detect Boolean values using the cell DataType which (in this instance) would be set to CellValues.Boolean.


There is also a CellValues enum CellValues.Date  but strangely this is not set in any of my sample data so the original technique of inspecting the StyleIndex still stands.



You might however wish to add the following code snippet to deal with Booleans:


if(dataType == CellValues.Boolean) {     int bVal;     if(int.TryParse(cellText, out bVal))     {         if(bVal == 0)         {             cellText = "false";         } else         {             cellText = "true";         }     } }

Parsing the value rather than just casting it may just be paranoia but…



Further Thought


Of course Yield return is sheer flummery in this context – although the Iterator was lots of fun. 

Yield should be reserved for computationally expensive processes where you want to spread the load a bit. In this instance most or all cells are going to be available in the Row object and so we would be better writing a function to return the whole list of cells in one go.


private List<Cell> getExcelRowCells(Row row) {      List<Cell> theCells = new List<Cell>();      int currentCount = 0;      foreach (Cell cell in row.Descendants<Cell>())      {          string columnName = columnNameFromReference(cell.CellReference);          int thisColIndex = convertColumnNameToNumber(columnName);          while (currentCount < thisColIndex)          {              var emptyCell = new Cell()              {                  DataType = null,                  CellValue = new CellValue(string.Empty)              };              theCells.Add(emptyCell);              currentCount++;          }          theCells.Add(cell);          currentCount++;      }      return theCells; }

And then consume the list as before but with conventional List<T> iteration:


foreach (Row row in rows) {     var thisRow = new List<string>();     List<Cell> rowCells = getExcelRowCells(row);     foreach(Cell cell in rowCells)     {         thisRow.Add(readCellContent(cell, workbookPart));     }     samples.Add(thisRow);     selCount++;     if (selCount >= sampleRowCount)     {         break;     } }

Monday, October 05, 2015

C# Memoization

Memoization is a program optimisation technique that stores the results of a computationally expensive function in a cache and then returns values from that cache if the function is called again with the same input(s).


Wikipedia has the term coined by (a personal hero) Donald Michie in 1968 although I can’t find the word in the original paper (dated 1967) [link: https://saltworks.stanford.edu/assets/kr445kk4694.pdf ] so it probably came into being as the idea was applied.


I had previously used the technique in JavaScript but that was a while ago and it took me a few moments to even remember the precise term – let alone work out how to do the trick in C#.


The archetypal example of memoization in JavaScript is a function to calculate Fibonacci numbers and I found this one on http://stackoverflow.com/


var fibonacci = (function () {     var memo = [0, 1];     var fib = function (n) {         var result = memo[n];         if (typeof result !== 'number') {             result = fib(n - 1) + fib(n - 2);             memo[n] = result;         }         return result;     };     return fib; }());


The var fibonacci contains a function that can be used to calculate a result. It will return a cached value for any previously requested calculation and will recursively call itself to calculate a new value – which will often be returned from the cache (well certainly if a sequence of values are requested). 

The cache takes the form of an array (memo[]) and this is wrapped in a “closure” so that it remains accessible to the inner function fib(). 

Closure, recursion and memorization in one function – has all the dangerous signs of an interview question.


I wanted a function in C# to look-up a subset of values previously stored in a database (in this instance identity integers) with a high likelihood that individual values would be repeatedly requested over the run of the process but without being able to sort the requests in any meaningful way. So assuming that database reads are relatively slow, memorization looked like the technique to try.


My requirement is a simple case – I have a function into which I pass a string as a key and the function does a database lookup and retunes a long integer. So let us start there


private long lookUpDimension(string dkey) {     long idVal = 0;     try     {         // code to access the databse and do the lookup     }     return idVal; }

Just a simple function (well after all the database access code is removed for clarity). Point being – it is just an ordinary function – takes a single argument and returns the output.


The class containing that look-up function also has two static functions to work some magic.


private static Func<TArg, TResult> Memoize<TArg, TResult>(Func<TArg, TResult> function) {     return Memoize(function, new Dictionary<TArg, TResult>()); } private static Func<TArg, TResult> Memoize<TArg, TResult>(Func<TArg, TResult> function, IDictionary<TArg, TResult> cache) {     return key => cache.ContainsKey(key) ? cache[key] : (cache[key] = function(key)); }

So – what is this Func<TArg, TResult>  stuff?



Func is a predefined Delegate, this one takes a single argument in and outputs a single result. There are more defined (up to 8 inputs) and we could express the second one (2 inputs) thus:



public delegate TResult Func<in T1, in T2, out TResult>(

        T1 arg1,

        T2 arg2

)

But Func<T1, T2, TResult> is syntactic sugar of the right sort.



My code needed to declare something like this



private Func<string, long> memLookup = null;

which declares a delegate instance with a string input and a long integer output

and then the following code needs to be executed:




memLookup = Memoize<string, long>(lookUpDimension);

sometime when the class is being created perhaps or certainly before the memorized function is required. The class then exposes a method like:



public long LookupDimValue(string dkey) {     return memLookup(dkey); }

Because memLookup() now “contains” a memoized version of the original lookUpDimension() function taking a string input and returning a long integer.


One note of caution about the sort of memorized function I have used in this example. If the function fails to locate a database record corresponding to the given key then it returns a value indicating that (a zero). One option might be to then insert a new record but the memorized function will “remember” that the key is not valid and continue to return the original response. The insert would therefore have to be managed from within the memorized function and the relevant value for the new record returned.



Just a thought (added 11/10/15)
What is the simplest C# code to match the JavaScript function we started with?

Dictionary<int, long> memo = new Dictionary<int, long>() { { 0, 0 }, { 1, 1 } };

private long fib(int n) {     long result;     if (!memo.TryGetValue(n, out result))     {         result = fib(n - 1) + fib(n - 2);         memo.Add(n, result);     }     return result; }
Fewer lines but less "functional" as it makes use of a Dictionary object that is external to the function. Memoization would often be a better approach.

If you are wondering if memoization (at least in some form) is an effective technique then try writing a fibonacci function without and give it a run - as the input n increases the response time can be considerable.

And more (added 13/10/2015)

C#6 now allows a different form for initialising a Dictionary

Dictionary<int, long> memo = new Dictionary<int, long> {     [0] = 0,     [1] = 1 };

which is bound to be useful when the assignment can add value (just a thought).