Writing Better Tests Than Humans Can Part 2: Model-based Tests with FsCheck in C#
Model Based Testing for C# and F#
This is the second post in a 3-part series on property-and-model based testing in FsCheck in C#.
- Writing Better Tests Than Humans Can Part 1: FsCheck Property Tests in C#
- Writing Better Tests Than Humans Can Part 2: Model-based Tests with FsCheck in C#
Subscribe to get the next as they’re published!
Testing Richer & Multi-Feature Scenarios with Model-Based Testing
A property test is meant to be fairly succinct and verify that simple properties hold true for all possible inputs given a set of preconditions. But what if you need to test a more complicated scenario, like one where the state of a given object is influenced by the previous operations that might have occurred?
Our previous example of the FixedSizedList<T>
or really any type of collection is a good example of an object. The state of that object will be determined by what’s already been added to the collection, its initial state, what’s been removed, etc… So for scenarios like that we’re going to use a model-based test.
In a property-based test we randomize the inputs a particular operation uses - that can be configuration settings, input to a method, and so forth.
But suppose you wanted to test whether or not any valid, arbitrary combination of operations against the same instance of a feature produce a correct result? This is what a model-based test does: it randomizes the operations and their inputs against the same object.
Creating a Model
So rather than come up with a contrived example, I’m going to use a real example of where we used FsCheck to prove the safety of Helios 2.1 - the socket library that powers the remoting and clustering systems inside Akka.NET 1.1. Helios 2.1 is running my personal fork of the DotNetty engine alongside all of its previous 1.* APIs.
DotNetty and Helios 2.x use a duplex “pipeline” model to process events going out to and coming in from a socket. Here’s a visual of how the pipeline model is structured:
This pipeline is asynchronous and can be modified at run-time in a thread-safe manner; but most importantly, we ensure that events flow through the pipeline in the correct manner: top-to-bottom for outbound events (writes) and bottom-to-top for inbound events (reads.)
This type of construct is essential to making our networking system extensible, flexible, and it actually is a key feature of DotNetty’s performance guarantees as well.
Therefore, it was essential that we validate the following things about Helios’ pipeline model:
- It must be possible to create initial stages of the pipeline during setup;
- It must be possible to modify the pipeline once it’s processing events, either by adding new pipeline stages or removing existing ones;
- Any
IChannelHandler
(a stage in theIChannelPipeline
) must handle any events it is explicitly programmed to handle and vice-versa; and - Events must be handled in the correct order by the pipeline.
Given all of the different types of events (12), the three different features (setup, removing, adding), and all of the possible orderings of channel pipelines… Verifying that these four behaviors worked as expected for all of those possible combinations would have been a nearly impossible task to do by hand. Hence why I opted to describe a model for how this pipeline behaves instead.
The approach I chose for modeling a real IChannelPipeline
was to use a linked list structure, since that data type is the simplest representation of the real pipeline itself.
public class PipelineModelNode
{
public NamedChannelHandler Handler;
public string Name;
public PipelineModelNode Next;
public PipelineModelNode Previous;
public override string ToString()
{
// added to make it easier to read
return $"{Handler}";
}
public PipelineModelNode Clone()
{
return new PipelineModelNode {Handler = Handler, Next = Next,
Name = Name, Previous = Previous};
}
}
public class PipelineMutationModel
{
// used by the REAL pipeline to save its events
public Queue<Tuple<string, SupportedEvent>> EventQueue;
public PipelineModelNode Head;
public int Length;
public PipelineModelNode Tail;
public bool Contains(string name)
{
// determines if a named IChannelHandler is present
// in the linked list
}
// used to predict what the event queue will look like for a given event type
public Queue<Tuple<string, SupportedEvent>> PredictedOutcome(SupportedEvent e)
{
var q = new Queue<Tuple<string, SupportedEvent>>();
var current = Head;
while (current != null)
{
if (current.Handler.SupportsEvent(e))
{
q.Enqueue(Tuple.Create(current.Handler.Name, e));
}
current = current.Next;
if (current == null)
{
break;
}
}
return q;
}
public override string ToString()
{
// stringifies the current state of the linked list
// in order to make it easier to debug inside FsCheck
}
public PipelineMutationModel Clone()
{
// creates an immutable copy of the current linked list
}
public static PipelineMutationModel Fresh()
{
// creates a fresh pipeline that mirrors the defaults
// built into the DefaultChannelPipeline
var head = new PipelineModelNode {
Handler = new NamedChannelHandler("HEAD"), Name = "HEAD"};
var tail = new PipelineModelNode {
Handler = new NamedChannelHandler("TAIL"), Name = "TAIL"};
head.Next = tail;
tail.Previous = head;
return new PipelineMutationModel
{
Length = 2,
Head = head,
Tail = tail,
EventQueue = new Queue<Tuple<string, SupportedEvent>>()
};
}
}
I removed most of the implementation details of the model for the sake of brevity, but you can see the full file in its entirety here.
The PipelineModelNode
represents and individual stage in the IChannelPipeline
and is used to create our linked list model of how the pipeline is constructed. The methods on the PipelineMutationModel
are used to traverse this linked list to predict how the real IChannelPipeline
implementation class should perform during each of the operations we’re about to test.
The model is supposed to do two things here:
- Provide a means for reflecting the current state of the
IChannelPipeline
and predicting the next state given some operation; and - Provide some tools to FsCheck for making it easy to clone and print the current structure of the “expected” pipeline, in order to make it easier to debug.
Building an FsCheck Model Based Test
Now that we have a model structure we can work with, we have to actually describe our test and its behavior. We do this using the Machine<TActual, TModel>
class in FsCheck.
public class ChannelPipelineModel
: Machine<IChannelPipeline, PipelineMutationModel> {
public override Arbitrary<Setup<IChannelPipeline, PipelineMutationModel>> Setup
=> throw new NotImplementedException();
public override Gen<Operation<IChannelPipeline, PipelineMutationModel>> Next(
PipelineMutationModel obj0)
{
throw new NotImplementedException();
}
}
To implement an FsCheck Machine<TActual, TModel>
, we just have to implement two methods:
Setup
- this is used to determine what the initial state of the model will be. So for instance, if we were using a model-based test to test aList<int>
, we might have the setup randomly create an empty list or maybe a list with some items in it already.Next
- a model based test is really just a sequence ofOperation<TActual, TModel>
instances that are chosen randomly by FsCheck and this method is responsible for generating the next possible operation.
Implementing the two methods themselves is trivial, as we’ll see later, but the real trick lies in defining our Setup
and Operation
classes.
Model Setups
In this example, our ChannelPipelineModel
uses exactly one Setup<IChannelPipeline, PipelineMutationModel>
implementation:
private class PipelineSetup : Setup<IChannelPipeline, PipelineMutationModel>
{
public override IChannelPipeline Actual()
{
return new DefaultChannelPipeline(TestChannel.Instance);
}
public override PipelineMutationModel Model()
{
return PipelineMutationModel.Fresh();
}
}
All we do here is create a brand new DefaultChannelPipeline
in the Actual
method, where we want to create a new instance of our object under test, and a PipelineMutationModel
in the Model
method - our model designed to predict how the actual IChannelPipeline
implementation will perform.
It’s possible to have multiple types of setup objects in play for any given model based test, but in our case we’re usually better off just starting with something simple.
Model Operations
The majority of the code you write for a model based test will be implementing Operation<TActual, TModel>
classes; each one of these classes describes an instance of performing some sort of work against the object-under-test and being able to validate those results against what the model predicts.
internal class AddFirst : Operation<IChannelPipeline, PipelineMutationModel>{
public override bool Pre(PipelineMutationModel _arg1)
{
throw new NotImplementedException();
}
public override Property Check(IChannelPipeline obj0, PipelineMutationModel obj1)
{
throw new NotImplementedException();
}
public override PipelineMutationModel Run(PipelineMutationModel obj0)
{
throw new NotImplementedException();
}
}
Each Operation<TActual, TModel>
class has two methods that need to be populated in order for an FsCheck spec to be executed:
TModel Run(TModel current)
- using the current model and any input that thisOperation
consumes this step will predict the next output of theTActual
class using the same input.Property Check(TActual currentActual, TModel nextModel)
- using the output of theRun
method, theCheck
method verifies that when that same operation is run againstTActual
that the results predicted by the model match those produced by the object under test.bool Pre(TModel current)
(optional) - this is avirtual
method that can be overridden, and it executes a pre-condition to determine if thisOperation
is valid given the current state ofTModel
. Any operations that don’t returntrue
against their preconditions are skipped but may be repeated again in the future.
One important thing worth mentioning about Operation<TActual, TModel>
classes is that they usually require some sort of input in order to be effective. I find that it’s best to provide that input into the constructor of the Operation
class, because then you can use FsCheck’s Gen
and Arb
generators to create randomized data for each instance of the Operation
(more on Gen
and Arb
later.)
internal class AddFirst : Operation<IChannelPipeline, PipelineMutationModel>{
public readonly NamedChannelHandler Handler;
public AddFirst(NamedChannelHandler handler){
Handler = handler; // input for this operation
}
// rest of implementation...
}
First thing we’re going to do now is describe how this operation will run inside the Run
method:
public override PipelineMutationModel Run(PipelineMutationModel obj0)
{
var newNode = new PipelineModelNode {Handler = Handler, Name = Handler.Name};
var model = AddToHead(obj0, newNode);
return model;
}
The AddToHead
method appends the newNode
, created using the data passed into the constructor of this AddFirst
operation, to the head of the linked list which represents the underlying IChannelPipeline
object.
If the IChannelHandler
at the front of this list after the operation is anything other than AddFirst.Handler
then we know this test is invalid. So let’s check it!
public override Property Check(IChannelPipeline obj0, PipelineMutationModel obj1)
{
var pipeline = obj0.AddFirst(Handler.Name, Handler);
var embeddedChannel = obj0.Channel() as EmbeddedChannel;
Contract.Assert(embeddedChannel != null);
embeddedChannel.RunPendingTasks(); // force the pipeline to run all scheduled tasks
var pFirst = pipeline.Skip(1).First(); //bypass the system default head node (always first)
var mFirst = obj1.Head.Next.Handler; // grab the model head node (model also has system default)
var pLength = pipeline.Count();
Property prop = (pFirst == mFirst)
.Label($"Expected head of pipeline to be {mFirst}, was {pFirst}")
.And(() => pLength == obj1.Length)
.Label($"Expected length of pipeline to be {obj1.Length}, was {pLength}");
return prop;
}
The Check
method executes the IChannelPipeline.AddFirst(IChannelHandler h)
method to add Handler
to the front of the pipeline, which should bring the state of TActual
up to date with the state predicted by TModel
.
For a real model based test, we want to define an Operation<TActual, TModel>
class for each operation we want to test within the same model for a given set of features under test - so we’ll see some ones not included on this document here when it comes time to run this test.
Generating Randomized Data with Gen
and Arb
Before we can run our model-based test, we need to give FsCheck a way of randomizing the following two things:
- The input data used by each
Operation<TActual, TModel>
instance and - The set of
Operation<TActual, TModel>
instances each model-based test run will use.
To do this, we will need to make use of the Gen
(generator) and Arb
(arbitrary) classes in FsCheck.
The difference between Gen
and Arb
? Arb
supports the “shrinking” operation we mentioned earlier, i.e. being able to distill down a failing test case from 1000 steps to four. So we will be using Arb
to generate our Setup<TActual, TModel>
since we can shrink the results applied to it from there. Otherwise, we only need to use Gen
.
Generating User-Defined Random Test Data
If you recall the constructor of our AddFirst
operation, it takes a NamedChannelHandler
as a constructor argument. All of the property based tests we executed in the previous article used built-in CLR types and FsCheck has built-in Gen
classes defined for each of those already. To generate randomized instances of our own types, we’re going to need to create our own Gen
instances.
public class NamedChannelHandler : IChannelHandler{
// implementation methods...
public static Gen<string> CreateName()
{
return Arb.Default.Char().Generator.ArrayOf(30).Select(x => new string(x));
}
public static Gen<NamedChannelHandler> CreateHandler()
{
return CreateName().Select(s => new NamedChannelHandler(s));
}
}
On the NamedChannelHandler
class itself we define a Gen<string>
, which will create a randomly assigned name using FsCheck’s built-in Gen<char>
generator, and then we use the output of that generator to serve as the input for our Gen<NamedChannelHandler>
. FsCheck is written in F# and uses these functional composition techniques to express data generation and randomization very succinctly.
N.B. Don’t use
Random
for generating randomized data with FsCheck. One of the really important and powerful features of FsCheck is that it gradually increases the complexity and size of its random tests over time, and in order to do that effectively it has to manage how data is randomly generated.So always use FsCheck’s built-in
Arb.Default
as a starting point for seeding data you might need for a random test if at all possible.
Now that we have a generator for our NamedChannelHandler
class, we need to also create a generator for our AddFirst
operation class - so we can use it inside ChannelPipelineModel.Next
.
public class AddFirst : Operation<IChannelPipleine, PipelineMutationModel>{
public static Gen<Operation<IChannelPipeline, PipelineMutationModel>> AddFirstGen()
{
return
NamedChannelHandler.AllHandlers()
.Select(x =>
(Operation<IChannelPipeline, PipelineMutationModel>) new AddFirst(x));
}
}
And thus we repeat this process for each possible Operation<TActual, TModel>
supported by the ChannelPipelineModel
and end up with something like this:
public class ChannelPipelineModel
: Machine<IChannelPipeline, PipelineMutationModel> {
public override Arbitrary<Setup<IChannelPipeline, PipelineMutationModel>> Setup
=> throw new NotImplementedException();
public override Gen<Operation<IChannelPipeline, PipelineMutationModel>> Next(
PipelineMutationModel obj0)
{
return Gen.OneOf(AddFirst.AddFirstGen(), AddLast.AddLastGen(),
RemoveFirst.RemoveFirstGen(), RemoveLast.RemoveLastGen(),
ContainsAllModelHandlers.GenContainsAll());
}
}
Great! Now we can randomly generate any of our operations against the model. Last thing we need to do is implement the Setup
method on ChannelPipelineModel
and we’ll be ready to go.
public class ChannelPipelineModel
: Machine<IChannelPipeline, PipelineMutationModel> {
public override Arbitrary<Setup<IChannelPipeline, PipelineMutationModel>> Setup
=> Arb.From(Gen.Fresh(() =>
(Setup<IChannelPipeline, PipelineMutationModel>) new PipelineSetup()));
public override Gen<Operation<IChannelPipeline, PipelineMutationModel>> Next(
PipelineMutationModel obj0)
{
return Gen.OneOf(AddFirst.AddFirstGen(), AddLast.AddLastGen(),
RemoveFirst.RemoveFirstGen(), RemoveLast.RemoveLastGen(),
ContainsAllModelHandlers.GenContainsAll());
}
}
I omitted the vast majority of the code used to write this model-based test for the sake of brevity, so if you had any trouble following this: read the full source code here.
Running the Model-based Spec
To run this model-based test, we need to call it from inside an XUnit test class. I’m going to use the FsCheck.XUnit
package here to make it a little bit easier for me to turn some settings on for this model-based test when we run it in XUnit.
using System.Linq;
using FsCheck;
using FsCheck.Experimental;
using FsCheck.Xunit;
using Helios.Channels;
using Xunit;
public class ChannelPipelineConstructionSpecs
{
public ChannelPipelineConstructionSpecs()
{
Model = new ChannelPipelineModel();
}
[Property(MaxTest = 1000)]
public Property ChannelPipeline_should_obey_mutation_model()
{
return Model.ToProperty();
}
}
ChannelPipeline_should_obey_mutation_model
will automatically be treated by XUnit as a Fact
, and we’re going to execute this model-based test 1000 times according to the configuration I set on the Property
attribute.
Let’s see what happens when I run this.
Falsifiable, after 795 tests (2 shrinks) (StdGen (1640124775,296195319)):
Original: (omitted for brevity)
Ooof. Failed after 795 tests! That’d be a pain in the ass to debug… Let’s see what the shrinked reproduction steps had to say about it.
Well thankfully, FsCheck shrunk this failing test case into 10 steps (omitted for brevity; the output is extremely verbose) and the error message behind the failure is self-explanatory:
Duplciate handler name: {random name}
Aha! We tried to add a handler with a duplicate name to our pipeline, which is illegal! Therefore - our AddFirst
operation needs to have a precondition to screen against that!
internal class AddFirst : Operation<IChannelPipeline, PipelineMutationModel>{
public readonly NamedChannelHandler Handler;
public AddFirst(NamedChannelHandler handler){
Handler = handler; // input for this operation
}
public static Gen<Operation<IChannelPipeline, PipelineMutationModel>> AddFirstGen()
{
return
NamedChannelHandler.AllHandlers()
.Select(x =>
(Operation<IChannelPipeline, PipelineMutationModel>) new AddFirst(x));
}
public override Property Check(IChannelPipeline obj0, PipelineMutationModel obj1)
{
var pipeline = obj0.AddFirst(Handler.Name, Handler);
var embeddedChannel = obj0.Channel() as EmbeddedChannel;
Contract.Assert(embeddedChannel != null);
embeddedChannel.RunPendingTasks(); // force the pipeline to run all scheduled tasks
var pFirst = pipeline.Skip(1).First(); //bypass the system default head node (always first)
var mFirst = obj1.Head.Next.Handler; // grab the model head node (model also has system default)
var pLength = pipeline.Count();
Property prop = (pFirst == mFirst)
.Label($"Expected head of pipeline to be {mFirst}, was {pFirst}")
.And(() => pLength == obj1.Length)
.Label($"Expected length of pipeline to be {obj1.Length}, was {pLength}");
return prop;
}
public override PipelineMutationModel Run(PipelineMutationModel obj0)
{
var newNode = new PipelineModelNode {Handler = Handler, Name = Handler.Name};
var model = AddToHead(obj0, newNode);
return model;
}
public override bool Pre(PipelineMutationModel _arg1)
{
// Can't allow two handlers with the same name to be added
return !_arg1.Contains(Name);
}
}
Now if we run that, our model-based test will pass!
Recap
So what really happened in our model-based test? What did this prove?
The idea behind a model based test is that you have your object under test, the IChannelPipeline
in this instance, and a separate model designed to predict how it will perform. The model-based test proved that for 1000 randomly generated combinations of the operations we designed against the IChannelPipeline
that each of those operations performed exactly as predicted by our model.
If 1000 tests isn’t enough, we can ramp up the number and make it even higher. This model-based test is capable of detecting errors and design faults that no manually-written test would ever be able to catch. That’s the power of random testing.
Learn more about FsCheck here at its official website.
Say tuned for the next and final post in this series: Writing Better Tests Than Humans Can Part 3: Verifying the Safety of Network and Concurrent Code FsCheck in C#.