Automated Testing


Automated Testing21 Nov 2013 09:46 pm

Authors Note: I was very edified when none other than David Heinemeier Hansson, the creator of Rails, wrote a blog post expressing similar, albeit more general, sentiments to those I present here with respect to the general inappropriateness of low level unit testing. I feel he has said more boldly what I have here intimated timidly.

I am a strong proponent of automated testing and test driven development. But if asked if I do unit testing verses integration or some higher level of testing, I will usually ask the questioner to define “unit.”

To some, this may seem like questioning the definition of “is” but I don’t think so. Consider a set of unit tests that test a single class, the stereotypical case. Let us assume the class was somewhat complex and there were fifty tests to exercise all of the code in the class. Later the class gets refactored into a facade class, supporting the interface of the original class, and a small set of simpler classes behind it that work together to do the work of the original class.

Should we now write unit tests for each of these new classes? Why? What is the ROI? If the original set sufficiently tested the original class, and the refactoring was just that, a change in code structure that did not modify its behavior, do they not now sufficiently test the classes as a group?

Indeed, in my experience, even if we started out with a set of classes that work together to perform a business function, the best value in testing is still to write tests that test that the business function is performed correctly. Such tests will always be valid as long as the business function, i.e. the functional specification of the code, does not change. Tests below this level, in my experience, are fragile and break upon refactoring because they are too closely tied to the implementation of the code under test. If the value of testing is to enable refactoring with confidence, then the tests must survive the refactoring.

How does this fit in with TDD? In TDD we should never write code unless we have a failing test. If each test expresses a detail of the functional requirements of the software (as opposed to a detail of its implementation), then as each test passes, a functional requirement is met, we refactor and move on. It should not matter if we wrote one line, one class or a dozen classes to make the test pass.

Some may argue that writing comprehensive tests at this higher level of abstraction is too difficult. Rather one should write general tests that level. One might, for example, assert that a value is returned. But lower level tests should be written to assert that the correct value is returned for every edge case.

This can sometimes be true. Sometimes tests for edge cases at higher levels of abstraction are harder to set up than the effort is worth, and a lower level test, even if fragile, gives better ROI. However in my experience, in the general case, what makes testing the edge cases difficult at the higher level is usually the same thing that makes any testing difficult: bad design, inexperience with testing, bad tooling, or a combination thereof. Even granting that higher level tests are objectively harder to write, if one practices writing harder tests, it eventually gets easy and one becomes a better tester than they otherwise would have been.

In the end, for each business function, there needs to be an API the implementation of which is responsible for performing that function, and that implementation needs to be testable within one or more contexts (a given system state that can be mocked or otherwise simulated). If the implementation is a single function, a class or an entire module is not the relevant concern. The concern is what are the inputs and what are the outputs and testing that the anticipated inputs all lead to their correct outputs.

Yes, we want to write our tests at the lowest level possible within this context. But we do not want to go below this level. We do not want to be testing components that are simply implementation details of the software’s functionality. Such tests break under refactoring, lead to higher maintenance costs, rarely add value and hence have poor ROI.

There is an exception. For teams or developers new to TDD and writing well designed code in general, lower level tests can provide value. Writing lower level tests is easier. More importantly, being forced to make the lower level components testable helps one to learn good design. It enforces loose coupling, proper abstractions and the like. However once these skills are internalized, they can be exercised without needing to write tests to enforce them. These tests are a learning tool that can and should be discarded.

There is a corollary to this. If a developer doesn’t stop writing these low level tests once he no longer needs them, if he doesn’t instead start writing test at the business functional level, it is entirely possible to develop a system that is fully “tested” but fails to do the right thing. Every low level unit can work as intended but in aggregate fail to work together as intended. One needs tests that assert that the system as a whole, or meaningful segments of it, perform as intended.

I will conclude by admitting that I have not truly answered the question I started out with. How do we correctly define a unit? I have asserted that the best definition of a unit “the code that implements the lowest level business function.” In short we need to be able to discern the boundary between business function and implementation detail. Pointers on how to do this shall perhaps be the topic of another post. For now I will only say that finding the level of test abstraction that will maximize ROI is as much an art, learned from experience, as it is anything else. But one will never develop the art, unless one first realizes it is to be sought after. And challenging those who have not already done so to start looking is the real point of this post.

† Throughout this discussion, I am using the term “business functionality” loosely to refer to what the software is supposed to do conceptually, the details of its functional specification as distinct from details of the implementation of that specification. The term “business” itself may not be properly applicable to all real world cases.

Automated Testing&Web Development17 Nov 2013 09:28 pm

Background

I have been working with our UI team recently to help them do better testing of their Backbone.js based single-page web application. We found it useful to bring in Squire.js to assist us in doing dependency injection into our many Require.js modules. Squire works quite well for this but invariably when writing these sorts of apps, you need to pull in libraries that are not AMD compliant at all or are simply “AMD aware.” When these sorts of modules enter the mix, Squire needs a little help.

jQuery is a great example of this sort of library. Recent versions are AMD aware, and include a define() call. Unlike a true AMD module, though, jQuery’s functionality is not fully encapsulated within the factory function provided to define. Indeed, none of jQuery’s initialization is handled in its factory function. Rather jQuery initializes upon load, just like any legacy JavaScript module. jQuery must do this in order to remain compatible with the millions of lines of non-AMD code that use it.

The Problem

This presents a problem when using Squire. In order to supply alternate versions of AMD modules to the module under test, Squire creates a new Require.JS context in which to load the module under test and its dependencies. Each new Require.JS context in turn loads afresh all the javascript files that are needed by that context. If all of these files are AMD modules, whose state is fully encapsulated within their factory functions, and only initialize when told to do so, then everything is fine. In the case of jQuery, or other non-AMD modules, which initialize upon load and store state in the global space, this can be a problem.

Consider this simple example. Two separate tests use Squire to load jQuery and the jQuery.BlockUI plug-in. Depending on timing details between your browser and your web server, both jQuery instances may load first, followed by both plugin instances, or they may load interleaved: jQuery, plugin, jQuery, plugin. The latter will work out well, the former (and in our experience most typical case) will not. In the former case, because of the shared global namespace, the second jQuery module loaded is the one that the global jQuery and $ variables point to when both of the BlockUI plug-ins load. Because of this, they both plug into the second jQuery instance leaving the first one plug-in free. For non-AMD modules who access jQuery from the global $ variable, this is not a problem. The instance they get has the plugin. For AMD modules that are handed a jQuery instance as an argument to their factory function, the context that loaded the first jQuery instance is stuck with that instance, which did not get its plug-in. This should lead to a lot of failing tests.

The Solution

At first pass, it may seem that the solution is to some how ensure the load order or otherwise ensure that both jQuery instances get their plug-in. That may be a theoretical ideal, but most non-AMD libraries were never designed to have multiple instances loaded and running and doing so can cause all kinds of problems. jQuery, because it supports loading multiple versions of itself at the same time actually handles this better than most. Nonetheless the best solution is to simply avoid loading multiple instances of non-AMD libraries. The question is how to do this in Squire.

Best we can tell, Squire does not explicitly support this. However, there is a simple workaround that can be put in place to enable it. The trick is to require jQuery, any plug-ins, and any other non-AMD modules that may be loading twice at the same time as you require Squire itself. Then for each of these libraries, tell Squire to mock the module and provide Squire with the initial instance as the mock. For modules that don’t return anything when invoked by Require (BlockUI plug-in in our case), Squire must still be told to mock it, but null can be provided as the value for the mock.

Here is some example code taken from a complete working example on github.

define([
  'vendor/squire/Squire', 
  'data/mock-data', 
  'vendor/jquery', 
  'vendor/jquery.blockui'], function(Squire, mock_data, $) {
  var injector = new Squire();
  injector.mock('data/real-data', mock_data);
 
  //Our fix to avoid loading jQuery and BlockUI twice
  injector.mock('jquery', function() { return $; });
  injector.mock('vendor/jquery.blockui', null);
 
  injector.require(['app/example-view'], function(View) {
    describe('Testing with Squire only', function() {
      var view = null;
      before(function() {
        view = new View();
      });
      it('$.blockUI should be defined', function() {
        assert.isDefined(view.getBlockUI(), '$.blockUI was undefined in example-view');
      });
      it('the data should be mocked', function() {
        view.getDataType().should.equal('mock');
      });
    });
  });
});

This approach works because by requiring the modules up front using Require and its default context, we rely on the standard Require logic to ensure the modules only load once. By telling Squire to mock the modules, it will not try to load them but will use the mocks provided, the common instances loaded by Require.

In the case of non-AMD libraries that return nothing to the factory function, such as the BlockUI plug-in above, simply requiring it will cause Require to load it. Upon load, the library does its thing (registers itself with jQuery) and that is all that is needed from it. Telling Squire to mock it keeps it from being loaded again in the new context, and because the library doesn’t provide a value, providing null as it mock value to Squire works just fine.

One final item to note is that in defining the mock for jQuery we cannot simply write

injector.mock('jquery', $ );

rather we must do

injector.mock('jquery', function() { return $; });

The reason for this is that contrary to the Squire documents, the second argument to mock is not always “the mock itself.” The second argument to mock works just like the final argument to define in Require. It may be an object or a function. If it is a function, then Require presumes it to be a factory function that it will invoke in order to get the mock. Since both jQuery and classes (i.e. constructors) are functions, they must be wrapped in factory in order not to be invoked as a factory.

.Net Development&Agile Practices&Automated Testing07 Dec 2012 10:47 pm

I have been playing with NBehave, which is a cucumber-like Behavior Driven Development (and testing) framework for .Net. I like it and it has a lot of capabilities in its latest release but there is one thing I don’t like. Why should the failure of one post-condition (a “then” clause) cause the entire given-when-then test set to terminate?

To clarify, the NBehave behavior is that if I have some test setup (some “givens”), “when” I perform my test action, this is usually followed buy multiple “then” post conditions to be tested. NBehave (and I think to be fair most BDD frameworks), stop evaluating post conditions after the first failing one is encountered.

Doesn’t this presume that there is some order dependency between the post-conditions, such that if a prior post condition failed, the rest of the post-condition tests are invalidated? I see no reason to make this assumption. Even if it is the case in some scenarios that post-conditions have dependencies on each other, in my experience this is not the norm. By terminating the test early one is simply depriving the developer of additional information that actually might help in resolving the failed condition.

Thoughts? Am I missing something?

Next Page »