Skip to main content

Should Unit Tests Touch the File System?

Reading: Should Unit Tests Touch the File System?
Should Unit Tests Touch the File System?

Recently there’s been some discussion in the community about a long-held belief regarding unit tests: A unit test should not touch the filesystem. AFAIK, this idea gained currency after Michael Feathers published his rules for unit tests.

It seems to me much of the debate is driven by differing assumptions about word meanings. Fortunately, that rarely happens in our field. Otherwise, we’d be locked in a large number of circular debates for the foreseeable future.

The words in question are:

  • unit
  • touch the filesystem

Unit tests

The term “unit” is overloaded, and by extension, “unit test.” The term “unit” is not well defined, but typically the implication is that it represents a small building block of a larger solution. But how small? There’s no commonly-accepted definition of the scope of a “unit.” It can be anything, really…and that’s okay.

In many organizations, people set up unit tests that are of fairly large scope. The test cases may involve multiple components, depend on live interfaces to external resources, and assert multiple postconditions. A unit test is a test case of the smallest practical scope in that organization, given the organization’s current technical practices and, possibly, limitations inherent in the technology stack.

In the context of this topic, I believe Michael and other practitioners are thinking of something a little more specific than unit test: Something called a microtest.


All the representations of the test automation pyramid that I found using this Google search label the bottom-most layer of test automation as “unit tests.” Contemporary development methods call for very short feedback loops using a very fast build and test process. The term “unit test” may be too broad to express the level of granularity we need at the base of the pyramid to support this kind of work flow. The intent might be clearer if we labeled the base layer of automated tests as “microtests” instead.

A microtest exercises the smallest chunk of code that is practical to test in isolation, using tooling and methods designed to help us isolate code. If I may repeat: using tooling and methods designed to help us isolate code, and not necessarily what is easy to do given the organization’s current-state tooling and methods. It’s very possible some organizations must change their tooling and methods to take advantage of this.

Developers who aren’t accustomed to methods like test-driven development may interpret those words in a way that suggests large-scale unit tests are okay. But a microtest exercises just a single logical path through a single function or method or block in a program. It asserts exactly one postcondition. A microtest does not demonstrate a complete feature. Here’s a trivial example for a FizzBuzz solution in Java using JUnit:

    public void itReturnsFizzForTheNumber3() {
        assertEquals("Fizz", fizzbuzz.processNumber(3));

The test case doesn’t verify that the program correctly supports all the specifications of the FizzBuzz problem. It only asserts a single result for a single input value. This is a building block for test-driving a solution. Microtests for “real” solutions are of similar scope to this example. Once the microtests exist, they serve as the first line of defense against regressions.

To serve their dual purpose, microtests have to run very fast. A suite of thousands of microtests has to run to completion in a few seconds. That requirement brings us to the second issue: Touching the filesystem.

Touch the filesystem

The phrase “touch the filesystem” may not express the intent clearly. After all, to run a test suite the system has to load programs into memory. AFAIK that involves “touching” the filesystem. So, what’s the intent here?

I think it isn’t really about filesystems as such. I think the core idea is it’s better if microtests aren’t dependent on data or other fixtures outside the control of each test case. When you have to load a database or a set of files separately from the test suite, the test cases may be fragile. A reliable test case has exactly one reason to fail, and that reason has nothing to do with whether an external database or file has been loaded correctly.

We aren’t usually interested in testing whether Oracle or DB2 or SQL Server is capable of processing a SELECT statement (unless we work for Oracle, IBM, or Microsoft). We’re usually interested in testing whether our application logic deals with the data that comes back from the SELECT operation. If our microtest can fail because of a database issue, then it’s a fragile test case.

Code isolation

Apart from differing assumptions about context and word-meanings, there’s another factor I’ve observed in the wild that leads programmers to write unit test cases that depend on the filesystem: It isn’t always easy to isolate the code under test.

It’s sometimes quicker to create a small test database or a set of files on the fly than it is to isolate the code from the filesystem altogether. Is that “wrong” or “bad?” I’m not sure it would be meaningful or helpful to make a blanket statement either way. I think the answer depends on what we’re trying to achieve with the test case, and whether it’s safe to rely on the filesystem to the extent the test case depends on it.

In-memory data stores

If you create an in-memory database instance, have you obeyed the “rule” about not touching the filesystem? Sure, but is the in-memory database more reliable than a similar instance residing on an external storage medium?

The answer has more to do with the way the database is used in the test case than on whether it literally lives on a disk. As long as the test case is reliable, does it really matter whether the database is in memory? Maybe it helps with performance, but otherwise I’m not sure it matters very much as long as the test case controls the creation of the test data.

If the test case creates and loads the database to guarantee the preconditions are set correctly, then the test case won’t be fragile just because it touches a database. An in-memory database will help with performance; remember that the microtest suite has to run in no more than a few seconds in order to serve its purpose.

I’ve heard people protest that a lightweight database, whether in memory or not, doesn’t support all the constraints that the production enterprise DBMS supports, and therefore any unit tests or microtests that use the lightweight database don’t accurately reflect the behavior we would see in production. I remind them that the purpose of a microtest is not to exercise the full application in its production context. There will be other automated test cases to do that. They live higher on the pyramid than microtests.


When we set aside arguments about how big a “unit” should be, and we all agree that we’re talking about microtests, then it’s easier to come to a consensus about dependency on the filesystem, or on anything else external to the test case itself.

Lead a Structured and Disciplined Agile Transformation
Download Now
Next Certified Scrum Professional (CSP) Deadline 12/31/17 HOW TO EARN SEU's FAST!

Comments (3)

  1. Junilu Lacar

    Feathers’ take on unit tests is covered early in his book Working Effectively with Legacy Code, otherwise known as the WELC book.

    His has two main characteristics for good unit tests. First, they have to run fast. The second characteristic is that they help localize problems. That is, they need to be small enough so that you can easily pinpoint where problems are. The work of looking at the failure and determining where along the path from inputs to output a problem occurred should be trivial.

    Michael asserts that unit tests run fast and if they don’t run fast, they aren’t unit tests. His definition of “fast”: A unit test the runs in 1/10th of a second is a slow unit test. He writes that a test is not a unit test if: 1) It talks to a database. 2) It communicates across a network. 3) It touches the file system. 4) You have to do special things to your environment (such as editing configuration files) to run it.

  2. Jim Vaughan

    This is a well-written article. Thank you for taking the time to publish. I agree with what you say and would like to add that an in-memory database also removes the susceptibility to network issues in addition to performance gains.

  3. typelogic

    If the file system behavior plays a core part of the code being tested, is it correct to mock the behavior of the file system? I am referring to the `inotify` file presence notification in `Poco`. Should we deprive the unit test code the right to check the asynchronous notification coming from the file system? If we mock this file system notification in our quest for 100% purity, did we not just missed the point of why we are doing this particular unit tests?


Leave a comment

Your email address will not be published. Required fields are marked *