Documentation and Developer Onboarding

WRITTEN BY Dave Nicolette


Organizations that build and support software have many unique challenges. They also have quite a few common challenges. One of those is how to help new technical team members come up to speed on the technical environment, coding standards, naming conventions, code base, build and deploy tooling, version control standards and conventions, and so forth.

Not long ago, someone posted a question on Twitter to start a discussion:

People replied with a number of suggestions, including:

  • Diagrams
  • Flowcharts (and a tool recommendation)
  • The C4 Model
  • Source comments
  • Commit messages
  • Overview of the code
  • Where to find things
  • How to run the build
  • Detailed explanations of implementations

Notice anything missing from that list? Yeah, so did I:

“The problem with depending on documentation to find your way around a new code base and environment is that documentation tends to drift away from the true state of the code very quickly.”

By the time a diagram is prettied-up and ready to share, it’s like to be out of date; someone committed a bug fix while the diagram was being polished.

This is the cue for snarky observers to jump up and shout, “You can’t depend only on test suites for information!”

So, let’s declare the snarky observers “right,” give them a nice piece of candy as a prize, and leave them behind to look for the word “only” in the preceding material while we move on.

Excessive and detailed diagrams and descriptions are not usually helpful, for a couple of reasons. First, because of the level of detail, it’s almost a certainty that the documentation will be out of date. Second, because of the quantity of such material (assuming detailed information), new team members are unlikely to read it all and even if they try, it’s unlikely they will remember much of it.

What sort of documentation might be useful, then?

When I go into a new environment, I find the following types of documentation to be the most useful for starting to understand what’s going on:

  • A Context Diagram for each major system, or each major component of the system I will be supporting
  • Some sort of lines-and-boxes drawing of the main components of the system, showing which ones touch which other ones.

That’s it. Specific modeling methods aren’t important. Graphical tools aren’t important. The most helpful diagrams are high-level ones.

When it’s time to get down to it and do some work on the code, I’d like to find:

  • Executable test suites at multiple levels of abstraction, properly isolated
  • Descriptive and meaningful commit messages in the version control system
  • If supported for the language, some sort of special source comments that IDEs pick up and show you in context while you’re working, like rdoc (Ruby), javadoc (Java), or documentation comments (C#).

That’s it. I’ll never read reams of offline documentation. I need the help in the moment, so it has to pop up in context in the editor pane of an IDE, or display test results immediately, either inside an IDE or in a command-line window. Other forms of low-level documentation are waste.

What Problems Can Documentation Solve?

We often work with large organizations that have been in operation for decades. It’s common for these organizations to support thousands of production applications running on dozens of different platforms. Of those, there are usually a few that are truly mission-critical. The systems currently in place have been added gradually over a long period of time. As each system was added to the environment, people followed the good practices that were recognized at the time, using the technologies that were available at the time. The result is a mixed bag of hardware, operating systems, programming languages, software update schemes, system provisioning tools, software deployment tools, security tools, testing frameworks, and more.

Some of these are home-grown, some are third-party products. Many of them are old releases, including some that are off support entirely. Over time, people have connected these systems in whatever ways they could think of when they were under delivery pressure. There are numerous point-to-point interfaces that don’t follow any particular standards, where data moves between systems. It’s not uncommon to find only one individual, or no one at all, who understands why a given data transfer occurs or when it was added to the environment. For that reason, there’s fear about removing it, and fear about modifying data formats and the values of data elements.

Typically, no one in the organization knows how a piece of data flows from system to system, from the moment it enters the environment or is created, until the moment it is deleted or archived. Every modification to a system has a ripple effect that causes regressions in multiple other systems. Even if some executable test suites are in place, there’s no practical way to predict or prevent all the unintended consequences of changing a system.

In my experience, this situation is the single largest problem that documentation could help solve. “Documentation” as such, with the few caveats mentioned above, doesn’t really help individual technical staff become productive with their new teams. All the time spent creating detailed diagrams and low-level explanations of application code is time lost to the larger challenge of documenting how all the moving parts of the environment interact with each other.

leave a comment

Leave a comment

Your email address will not be published. Required fields are marked *