Thursday, May 2, 2013

The Linguistic Approach to System Description

You are doing a software project. How should you structure its documentation? What guiding principles should be used for creating and structuring its documentation?  Should you include project-planning documents in it?

I propose the "Linguistic Paradigm for System Description" here. It may have been proposed before, but probably not in exactly the same form. It is a tool for thinking about not only of documentation, but the structure of "systems" in general.    

Note that we have computer applications which we often call "systems". Then we have project planning (documents, models) to help the creation of such systems in an orderly manner. But a project plan can also be seen as a system on its own. It has components that relate to each other, rules for its actors to follow, conditions and events that trigger further actions. Executing a well-defined project plan is really executing a program.

I focus here on system descriptions in general, whether those systems be computer programs or procedures and plans for creating them.

Before getting too philosophical here's the structure of documentation I advocate:

 1. Syntax
 2. Semantics
 3. Interpreter
 4. Meta

And now the explanation and purpose of each:


A computer system is a "smart system" that helps us in some way. Because it is 'smart' we are able to control it via some kind of language. What kind of language? What primitives and command-sequences can we use to communicate with it?  Describing that, means describing the SYNTAX of the language that controls the system.

For a graphical application (aren't they all?) this would mean describing its GUI controls and dialogs. In what sequence can they be exercised? As an example, to choose an item from a menu, you first need to click something else to get the menu to pop up. Thus we can see that a user-interface defines a SYNTAX for how you can interact with the system.

Therefore the SYNTAX -section of our documentation is there there to describe how users will and can INTERACT with the system. It is important to describe this 'boundary' of the system separately from what is inside it, to keep it not too dependent on how it is implemented.


The actions that users can perform on a GUI, or on a command-line have some MEANING, called its SEMANTICS. That means (pun intended) what those actions cause. What the user hopes to accomplish with them? What is the intention of the user, when activating certain UI controls?

For the user to hope to accomplish something by some action, they need a "mental model" of the concepts they are manipulating by their actions. That mental model, the available actions on it and expected results CREATES meaning, the semantics, of the user-actions.

Syntax describes what the user does or can do. Semantics describes why a user would do it.


So we have a language, described by both its syntax and its semantics.  But who understands that language?  The part of the system that reacts to the user interactions, implemented as code, is the part that 'understands' it. We call it the INTERPRETER.

We use the term interpreter here in a more general sense, than parser/lexer/interpreter/compiler used in computer science. Systems INTERPRET the messages they receive BY REACTING to them.

Think of calling a function or procedure as a linguistic act. It transforms the function-call to another form, consisting of other calls to other functions.  Thus executing a computer program can be seen as a continuous, recursive process of interpretation.

The end-result of interpretation must be some way of arriving at the "meaning" of the commands used by the user.  The system however does not need to produce some other final representation of the meaning. The meaning of the commands is really what they do, how they are executed, what is their effect.

Thus, meaning is born by the fact that the system reacts in a specific way to user-inputs, and that the user expects it will react that way.  The part of the system that produces these reactions is the code that reacts to the inputs.  In our paradigm we call that code the 'interpreter'.

In summary the meaning of user-actions is defined by their effects, and results.
  • Syntax      =  What actions user can do 
  • Semantics =  What effects user-actions have


You've gone through three out of four sections of the documentation. But nobody has even told you why the system exists at all. What are the benefits of it?

Maybe you can infer some of those benefits by having understood what a user can do with the system (SYNTAX), and how the system will react (SEMANTICS). But shouldn't we also tell WHY the system was created? Yes.  But not in the first 3 sections. Why not?  Because REASON the system was built is not PART of the system.  But, describing why our system exists is a relevant for understanding it.  Therefore that is explained in the META-section of the documentation.

The META -section is information "about" the system  like why and how the documentation was created, which means describing why the system was created in the first place. It includes project plans, procedures, methodology, history, personnel, cost-benefit analyses  etc.

Our purpose here is to come up with a rationale as to what information should be put into each section of documentation. Their order does not matter so much - except to make clear that META -section differs from others on a conceptual level.  The META -section is not a 'blueprint' of one part of the system. The system does not have a PART called 'meta'.

Meta is information about the system, not part of it. The other three sections SYNTAX, SEMANTICS, INTERPRETER in contrast, are all  "blueprints" of the system.

Recursive System Descriptions

One thing to note about the above way to describe and documents systems is that it can be applied recursively, on multiple levels of the system. The INTERPRETER is the part of the system where most of its work gets done. It is typically implemented as a set of interacting SW-modules.

But each such module can be described as a system of its own, with its SYNTAX, SEMANTICS, INTERPRETER and META. The SYNTAX of a software module describes its 'methods' and the data-structures they consume and produce. It SEMANTICS is described by telling for each method how its results related to its arguments, and what side-effects it has.  The private sub-modules inside a module, are its INTERPRETER.

© 2013 Panu Viljamaa 


  1. Hi, Panu,

    This seems to approach documentation from the wrong angle. Document *what*? If we're documenting requirements, then I think the right tool is use cases. They're a good way to create a framework to document the discussions that take place around requirements. If it's to document the architecture, I'd use patterns — but I realize there are lots of other things that work.

    The tool is not the the thing. Documents are a tool. As Japanese temple builders build their own tools, so should software teams. Maybe language-based documentation is one tool in one's toolkit, but I think it's only one tool. It would be more useful to better contextualize its advantages and liabilities. How would you apply your taxonomy to a ballistic missile delivery control system? I think you have a narrower context than you realize.

    I never adopt any technique unless its advocate offers a good comparative critique of it.

    The ball's in your court.

  2. Thanks for your comment Cope.

    My blog-post is not a concrete suggestion that all actual documentation should be neatly divided into the 4 sections described. It is more a viewpoint, a way to think about documentation in general.

    What are the common characteristics of (computer) SYSTEMS. How should we describe them? It can also offer a checklist on documentation, where do I find the listed facets of SYNTAX, SEMANTICS, INTERPRETER, META. Are those facets spread all over the documentation, or is it easy to know where to find them? Are they all present in some form?

    In practice we use things like JavaDoc for documenting java method-APIs. There it makes sense that both the SYNTAX (= type-signature) and SEMANTICS (what the method does) are described in the same place. But that is probably because the a single method or class is a very small system. For a larger system like a Java package it may make more sense to divide the documentation into what the package does in general, and separately what are the detailed interfaces for interacting with it.

    Methodologically I believe it is important to make sure the INTERFACE (= Syntax + Semantics) is described separately from the IMPLEMENTATION (= 'Interpreter) . That is naturally the idea behind Java (etc.) Interfaces also. Interface is a contract, that should be documented separately from how the parties involved actually full-filled their obligations.

    Use-Cases describe how users interact with the system. What are the actions they can take. That describes the interaction, the language used between users and the system. What do those actions MEAN is usually also documented in the user-case. Like with Java APIs it may make sense that both the syntax and the semantics are described in the same Use-Case.

    If we choose Use-Cases as the tool, we should make clear to its users whether we expect use-cases to describe both the concrete form of user-communication (SYNTAX) or also the semantics of the changing state inside the system boundary.

    What my article perhaps was not quite clear about is that I don't really advocate that documentation for every SW system should consist of 4 chapters which I call SYNTAX, SEMANTICS, INTERPRETER and META. What I'm stating is that ALL those 4 facets (or 'aspects') of any system should be documented, and it should be easy to verify that, and find where each of them is documented.

    The "Linguistic Viewpoint" emphasizes that the main thing about "systems" is that we need to communicate with them, in some medium or 'language'. It's good to pay attention to both the syntax and semantics of user-interaction and keep such documentation separate from things that are "about" the system, not part of the system as it exists when released.

    In summary the 4 facets I described are the ones I would like to (easily) find in any system's documentation, and at the moment I don't see any part of documentation that would not fit into one of these 4 categories. A critical part of a system-description is naturally to document how the different aspects relate to each other and why. Therefore we can not completely separate the presentation of the 4 facets from each other.

    Ballistic Missile Control System:

    There needs to be a language with a specific syntax the operators use to interact with it. It is especially important in such a critical system to pay attention to that language, so we can be sure it is not prone to user-errors or mis-interpretations.

    It is critical to be clear about the side-effects of pressing the red button.

    It is critical that we describe and understand how the system implement\ts/interprets the commands it gets from its users.

    This includes things like its version-control system so we can go back and track bugs as they accidentally are added to the system.

  3. A note: Even JavaDoc documents methods by first presenting their type-signature (= SYNTAX).

    The method-comment under the type-signature tells us the SEMANTICS of what the method does, why you would use it.

    There is some META-information perhaps, like examples, or description of performance characteristics. Could be last-modified-date.

    There is no 'picture' of the method implementation in JavaDoc. But there COULD be a hyperlink to its source-code (= INTERPRETER)