PanuLogic Software Development Blog: March 2014

Monday, March 24, 2014

Why Enterprise Java Beans?

I've worked with Enterprise Java Beans (EJBs) for a long time. They are the main building-blocks of Java -based web-applications. Yet I never spent much time thinking about why they were needed. Maybe because it seemed difficult enough just to understand how to use them.

An obvious alternative to EJBs is direct SQL-calls to the database, via JDBC drivers. On the legacy system I was working on some programmers had used direct JDBC calls, some used EJBs. What was the benefit of each approach? It seemed the benefit of using JDBC was simplicity. What was it for EJBs?

I. HOW I CAME HERE

We started a project to upgrade an application from JBoss-4 to a newer version. But rather soon this started to seem a bigger project that we had hoped for, because of the difference in XML-dialects which define the EJBs on JBoss-4, vs. later versions.

The XML-definitions should have been backwards compatible of course, but were not. Even though there is an evolving standard for Java EE, all application-servers add their own XML definitions for server-specific configuration of them and EJBs that run on them. The standard itself may have evolved in incompatible ways. Or perhaps a newer JBoss version is enforcing it more strictly. This seemed to be the disadvantage of EJBs.

Trying to start the new JBoss with the old application gave obscure error-messages we couldn't make sense of. Remedies for some errors were found on the web, but other errors remained. And new ones seemed to emerge every time one was solved.

It was unclear not only how to get rid of the error-messages, but also how much time it would take to get rid of all of them. And it is bad to start a project whose duration you can't estimate.

A JBoss installation has two main parts: A web-server which on JBoss-4 is Tomcat, and the "JBoss proper". Both of these are large systems on their own. To understand and tune the whole system you also need to understand how they work together.

Would it be possible to run the system JUST on Tomcat? That would cut (at least) in half the area of expertise needed to maintain and port the system from one platform-version to the next, now and in future.

But if we used only Tomcat we couldn't use EJBs. They only run on 'true application servers', like JBoss, WebSphere, GlassFish, etc. They do not run on 'mere web-servers' like Tomcat.

Did we really need the EJBs? They were used heavily by the application in question, about 100 EJBs in total. Would it be possible to get rid of them and replace them with something simpler?

And only then, after working with EJBs so long I started to see their value. But I also started to see they could be replaced with Plain Old Java Objects (POJOs) - IF we understand why EJBs are needed in the first place. So, after this lengthy but necessary introduction let me try to answer that question.

II. WHAT EJBs DO

There are three main types of EJBs: 1) Session Beans, 2) Entity Beans 3) Message Beans. Rather than describe each on its own, it helps if we can first see what they all have in common, before looking at their differences.

What is common to all bean-types? That is the discussion I haven't found in any book or article so far. Which is perhaps the reason it took me so long to understand what they all are about.

Clearly they are all part of the same standard. That is one thing they have in common. They are all programmed in Java AND XML, or alternatively in Java and Java annotations.

But they seem very different in purpose. Session Beans help you deal with user-sessions on the web. Entity Beans help you deal with database interactions. Message Beans help you deal with batch-processing jobs. Is there really any common purpose to them? Were they grouped together simply because we like the word "bean" so much?

Here's the one thing in common about their purpose: All EJBs help implement persistence.

Entity Beans persist your data in the database. Session Beans persist the memory of user actions during a web-session. Message Beans persist tasks the user may initiate during the session, so they can be performed later.

The REASON we need help in persisting things is that web is a stateless protocol. It is just a set of requests and responses. Enterprise Java Beans allow you to implement (the three types of) state on top of the HTTP -protocol.

Example: You read the user's name from the database when they log in. You may need this info several times during the user's session. But you don't want to query the database every time you need it. Entity Beans cache the information in memory when it is first read from the database. The database is queried only once. That allows your application serve a much larger number of simultaneous users than it could otherwise.

Session Beans persist information that can be forgotten after the user-session is over, or that is perhaps saved into the database (via an Entity Bean) once the user commits to their purpose.

Message Beans persist information about what tasks need to be performed later. For instance large numbers of users might input a lot of data into the system, which we want to analyze later when the system is less busy. We may want to run the report-generator once a night.

III. IS THERE A BETTER WAY?

As described in the introduction we had a large number of EJBs in the legacy application, and they seemed to make it practically impossible to port the application to the latest JBoss version.

The system contained large number of interactions with these EJB classes, which could not be ported to Tomcat, because the EJB-classes are dynamically generated by JBoss. The exact XML-instructions for generating such 'Proxy classes' were only understood by JBoss-4, not by later versions of it.

But after analyzing how this particular application used EJBs, we realized it was relatively straightforward, although still tedious, to write our own Java classes which would replicate the functions of the EJBs.

Difference was that instead of writing XML to tell JBoss how to create the bean-classes we wrote the bean-classes ourselves. Which made it trivial to understand what they do, how they talk to the database, and how they persist data during the session to avoid unnecessary querying of the database.

The alternative mentioned in the beginning, direct JDBC calls, is still a possibility. But it doesn't take care of the caching aspect. That might mean every programmer takes care of caching in their own way. If instead we give them a common set of POJO-EJB classes (PJBs?) all team-members can take advantage of caching that comes with them. They don't need to know about SQL or the database-schema, they just need to be aware of the PJB -classes provided.

The system now runs on plain Tomcat, and we hope it will perform better because of it. At least Tomcat starts much faster than JBoss. If there are issues, we can always dive deep into the code to see where it could be optimized. It is much simpler to maintain and debug.

IV. SOME ADVICE

My intention here is not to bash EJBs or "app-servers" but to explain what they do. But once you understand what they do it is relatively easy to write something similar yourself, with Plain Old Java Objects (POJOs).

Newer versions of the EJB standard allow you to replace XML with Java's @-annotations. If you do that it is more likely your system will run also with future versions of different application servers. You get to keep the Java-code and annotations in the same source-file which makes it easier to understand how they work together.

But annotations are still much like XML. You can only assume their effect is what you think. Is it the same on every app-server (-version)? You can't be 100% sure. Whereas if you write plain Java code yourself, you can always read it and debug it, and SEE what it does.

Avoid using any features specific to your application server, even though that might be tempting at times. It would make it difficult to port to a different but better platform in the future. If you are a project manager it may be hard to control what your programmers do, how tightly they get coupled to a specific app-server. That is one reason to mandate the use of a simpler platform, a plain web-server like Tomcat.

If you opt for a full application server instead, consider compiling the app-server from the source. Then you can at least debug what it's doing, and possibly even fix some bugs yourself.

But be prepared for complexity overload. Application Servers are powerful, complicated, dangerous beasts. It takes a long time to tame your specific tiger. Choosing one over another is a long-term investment which requires continual re-education from you and your team.

Monday, March 17, 2014

A Question about Java 8 Lambda Expressions

Java 8 is out. Its big new feature is "Lambda Expressions". I've done some reading on them to prepare myself having used 'closures' in other languages.

The best way to learn is to ask questions, and I have one about Lambdas. Why can't the syntax be simpler? The best way to learn is to ask questions, so here we go.

I will discuss below a slightly modified version of a Java 8 lambda-expression example from http://www.oracle.com/technetwork/articles/java/lambda-1984522.html. I've modified parts of the example to make it shorter, to keep the focus on the syntax.

1. The Question

To create a lambda-expression you must first DECLARE an interface that describes it (unless someone else has done that already). In the linked-to example it was called HelloService. I renamed it to HelloInterface, to be clear it is a Java "interface", which is one of the key concepts to understand about lambda expressions. It is defined like this inside the class Hello:

public class Hello
{
interface HelloInterface
{ String hello (String fname, String lname);
}
...
}

This 'target-interface' of a lambda-expression needs to be a "Functional Interface", meaning it must define a single abstract method.

That is the point of lambdas, to make it unnecessary to write a whole new class when you only need a single method. But because Java is statically typed, you still need to declare the interface of the lambda to indicate its argument- and result -types.

To CREATE a lambda that conforms to the interface above you write a lambda-expression like this:

HelloInterface myLambda
= (String fname, String lname)
-> { String s = "Hello " + fname + " " + lname;
return s;
};

You can see the expression after the '=' looks kind of like a function, without a name. Which is what Lambdas are!

After creating the lambda above you USE it like this:

String s2 = myLambda.hello ("Will", "Smith");

Now comes my question. The "functional interface" needed to create a lambda has a single method. Then WHY do we need to give a NAME to that ONLY method? Wouldn't it be easier to write:

interface HelloInterface
{ String (String fname, String lname);
}
...

String s = myLambda ("Joe", "Doe");

2. The Answer, kind of

From http://download.java.net/jdk8/docs/api/java/lang/FunctionalInterface.html we can read: "... Conceptually, a functional interface has exactly one abstract method. Since default methods do have an implementation, they are not abstract".

To understand the above you must know about another new Java 8 feature: Default Methods. These are methods you can define for an interface. You couldn't do that before Java 8. If a class implementing such an interface does not provide its own implementation for a default method, it gets the implementation defined for the interface. You can read more about default methods here.

THEREFORE it is NOT the case that 'functional interfaces' can have only a single method. It is the case they must have one abstract method. But in addition they can also have some default methods. Therefore when "interacting" with a lambda, we must specify which of its methods we are calling.

BUT: It is the case that for every lambda-expression there must a single abstract method, declared in its functional interface. Therefore we could say: IF you want to call the single abstract method (which is the case most often) THEN it is unnecessary to specify which method you are calling. If such a rule was adopted, we could call it with:

String s = myLambda ("Joe", "Doe");

Maybe there's a reason why that couldn't be done in Java 8, but I don't know it at the moment. But that's OK, I'm learning new things daily.

Friday, March 14, 2014

Does JavaScript have classes and methods?

You sometimes hear that JavaScript is not an Object-Oriented language: it is Prototype-Based. Which means any object can be used as a prototype for further objects. Therefore, there are no "classes", you say. If you want "classes" you must create your own, or use some existing O-O framework built on top of JavaScript. In the following I will argue this is not the case, really.

So what is a "class" in languages like Java, Smalltalk, or C#, considered O-O languages? A class is something that does two things:

1. It can create new objects, called its "instances"
2. It determines the common properties its instances will have.

So do we have something that does those two things in JavaScript, with no libraries added? Yes we do: The "constructor". When we call:

var myBook = new MyBook("Title of my Book");

we are creating a new 'instance' of MyBook.

It is the constructor 'MyBook' that creates the instance. When it does that it assigns some common properties, or even "methods" to the instances it creates. It might be written like this:

function MyBook(bookTitle)
{ this.titleVAR = bookTitle;
this.title = function ()
{ return this.titleVAR;
};
return this;
}

So from the above we see that constructors both create new instances, and determine their common properties. Therefore we say:

Constructors are the Classes of JavaScript

But what is a "constructor" really? Any function in JavaScript can be used as a constructor by simply prefixing a call to it with 'new'. From this we get the perhaps more interesting result that:

Functions are the Classes of JavaScript

When you think about it it makes sense. To create a new instance and give it some properties you need some executable code. What is the construct in JavaScript for creating units of executable code? Of course, Functions.

From this perspective JavaScript starts to look quite Object-Oriented. What about the final defining feature of O-O, objects having 'methods' ? It is easy to see from the example above that the constructor 'MyBook' creates a new object which will have the property 'title' whose value is a method. So in JavaScript:

Properties whose value is a Function are 'methods'

There is one more feature of JavaScript that makes its objects "Objects" instead of just data-structures whose values can be functions. That is the special behavior of the keyword 'this'.

You can see from the code-excerpt above what happens when you call myBook.title(). It executes:
...
return this.titleVAR;

When this happens the pseudo-variable this is bound to the object stored in the variable myBook. The call will return the value of the property titleVAR of that specific MyBook -instance. So we are not simply calling a function named 'title'. We are calling the method 'title' of a specific MyBook -instance.

We said above "Functions are classes". We can now see that all of the following are true in JavaScript:

Functions are not methods.
Functions implement methods.
Functions are the classes.
A Method is a property whose value is a Function

We have 'classes', and we have 'methods', which are something more than Functions. This makes me say: "JavaScript is an Object-Oriented language". It is prototype-based dynamically typed Object-Oriented language with functional programming features.

I find this interesting because when I first started working with JavaScript, coming from Smalltalk background, my immediate reaction was "Functions look like methods. Therefore they must be methods". But really, functions are the classes in JavaScript! Which I think is the novel contribution of JavaScript to Object-Orientation.

Monday, March 10, 2014

Objects vs. Functions round 1: Currying vs. Instantiation

This article is about the difference between two Design Patterns: "FP Currying" and "OO Instantiation". This is NOT about comparing the benefits of different programming languages, Object-Oriented, or Functional. We will use JavaScript examples to illustrate both patterns.

CAVEAT: We will use the term "Currying" loosely here, to mean how we would implement "something like it" in JavaScript. It seems the proper definition of currying is calling a function with a subset of its arguments, and then getting back a function that has the previously given arguments fixed to the values you gave them. The proper term for what we do below is "Partial Application" meaning you pass in a function + some of its arguments into a function that returns a function where the given arguments have been fixed to values you gave. You can read detailed explanation of the difference between currying and partial application here.

It is often said the defining characteristic of Functional Programming (FP) is "Referential Transparency", or "immutability". But there is no reason why OO "objects" can't be immutable. And all FP languages must more or less allow things to "mutate", because they must deal with input-output in some fashion. A function that returns some data from the user most likely will return a different result the next time you call it.

For purposes of comparing FP and OO styles of programming I'd say the main difference is this: In FP you create and call individual functions (which can return other functions). In OO you create groups of functions called Classes which you then "instantiate" into Objects.

A well-known feature of FP is "currying" which loosely speaking means you can "fix" some of the arguments of a function to specific values to get another simpler function where those arguments will have the fixed values you gave. This means you don't need to re-enter that same value again and again. "Currying" (or more properly "partial application") could be used in JavaScript  as follows:

function multiply (aNumber1, aNumber2){...}
var multiplyBy2 = curry(multiply, 2);
var six = multiplyBy2(3);

The benefit is that if you need to multiply many different numbers by two you no longer need to pass both numbers as arguments of each call, one is enough. As mentioned the above is more "partial application" than "currying". But both serve the same purpose: Getting a simpler function out of a more complex one, by fixing some of the argument-values. I offer this characterization of the subtle difference between the two: "Currying is automated Partial Application"

So is there a way to achieve similar benefits with OO? Yes. It is called instantiation. You define a "class" with a set of methods and a set of data and then "instantiate" it:

function Multiplier () {...}

Multiplier.prototype.multiply = function multiply () {...}
var multiplier2 = new Multiplier (2);
var six = multiplier2.multiply(3);

Above Multiplier is the class, multiplier2 its instance.

Based on the above examples it would seem currying is somewhat simpler. Fewer lines are needed. But what if you are looking for a solution that handles other numbers besides 2 as well? The situation changes.

A Class is a group of functions parameterized  by the data defined for that class. Therefore we can easily extend that group and add more functions or 'methods' to our class, to handle other calculations. So let's give our function/class a more general name, and add methods for different calculations:

function Calc () {... }
Calc.prototype.multiply = function () {...}
Calc.prototype.divide = function () {...}
Calc.prototype.add = function () {...}
Calc.prototype.subtract = function () {...}
Calc.prototype.raisedTo = function () {...}

Then at runtime we can instantiate that class, with any number we want:

var calcWith2 = new Calc (2);

Feels a bit like currying, right? If we use traditional currying instead to create multiple functions with a fixed parameter 2, we would write thus:

var multiplyBy2 = curry(multiply, 2);
...
var raisedTo2 = curry(raisedTo, 2);

Now imagine you need to handle other number as well: 3, 4, 5 , 99, etc. What's so special about number 2 anyway? Doing that by currying you need to apply currying to each function for each number we want to use as the fixed parameter:

var multiplyBy3 = curry (multiply, 3);
...
var raisedTo3 = curry (raisedTo, 3);

var multiplyBy4 = curry (multiply, 4);
...
var raisedTo4 = curry (raisedTo, 4);

var multiplyBy5 = curry (multiply, 5);
...
var raisedTo5 = curry (raisedTo, 5);

...

var multiplyBy9  = curry (multiply, 9);
...
var raisedTo9 = curry (raisedTo, 9);

Using the OO -style we only need to write:

var calcWith3 = new Calc (3);
var calcWith4 = new Calc (4);
var calcWith5 = new Calc (5);
...

With OO-instantiation we reuse the same parameter, say 4, and get a version of all our math-functions, with a single call to instantiate the class Calc. We don't need to re-curry all our methods for all numbers we need, 3,4, ... 99. The advantage OO has is that the fixed parameter 4 can be shared by ALL methods of the object-instance, by making a single call which creates that instance.

This benefit is not accidental. It is a direct consequence of a defining feature of Object-Orientation: All functions ('methods') of an instance share the same set of data. Another way to put it: By instantiating a class, you are currying multiple functions with a single call.

Which "pattern" to use depends on what you need. If you need a general, reusable, extensible solution, OO-style is the way. If you just want a function that can multiply its argument by 2, currying is simpler. Especially if your language supports it. So I'm not saying "currying is bad". I'm saying it is not missed much in the Object-Oriented way of programming.

There are two additional benefits instantiation has over "currying" (or "partial application"). In a typical implementation of currying you can't curry an argument without currying the arguments before it as well. Can you? Whereas with instantiation you can decide which subset of data-members of the instance you "fix" with non-default values. Secondly if your language supports currying out of the box it can't support default arguments: If you don't provide a value for an argument, it doesn't mean a default value will be used for it. It means currying ensues.

UPDATE: Thanks for several posters at Google+ who pointed out the subtle difference between "Currying" and "Partial Application". I modified the text above to make it clear my use of the term 'currying' is technically incorrect.