Scala, the next Java?

Software development is increasing in complexity, and not just because the business being automated is getting more complicated. A significant, additional issue is that we are seeing the end of the Von Neumann model, which basically resulted in a mindset around serial processing, single streaming through a single processor. I’m not sure that this is an intrinsic feature of the Von Neumann model itself (although the somewhat related Von Neumann bottleneck, between memory and CPU, is), but it tends to make people think of things happening one-thing-at-a-time as you step through a program.

With modern multicore chips, however, processing should be in parallel, on many processors – cores – inside the processor chip, with many things happening at the same time. Then, with cloud computing, we are are starting to build business systems out of loosely-coupled distributed services, some (possibly all) of which we don’t own and have little control over – and these may be hard to force to run serially. Basically, we need to build systems in a service-based environment where several things can happen at the same time; the real world has been like this for ever; mainframes were working like this 40 years ago but it’s a rather new concept in the PC world.

There are various ways to make things look like they are exploiting parallel processing in the small scale in modern environments but these don’t scale well. You may have four cores in your PC and you may break up your program into modules that can potentially run on all 4 cores but you usually find that, most of the time, 3 of them are idle, waiting for some critical operation to finish on the 4th – and, since the power in your new PC comes from additional cores rather than faster cores, you may find that buying a more powerful PC doesn’t speed things up.

What we really need are different approaches specifically expecting modern computing architectures, not serial approaches bent to fit it. One example of such an approach is Pervasive Datarush which exploits something called dataflow programming for processing large amounts of data in a highly parallel way. Another approach uses something called functional programming.

However, even if programming models such as those behind conventional enterprise development environments like Java (and even Ruby), and their frameworks, aren’t coping well with the new environments at large scale, being practical, you will want to protect your investments in Java programmers and training if at all possible; and, ideally, run your parallel-aware code in a standard Java Virtual Machine (JVM).

So, Datarush, for example, is invoked from Java and the main subject of this piece, Scala is a purpose-designed language that extends Java to deal with these new environments using a hybrid functional programming approach (and it has even been noticed by business-oriented magazines like Forbes).

Scala is sometimes thought of as just a new language, but it’s more than that. It was designed by Martin Odersky and his book, Programming in Scala, is here. Martin wrote Oracle’s Java compiler, and Scala has a strong and Java-aware development team behind it. However, it is really a “software stack”, all the parts of which, at several levels, aim at simplifying the development of systems that operate with high concurrency (that is, with processes running in parallel), at scale. It uses concepts from functional-programming languages like LISP and Erlang that make it easy to avoid the deadlock and race-condition issues associated with “shared state” (which appear as program bugs when processes running at the same time on different processors interfere with each other).

The rest of the Scala stack includes Akka middleware, which is an event-driven framework for large-scale distributed cloud applications; Play, which is a web framework with a Railsdeveloper experience for the JVM but which scales better than Rails, using the Akka framework’s functionality. Scala provides all this in an Eclipse developer’s environment and provides a “simple build tool” for linking modules into applications.

Scala compiles into Java bytecode for a conventional JVM, so it fits well into a conventional enterprise Java-based environment, but it exploits the Java platform even better than Java does: it’s more productive (often only half of the source code or better) whilst still retaining Java’s current performance. Akka and Play have both Java and Scala APIs, so you can write in Java instead of Scala if you want to, which makes it easy to adopt and protects an investment in Java libraries, tools, and programming skills (although it does mean that you’ll probably need to actively persuade programmers to adopt the new approaches).

Scala and Akka are being developed by Typesafeand were first released together in May 2011; in 2012, they’re available together with Play and the management console (and Typesafe is working on a new framework for database connectivity, which should be available later in 2012). You can find the Scala stack here; it’s developed with a “commercial open-source model” (of the sort used by Apache, RedHat and JBoss) and the Scala community is here. A paid subscription, which adds support, maintenance and the management tools and console needed for enterprise use, is available.

Scala is being used today by several well-known applications with extreme scalability issues, such as Twitter (it found that Ruby didn’t really scale far enough); and LinkedIn (using it to cope with with 400 million plus transactions a day). There are other large commercial applications and Scala is becoming accepted as a powerful and innovative development environment that extends Java with functional-programming concepts. Perhaps Microsoft’s F# functional programming language for .NET is a competitor – but most enterprises already develop core business applications (as opposed to, perhaps, user desktop interfaces) in Java.