FactoringLargePrograms

Last edit March 11, 2010

I've read the C++ FAQs (which are not FAQs at all but questions that should be asked frequently) many times. A great book. Still, there are some points that strike me as plain weird. For example, there is an item approximately like this:

Are small projects good to get your feet wet?: No, often small projects give you wrong intuitions about design. It's hard to say which projects are small and which are big, but a project with less than 10KLOC (LinesOfCode) is almost certainly small and a project with more than 200KLOC is almost certainly big.

Okay, now. I've programmed for five years in a procedural fashion, five years in OO, one year in functional and three years in a hybrid style. I've grokked BehaviorOrientedProgramming, IncrementalDevelopment, ClassInvariants, UnitTests, etc ad nauseum, but I still claim that writing a single application of more than two hundred thousand lines of code is madness. Sure, it can be done. I've seen procedural programs of >200KLOC. With OOP, you can relatively easily go as high as 2MLOC, but you shouldn't. Here is why.

There are many ways to abstract big projects that have many cooperating parts. They use different mechanisms for communication between the components. There are API's (ApplicationProgrammingInterface), ABI's (ApplicationBinaryInterface), streams, API's to streams (like Xlib, CORBA, XDR-RPC, XML-RPC and so on), internal command languages and procedure databases (AlternateHardAndSoftLayers), to name those that come to my mind first. OOP falls in the first class of abstraction, it helps to build API's. (You might claim that it also gives ABI's, but I've learned not to trust this.)

Roughly, the former two (API's and ABI's) are used more in the commercial world, while the latter three (streams and layered languages) are used more in the UnixLike world, and especially, in the FreeSoftwareMovement world. I'm strongly of the opinion that the UnixLike way is better, and I'll try to assess the forces that cause this division:

In the commercial world, solutions tend to be total (not modular), because they are sold in one package and they are not expected to be mixed with other solutions. Interoperability is not only irrelevant, it is often also a threat.
API's are a great way to ensure totalness, because they keep most communication within the application while forgetting the metadata of what the communication is about. ABI's, on the other hand, help to write API-level modules without disclosing the source code.
Character streams are the traditional UnixLike way of abstraction. Even many driver stacks in various kernels are modeled as stream filter pipelines. But in addition to being traditional, they are LanguageAgnostic, UnitTestable, DistributionAgnostic, and they provide StrictEncapsulation.
In the OpenSource world, products are often not made with a specific need in mind (whereas commercial products are designed to be useful / sound useful to the client), but to solve a specific problem. The difference between these two is subtle and often results in commercial products aiming for adaptability, OpenSource products aiming for wide applicability. (This points to another weird Q&A in C++ FAQs, see below.)
In bigger programs, the Unix tradition mandates exposing the internals. Why? Because otherwise the program cannot be used in a modular fashion. Hence AlternateHardAndSoftLayers. This also enables people to follow the Unix philosophy of using the most appropriate tool for each task (low level languages for low level programming, high level languages for high level programming).

The FAQ mentioned above was something like:

Why is it important to handle change?: Changing requirements. Customers change their requirements, and software that cannot adapt to changing requirements is software that does not get buyed.

But in the OpenSource world, you don't have customers. Programmers state the requirements of the program. When they don't meet the requirements of end users, the end users should pick another program. This also makes it important that programmers not change the program's requirements, because then there will be end users for whom the program no longer provides a solution.

Of course, all these methods of abstraction have their uses. For example, API's are great as parts of development environment, such as standard libraries. Besides, the methods can be used to wrap each other. But often, API's are left as the only interface to a commercial product, and this shows in the design so that stream-wrapping or language-wrapping the API would be clumsy.

All in all, the all-object, one-language approach seems to result in solutions that are monolithic, bloated, often also slow (because of the inability / lack of will to program the parts with domain-specific languages), tightly dependent of one application architecture / language platform, mystic and difficult to grasp. Besides, one often ends writing everything from the ground up in the same style because the environment is the same. However in my experience it's best to choose the programming style depending of the size of the subsystem.

Comments?

-- PanuKalliokoski

As a RelationalWeenie, my solution to "big programs" is often to let the database be the BigRiverOfCommunication between many smaller "tasks" or "events". True, this may perhaps bother StaticTyping fans, but it works well in my observation. Your GrokScope is reduced to the current task at hand, shared libraries, and the (good) schema. It allows contractors to walk in off the street and start adding to the project in a few hours. See: ProceduralMethodologies.

Relational databases are a great way of gluing things together. Sometimes, however, you just can't afford DatabaseVendorLock. And I can tell writing a multi-lingual framework in a database-independent way is not nice.

However, relational databases often are better than domain-specific server services. You can always go there with your SQL monitor and see what's happening. The database also stays comprehensible as long as you understand the RDBMS, not only as long as you still understand your own application.