Sunday, February 19, 2012

Hacking a circular dependency away

This is a horror story on how messy a SW design may be.
Particularly sensitive people are advised not to read on! ;-)

Hello guys,

this week I finally managed to enable Super Commutative Algebras (SCA) for spielwiese. And that’s with the approximately 1 year old fix of the general non-commutative (NC) stuff... and even at that time it was more of a hack. So what’s the matter with NC and SCA?

Recall that in the legacy Singular all this stuff was packed together into the core, therefore it was kind of Ok to have circular dependencies between subsystems, e.g. between NC and GB (Groebner Bases).

During the refactoring Singular effort, NC went to libpoly since it provides the low-level multiplication and other basic routines, whereas GB stayed at libkernel. The point is that they (unfortunately) depend on each other in the following way:

1. InitNC (an entry part of NC) is widely called in libpolys

2. InitSCA (called by InitNC) needs NF (from GB)

3. InitNC and InitSCA set a ring member as a pointer to a function like gnc_GB/sca_GB

4. the later functions (*_GB) and NF are implemented within the GB framework and absolutelycannot be separated from it,

5. clearly GB is based on libpolys (for doing arithmetic) and thus implicitly use basic NC functions

Since separating libpolys (containing polynomial arithmetic basics) from libkernel (containing higher level algorithms) was among of our goals for refactoring Singular we cannot simply pull GB down to libpolys. Thus we have got the following picture: on the very bottom libpolys widely uses InitNC, which needs *_GB and NF, which are defined on higher level in libkernel within GB, which in turn depends on libpolys. Thus: lower layer depends on higher and of course other way around, i.e. a circular dependency between layers!

In order to make this design to work with pure brute force we declare all needed functions (NF, *_GB) in libpolys. Of course, its test suite cannot link now due to undefined symbols.
Thus we add dummy definitions to each test unit. Yey! This makes the test suite pass, since we luckily don’t test any NC stuff there :-/

That must be all to it!? Wat! What’s that? Now we cannot link the static test suite of libkernel! And liner complains that *_GB functions are missing! How so? Well the main point is that that those *_GB function are _never_ called in libkernel itself! Hence our clever linker discards them while linking the static test units against libkernel and miss them while linking against libpolys, as we, of course, pass “-lkernel -lpolys” to it.

Unfortunately, forcefully adding libpolys to libkernel (with something like libkernel_LIBADD = did not do the job :(

Well, the other usual trick in this situation is to duplicate the library usage specification and make linker to consider it twice (remember - we are linking a static binary!). Great - using “-lkernel -lpolys -lkernel” actually works from command line, and if we proceed we can also link Singular static binaries by hand. Moreover, this trick will have to be done whenever linking against both libkernel and libpolys.

Oh, well, anyway... Are we done? Not quite :’-( “Luckily” we use autotools & libtool - so we start changing those LDFLAGS/LDADD for test units, and... Hey!? Hello again? As it turns out our libtool removes library duplicates... The solution is easy, we add “LIBTOOLFLAGS = --preserve-dup-deps” to our, and... Hey!?? Hello again?? Well, as it turns out libtool is clever enough to detect and remove these “unneeded” library duplicates even with that option!!!

Now we are desperate enough to try to forcefully pass to linker something like --start-group/--end-group or --whole-archive/--no-whole-archive. Alas, no luck!

THIS IS THE END OF THE WORLD!!!!... of NC/SCA-Singular!?

Nope. The solution is right under our nose: we may fool the linker into believing that those functions are used by libkernel using many tricks. My way was the following: recall that the *_GB functions were needed for setting a function pointer (for C-style of virtual method overloading). That function pointer will be used by corresponding wrapper NCGB by GB and it must not be NULL it the NC-ring was initialized properly... this wrapper must not be very efficient so we can test for NULL pointer and kind of explain the logic behind InitNC/InitSCA with actual calls to gnc_GB and sca_GB in a never visited if-branch.

Yeah! Unbelievable, that finally did the job!

The main drawback is that whoever links against libpolys (but not against libkernel) has to define (in the main object unit) several bizarre function. Imho, that is ugly! Although, this can be simplified up to including an additional special-purpose header, which does just that.

I can see some alternatives:

1. One “simple” alternative design solution would be to separate “the setting a ring member as a pointer to a function like gnc_GB/sca_GB” artificially from NC and move to GB. Well, i guess this might lead to even bigger mess...

2. A large scale alternative solution would be to separate NC (any maybe the calling parts) from libpolys, GB from libkernel and put them together as a new library in between. Yeah libpolys will still depend on InitNC and other NC high-level functions but one could turn them all into global function pointers... which are initially NULL and are to be set on higher level. Well, imho, it’s not a nice solution but might work.

3. Rewrite the inherited C-like code in C++ using inheritance for NC-struct and virual methods for slow initialization methods.

As a conclusion I'd say that maintaining old and tricky C-like C++ code provides new tricky to solve problems the further it goes... It is worth the trouble to rewrite it in correct C++!