An outsider's perspective: 2015

Sunday, February 22, 2015

More thoughts on TPF

I thought I would take the opportunity to shatter a few other myths about TPF and Assembler. Let me start with a disclaimer. My last experience with TPF was in 1992 while working on the Confirm project, before it was run into the ground by Max Hopper and his side-kick Dave Harms. That was my escape from TPF so if things have changed since then, so be it.

TPF code has to be re-entrant

It's taken as gospel that code written for TPF has to be re-entrant and as part of this myth it is assumed that writing such code is difficult. Since TPF never interrupts a running ECB and switches to another ECB, code does not have to be re-entrant. It does, however have to be serially reusable between implied wait SVC calls. This means that you can, if you should so desire, store variables in the program space as long as you don't expect it to survive an SVC call. This means that it is also possible to write self modifying code, as long as the code is corrected before an SVC. This is, of course, appalling practice, but I have seen examples of this. One common example is that because the usage count of a program is maintained in the program header, it is sometimes incremented by the program to ensure that it does not get unloaded. It does, however have several drawbacks. Because programs are modified, it means that any attempt to load programs into storage which can only be read by ECBs would cause programs to fail. In ideal operating systems, all programs would be loaded into read only storage so they cannot be modified, either deliberately or inadvertently.

Since it is the responsibility of the Operating System when to load and unload programs, the information it uses to determine that, i.e., the usage count should be maintained in protected storage available to the OS only. While working at Danbury on the implementation of 4K blocks and Centralized List Handling (CLH) this was a change I tied to make, but it was determined that we would break too many badly behaved programs so it was dropped.

Structured Code is always less efficient

I regard this as a classic case of muddled thinking. Whenever I have found that it is hard to write structured code I have always discovered, after much thought, that it's because I was trying to solve the problem incorrectly. If you adequately define the problem, the solution can always be structured nicely. This makes it easier to follow. It is true that using the Structured Programming Macros (SPM) did, on occasions lead to branches to branches to branches as the indentation unwound. However, if you structure the code correctly, then you can be sure that this happens in the least used path. I remember that we rewrote the CPU loop in SPM as a part of CLH. The TPF bigots were appalled that the code eventually led to 4 branches being executed in a row. This was used as a classic case of how inefficient SPM was. What the simple minded failed to realize that while this was indeed true, it occurred in a path that ended after the final branch to an LPSW that loaded a wait state. Thus it is true that 3 unnecessary branches were taken on route to doing..... nothing!

But this only leads to a further conclusion. It is often the case that code written in a high level language can be more efficient than code written in assembler, Someone on a thread in TPF'ers said that some of the best assembler code he had ever seen was generated by C/C++ compilers. This is to be expected, even if it is counter intuitive. A compiler, looking at the big picture wouldn't generate branches to branches - it would generate code to go directly to the end. But this would be done without damaging the structure of the code.

Let me give another example of where a compiler can generate better code than a good assembler programmer.

*       It's been so long, forgive me if I make some syntactical errors
*       Some Return codes - defined in some macro somewhere
OK      EQU   0
BAD     EQU   OK+1
VBAD    EQU   BAD+1
.
.
.

* Some code to process the return code that is returned in register 0
        CH    0,=AL2(OK)
        BNZ   ERROR
...
ERROR   Code continues.

This code is trivial, but it exemplifies some good practices. The values OK, BAD and VBAD are not hard coded but EQU is used. This means that if you change the values, the code will continue to work as intended, always assuming that the code setting the return code is using the same set of equates. But, I hear you say, it would be more efficient to use LTR 0,0 to compare for zero. Indeed it would, but if you do that, then the code breaks if the value for OK is changed. But now let's look at what happens in a high level language.

    enum { OK, BAD, VBAd };
.
.
.
    if (return_code == OK)
.
.
.

In this case, the compiler, knowing at compile time that OK is equal to 0, can generate an LTR itself, but if you change the enum, the code generated next time will be different. But it doesn't break. This is a trivial example but if you extend it to situations where the compiler might need to multiply a number by a power of 2, since it knows that at compile time it can generate a SLL rather than using a MH. You can't do that in assembler because if the number you wish to multiply by changes from 2 to 3 due to an unrelated change in a DSECT somewhere, the assembler code would break.

There are many more examples where a compiler can generate more efficient code. Consider if you wish to test a condition and do one thing if it's true and another if it's false. Over time, it's conceivable that mods are made to both paths independently. It's conceivable that over time, both paths contain identical pieces of code. Because the compiler is looking at the bigger picture, it could see these common paths and extract them into the common path. Remember that code that looks totally different to you might look identical to a compiler. It's even possible that you might notice the similarity in the assembler code and extract it yourself. But if you did that you would be falling into the trap of "coincidental binding". But that would be a mistake because the code would be extremely difficult to follow.

It is for reasons like this that compilers often have a switch to inhibit optimization for debugging purposes. Sometimes the code generated by compilers can be extremely difficult to follow due to optimization. But it's very useful to have in production.

4K blocks

When we implemented 4K blocks, I fought very hard to make the block 4096 bytes long instead of 4095 bytes. My thinking was that at some time in the future, someone would realize that it was ridiculous to split a program up into components that would each fit within a 4K block and instead it would be possible to write a program many kilobytes or even megabytes long and simply load them into multiple 4K blocks and using the virtual addressing make them all appear to be contiguous. I envisioned systems that instead of constantly fetching programs from DASD they would be completely loaded into RAM at boot times. Even though this wasn't possible at that time, I thought limiting the block to 4095 instead of 4096 just made absolutely no sense, but such was the shortsightedness of the TPF development team at Danbury and of Bob Dryfus in particular, that the blocks became 4095. It's always puzzled me why TPFers in general have this tremendous instinct never to look forward and always hanker after the past.

I don't blame the people who designed PARS back in the 1960s. The world was very different then and the art of software was very undeveloped. And computers were so lame back then and memory so small that every byte used or instruction executed mattered. But today things are different. In the next room I have a PC server with 24 GB RAM and dual quad core processors. We buy disks with capacities measured in terabytes that fit in the palm of your hand. My guess is that the entire program base of a TPF system would fit easily on to a single SSD hard drive. It simply doesn't make sense to continue to use 1960s or even 1980s software technology in 2015.

Saturday, February 21, 2015

A rant about TPF

In a post on Facebook in the TPF'ers group, Jerri Peterson said "Assembler was always my true love. How do programmers truly know what they are doing if they don't understand assembler!????" While it is certainly true that to understand fully how computers work, a knowledge of an Assembler like language is essential. However, a knowledge of Assembler is far from sufficient to become a good programmer. The experience of TPF shows that there are large numbers of people familiar with Assembler who have very few skills when it comes to programming. One of the problems with the TPF mentality is that an obsession with writing clever assembler was considered vastly preferable to a thorough understanding of algorithms. Everyone always raves about how efficient TPF is and quote large numbers of transactions per second as proof. Of course, what few understand is that the TPF idea of a transaction would be considered laughable by most in the IT business. MD, MU, 1* etc are all considered by TPF to be transactions. When building a PNR each entry, i.e., the name, address, telephone number etc. is considered a transaction. In the real world, only entries like ET are counted as real transactions because that is when some permanent mark is left in the database. But it doesn't matter because it is essential indoctrinate the new TPFer into the infallibility of TPF. The reality is somewhat different. Some of the worst examples of programming I have ever seen have been by people in TPF assembler.

Let me give a couple of examples. Years ago, while working on Amadeus in Miami they were having trouble running schedule change. This was in testing and it all worked, but when tried with typical volumes that would be required in production, each nightly run was taking 40 hours. This is clearly unacceptable. The System One assembler gurus pored over the code and were just unable to tweak it. It was all written in assembler and it couldn't be improved. IBM introduced someone from DC who studied the program and went away to rewrite it in PL/1. The TPFers scoffed at the very idea. But he came back with a program that worked and ran in , as I recall, about 6 hours. So how did a program written in PL/1 manage to defeat an extremely efficient assembler program? The answer is simple. It used intelligent algorithms. The assembler program maintained the schedules as one huge array in RAM (called core in those days) and since this was for many carriers, the size of the array was measured in megabytes. This was back in the day when 64 MB of RAM was an amazing number. Every time an entry was inserted into the schedule, all the schedules below the insertion point had to be moved down and when a schedule was deleted, all following had to be moved up to fill the gap. Even though it was written efficiently, the cycles required to move vast amounts of data slowed the program to a crawl. So instead, the PL/1 program maintained a slice of storage for every schedule but instead of moving the data, he maintained a chained list. Each item in the list maintained a pointer to the next item and a pointer to the previous item. So now, to remove an item from the list he simply ran down the chain till he found the item that needed deleting and made the previous one point to the next one and didn't move any data. Similarly to add an item, it was added to the end of storage and the appropriate pointers were updated, This wasn't particularly clever, but it demonstrated that really thinking about what went on was much more important than minimizing path length.

Before I even came to the US, back in the mid 70s I was working for a company which had won a contract to produce for a large national company a network that would allow all of their terminals to access any of their mainframes. We developed an X.25 based protocol. Each computer on the network either interfaced with terminals or with mainframes or was just a node for routing packets. The contract called for the nodes to be Ferranti Argos mini computers which booted from cassette tapes and had no other peripherals, just RAM. The project called for these nodes to be capable of switching packets at, as I recall, 240 packets per second. The project was staffed by some TPF people drawn from British Airways and from non TPF people. I was a rarity in that I had had a lot of non TPF experience but following the merger of BEA and BOAC had moved into TPF. Thus I had experience from both sides. It was a condition of the contract that all development be done in a high level language and that it follow good structured programming guidelines. As you can imagine, it was a terrific battle to get the TPFers to embrace the notion that the most important thing was to write good, structured code and that performance would be looked at later, after it was working. After the project was complete and was working we then set about testing throughput. We did this by arranging 4 nodes in a square and injecting a message into a node destined for a remote node. The node would do its routing and decide where to route it but then the code was patched to send it to the next one in the square. This way we did all the routing work but the packet would end up flying round and round in circles. Periodically we would inject another packet into the node until the node couldn't handle the load. Since we were counting the number of packets switched per second we were able to ascertain that number after the test. From memory our first attempt yielded somewhere around 80 packets per second. There was much "I told you so" from the TPFers but what we did was add a trap into the timer interrupt to record the equivalent of the PSW at every timer interrupt. Examining this later allowed us to draw a bar graph of where most of the CPU was going. This was a message passing OS and so to communicate between separate processes a message was copied from the memory of the sending process to the memory of the receiving process. This routine was written, like the rest of the OS in a high level language. We changed this one routine to assembler changing it from a for loop to the equivalent of a BCT loop and reran the test. The next test showed packets per second of around 160. Repeating this process to eliminate the next 2 or three significant bottlenecks meant that we easily and quickly met our targets and without destroying the integrity of the coding.

It may have been necessary, back in 1960 but, TPF has being using obsolete techniques for decades and has steadfastly fought any efforts at rational design.

There is an insistence withing the TPF community that everything must be done incorrectly. I remember having huge arguments with the people at Danbury over the ATTAC, DETAC macros. They decided, in their infinite wisdom that if you attempted to DETAC a level that was not currently in use, it would issue a dump. Not only did this cause unnecessary code, because the OS had to check for it instead of just making a copy. This also caused unnecessary programming in programs that were used as utilities. A program that could be called by any program in order to perform a task couldn't just DETAC say level 15 and then ATTAC at the end. Instead it had to check to see if there was anything on level 15. If there was it could DETAC, but if not it needn't bother. But now it has to remember whether it did a DETAC or not, so it has to set a bit in the new core block it acquired for its work so that it knows whether to do an ATTAC at the end. Pushing and popping from stacks are fundamentals in Programming 101 and yet they were not understood even in the 90s.

There are other examples of obsession with efficiency causing problems. I remember an old man in Danbury, we'll call him Frank, who decided that a STM or LM was OK if you were doing more than 4 registers, but for 4 or less multiple Ls or STs were quicker, so he went through the CP and changed them all. What he overlooked and we found out the hard way was that:

         LM      1,4,0{1) is NOT the same as
         L       1,0(1)
         L       2,4(1)
         L       3,8,(1)
         L       4,12(1)

It's encouraging to see TPF embracing C but let's remember that C is now hopelessly obsolete. Object Oriented programming is now universally understood. Anyway, that's the end of my rant, Your mileage may vary but I really wish people would learn that the ability to develop a new system quickly and reliably is much more important than clever assembly programming.