I thought I would take the opportunity to shatter a few other myths about TPF and Assembler. Let me start with a disclaimer. My last experience with TPF was in 1992 while working on the Confirm project, before it was run into the ground by Max Hopper and his side-kick Dave Harms. That was my escape from TPF so if things have changed since then, so be it.
TPF code has to be re-entrant
It's taken as gospel that code written for TPF has to be re-entrant and as part of this myth it is assumed that writing such code is difficult. Since TPF never interrupts a running ECB and switches to another ECB, code does not have to be re-entrant. It does, however have to be serially reusable between implied wait SVC calls. This means that you can, if you should so desire, store variables in the program space as long as you don't expect it to survive an SVC call. This means that it is also possible to write self modifying code, as long as the code is corrected before an SVC. This is, of course, appalling practice, but I have seen examples of this. One common example is that because the usage count of a program is maintained in the program header, it is sometimes incremented by the program to ensure that it does not get unloaded. It does, however have several drawbacks. Because programs are modified, it means that any attempt to load programs into storage which can only be read by ECBs would cause programs to fail. In ideal operating systems, all programs would be loaded into read only storage so they cannot be modified, either deliberately or inadvertently.
Since it is the responsibility of the Operating System when to load and unload programs, the information it uses to determine that, i.e., the usage count should be maintained in protected storage available to the OS only. While working at Danbury on the implementation of 4K blocks and Centralized List Handling (CLH) this was a change I tied to make, but it was determined that we would break too many badly behaved programs so it was dropped.
Structured Code is always less efficient
I regard this as a classic case of muddled thinking. Whenever I have found that it is hard to write structured code I have always discovered, after much thought, that it's because I was trying to solve the problem incorrectly. If you adequately define the problem, the solution can always be structured nicely. This makes it easier to follow. It is true that using the Structured Programming Macros (SPM) did, on occasions lead to branches to branches to branches as the indentation unwound. However, if you structure the code correctly, then you can be sure that this happens in the least used path. I remember that we rewrote the CPU loop in SPM as a part of CLH. The TPF bigots were appalled that the code eventually led to 4 branches being executed in a row. This was used as a classic case of how inefficient SPM was. What the simple minded failed to realize that while this was indeed true, it occurred in a path that ended after the final branch to an LPSW that loaded a wait state. Thus it is true that 3 unnecessary branches were taken on route to doing..... nothing!
But this only leads to a further conclusion. It is often the case that code written in a high level language can be more efficient than code written in assembler, Someone on a thread in TPF'ers said that some of the best assembler code he had ever seen was generated by C/C++ compilers. This is to be expected, even if it is counter intuitive. A compiler, looking at the big picture wouldn't generate branches to branches - it would generate code to go directly to the end. But this would be done without damaging the structure of the code.
Let me give another example of where a compiler can generate better code than a good assembler programmer.
* It's been so long, forgive me if I make some syntactical errors * Some Return codes - defined in some macro somewhere OK EQU 0 BAD EQU OK+1 VBAD EQU BAD+1 . . . * Some code to process the return code that is returned in register 0 CH 0,=AL2(OK) BNZ ERROR ... ERROR Code continues.
This code is trivial, but it exemplifies some good practices. The values OK, BAD and VBAD are not hard coded but EQU is used. This means that if you change the values, the code will continue to work as intended, always assuming that the code setting the return code is using the same set of equates. But, I hear you say, it would be more efficient to use LTR 0,0 to compare for zero. Indeed it would, but if you do that, then the code breaks if the value for OK is changed. But now let's look at what happens in a high level language.
enum { OK, BAD, VBAd }; . . . if (return_code == OK) . . .
In this case, the compiler, knowing at compile time that OK is equal to 0, can generate an LTR itself, but if you change the enum, the code generated next time will be different. But it doesn't break. This is a trivial example but if you extend it to situations where the compiler might need to multiply a number by a power of 2, since it knows that at compile time it can generate a SLL rather than using a MH. You can't do that in assembler because if the number you wish to multiply by changes from 2 to 3 due to an unrelated change in a DSECT somewhere, the assembler code would break.
There are many more examples where a compiler can generate more efficient code. Consider if you wish to test a condition and do one thing if it's true and another if it's false. Over time, it's conceivable that mods are made to both paths independently. It's conceivable that over time, both paths contain identical pieces of code. Because the compiler is looking at the bigger picture, it could see these common paths and extract them into the common path. Remember that code that looks totally different to you might look identical to a compiler. It's even possible that you might notice the similarity in the assembler code and extract it yourself. But if you did that you would be falling into the trap of "coincidental binding". But that would be a mistake because the code would be extremely difficult to follow.
It is for reasons like this that compilers often have a switch to inhibit optimization for debugging purposes. Sometimes the code generated by compilers can be extremely difficult to follow due to optimization. But it's very useful to have in production.
4K blocks
When we implemented 4K blocks, I fought very hard to make the block 4096 bytes long instead of 4095 bytes. My thinking was that at some time in the future, someone would realize that it was ridiculous to split a program up into components that would each fit within a 4K block and instead it would be possible to write a program many kilobytes or even megabytes long and simply load them into multiple 4K blocks and using the virtual addressing make them all appear to be contiguous. I envisioned systems that instead of constantly fetching programs from DASD they would be completely loaded into RAM at boot times. Even though this wasn't possible at that time, I thought limiting the block to 4095 instead of 4096 just made absolutely no sense, but such was the shortsightedness of the TPF development team at Danbury and of Bob Dryfus in particular, that the blocks became 4095. It's always puzzled me why TPFers in general have this tremendous instinct never to look forward and always hanker after the past.
I don't blame the people who designed PARS back in the 1960s. The world was very different then and the art of software was very undeveloped. And computers were so lame back then and memory so small that every byte used or instruction executed mattered. But today things are different. In the next room I have a PC server with 24 GB RAM and dual quad core processors. We buy disks with capacities measured in terabytes that fit in the palm of your hand. My guess is that the entire program base of a TPF system would fit easily on to a single SSD hard drive. It simply doesn't make sense to continue to use 1960s or even 1980s software technology in 2015.