Serious Debugging Time

Now that the power supply is straightened out (at last!), I’ve been able to start tracing logic and seeing what’s up with the CPU.

First things first: It still doesn’t run. Last week before the power supply died I mentioned that I thought it may have been due to low voltage. That was wishful thinking. Now that the voltage levels are good, the behavior of the CPU is roughly the same. It is time to go spelunking into the lair of the beast.

I wanted to tackle the HALT/ENABLE switch first. If the CPU is halted (switch set into the HALT position) it should be running the console loop, waiting for commands from the console switches, but I have no evidence that it is. I had already verified that the 7410’s on the console were working correctly, so my investigation took me from there to the CPU itself.

Thanks to countless people around the world, a lot of DEC’s documentation lives on in scanned form, including the KD11-A Maintenance Manual and Engineering Drawings. These have proven absolutely essential. It didn’t take me long to trace where the HALT switch feeds into the CPU. It turns out it has several destinations: On the M7235 STATUS module it feeds into a 74H11 AND gate, and on the M7234 TIMING module it feeds into a 7474 flip-flop, a NOR on a 7402, an inverter on a 7404, and a NAND on a 7400.

I took a breadth-first approach to tracing the HALT signal by looking at the inputs and outputs on each of these gates. I was able to watch the signal go from HIGH to LOW and back again as I flipped the switch, so I knew the switch, the cable, and the backplane connections it used were good. Everything looked OK until I go to the 7404 inverter on the TIMING module.

Bingo! My first incontrovertibly bad part! It didn’t take me long to replace the 7404. One very tiny problem down, who knows how many to go.

I hit kind of a dead end on tracing the HALT switch, because most of the other gates have other inputs that weren’t changing. After my little 7404 victory, I decided to start looking more closely at the timing situation. How did I know what was going on at startup? Could I figure out whether I even had a clock? Was any microcode running?

I figured I should start by watching what happens at power-up. There’s a nice little diagram in the maintenance manual, page 3-11, that shows the expected timing of various signals after the CPU is turned on, so I decided to watch these lines on a logic analyzer and see what I got.

I probed the following lines and IC locations on the M7235:

  1. +5V DC
  2. BUS DC LO (DEC8881 E16, 04)
  3. BUS AC LO (DEC380 E15, 10)
  4. PWRUP INIT L (74123 E26, 04)
  5. PWRRESTART H (74123 E25, 13)
  6. JAMUPP (7474 E44, 03)
  7. DELAY POWER DOWN (74123 E34, 12)

Well, I got a mostly correct waveform on my PDP-11/35’s startup:

+5V shows where I turned on the system. About 40ms later, DC LO goes high, then AC LO goes high a few ms after that. They seem to be behaving exactly as they should.

The little spikes there on the JAMUPP line are also correct – in reality those are about 3µs wide, exactly as they should be [EDIT: actually, plus or minus a microsecond – I need to investigate that. microseconds matter!]

DELAY POWER DOWN looks good too. The line is pulled low for about 3ms just as PWRRESTART H and JAMUPP both go low.

But PWRUP INIT L is seriously wrong. What looks like a little spike is a couple of < 1µs blips. That line is SUPPOSED to go low right after BUS DC LO goes low, and stay low for 20ms. That's part of the function of the 74123 IC, it uses an RC network as a timer — in this case an 18K resistor and a 3.9µF capacitor. Different values would be used for different delays. My first thought was to double-check that noise and see what it looks like on an oscilloscope:

The yellow trace is DC LOW going high, and the blue trace is PWRUP INIT L. As you can see, that really is a very tiny blip. It tries to go low and fails miserably.

I immediately tested the capacitor and resistor in the RC network, but both are totally fine. It’s a tantalum cap anyway, so I wouldn’t really expect it to have gone bad. That pretty much means by process of elimination that the 74123 is bad. No surprise, actually, because the legs are pretty badly spotted with rust.

So that’s where I sit as of tonight. I obviously have startup timing problems, and so far two bad ICs. I’ve ordered a little care package of some 74LSXX, 74ALSXX, and 74FXX parts from Jameco, so they should be here by Wednesday at the latest. Some 74LS123s are among them. I’ll replace E26 and see if the startup waveform improves. There’s also a mystery with PWRRESTART H going high too soon, but I suspect that’s because of the glitch in PWRUP INIT L. We’ll see.

I think this is going to be a pretty slow process. But I’ll make it work yet!

On Spam

It’s remarkable how much spam is directed at this little blog. In a given day, I’d say the Akismet plugin stops something like 5 to 8 spam comments from getting posted here. It’s absolutely infuriating and ridiculous, especially for such a podunk little no-name blog like this. I can’t imagine how many the big ones get!

On the other hand, it would be so much worse if the comments actually made it through. I should be grateful that Akismet is so good at stopping them (knock on wood).

Persistence Pays Off

Eureka!!

I replaced MPSA05 and MPSA55 transistors that I suspected of being reversed, and bingo, the power supply is working again. I’m still upset with myself that I put them in backward, but in the end no other damage was done. Thank goodness.

Now I have a good, stable 5V supply, and I’ve adjusted the output correctly. That means I can get back to the business of debugging the CPU and seeing why the machine won’t run.

Transistors!

Oh my God! I know what I did wrong with the power supply. Good heavens!

DEC used GPSA05 (NPN) and GPSA55 (PNP) transistors. I replaced two that were shorted out with MPSA05 and MPSA55’s. They have completely identical specifications.

But guess what? THEY USE REVERSED PINS. Pin 1 is the emitter on the MPSA05/55’s. It’s the collector on the GPSA05/55’s.

Lesson learned: ALWAYS CHECK THE TRANSISTOR ORIENTATION. Especially when you’re 110% sure the parts are identical.

Ugh!

I will report back after I have fixed my mistake.

iCircuit

Last night before bed I whipped up a circuit simulation of the 5V regulator using iCircuit running on OS X. It’s a little buggy, and it’s certainly no Spice, but the real-time feedback is fantastic.

The beauty of this is I can play with it and examine possible failure scenarios by shorting or opening various components to see what happens. I think I have a much better idea of how the regulator actually works, now, and I’m going to do a careful part-by-part check, from one end of the circuit to the other, and find out what’s failed. My gut tells me I should check R50 (the potentiometer), R47, and the 2.4V reference zener. So I’m not giving up hope yet!

Continued Power Supply Problems

Things are certainly looking grim for my hopes of completing restoration before July is through. I have replaced every shorted part, and I still don’t understand what’s going wrong. I am letting my Retrochallenge friends down!

The problem really is that I am simply out of my league with the +5V regulator. I can fake my way through digital electronics quite well because much of it feels natural to me. And since everything I do is low frequency, I have the luxury of more or less ignoring capacitance and inductance, and I can pretend that transistors are nothing more than electrical switches. It makes life so easy.

Understanding and repairing this power supply, though, that’s another matter. When it comes to analog electronics, I can understand a basic emitter-follower amplifier alright, but anything more complex than that leaves me scratching my head. Even with the circuit description in the manual (which is quite good) I don’t really “get” it.

So far my understanding of the problem is that the regulator is not switching. It should be generating a sawtooth wave at high frequency, which eventually gets smoothed out to +5V by an LC network. But the switching isn’t happening. I have now replaced every transistor in the circuit, and it is still not switching. I either screwed something up, or one of the other parts (possibly one of the diodes, or one of the zeners) that I THINK is good is not good. I don’t want to go full-on shotgun and replace every component, that’s ridiculous!

Still… I do have a backup plan up my sleeve. I located (and bought) a spare regulator board. I got a very good deal on it, only $45. I consider that a very reasonable price for my sanity. It should be here next week, and I’ll keep it as a spare if I actually DO get this board working. And if I can’t, at least I’ll have a known good board to use in its place.

I’m a little down about it, but at least it’s a learning experience.

One Step Forward, Two Steps Back

You’re probably wondering to yourself, “Self, why hasn’t Twylo updated his PDP-11 repair journal lately? It’s been days!”

I can sum it up with one schematic:

DEC +5V Regulator
+5V Regulator Circuit

That’s the schematic for the +5V regulator in the 11/35’s power supply. It blew up on me (figuratively) just as things were getting good.

It all started on Sunday morning. I wrote up an in-depth testing plan to let me debug the flakey CPU logic step by step, tracing signals from the console into the CPU and examining the behavior at each step. Of course, the first step was to verify that +5V was making it to Vcc on the ICs. I had fully tested the power supply a few weeks ago before putting it back into the chassis, and an IC not getting power would be a pretty obvious problem, so I didn’t expect to find any issues.

Well, I was in for a surprise. I was reading a mere +4.4V on Vcc. I checked several ICs on multiple cards, and they were all the same. +4.4V is way below the recommended tolerance for the first-generation TTL logic used in the KD11A CPU! That alone would probably cause things not to run. So I felt like I was definitely onto something here. The H750 power supply has little potentiometers to adjust the voltages on the output of each regulator, and I’d adjusted the 5V output to pretty much exactly 5.00V. But I did so without any load on the supply, and I think that was my problem. I decided to re-adjust the voltage with the system loaded and get it back up to +5V. Lo and behold, as I was adjusting the voltage, the LEDs started to flicker, as if a bit of life was coming back into the CPU. It looked almost like it was trying to run!

And that’s when things went wrong.

Suddenly, there was the tiniest popping sound from the power supply, and all the LEDs went out. I instantly shut off the power, but I knew full well it was a fuse. The regulator has an overvoltage protection circuit built into it, right at the end. This is commonly called a “crowbar circuit” (I guess because it’s like taking a crowbar to something to shut it off). It’s composed of D11, D12, and Q11 in the schematic above. It’s very simple: if the output voltage of the regulator goes above the breakdown voltage of zener D12 (5.6V), D12 starts to conduct in the reverse direction, and suddenly there’s a voltage across R53. This voltage is sensed by Q11 on its gate input, causing current to flow from the anode to the cathode. Q11 can sinks a lot of current, and when it’s sinking about 16A, fuse F1 (not pictured) pops and shuts the whole circuit down. Since the SCR and D11 are both rated for 20-25A continuous, in theory everything should be happy once you turn the voltage potentiometer back down and replace the fuse.

In my case, though, that’s not what happened. I replaced the fuse, turned the voltage down, and powered the PSU back up. POP! The fuse blew again. Bad sign.

So I started more troubleshooting. I had some false starts and drew some incorrect conclusions, but before the end of the night I’d discovered that D11 was dead shorted. In fact, its package had cracked! So any time I applied power after that, +39V would go to the +5V output and the fuse would blow again right away.

Sigh. Nothing else for it, I wrote up an order to Mouser and Jameco and ordered some replacement parts. Since I’ve had trouble with this supply before, I added some extras stuff onto the order. In addition to the 1N5624 that shorted, I bought replacements for several core parts that may fail in the future, including power transistors, A05 and A55 transistors, zener diodes, etc., just in case! Total cost was less than $30, so I consider it an acceptable investment.

The parts should be here by Thursday, and then I will be back to my regularly scheduled debugging. Until then, I must twiddle my thumbs.

Intertwingly Little Lines

I’ve taken the initiative to dig into the KD11-A Maintenance Manual and the Engineering Drawings a little bit. To say I am intimidated would be a bit of an understatement. This is a very complex machine that many smart people with engineering degrees designed. I like to think that I am capable of screwing in a lightbulb without hurting myself, but I do not have an engineering degree. My electronics knowledge is 100% self-taught. So I feel, I think somewhat justifiably, like I may be a little out of my league.

On the other hand, there’s a sense of freedom in not really knowing how much I don’t know.

Based on a suggestion from the Vintage Computer Forums, I’ve narrowed my search down to the console control logic that is located on the M7235 STATUS board. A first step is to try to figure out why nothing gets loaded when I press the “LOAD ADRS” switch. I have come up with a plan of attack that will hopefully rule out very simple things like improper cabling, and then gradually work back from the console itself. So, this will be the entry point of my puzzle, to trace the signal from this switch through the KY11 console board, up and over the 40-pin flat ribbon cable that connects the console to the M7235, and then into the control logic on the M7235 itself.

To aid in this, I have done the unthinkable in the modern age: I have printed out the relevant schematics and the maintenance manual on paper. That’s right, dead tree format. I find it much easier to leaf through than trusting the painfully slow image-based PDF browser on my iPad, or lugging around my laptop. Plus, I get to scribble annotations right on the paper. Keepin’ it old-school, as they say.

I will say that the maintenance manual is quite handy, now that I have started to get at least some minor understanding of the architecture. It goes hand-in-hand with the prints and describes in great detail the operation of each circuit. It’s almost like getting a CSEE computer design course, only without the crippling tuition payments!

I won’t be able to really dig in until this weekend. I have ordered a set of DEC extender boards (one dual width, one quad width) from Douglas Electronics, and I hope they will be here on Friday. I am also expecting a hex extender card that Jack Rubin was kind enough to lend me — I should have that next week some time. Once I have these in hand the game is really on.

Time for the real work to begin

Last night was the big moment. I fired up the 11/35 for the first time with the CPU installed.

I’m pleased to report that nothing popped, no fireworks went off, and no magic smoke got out.

On the other hand, it did not work correctly, either. The processor starts up, the RUN light comes on, random(-ish) data is displayed on the ADDRESS and DATA lights, and that’s about all that I can make happen.

So now begins the really hard part, the logic debugging. I’ve started a thread over on Erik S. Klein’s Vintage Computer Forum to discuss what I’m seeing, with the hopes that the very smart (much, much smarter than I) DEC fiends over there will be able to offer insight into how I should proceed with debugging.

I am armed with the KD11-A Processor Maintenance Manual, the PDP-11/40 System Engineering Drawings, an 8-channel logic analyzer, an oscilloscope, a multimeter, my brain, and the Internet. This will by no means be easy, but it will definitely be educational, and probably fun and frustrating in equal parts. Let’s do this!

More Assembly

Things really started to come together last night. I finished installing the cooling fans, dropped the power supply into the chassis, and connected the 9-slot backplane. No cards and no console, but I consider this my first “smoke test” of the fully assembled chassis and power subsystem. It works!

Exciting days ahead. I have to re-assemble the console, and then I’ll drop in the cleaned-up CPU cards this weekend and see what happens. I’m a mite skittish about that.