This is a crash course on making sense of the NMOS circuit displays in Visual 6502/2C02/2A03, written for people without much low-level electronics experience (like the primary author). It aims to present the information needed to read the diagrams at a basic level in simple language, omitting details that are less important when starting out.
You might want to read the Visual 6502 user's guide and the Visual 2C02 page first.
Let's start by defining what the different colors mean:
At the level presented here, diffusion, metal, and polysilicon can be thought of as roughly equivalent when viewed in isolation; they all conduct current. The important difference is in how they interact with each other, which is explained below.
When a piece of polysilicon is sandwiched between two areas of diffusion, it acts as a gate, only letting current through when the polysilicon is connected to power (or, equivalently, is on, conducting, high, or 1). The diffusion area from which current flows when the gate is high is called the source. The diffusion area into which current flows is called the drain. The gate together with the source and drain is what makes a transistor.
The transistor here is an enhancement-mode transistor. All the "ordinary" selectable (see the nodes section) transistors have this type.
Around areas of powered diffusion we often see something like the following (note the distinctive "hook" in the polysilicon):
Here the polysilicon acts roughly like a resistor (or more specifically a pull-up resistor). This prevents there from ever being a short from VCC to ground (through some path of high gates). In the simulators, this entire configuration is simply modeled as a power source.
The transistor here is a depletion-mode transistor, a different type of transistor compared to above (though it appears the same visually).
In the following curcuit, the gate will be high and A powered/pulled to VCC:
VCC | | | [power source]------[gate] | | | A
No current will ever flow from the power source to A (ideally). The voltage on the gate only controls whether there's a conductive path between VCC and A.
Similarly, the gate will be low and A not powered/pulled to VCC In the following circuit:
VCC | | | GND-----------------[gate] | | | A
Using a switch analogy for transistors (which is common in digital electronics), the above circuits can be viewed as follows:
A | | | \ B----------[remote switch controller] \ <- controlled switch \ | | | C
When B is high, the switch is closed, connecting A and C. When B is low, it is open (like in the figure).
Electrically common areas are called nodes in Visual 6502/2C02/2A03. Clicking on a node highlights it, making it easier to see how things are connected. (Clicking on powered or grounded diffusion won't work; these only modify properties of other nodes and are not themselves nodes.) When a node is highlighted, a numeric ID unique to the node is displayed in the upper right, along with a name for the node if it has one. Node names are defined in nodenames.js.
Transistors can be selected separately by clicking on the gate (the part of the polysilicon between the diffusion areas). They have names that start with 't', followed by a numeric ID.
The Find: edit field can be used to locate nodes, either by numeric ID or by name. Numeric IDs can also be used to trace the values of nodes without an assigned name.
An inverter is constructed like in the image below:
Note that there is a hook in the gate to the left, meaning the left part of the circuit is a power source instead of a "normal" transistor.
When the input gate is low, current from the power source flows into the output wire, pulling the voltage high. When the input gate is high, current from the power source instead flows into ground, driving the voltage on the output wire low. The output wire is hence the inverse of the input wire.
When one node is the inverse of another, we will say that it inverts into the other node.
Below is an example of a NOR gate taken from Visual 2A03, related to controlling when the first square channel is silenced:
If any of the gates in red circles are high, the voltage of the highlighted node is pulled to ground instead of pulled high (as current will flow from the power source on the left into ground through any high gates). The value that reaches the gate in the blue circle is hence the NOR of the values on the gates in the red circles.
Note that the circles represent the only transistors in this image (except for the depletion-mode transistors on the power sources). There are polysilicon traces passing above (or in reality, below) metal traces in a few spots, but this does not form a transistor. The highlighting (which was activated by clicking on the node) shows how things are connected.
The gate in the blue circle is part of a pass transistor, so called because it passes current between two nodes rather than driving or grounding a node. The gate in this case is apu_clk1, and we say that value is "buffered on apu_clk1".
This is the simplest form of storage, and so is covered first.
If a wire is "closed off" so that it is no longer connected to neither power nor ground, it retains its value for a while through capacitance. This is used to store some short-lived data "on the wire". As an example, here's the read buffer for the 2C02's VBlank flag, which lets its value be read even though reading $2002 immediately clears the VBlank flag:
The circled gate is controlled by the /read_2002_output_vblank_flag signal (shortened to ov from here on). While ov is high, the value of vbl_flag (or rather /vbl_flag in this case) is connected to the highlighted wire. When ov goes low, the value on the wire is held.
A '/' denotes 'inverse', meaning the signal is the inverse of another signal with the same name but without the '/'. A '/' can also mean 'active low', meaning the signal is considered "active" when low. Visual 6502 has a slightly different convention – see Node names in Visual 6502.
While a node or wire is isolated from both VCC and ground in the above fashion, it is said to be floating. For bus lines, a floating line is said to be tri-stated, as the floating state can be viewed as a third state in addition to 0 and 1. This third state allows other devices to use the bus without interference.
Using capacitance as storage in the above fashion is an instance of dynamic logic, so called since it has time-dependent behavior beyond just the input clock. Chips that make use of dynamic logic techniques tend to have a minimum clock speed at which they function correctly, as values stored via capacitance degrade to zero over time.
Two cross-coupled inverters make a latch – an element that stores a single bit.
Below is the VBlank flag from Visual 2C02. In the left-most picture the vbl_flag node is highlighted, and in the middle picture its inverse (/vbl_flag) is highlighted. As can be seen by the two gates in gray circles, each node inverts into the other, forming two cross-coupled inverters.
The gates marked set and clear set and clear the latch, respectively. To clear the latch, vbl_flag is driven low. To set the latch, /vbl_flag is driven low.
This circuit is an example of an SR Latch, where S stands for set and R for reset, corresponding to the set and clear gates above. It is more specifically an SR NOR Latch, as it can be viewed as being built of NOR gates. The corresponding schematic using NOR gates is shown in the right-most picture.
When a latch can be set directly from the value of some line, e.g. a data bus line, an arrangement involving a clock is often used. The motivation is to avoid having to form both data_line and /data_line and route them to the set and clear terminals of the latch, which would use more logic. The clock is already routed all around the chip, so mixing it in usually isn't as much of a problem.
As an example, here's the noi_lfsrmode node (the Loop noise flag from $400E):
While apu_clk1 is high, noi_lfsrmode will flow into the floating node (so called because it will float when both apu_clk1 and w400e are low), which then inverts into noi_/lfsrmode, forming a cross-coupled inverter latch. While apu_clk1 is low, the loop will be broken momentarily, and during this phase a new value can be copied into the latch through the gate controlled by the w400e signal (which goes high on writes to $400E). The value let through by the pass transistor is the db7 node, corresponding to the 8th bit of the data bus. (There's a via between the diffusion and the metal db7 line – easier to see if the node is highlighted.) If the loop was not broken during the write operation, the old value in the latch would interfere with setting a new value.
For another, less cluttered view of the same type of circuit, see this image (substitute "apu_clk1" for "/φ₁" and "w400e" for "φ₁").
(The circuitry in the lower-right corner is a multiplexer, which selects between one of two inputs depending on whether noi_lfsrmode or noi_/lfsrmode is high; i.e., depending on whether noi_lfsrmode is 0 or 1. The output of the multiplexer is on the left side.)
Below is an example of a DRAM cell, taken from the internal PPU OAM memory:
In the left and right pictures the two sides of the cell are highlighted (with a different highlight color on the right due to the node being high). The two nodes are always inverses of each other, with the node highlighted in the left picture corresponding to the value held in the cell (low for 0 and high for 1).
Note that this is not an instance of cross-coupled inverters, as neither node is directly connected to a power source. Rather, DRAM depends on capacitance to hold the value, which will fade unless the capacitor is regularly refreshed (the high side recharged). This is the "dynamic" part of DRAM.
Below is a picture of the upper edge of the PPU OAM DRAM array:
The "column" and "row" labels are conventional memory terminology; they confusingly happen to get the opposite orientation in Visual 2C02. "Row" and "column" below will refer to this terminology.
The spr_rowx lines (sometimes called word lines) are used to connect a row of memory cells to the horizontal bit lines (by opening up each cell to a pair of vias); this is called opening that row. For example, spr_row16 opens the highlighted row, while spr_row0 opens the row on its right side. As can be guessed from the node names, the memory layout is not as straightforward as consecutive memory locations being stored in consecutive rows. (Interestingly, we do get consecutive rows if we reverse the bits in the part of the sprite address that selects the row. It is unknown why the row selection bits were not wired to the DRAM in this "correct" configuration instead.)
On the left side of OAM we see pass transistors on the spr_col1 and spr_col3 lines select the bit lines from the first and second columns of the memory array, respectively (there are other, similar, lines next to them) . Each such spr_colx line is connected to eight different columns (16 bit lines), corresponding to the eight bits of the byte to be read or written (increasing bit positions are not stored in consecutive columns either). One notable exception to this pattern is that two columns only connect to five sets of bit lines; these columns correspond to the "flags" bytes in OAM, where the middle 3 bits don't actually exist.
At the right side in the picture above we see pclk0 running down the edge of OAM, connected to pull-up transistors for each bit line. During pclk0, these are used to precharge the bit lines, after which the pull-up transistors are disabled but the lines remain charged through capacitance. When the selected row is opened after pclk0, it will be exposed to the precharged bit lines, which has the effect of charging up the high side of the cell. On the low side of the cell, the precharge current will simply drain to ground, as the gate on that side will be driven high.
In a typical DRAM circuit, the rows are automatically and periodically refreshed to prevent values from fading. In the PPU, no such logic exists, and rows are only refreshed when explicitly accessed. The reason the PPU (usually) gets away with this is that sprite evaluation will access the entire OAM (provided rendering is enabled), refreshing the rows as a side effect.
SRAM uses cross-coupled inverters for storage and is accessed using a row/column scheme similar to DRAM. Compared to DRAM, SRAM does not need to be refreshed, tends to be faster, uses more die area per memory cell, and draws more power for the NMOS version.
Below is a picture of SRAM memory cells used to store the PPU's palette (in this case the rows do go horizontally):
The two elements are covered together since their implementation turns out to be similar in this case.
Pictured below is the decoder and mask ROM that act as the lookup table for initialization of the length counters in the APU:
The length is set by writing bits 7-3 of e.g. $4003 (in the case of the first pulse channel), so the inputs to the decoder are bits 7-3 of the data bus. The output from the decoder feeds into the mask ROM, and the output from the mask ROM is the length from the lookup table. The length is used to initialize a counter that counts down to zero before silencing the channel.
The picture below shows a zoomed-in view of the lower part of the decoder and mask ROM:
The spots of yellow diffusion in the decoder and mask ROM are connections to the metal wires, which run horizontally in the decoder and vertically in the mask ROM. By setting the gates connected to the diffusion high, the wires can be driven low.
In the decoder (right part) the input lines and their inverses run vertically (/db7 has been highlighted to show its connection). By looking carefully at the bottom-most horizontal row in the decoder, we see that it is powered on the right side, and that the condition for it to remain high as it passes into the mask ROM is /db7 AND /db6 AND /db5 AND /db4 AND /db3. Another way to put this condition is db7-db3 = $00.
Similarly, the condition for the second row from the bottom to be high is /db7 AND /db6 AND /db5 AND db4 AND /db3, which translates to db7-db3 = $02. The conditions for the third and fourth rows from the bottom are db7-db3 = $04 and db7-db3 = $06, respectively.
The decoder is set up so that dbx and /dbx will never both drive the same horizontal line low (which would make it impossible for that line to ever be high), and in this case each row has a unique bit pattern that activates it. (It would also be possible to insert a "don't care" condition in the decoder by having neither dbx nor /dbx drive the line low.)
The decoder here is a 5-to-32 decoder, with 32 rows corresponding to the 32 possible bit patterns made with five bits. This type of decoder is said to fully decode its inputs, and is an instance of an n-to-2n decoder.
In the mask ROM (this one in particular being a NOR ROM), we see that each horizontal line from the decoder when high will cause a particular pattern to appear on the lenx outputs. Reading off the bottom row, this pattern is len7-0 = 00001001b = 9. Reading off the remaining rows from bottom to top, we get the values 00010011b = 19, 00100111b = 39, and 01001111b = 79.
Putting together the above, we have the following incomplete map from inputs to outputs:
Index | Length |
---|---|
$00 | 9 |
$02 | 19 |
$04 | 39 |
$06 | 79 |
By checking against the APU length counter table, we see that these indeed are the length values corresponding to those indices (minus one, due to details of how the length counter works).
To give an example of a decoder that does not feed into a mask ROM, the picture below shows the internal 2A03 address decoder for the address range $4000-$4017, where signals such as r4017 (read 4017) and w4004 (write 4004) are generated.
The theory behind the decoder and mask ROM seen here is closely related to that of PLAs (Programmable Logic Arrays), where we could view the decoder as the AND plane and the mask ROM as the OR plane (both implemented with NOR gates). This introduction to PLAs is helpful.
Pictured below is part of the adder used by the sweep units in the 2A03 to calculate the target period for sweep period updates to the second square channel (the first square channel is identical except for a small quirk related to subtraction; see below). The pictured part calculates the second bit (bit 1) of the sum, along with the carry for that bit position.
The adder is split into two parts. The left-most part (having four columns) calculates bit 1 of the sum. The right-most part (with three columns) calculates the carry. Both /sum1 out and /carry out are powered, and can be forced low by certain combinations of the input signals being high. (For e.g. the left-most column, this combination is addend1 AND carry in AND sq1_p1). The essential information is captured in the following truth table:
sq1_p1 | addend1 | carry in | /sum1 out | /carry out |
---|---|---|---|---|
0 | 0 | 0 | 1 | 1 |
0 | 0 | 1 | 0 | 1 |
0 | 1 | 0 | 0 | 1 |
0 | 1 | 1 | 1 | 0 |
1 | 0 | 0 | 0 | 1 |
1 | 0 | 1 | 1 | 0 |
1 | 1 | 0 | 1 | 0 |
1 | 1 | 1 | 0 | 0 |
As expected, this corresponds to an addition operation (with the sum and carry inverted).
The same logic is used to perform subtraction, by inverting each bit of the addend (using separate logic) and setting the carry in for the zeroth bit to 1. This corresponds to the usual invert-bits-and-add-one operation for negating a number in two's complement.
For unknown reasons, the inverted carry input for the zeroth bit of the first square channel is connected to VCC instead of the inverted sweep direction flag (as it is in the other square channel), making the carry input unconditionally zero. This leads to the value minus one being subtracted instead on that channel.
The below circuitry forms part of a barrel shifter, used to shift the inputs to the adders for sweep unit period updates in this case.
As a side note, the bit inversion for subtraction by the sweep units happens before the bits enter the barrel shifter.
(This section might be considered "advanced" on a first reading. I just wanted an example that made more complex use of clocks.)
The picture below shows the 16-bit shift register that holds the high bits for background tiles (see the PPU rendering page). The upper eight bits can be reloaded from PPU VRAM data bus lines, and the output is taken from the lower eight bits (in this case, the particular bit to use is selected by the fine x scroll). Bits flow clockwise through the shift register.
Below is a zoomed-in view of three bits (tile_h15-13) from the upper-left part of the shift register:
The value of each bit corresponds to the value on the (2) side.
The following signals control the shifting and reloading of the register (the names used were invented for the article and are not standard terminology):
Shift does not always exactly mirror pclk1, as explained below, which is the reason for the ≈ notation.
Shifting the register is a two-step process:
Due to the bit of powered diffusion circled in red, the default value shifted into (1) is 1. However, as the value is held on the inverted side (2), this means that zeroes are being shifted in.
To perform a parallel load of the register, step (2) from above is modified so that Shift remains low during pclk1 and Parallel load goes high instead, causing the new value for each cell to come from the data bus lines instead of from the previous cell.
The diagram below might clarify how the control signals are related. Each row is a PPU half-cycle.
pclk0 | Invert | Shift | Parallel load |
---|---|---|---|
1 | 1 | 0 | 0 |
0 | 0 | 1 | 0 |
1 | 1 | 0 | 0 |
0 | 0 | 1 | 0 |
1 | 1 | 0 | 0 |
0 | 0 | 0 | 1 ← Reloaded here |
1 | 1 | 0 | 0 |
0 | 0 | 1 | 0 |
1 | 1 | 0 | 0 |
0 | 0 | 1 | 0 |
1 | 1 | 0 | 0 |
0 | 0 | 1 | 0 |
The below Visual 2A03 circuitry controls the volume on the output pin for the two square channels (the triangle, noise, and DMC channels use a separate pin). Note that each successive bit has twice the weight of the preceding one in terms of the amount of powered diffusion connected to it.
This is an example of a binary-weighted DAC. A different type of DAC is used for the video output from the PPU (found in the upper-left of Visual 2C02, rotated 90 degrees here):
The upper-left end is actually connected to VCC, and the lower-right to ground. This is a voltage ladder, and works by tapping the wire (which behaves as a resistor) at different points along the run to get different voltages. As the simulator is purely digital, this circuit is not directly used in the simulation, and some parts that would otherwise interfere with it have been disconnected.
These are found on pins capable of doing output, which need to be able to source (generate) and sink large currents to drive the line high or low. The polysilicon wire that would cause the pin to source current is highlighted below.
Large clusters of pull-up and pull-down transistors like these are sometimes called superbuffers. They also appear in some internal circuits that need to source or sink larger currents, e.g. due to having a large fan-out – a large number of connections from the logic gate's output to inputs of other gates.
On lines that are capable of being tri-stated, this is done by activating neither the pull-up nor the pull-down transistors, so that the pin neither sources nor sinks current. This is also done for reads on bidirectional lines, to prevent the output driver from interfering.
Some parts of the chips, especially outside the 6502 core, were designed using a copy-and-paste process called "standard cell", leading to some seemingly nonsensical and cut-off connections. These carry no special significance. The image below contains an example.
The 6502 core inside the 2A03 is a substantially tighter block of NMOS (having been designed by hand), but it still has a few cut-off connections remaining from removal of the original output drivers.
(This information is not essential to reading the diagrams.)
The layers that make up the chip are as follows, in order from bottom to top: substrate, diffusion, oxide (with holes for buried contacts and vias), polysilicon, more oxide (with holes for vias), metal, and passivation (or "overglass", containing holes where bond wires connect).
The way diffusion is powered or grounded is through vias to large areas of metal that are either grounded or powered.
Visual6502, Visual2A03, and Visual2C02 are purely digital simulators, so the effects of transistor dimensions don't matter. But you will often notice locations in the simulators where transistors are different shapes.
Here's the inverter from the beginning of this tutorial, now annotated with dimensions:
Because the layer of substrate is uniform thickness, everything is calculable in terms of sheet resistance. In the above annotated picture, two transistors are shown: one significantly wider than long, and the other the opposite. The aspect ratio (length divided by width) of the depletion mode pull-up transistor on the left is approximately 3.47, while the aspect ratio of the enhancement mode pull-down transistor on the right is 0.23. As a result, the pull-down transistor is approximately 15 times more effective at sinking current (which is good: it has to be able to override the pull-up).
The 2A03 uses these analog effects in its audio path:
Shown are the three least significant bits of the 2A03's APU's PCM channel. pcmout1 drives a single transistor (with some resistance R) and pcmout2 drives two (resulting in a resistance R÷2). To give pcmout0 a resistance of 2·R, they either would have had to make the transistor half as wide or twice as long. Halving the width wasn't an option because the diffusion areas are already as narrow as possible using this manufacturing technology. As a result, the gate for the least significant bit is longer.
This section lists node names for various clocks that sequence operations within the chips. Some of the 6502 pin signals might have gained a "c_" prefix in Visual 2A03 compared to Visual 6502.
This clock arrangement helps to ensure that timed events (various counters being decremented or reloaded) do not conflict with writes from the CPU (which only happen when φ2 is high).
φ1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 |
---|---|---|---|---|---|---|---|---|
φ2 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 |
apu_clk1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
apu_/clk2 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 |
apu_clk2x | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
The clock divider in the PPU is clocked on a zero-to-one transition of the master clock, while the clock divider in the CPU is clocked on a one-to-zero transition. Diagrammatically, this might look like below for NTSC (for CPU clock, "!" denotes when the 2A03's M2 line goes high but the 6502's internal clock is still low).
Master clock | 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 ... PPU pixel clock | 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 ... CPU clock | 0 0 0 0 0 0 0 0 0 ! ! ! 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 ! ! ! 1 ...
During each active cycle (i.e., while the RDY line of the 6502 has not been pulled low), the CPU either reads from or writes to memory; there are no "idle" cycles w.r.t. the data bus in the 6502. Each such read or write cycle is split up into two equally long phases, called φ1 (phase 1) and φ2 (phase 2), corresponding to the clock signals above. φ1 takes place while the clock input is low, φ2 while it is high.
During each cycle, the R/W signal and the address bus lines are updated during φ1. In the simulators we see them change right away, but in a real 6502 there will be some delay. At the end of φ2, values are read from or written to the data bus lines. The IRQ and NMI interrupt lines appear to be sampled on the falling edge of φ2 (as indicated in this thread and also from observed behavior relating to the VBlank flag).
This document lists data and address bus contents during each cycle of an instruction. For more detailed timing information, see the MOS hardware manual.
Below are various terms you might run into:
A hash (#) or tilde (~) on a node name signifies active low or negation in Visual 6502. Due to problems passing hashes in URLs, aliases were automatically introduced that use tildes instead (hence the "automatic alias replacing hash with tilde" comments).
When the simulator is loaded and after it has been run with "animate during simulation" enabled, nodes that are high will be highlighted. To get rid of this highlighting, click the "clear highlighting" button.
Being able to add node names to nodenames.js can be very helpful when figuring out a circuit. To do this, a local version of the simulator can be downloaded with e.g. $ wget --convert-links on a *nix system. Please watch the recursion level and avoid downloading data needlessly, as at least Visual 2C02 and Visual 2A03 are hosted on a limited uplink.
Many additional node names for Visual 2C02 can be found in this repository. The repo is maintained separately since it is updated often and not all nodes have been confirmed.
A high-level overview of the layout of the PPU can be found here. Another, lower-level analysis can be found here.