This is the guide to the starman's site.
On this page http://thestarman.pcministry.com/asm/debug/Segments.html

If you are not sure, go here http://thestarman.pcministry.com/

Copyright©2001, 2007 by Daniel B. Sedory

Removing the Mystery from SEGMENT : OFFSET Addressing

Section 2. Vizualizing the Overlapping Segments

The High Memory Area (HMA)
  • The Upper Memory Area (UMA)
  • Read the original of this page at the address at the top.

    Later I'm interested in discussing what Mr Bill Gates said, and ask there can be an alternative point of view.


    Introduction.

    There are often many different Segment:Offset pairs which can be used to address the same location in your computer's memory. This scheme is a relative way of viewing computer memory as opposed to a Linear or Absolute addressing scheme. When an Absolute addressing scheme is used, each memory location has its own unique designation; which is a much easier way for people to view things. So, why did anyone ever create this awkward "Segment:Offset scheme" for dealing with computer memory? As an answer, here's a brief lesson on the 8086 CPU with an historical slant:

    Segment:Offset addressing was introduced at a time when the largest register in a CPU was only 16-bits long which meant it could address only 65,536 bytes (64 KiB[1]) of memory, directly. But everyone was hungry for a way to run much larger programs! Rather than create a CPU with larger register sizes (as some CPU manufacturers had done), the designers at Intel decided to keep the 16-bit registers for their new 8086 CPU and added a different way to access more memory: They expanded the instruction set, so programs could tell the CPU to group  two 16-bit registers together whenever they needed to refer to an Absolute memory location beyond 64 KiB.

    If the designers had allowed the CPU to combine two registers into a high and low pair of 32-bits, it could have referenced up to 4 GiB[2] of memory in a linear fashion! Keep in mind, however, this was at a time when many never dreamed we'd need a PC with more than 640 KiB of memory for user applications and data![3] So, instead of dealing with whatever problems a linear addressing scheme of 32-bits would have produced, they created the Segment:Offset scheme which allows a CPU to effectively address about 1 MiB of memory.[4]

    The scheme works like this: The value in any register considered to be a Segment register is multiplied by 16 (or shifted one hexadecimal byte to the left; add an extra 0 to the end of the hex number) and then the value in an Offset register is added to it. So, the Absolute address for any combination of Segment and Offset pairs is found by using the formula:


      Absolute
      Memory   =  (Segment value *  16) + Offset value
      Location


    After working through some examples, this will become much clearer to understand: The Absolute or Linear address for the Segment:Offset pair, F000:FFFD can be computed quite easily in your mind by simply inserting a zero at the end of the Segment value ( which is the same as multiplying by 16 ) and then adding the Offset value:
                  F0000
                 + FFFD
                 ------
                  FFFFD  or  1,048,573(decimal)
      

    Normalized Segment:Offset Notation

    Since there are so many different ways that a single byte in Memory might be referenced using Segment:Offset pairs, most programmers have agreed to use the same convention to normalize all such pairs into values that will always be unique. These unique pairs are called a Normalized Address or Pointer.

    By confining the Offset to just the Hex values 0h through Fh (16 hex digits); or a single paragraph and setting the Segment value accordingly, we have a unique way to reference all Segment:Offset Memory pair locations. To convert an arbitrary Segment:Offset pair into a normalized address or pointer is a two-step process that's quite easy for an assembly programmer:

    1. Convert the pairs into a single physical (linear) address.
    2. Then simply insert the colon (:) between the last two hex digits!

    For example, in order to normalize 1000:1B0F, the steps are:

      1000:1B0F 11B0Fh 11B0:F (or 11B0:000F)

    Since the normalized form will always have three leading zero bytes in its Offset, programmers often write it with just the digit that counts as shown here: 11B0:F (when you see an address like this, it's almost a sure sign that the author is using this normalized notation).

    How Segment:Offset notation can lead to Problems

    The normalized notation for the first byte where all PC BIOS must place a floppy diskette's boot strap code is: 07C0:0 (or 07C0:0000). A big problem for some PC manufacturers came about when some BIOS writer assumed that there'd be nothing wrong with jumping to the bootstrap code at that particular Segment:Offset pair, since it's the same memory location as 0000:7C00.
    Somehow they never took into consideration the fact that the standard used by everyone else always set the SEGMENT values to ZERO. Therefore, a bootstrap code programmer could assume all Segment values (Code, Data, etc.) were zero and only have to deal with the Offset values in that Segment. Along comes this BIOS chip clone that sets the Code Segment to 07C0 (using a JMP 07C0:000 instruction), and suddenly there was a big problem getting most OS bootstrap code (including Microsoft® and IBM® OSs) to boot up in these computers! This is why some bootstrap code, such as that in the GRUB Boot Manager, will add extra instructions to make sure the Segment Registers have been set correctly! One of the authors of GRUB comments that his Long Jump code was necessary "because some bogus BIOSes jump to 07C0:0000 instead of 0000:7C00."


    3[Return to Text] We've often heard that Bill Gates said something to the effect: 640K of memory should be enough for anyone. Though many of us no longer believe he ever said those exact words (and he has finally made some public denials concerning this), he did, however, during a video interview with David Allison in 1993 for the National Museum of American History, Smithsonian Institution, say: "I laid out memory so the bottom 640K was general purpose RAM and the upper 384 I reserved for video and ROM, and things like that. That is why they talk about the 640K limit. It is actually a limit, not of the software, in any way, shape, or form, it is the limit of the microprocessor. That thing generates addresses, 20-bits addresses, that only can address a megabyte of memory. And, therefore, all the applications are tied to that limit. It was ten times what we had before. But to my surprise, we ran out of that address base for applications within -- oh five or six years people were complaining." (from a transcript of the interview, under the "Microsoft and the Mouse" section). For a bit more info, see: Did Bill Gates say the 640k line? and perhaps of more interest to others, here are some verifiable quotes from Mr. Gates.


    JScript
    is Microsoft's dialect of the ECMAScript standard[2] that is used in Microsoft's Internet Explorer.
    Comparison to JavaScript

    As explained by JavaScript guru Douglas Crockford in his talk titled The JavaScript Programming Language on YUI Theater,

    [Microsoft] did not want to deal with Sun Microsystems about the trademark issue, and so they called their implementation JScript. A lot of people think that JScript and JavaScript are different but similar languages. That's not the case. They are just different names for the same language, and the reason the names are different was to get around trademark issues.[5]

    However, JScript supports conditional compilation, which allows a programmer to selectively execute code within block comments. This is an extension to the ECMAScript standard that is not supported in other JavaScript implementations, thus making the above statement not completely true.


    The DOS operating system had two types of memory it could use, page xxx of the DOS 5 manual conventional, or physical or expanded memory, and Extended or virtual memory on the chip (CPU).
    It also had instructions on how to address the A20 line


    The A20, or addressing line 20, is one of the electrical lines that make up the system bus of an x86-based computer system. The A20 line in particular is used to transmit the 21st bit on the address bus.

    A microprocessor typically has a number of addressing lines equal to the base-two logarithm of its physical addressing space. For example, a processor with 4 GB of physical addressing space requires 32 lines, which are named A0 through A31. The lines are named after the zero-based number of the bit in the address that they are transmitting. The least significant bit is first and is therefore numbered bit 0 and signaled on line A0. A20 transmits bit 20 (the 21st bit) and becomes active once addresses reach 1 MB, or 220.


    The high memory area is available in real mode on 80286 processors if the A20 gate is not disabled.

    The early Intel 8086, Intel 8088, and Intel 80186 processors had 20 address lines, numbered A0 to A19; with these, the processor can access 220 bytes, or 1 MB. Internal address registers of such processors only had 16 bits. To access a 20-bit address space, an external memory reference was made up of a 16-bit Offset address added to a 16-bit Segment number, shifted 4 bits so as to produce a 20-bit physical address. The resulting address is equal to Segment * 16 + Offset. There are many combinations of segment and offset that produce the same 20-bit physical address. Therefore, there were various ways to address the same byte in memory. For example, here are four of the 4096 different segment:offset combinations, all referencing the byte whose physical address is 0x000FFFFF (the last byte in 1 MB-memory space):

    F000:FFFF
    FFFF:000F
    F555:AAAF
    F800:7FFF

    Referenced the last way, an increase of one in the offset yields F800:8000, which is a proper address for the processor, but since it translates to the physical address 0x00100000 (the first byte over 1 MB), the processor would need another address-line for actual access to that byte. Since there is no such line on the 8086 line of processors, the 21st bit above, while set, gets dropped, causing the address F800:8000 to "wrap around" and to actually point to the physical address 0x00000000.


    It may assist to know that I claim to have written Microsoft DOS 5 (while three Microsoft coders were working on at the same time. I also wrote the manul for them and put it into a database.
    Interestingly NEC in Japan was working on it (DOS 5) at the same time and completed it two days before Microsoft did, but Microsoft accused them of copying Microsoft's code and they backed down, despite (Asian DOS) having a different kernel.
    My advantage was that I worked 24 hours a day, 7 days a week without breaks for food and sleep, while the Microsoft boys had to stop at the end of each day and make sure their individual work fitted together with what the others had been doing, leading to much rewriting and revision. An incredible claim if true, I know, but I shall expand on it.

    The intel 80386 chip (introduced in October 1985)  added a 32-bit architecture and a paging translation unit, which made it much easier to implement operating systems that used virtual memory. It also offered support for register debugging.


    When the first 386 computers were released n 1990 they could use an external modem which used BitCom software Here is the BitCom manual  BitCom


    The Intel 80386, also known as i386 or just 386, is a 32-bit microprocessor introduced in 1985.[1] The first versions had 275,000 transistors[2] and were the CPU of many workstations and high-end personal computers of the time. As the original implementation of the 32-bit extension of the 80286 architecture,[3] the 80386 instruction set, programming model, and binary encodings are still the common denominator for all 32-bit x86 processors, which is termed the i386-architecture, x86, or IA-32, depending on context.

    The 32-bit 80386 can correctly execute most code intended for the earlier 16-bit processors such as 8086 and 80286 that were ubiquitous in early PCs. (Following the same tradition, modern 64-bit x86 processors are able to run most programs written for older x86 CPUs, all the way back to the original 16-bit 8086 of 1978.) Over the years, successively newer implementations of the same architecture have become several hundreds of times faster than the original 80386 (and thousands of times faster than the 8086).[4] A 33 MHz 80386 was reportedly measured to operate at about 11.4 MIPS.[5]

    The 80386 was introduced in October 1985, while manufacturing of the chips in significant quantities commenced in June 1986.[6][7] Mainboards for 80386-based computer systems were cumbersome and expensive at first, but manufacturing was rationalized upon the 80386's mainstream adoption. The first personal computer to make use of the 80386 was designed and manufactured by Compaq[8] and marked the first time a fundamental component in the IBM PC compatible de facto-standard was updated by a company other than IBM.

    In May 2006, Intel announced that 80386 production would stop at the end of September 2007.[9] Although it had long been obsolete as a personal computer CPU, Intel and others had continued making the chip for embedded systems. Such systems using an 80386 or one of many derivatives are common in aerospace technology and electronic musical instruments, among others. Some mobile phones also used (later fully static CMOS variants of) the 80386 processor, such as BlackBerry 950[10] and Nokia 9000 Communicator.


    Footnotes
    1[Return to Text] KiB is the abbreviation for a kibibyte (a contraction of kilo binary byte) or binary kilobyte. It is equal to 2 to the 10th power (2^10) or 1024 bytes. Likewise, MiB is a mebibyte (mega binary byte); equal to 2 to the 20th power (2^20) or 1,048,576 bytes, and GiB is a gibibyte (giga binary byte); equal to 2 to the 30th power (2^30) or 1,073,741,824 bytes. In this document, which refers to memory in early IBM PCs (and the Intel 8086 and 80286 CPUs), we may at times refer to kibibytes using the abbreviation "kb" instead of KiB. (It is taking a long time for "kibi-", "mebi-" and "gibi-" to be recognized by techs and computer sales departments as the proper way to refer to memory ; even though they have been in official standards organizations for many years. See, for example, NIST: Prefixes for binary multiples, and Adoption by IEC and NIST.)


    64 KiB is also equal to 2 to the 16th power [ (2^16) = (2^10) x (2^6) = (1024) x (64) = 65,536 ] bytes, and each of the two-byte (or 16-bit) registers in an 8086 CPU can contain a maximum value of: 1111 1111 1111 1111 in binary or FFFFh (hexadecimal). In decimal, that's: [(15 x 16^3) + (15 x 16^2) + (15 x 16^1) + 15] = [(15 x 4096) + (15 x 256) + (15 x 16) + 15] = 61,440 + 3,840 + 240 + 15 = 65,535. However, since memory always begins with zero (0) as its first location, that gives us 65,535 + 1 = 65,536 (or, 16^4) memory locations. 65,536 divided by 1024 per KiB = 64 KiB of memory. For more on the use of Hexadecimal in computers, see: What Is "Hexadecimal"?

    2[Return to Text] This is 4 gibibytes (see Footnote #1) or 4 times (2^30) = 4,294,967,296 bytes.



    4[Return to Text]   1 MiB of memory is 1,048,576 bytes (2^20 bytes), but the Segment:Offset addressing scheme actually allows one to access up to 10FFEFh, plus one, bytes of memory, or 1,114,096 bytes. We'll have more to say about this and the HMA (High Memory Area) shortly.

    5[Return to Text] As we said above, until an IBM PC (or clone) actually had more than 1MiB of memory, it was expedient for the early IBM PCs to effectively wrap-around to the beginning of memory whenever programs tried to access an address past FFFFFh bytes.

    [Sorry, this FOOTNOTE IS STILL UNDER CONSTRUCTION! It will soon have some links about the IBM PC AT model's keyboard and the infamous A20 line!]

    The Intel 80386, also known as i386 or just 386, is a 32-bit microprocessor introduced in 1985.[1] The first versions had 275,000 transistors[2] and were the CPU of many workstations and high-end personal computers of the time. As the original implementation of the 32-bit extension of the 80286 architecture,[3] the 80386 instruction set, programming model, and binary encodings are still the common denominator for all 32-bit x86 processors, which is termed the i386-architecture, x86, or IA-32, depending on context.

    The 32-bit 80386 can correctly execute most code intended for the earlier 16-bit processors such as 8086 and 80286 that were ubiquitous in early PCs. (Following the same tradition, modern 64-bit x86 processors are able to run most programs written for older x86 CPUs, all the way back to the original 16-bit 8086 of 1978.) Over the years, successively newer implementations of the same architecture have become several hundreds of times faster than the original 80386 (and thousands of times faster than the 8086).[4] A 33 MHz 80386 was reportedly measured to operate at about 11.4 MIPS.[5]

    The 80386 was introduced in October 1985, while manufacturing of the chips in significant quantities commenced in June 1986.[6][7] Mainboards for 80386-based computer systems were cumbersome and expensive at first, but manufacturing was rationalized upon the 80386's mainstream adoption. The first personal computer to make use of the 80386 was designed and manufactured by Compaq[8] and marked the first time a fundamental component in the IBM PC compatible de facto-standard was updated by a company other than IBM.

    In May 2006, Intel announced that 80386 production would stop at the end of September 2007.[9] Although it had long been obsolete as a personal computer CPU, Intel and others had continued making the chip for embedded systems. Such systems using an 80386 or one of many derivatives are common in aerospace technology and electronic musical instruments, among others. Some mobile phones also used (later fully static CMOS variants of) the 80386 processor, such as BlackBerry 950[10] and Nokia 9000 Communicator.

     

    Make a free website with Yola