SGML ÉùÃ÷

Ŀ¼
  1. Îĵµ×Ö·û¼¯
    1. Êý¾Ýת»»
  2. SGML ÉùÃ÷

Îĵµ×Ö·û¶©

´Ó SGML ½Ç ¶È À´ ¿´ HTML 4.0 ÎÄ µµ ×Ö ·û ¼¯ ÊÇ [ISO10646] µÄ ÊÀ ½ç ×Ö ·û ¼¯ (Universal Character Set, UCS). Ä¿ ǰ, Ëü Íê È« Öð ×Ö µÈ ¼Û ÓÚ [UNICODE] ±ê ×¼.

Êý¾Ýת»»

µ± HTML ÎÄ ±¾ Óà UCS-2 (charset="UNICODE-1-1") Ö± ½Ó ´« ËÍ µÄ ʱ ºò, Äã ±Ø ¶¨ »á ¹Ø ÐÄ Ëü µÄ λ Ôª ´Î Ðò: ¶Ô ÓÚ Ë« λ Ôª ×Ö ·û, ¸ß λ Ôª ÊÇ ÏÈ ËÍ »¹ ÊÇ ºó ËÍ? Õâ ·Ý Ëµ Ã÷ Êé ½¨ Òé UCS-2 ÒÔ big-endian λ Ôª ´Î Ðò (ÏÈ ´« ËÍ ¸ß λ Ôª) ´« Êä, Ëü ͬ ʱ ·û ºÏ È· ÈÏ Íø Âç λ Ôª ´« ËÍ ¹æ Ôò ÒÔ ¼°  UNICODE ([ISO10646] µÄ UTF-1 (ÓÉ IANA ×÷ Ϊ ISO-10646-UTF-1 ×¢ ²á) ±ä ÐÎ ¸ñ ʽ, ½« ²» ±» ʹ ÓÃ.

SGML ÉùÃ÷

   <!SGML  "ISO 8879:1986"
   --
        SGML Declaration for HyperText Markup Language version 4.0

        With support for Unicode UCS-4 and increased limits
        for tag and literal lengths etc.
   --

   CHARSET
            BASESET  "ISO Registration Number 177//CHARSET
                      ISO/IEC 10646-1:1993 UCS-4 with
                      implementation level 3//ESC 2/5 2/15 4/6"
            DESCSET  0   9     UNUSED
                     9   2     9
                     11  2     UNUSED
                     13  1     13
                     14  18    UNUSED
                     32  95    32
                     127 1     UNUSED
                     128 32    UNUSED
                     160 2147483486 160
   --
       In ISO 10646, the positions with hexadecimal
       values 0000D800 - 0000DFFF, used in the UTF-16
       encoding of UCS-4, are reserved, as well as the last
       two code values in each plane of UCS-4, i.e. all
       values of the hexadecimal form xxxxFFFE or xxxxFFFF.
       These code values or the corresponding numeric
       character references must not be included when
       generating a new HTML document, and they should be
       ignored if encountered when processing a HTML
       document.
   --

   CAPACITY        SGMLREF
                   TOTALCAP        150000
                   GRPCAP          150000
             ENTCAP         150000

   SCOPE    DOCUMENT
   SYNTAX
            SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
              17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 127
            BASESET  "ISO 646IRV:1991//CHARSET
                      International Reference Version
                      (IRV)//ESC 2/8 4/2"
            DESCSET  0 128 0

            FUNCTION
                     RE            13
                     RS            10
                     SPACE         32
                     TAB SEPCHAR    9

            NAMING   LCNMSTRT ""
                     UCNMSTRT ""
                     LCNMCHAR ".-"  -- ?include "~/_" for URLs? --
                     UCNMCHAR ".-"
                     NAMECASE GENERAL YES
                              ENTITY  NO
            DELIM    GENERAL  SGMLREF
                     SHORTREF SGMLREF
            NAMES    SGMLREF
            QUANTITY SGMLREF
                     ATTSPLEN 65536   -- These are the largest values --
                     LITLEN   65536   -- permitted in the declaration --
                     NAMELEN  65536   -- Avoid fixed limits in actual --
                     PILEN    65536   -- implementations of HTML UA's --
                     TAGLVL   100
                     TAGLEN   65536
                     GRPGTCNT 150
                     GRPCNT   64

   FEATURES
     MINIMIZE
       DATATAG  NO
       OMITTAG  YES
       RANK     NO
       SHORTTAG YES
     LINK
       SIMPLE   NO
       IMPLICIT NO
       EXPLICIT NO
     OTHER
       CONCUR   NO
       SUBDOC   NO
       FORMAL   YES
   >