The VMS SharkOpenVMS Notes: Text File Structures

Menu: Hack-1 | Analysis-1 | Hack-2 | Hack3 | EOL Markers + FTP

Document Scope: a VMS application programmer's view of text storage

Hack #1

On OpenVMS, use EDT (edit/edt) or EVE (edit/tpu) to create a 10-line non-stream (vanilla) text file that looks like this. Make sure you have no trailing spaces, no embedded control characters, and no blank lines (so don't hit <enter> after typing in the last line; just save then exit).

Legend: <ur> = user response
        <sr> = system response

<sr> $
<ur> type yada.txt
<sr> 1234567890
     123456789
     12345678
     1234567
     123456
     12345
     1234
     123
     12
     1

<sr> $
<ur> dir/full yada.txt
<sr> Directory CSMIS$USER3:[ADMCSM.NEIL]
     YADA.TXT;5                    File ID:  (320,23,0)            
     Size:            1/9          Owner:    [NEIL]
     Created:     2-JAN-2005 14:54:05.35
     Revised:     2-JAN-2005 14:54:05.40 (2)
     Expires:    <None specified>
     Backup:     <No backup recorded>
     Effective:  <None specified>
     Recording:  <None specified>
     Accessed:   <None specified>
     Attributes: <None specified>
     Modified:   <None specified>
     Linkcount:  1
     File organization:  Sequential
     Shelved state:      Online 
     Caching attribute:  Writethrough
     File attributes:    Allocation: 9, Extend: 0, Global buffer count: 0, No version limit
     Record format:      Variable length, maximum 255 bytes, longest 10 bytes Note: Variable means each record uses a length indicator
     Record attributes:  Carriage return carriage control Note: this means add a <cr> and <lf> to each record after retrieval
     RMS attributes:     None
     Journaling enabled: None
     File protection:    System:RWED, Owner:RWED, Group:RWED, World:RWE
     Access Cntrl List:  None
     Client attributes:  None

     Total of 1 file, 1/9 blocks.

<sr> $
<ur> ana/rms  yada.txt
<sr> Check RMS File Integrity                      2-JAN-2005 14:58:39.29   Page 1
     CSMIS$USER3:[ADMCSM.NEIL]YADA.TXT;5

     FILE HEADER

     File Spec: CSMIS$USER3:[ADMCSM.NEIL]YADA.TXT;5
     File ID: (320,23,0)
     Owner UIC: [NEIL]
     Protection:  System: RWED, Owner: RWED, Group: RWED, World: RWE
     Creation Date:    2-JAN-2005 14:54:05.35
     Revision Date:    2-JAN-2005 14:54:05.40, Number: 2
     Expiration Date: none specified
     Backup Date:     none posted
     Contiguity Options:  none
     Performance Options: none
     Reliability Options: none
     Journaling Enabled:  none

     RMS FILE ATTRIBUTES

     File Organization: sequential
     Record Format: variable
     Record Attributes:  carriage-return 
     Maximum Record Size: 255
     Longest Record: 10
     Blocks Allocated: 9, Default Extend Size: 0
     End-of-File VBN: 1, Offset: %X'0050' 80 Note: this file's EOF marker is at byte # 80
     File Monitoring: disabled
     File Length Hint (Record Count):     10 Note: this is the number of lines in my file
     File Length Hint (Data Byte Count):  55 Note: this is the actual stored byte count without padding, length counts, etc.
     Global Buffer Count: 0

     The analysis uncovered NO errors.

     ANA/RMS YADA.TXT

<sr> $
<ur> dump yada.txt
<sr> Dump of file CSMIS$USER3:[ADMCSM.NEIL]YADA.TXT;5 on  2-JAN-2005 14:54:57.07
     File ID (320,23,0)   End of file block 1 / Allocated 9
     Virtual block number 1 (00000001), 512 (0200) bytes
                                                    <<<--- read this way ---|--- read this way --->>>
     36353433 32310008 00393837 36353433 32310009 30393837 36353433 3231000A ..1234567890..123456789...123456 000000
     32310004 00353433 32310005 36353433 32310006 00373635 34333231 00073837 78..1234567...123456..12345...12 000020
     00000000 00000000 00000000 0000FFFF 00310001 32310002 00333231 00033433 34..123...12..1................. 000040
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000060
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000080
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0000A0
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0000C0
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0000E0
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000100
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000120
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000140
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000160
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000180
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0001A0
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0001C0
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0001E0

Analysis #1

  1. Each RMS record begins with a 16-bit length specifier capable of a describing a variable length string up to 32767 (0x7FFF) bytes in length. OpenVMS strings cannot exceed this value.
    note: using either $DIR/FULL or $ANA/RMS for this file reveals that the maximum record size is set to 255 bytes
  2. Notice that occasionally a <null> byte is inserted into the file to word-align the next 32-bit length specifier. This padding byte is never counted in the length specifier.
  3. Notice that normally, there are no embedded paper commands like <carriage return> or <line feed>. The original designers of VMS realized that the driver associated with the desired output device would insert these so-called paper commands as required. This is one reason that text files must be FTP'd into VMS using ASCII or TEXT mode. The end-of-line character used on the remote system must be stripped off for storage in this kind of file structure.
  4. 0x0000 means blank line (no data bytes on this line)
  5. 0xFFFF followed by <nul> bytes until EOF means no more RMS data
              0008                       0009                       0010 <--- record length in bytes (not including padding)
  6 5 4 3  2 1        9 8 7  6 5 4 3  2 1      0 9 8 7  6 5 4 3  2 1     <--- data characters
                   00                                                    <--- padding to word-align length data
 -------- -------- -------- -------- -------- -------- -------- -------- ---------------------------------------
 36353433 32310008 00393837 36353433 32310009 30393837 36353433 3231000A ..1234567890..123456789...123456 000000
 32310004 00353433 32310005 36353433 32310006 00373635 34333231 00073837 78..1234567...123456..12345...12 000020
 00000000 00000000 00000000 0000FFFF 00310001 32310002 00333231 00033433 34..123...12..1................. 000040
                                     00                00                <--- padding to word-align length data
                                        1      2 1        3 2 1      4 3 <--- data characters
                                         0001     0002          0003     <--- record length in bytes (not including padding)
                                FFFF                                     <--- \ FFFF and null to EOF means...
 00000000 00000000 00000000 0000                                         <--- /     ...no more data

Hack #2

Use EDT or EVE to create a non-stream text file on OpenVMS.

executing ANA/RMS on this file shows the EOF marker at $10 (16) 

                                              00010000     0001     0001 <--- record length in bytes (not including padding)
                                           ++--------------------------- "A"
                                           ||            ++------------- <bel>
                                           ||            ||       ++---- <nul>
                                         00            00       00       <--- padding to word-align the length data
 00000000 00000000 00000000 0000FFFF                                <--- means no more data
 -------- -------- -------- -------- -------- -------- -------- -------- ---------------------------------------
 00000000 00000000 00000000 0000FFFF 00000041 00010000 00070001 00000001 ............A................... 000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000020

Analysis #2

Hack #3

<sr> $
<ur> cre stream_lf.dat<enter>						! cerate a file
     <ctrl-Z>
<sr> $
<ur> set file stream_lf.dat /attr=(rfm:stmlf,lrl:32767,mrs:0,rat:cr)	! DCL magic to set stream=lf with <cr> records
<sr> $
<ur> ana/rms/fdl/output=stream_lf.fdl stream_lf.dat			! create an FDL
<sr> $
<ur> convert/create/fdl=stream_lf.fdl yada.txt yada_lf.txt		! convert previous file into stream lf
<sr> $
<ur> dump yada_lf.txt							! dump file to terminal in ASCII and hex
<sr> $
<ur> ana/rms  yada_lf.txt						! analyze resultant file
<sr> Check RMS File Integrity                      3-JAN-2005 07:02:57.60   Page 1
     CSMIS$USER3:[ADMCSM.NEIL]yada_lf.TXT;5

     FILE HEADER

     File Spec: CSMIS$USER3:[ADMCSM.NEIL]yada_lf.TXT;5
     File ID: (479,34,0)
     Owner UIC: [NEIL]
     Protection:  System: RWED, Owner: RWED, Group: RWED, World: RWE
     Creation Date:    3-JAN-2005 00:20:23.92
     Revision Date:    3-JAN-2005 00:20:23.96, Number: 2
     Expiration Date: none specified
     Backup Date:     none posted
     Contiguity Options:  none
     Performance Options: none
     Reliability Options: none
     Journaling Enabled:  none

     RMS FILE ATTRIBUTES

     File Organization: sequential
     Record Format: stream-LF                Note: means each record is terminated with <lf>
     Record Attributes:  carriage-return     Note: means add a <cr> to each record after retrieval
     Maximum Record Size: 255
     Longest Record: 10
     Blocks Allocated: 9, Default Extend Size: 0
     End-of-File VBN: 1, Offset: %X'0041' 65 Note: this file's EOF marker is at byte # 65
     File Monitoring: disabled
     Global Buffer Count: 0

     The analysis uncovered NO errors.

     ANA/RMS yada_lf.TXT
$dump yada_lf.txt

     ++------------------++---------------------++---------------------- <lf>
  2 1   8  7 6 5 4  3 2 1    9 8 7 6  5 4 3 2  1   0 9  8 7 6 5  4 3 2 1 <--- data characters
 32310A38 37363534 3332310A 39383736 35343332 310A3039 38373635 34333231 1234567890.123456789.12345678.12 000000
 310A3231 0A333231 0A343332 310A3534 3332310A 36353433 32310A37 36353433 34567.123456.12345.1234.123.12.1 000020
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000A ................................ 000040
                                                            5 4  3 2 1   <--- ASCII data characters
                                                                      ++ <lf>
                                                                      ++ EOF is also located at byte 0X47

Analysis #3

EOL Markers + FTP

Just about every operating system uses its own peculiar way to store text data.

Text is stored in files using these two data formats:

Format Notes
ASCII  
EBCDIC (almost exclusively reserved for older IBM mainframes)

They each employ one these EOL (end-of-line) markers:

EOL Marker Notes
<cr> Seen in DOS
<lf> Seen in older UNIX systems
<cr><lf> Seen in Windows and newer UNIX systems
<lf><cr>  
<ctrl-Z> Seen in some CP/M systems
<ctrl-^> Seen in older QNX systems

If you don't believe me then consider the following problem often seen on Windows platforms. Opening a text file with NOTEPAD usually works but if you see junk on the screen then try opening with WORDPAD which almost always works. How can this be? Well, the authors of WORDPAD put some special logic into their app to take care of foreign-formatted text files. Excel can do this too when importing data from text files containing either CSV or XML data.

Back in the day, the people who invented FTP were aware of this problem and so developed ASC (ASCII) Transfer Mode. When an FTP connection is placed into ASC mode

  1. the file is read using EOL rules at the sending end
  2. the data is transmitted to the far followed by an end-of-line meta data marker
  3. the file is written using EOL rules of the receiving end.

HPFM! (Hocus Pocus - Frickin Magic)

References:


Back to OpenVMS
Back to Home
Neil Rieck
Kitchener - Waterloo - Cambridge, Ontario, Canada.