OpenVMS Notes: Text File Structures

Menu: Hack-1 | Analysis-1 | Hack-2 | Hack3 | EOL Markers + FTP

Document Scope: an application programmer's view of text storage

Hack #1

On OpenVMS, use EDT or EVE to create a 10-line non-stream (vanilla) text file that looks like this. Make sure you have no trailing spaces, no embedded control characters, and no blank lines (so don't hit <enter> after typing in the last line; just save then exit).

$type yada.txt
1234567890
123456789
12345678
1234567
123456
12345
1234
123
12
1
$

$dir/full yada.txt
Directory CSMIS$USER3:[ADMCSM.NEIL]
YADA.TXT;5                    File ID:  (320,23,0)            
Size:            1/9          Owner:    [NEIL]
Created:     2-JAN-2005 14:54:05.35
Revised:     2-JAN-2005 14:54:05.40 (2)
Expires:    <None specified>
Backup:     <No backup recorded>
Effective:  <None specified>
Recording:  <None specified>
Accessed:   <None specified>
Attributes: <None specified>
Modified:   <None specified>
Linkcount:  1
File organization:  Sequential
Shelved state:      Online 
Caching attribute:  Writethrough
File attributes:    Allocation: 9, Extend: 0, Global buffer count: 0, No version limit
Record format:      Variable length, maximum 255 bytes, longest 10 bytes Note: Variable means each record uses a length indicator
Record attributes:  Carriage return carriage control Note: this means add a <cr> and <lf> to each record after retrieval
RMS attributes:     None
Journaling enabled: None
File protection:    System:RWED, Owner:RWED, Group:RWED, World:RWE
Access Cntrl List:  None
Client attributes:  None
Total of 1 file, 1/9 blocks.

$ana/rms  yada.txt
Check RMS File Integrity                      2-JAN-2005 14:58:39.29   Page 1
CSMIS$USER3:[ADMCSM.NEIL]YADA.TXT;5
FILE HEADER
        File Spec: CSMIS$USER3:[ADMCSM.NEIL]YADA.TXT;5
        File ID: (320,23,0)
        Owner UIC: [NEIL]
        Protection:  System: RWED, Owner: RWED, Group: RWED, World: RWE
        Creation Date:    2-JAN-2005 14:54:05.35
        Revision Date:    2-JAN-2005 14:54:05.40, Number: 2
        Expiration Date: none specified
        Backup Date:     none posted
        Contiguity Options:  none
        Performance Options: none
        Reliability Options: none
        Journaling Enabled:  none
RMS FILE ATTRIBUTES
        File Organization: sequential
        Record Format: variable
        Record Attributes:  carriage-return 
        Maximum Record Size: 255
        Longest Record: 10
        Blocks Allocated: 9, Default Extend Size: 0
        End-of-File VBN: 1, Offset: %X'0050' 80 Note: this file's EOF marker is at byte # 80
        File Monitoring: disabled
        File Length Hint (Record Count):     10 Note: this is the number of lines in my file
        File Length Hint (Data Byte Count):  55 Note: this is the actual stored byte count without padding, length counts, etc.
        Global Buffer Count: 0
The analysis uncovered NO errors.
ANA/RMS YADA.TXT

$dump yada.txt
Dump of file CSMIS$USER3:[ADMCSM.NEIL]YADA.TXT;5 on  2-JAN-2005 14:54:57.07
File ID (320,23,0)   End of file block 1 / Allocated 9
Virtual block number 1 (00000001), 512 (0200) bytes
 36353433 32310008 00393837 36353433 32310009 30393837 36353433 3231000A ..1234567890..123456789...123456 000000
 32310004 00353433 32310005 36353433 32310006 00373635 34333231 00073837 78..1234567...123456..12345...12 000020
 00000000 00000000 00000000 0000FFFF 00310001 32310002 00333231 00033433 34..123...12..1................. 000040
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000060
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000080
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0000A0
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0000C0
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0000E0
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000100
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000120
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000140
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000160
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000180
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0001A0
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0001C0
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0001E0

Analysis #1

  1. Each RMS record begins with a 16-bit length specifier capable of a describing a variable length string up to 32767 (0x7FFF) bytes in length. OpenVMS strings cannot exceed this value.
    note: using either $DIR/FULL or $ANA/RMS for this file reveals that the maximum record size is set to 255 bytes
  2. Notice that occasionally a <null> byte is inserted into the file to word-align the next 32-bit length specifier. This padding byte is never counted in the length specifiers.
  3. Notice that normally, there are no embedded paper commands like <cr> (carriage return) or <lf> (line return). The original designers of VMS realized that the driver associated with the desired output device would insert these so-called paper commands as required. This is one reason that text files must be FTP'd into VMS using ASCII or TEXT mode. The end-of-line character used on the remote system must be stripped off for storage in this kind of file structure.
  4. 0x0000 means blank line (no data bytes on this line)
  5. 0xFFFF followed by <nul> bytes until EOF means no more RMS data
              0008                       0009                       0010 <--- record length in bytes (not including padding)
  6 5 4 3  2 1        9 8 7  6 5 4 3  2 1      0 9 8 7  6 5 4 3  2 1     <--- data characters
                   00                                                    <--- padding to word-align length data
 -------- -------- -------- -------- -------- -------- -------- -------- ---------------------------------------
 36353433 32310008 00393837 36353433 32310009 30393837 36353433 3231000A ..1234567890..123456789...123456 000000
 32310004 00353433 32310005 36353433 32310006 00373635 34333231 00073837 78..1234567...123456..12345...12 000020
 00000000 00000000 00000000 0000FFFF 00310001 32310002 00333231 00033433 34..123...12..1................. 000040
                                     00                00                <--- padding to word-align length data
                                        1      2 1        3 2 1      4 3 <--- data characters
                                         0001     0002          0003     <--- record length in bytes (not including padding)
                                FFFF                                     <--- \ FFFF and null to EOF means...
 00000000 00000000 00000000 0000                                         <--- /     ...no more data

Hack #2

Use EDT or EVE to create a non-stream text file on OpenVMS.

                                              0001___0     0001     0001 <--- record length in bytes (not including padding)
                                           ++--------------------------- "A"
                                           ||            ++------------- <bel>
                                           ||            ||       ++---- <nul>
                                         00            00       00       <--- padding to word-align the length data
 00000000 00000000 00000000 00000000 FFFF                                <--- means no more data
 -------- -------- -------- -------- -------- -------- -------- -------- ---------------------------------------
 00000000 00000000 00000000 00000000 FFFF0041 00010000 00070001 00000001 ............A................... 000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000020

Analysis #2

Hack #3

$cre stream_lf.dat
<ctrl-Z>
$set file stream_lf.dat /attr=(rfm:stmlf,lrl:32767,mrs:0,rat:cr)
$ana/rms/fdl=stream_lf.fdl stream_lf.dat
$convert/create/fdl=stream_lf.fdl yada.txt yada_lf.txt
$dump yada_lf.txt
$ana/rms  yada_lf.txt
Check RMS File Integrity                      3-JAN-2005 07:02:57.60   Page 1
CSMIS$USER3:[ADMCSM.NEIL]yada_lf.TXT;5
FILE HEADER
        File Spec: CSMIS$USER3:[ADMCSM.NEIL]yada_lf.TXT;5
        File ID: (479,34,0)
        Owner UIC: [NEIL]
        Protection:  System: RWED, Owner: RWED, Group: RWED, World: RWE
        Creation Date:    3-JAN-2005 00:20:23.92
        Revision Date:    3-JAN-2005 00:20:23.96, Number: 2
        Expiration Date: none specified
        Backup Date:     none posted
        Contiguity Options:  none
        Performance Options: none
        Reliability Options: none
        Journaling Enabled:  none
RMS FILE ATTRIBUTES
        File Organization: sequential
        Record Format: stream-LF                Note: this means that each record is terminated with <lf>
        Record Attributes:  carriage-return     Note: this means add a <cr> to each record after retrieval
        Maximum Record Size: 255
        Longest Record: 10
        Blocks Allocated: 9, Default Extend Size: 0
        End-of-File VBN: 1, Offset: %X'0047' 71 Note: this file's EOF marker is at byte # 71
        File Monitoring: disabled
        Global Buffer Count: 0
The analysis uncovered NO errors.
ANA/RMS yada_lf.TXT
$dump yada_lf.txt
     ++------------------++---------------------++---------------------- <lf>
  2 1   8  7 6 5 4  3 2 1    9 8 7 6  5 4 3 2  1   0 9  8 7 6 5  4 3 2 1 <--- data characters
 32310A38 37363534 3332310A 39383736 35343332 310A3039 38373635 34333231 1234567890.123456789.12345678.12 000000
 310A3231 0A333231 0A343332 310A3534 3332310A 36353433 32310A37 36353433 34567.123456.12345.1234.123.12.1 000020
 00000000 00000000 00000000 00000000 00000000 00000000 000A3534 3332310A .12345.......................... 000040
                                                            5 4  3 2 1   <--- ASCII data characters
                                                         ++-----------++ <lf>
                                                         ++------------- EOF is also located at byte 0X47

Analysis #3

EOL Markers + FTP

Just about every operating system uses its own peculiar way to store text data.

Text is stored in files using these two data formats:

Format Notes
ASCII  
EBCDIC (almost exclusively reserved for older IBM mainframes)

They each employ one these EOL (end-of-line) markers:

EOL Marker Notes
<cr> Seen in DOS
<lf> Seen in older UNIX systems
<cr><lf> Seen in Windows and newer UNIX systems
<lf><cr>  
<ctrl-Z> Seen in some CP/M systems
<ctrl-^> Seen in older QNX systems

If you don't believe me then consider the following problem often seen on Windows platforms. Opening a text file with NOTEPAD usually works but if you see junk on the screen then try opening with WORDPAD which almost always works. How can this be? Well, the authors of WORDPAD put some special logic into their app to take care of foreign-formatted text files. Excel can do this too when importing data from text files containing either CSV or XML data.

Back in the day, the people who invented FTP were aware of this problem and so developed ASC (ASCII) Transfer Mode. When an FTP connection is placed into ASC mode

  1. the file is read using EOL rules at the sending end
  2. the data is transmitted to the far followed by an end-of-line meta data marker
  3. the file is written using EOL rules of the receiving end.

HPFM! (Hocus Pocus - Frickin Magic)

References:


Back to OpenVMS
Back to Home
Neil Rieck
Kitchener - Waterloo - Cambridge, Ontario, Canada.