"Resource Waits" in the VMS + OpenVMS Operating Systems

Any use of the information presented here is entirely at the reader's own risk. Always backup your system before attempting any procedure which could cause your VMS system to hang or crash. Though VMS (now known as OpenVMS) is very robust, some of the techniques presented here involve unusual kernel mode operations which are extremely risky on a production system.
 

Document #1 (1993) via DECUS

Resource Waits in the OpenVMS Operating System
or
What to do when you R-WASTed by OpenVMS

DECUS Spring '93 Atlanta Symposia VS060
David L. Cathey Montagar Software Concepts
P. O. Box 260772 Plano, TX 75026-0772 (972) 578-5036

davidc@montagar.com

David L. Cathey, Montagar Software Concepts, P.O.Box 260772, Plano, TX 75026-0772 Spring'93 DECUS Symposia Slide No. 1


Session Outline

  • What are "Resource Waits"
     
  • Resource Waits and MUTEXes
     
  • RWAST causes and descriptions
     
  • Using SDA (System Dump Analyzer) to determine causes of RWAST
     
  • Breaking processes out of RWAST
     
  • Getting tough: Brute force approaches
     

David L. Cathey, Montagar Software Concepts, P.O.Box 260772, Plano, TX 75026-0772 Spring'93 DECUS Symposia Slide No. 2


What is a "Resource Wait"

  • A Resource Wait is a type of MUTEX (MUTual EXclusion semaphore) within the OpenVMS Operating System.
     
  • Resource Waits are a set of events that suspends a process until some process or operating system resource becomes available.
     
  • Typically, the resource, or enough of a resource, becomes available and the process is resumed.
     
  • MUTEXes and Resource Waits are not always "evil". They are used to provide a form of flow control, or allow a process to maximize throughput by utilizing all available quota.
     

David L. Cathey, Montagar Software Concepts, P.O.Box 260772, Plano, TX 75026-0772 Spring'93 DECUS Symposia Slide No. 3


Resource Waits and MUTEXes (Examples)

  • A MUTEX is a synchronization mechanism (similar to a Lock, i.e. $ENQ/$DEQ) that allows protecting an operating system resource without blocking all other activity by using elevated IPL.
     
  • A MUTEX is implemented via a longword data cell in the OpenVMS executive. See "VMS Internals and Data Structures V5.2", Chapter 8.5, page 196, or "Alpha Internals and Data Structures", Chapter 9.7, page 9-50.
     
  • Examples of MUTEXes used in OpenVMS V5.5 are:
    • LNM$AL_MUTEX Logical Name Table MUTEX
       
    • IOC$GL_MUTEX I/O Database
       
    • EXE$GL_CEBMTX Common Event Block Queue
       
    • SMP$GL_CPU_MUTEX CPU Database Queue
       
    • EXE$GL_PGDYNMTX Paged Dynamic Memory
       
    • EXE$GL_GSDMTX Global Section Descriptor Queue
       
    • CIA$GL_MUTEX Cumulative Intrusion Analysis Queue
       
    • EXE$GL_BASIMGMTX Base OpenVMS Image (Loaded Images)
       

David L. Cathey, Montagar Software Concepts, P.O.Box 260772, Plano, TX 75026-0772 Spring'93 DECUS Symposia Slide No. 4


Resource Waits and MUTEXes (Data Structures)

  • The MUTEX longword is manipulated via the internal routines SCH$LOCKR, SCH$LOCKW, and SCH$UNLOCK.
     
  • The MUTEX longword is divided up into two fields: Status and Owner Count.
   31                           16 15                           0 
  +----------------------------+--+------------------------------+
  |              MBZ           | 1|          Owner Count         |
  +----------------------------+--+------------------------------+
                                Write-in-progress or 
                                Write-pending status bit 
  • Other related structures:
    • PCB$W_MXTCNT Count of MUTEXes owned by this process
       
    • PCB$L_EFWM Address (0x8nnnnnnn) of pending MUTEX
       

David L. Cathey, Montagar Software Concepts, P.O.Box 260772, Plano, TX 75026-0772 Spring'93 DECUS Symposia Slide No. 5


Example of Accessing a MUTEX

; Note that this code must be in Kernel mode, 
; in order to access the MUTEX data cell for R/W access. 
; 
; Grab the Intrusion Queue mutex, so we can 
; scan it safely... 

        moval g^CIA$GL_MUTEX,r0 
        jsb g^SCH$LOCKW ; Lock MUTEX 

        movl    g^CIA$GQ_INTRUDER,r3    ; Get first intrusion blk 
        moval   g^CIA$GQ_INTRUDER,r4    ; Get listhead address 
1$:     cmpl    r3,r4                   ; If r3 is listhead, bail 
        beql    5$ 
        ...                             ; Do lots of neat stuff 
        movl    CIA$L_FLINK(r3),r3      ; Get next intrusion blk 
        brw     1$ 
5$: 
        moval   g^CIA$GL_MUTEX,r0       ; Unlock MUTEX 
        jsb     g^SCH$UNLOCK 

David L. Cathey, Montagar Software Concepts, P.O.Box 260772, Plano, TX 75026-0772 Spring'93 DECUS Symposia Slide No. 6


Resource Waits and MUTEXes (Processes)

  • Resource Waits are seen when a process requests an operating system resource or a process resource (quota), and there is not enough. The process will be put into an MWAIT state, and the cause for the wait is placed in the Event Flag Wait Mask (IDSM, Chapter 12, page 283).
     
  • A process can be set to disallow placement into a Resource Wait by setting the Resource Wait Mode using the $SETRWM system service. Note: This will not completely disable all Resource Waits. It will prevent most cases. More on this later!
     
  • Processes in a resource wait are identified by "SHOW SYSTEM" with a state starting with "RWxxx".
     

David L. Cathey, Montagar Software Concepts, P.O.Box 260772, Plano, TX 75026-0772 Spring'93 DECUS Symposia Slide No. 7


Resource Waits and MUTEXes (Reason Codes)

  • OpenVMS V5.x defines these and other Resource Waits (defined in $RSNDEF in the macro library SYS$LIBRARY:LIB.MLB):
     
State
Reason Code
Value
Meaning
RWAST
RSN$_ASTWAIT
1
Wait for AST event
RWMBX
RSN$_MAILBOX
2
Mailbox I/O
RWNPG
RSN$_NPDYNMEM
3
Nonpaged Dynamic Memory
RWPAG
RSN$_PGDYNMEM
5
Paged Dynamic Memory
RWMPE
RSN$_MPLEMPTY
11
Waiting for Modified List to empty
RWMPB

RSN$_MPWBUSY

12

Modified Page Writer Busy
(ReallyWantedMyProcessBack - Pat O.)

RWSCS

RSN$_SCS

13

System Communications Services

RWCAP
RSN$_CPUCAP
15
CPU Capability (Vectors, etc)
RWCSV
RSN$_CLUSRV
16
Cluster Server Process Busy

David L. Cathey, Montagar Software Concepts, P.O.Box 260772, Plano, TX 75026-0772 Spring'93 DECUS Symposia Slide No. 8


Resource Waits and MUTEXes (Executive)

  • A Resource Wait is entered by the OpenVMS Executive when an exhaustion of a resource is detected. The routine SCH$RWAIT is called with the RSN$_nnnnnnn symbol as input. The RSN$ code is placed in the PCB$L_EFWM and the process state is set to MWAIT.
     
  • The bit corresponding to the RSN$ code is set in the system longword SCH$GL_RESMASK, i.e. RSN$_MPWBUSY = 12.
     
   31                                    12                     0 
  +-------------------------------------+--+---------------------+
  |                                     | 1|                     |
  +-------------------------------------+--+---------------------+
  • When the OpenVMS Executive determines that a resource has been freed, (via the RSE routine) it check the SCH$GL_RESMASK to determine if any processes are waiting on the resource. If so, the MWAIT process queue is scanned to determine which processes should be rescheduled.
     

David L. Cathey, Montagar Software Concepts, P.O.Box 260772, Plano, TX 75026-0772 Spring'93 DECUS Symposia Slide No. 9


Example of Putting a Process in a Resource Wait

; 
;       Put self in RWAST if there unable to allocate 
;       the required non-paged pool... 
; 
;       Assume R4 hold value of current PCB 
1$: 
        movl    #GOOF$C_LENGTH,r1 

        jsb     g^EXE$DEBIT_BYTCNT_ALO  ; Allocate 1000 bytes 
        blbs    r0,5$ 
        movl    #RSN$_NPDYNMEM,r0       ; Can't do it, wait until 
        jsb     SCH$RWAIT               ; the system frees some and 
        brb     1$                      ; then try it again... 
5$: 
        movl    r1,GOOF$W_SIZE(r2)      ; Play with our new buffer 
        ... 

David L. Cathey, Montagar Software Concepts, P.O.Box 260772, Plano, TX 75026-0772 Spring'93 DECUS Symposia Slide No. 10


RWAST Causes and Descriptions

  • RWAST == Resource Wait for AST-related event
     
  • RWAST generally occurs for the following reasons:
    • Process is waiting for the deletion of a sub process.
       
    • AST Limit has been exhausted (ASTLM)
       
    • Direct I/O Limit has been exhausted (DIOLM)
       
    • Buffered I/O Limit has been exhausted (BIOLM)
       
    • Process is waiting for outstanding I/O to complete.
       
  • Quota exhaustion typically is transient, as the process is allowed to continue executing once an I/O or other event has completed.
     
  • Waiting for a process deletion is also transient, unless that process is not computable (i.e. it's stuck in RWAST, too!).
     

David L. Cathey, Montagar Software Concepts, P.O.Box 260772, Plano, TX 75026-0772 Spring'93 DECUS Symposia Slide No. 11


Using SDA to Determine RWAST Cause

  • Process state will be MWAIT or RW???
    • If the state is "MWAIT", the process is in a MUTEX state. Use "EVAL/ADDRESS on the address specified in the Event Flag Wait Mask to determine which MUTEX is locked.
       
    • If the state is "RW???", then the process is in a resource wait. The Event Flag Wait Mask will be set to the reason code, which should match the state shown by SDA.
       
  • Use "SHOW PROCESS" and "SHOW PROCESS/CHANNEL" to see what it is waiting on.
    • A "0" (BYTLM/BYTCNT may be a small value) a quota mean exhausted quota.
       
    • A "busy" channel means outstanding I/O
       
    • Otherwise, check subprocess count for active subprocesses. Analyze them for Resource Wait problems.
       

David L. Cathey, Montagar Software Concepts, P.O.Box 260772, Plano, TX 75026-0772 Spring'93 DECUS Symposia Slide No. 12


Sample SDA Output: SHOW PROCESS/INDEX

SDA> SHOW PROCESS/INDEX=0CF 

            Process index: 000F   Name: DAVIDC_1   Extended PID: 000000CF 
- - - - - - -------------------------------------------------------------
Status : 02040001 res,phdres
Status2: 00000001 quantum_resched
PCB address              805659E0    JIB address              806D2F80 
PHD address              808F9000    Swapfile disk address    00000000 
Master internal PID      00020019    Subprocess count                1 
Internal PID             0003000F    Creator internal PID     00020019 
Extended PID             000000CF    Creator extended PID     00000099 
State                       RWAST    Termination mailbox          002F 
Current priority                6    AST's enabled                KESU 
Base priority                   4    AST's active                 NONE 
UIC                [00002,000001]    AST's remaining               148 
Mutex count                     0    Buffered I/O count/limit        0/40 <---+
Waiting EF cluster              0    Direct I/O count/limit         40/40     |
Starting wait time       1B001B1B    BUFIO byte count/limit      30800/30800  |
Event flag wait mask     00000001<-+ # open files allowed left     147        |
Local EF cluster 0       E0000000  | Timer entries allowed left     20        |
Local EF cluster 1       00000000  | Active page table count         0        |
Global cluster 2 pointer 00000000  | Process WS page count         161        |
Global cluster 3 pointer 00000000  | Global WS page count           40        |
                                   |                                          |
                                   |                    Zero remaining quota--+ 
                                   |
                                   +- Event Flag Mask == 1 == RSN$_ASTWAIT 
                                      if 8nnnnnnn, then it would indicate which MUTEX 

David L. Cathey, Montagar Software Concepts, P.O.Box 260772, Plano, TX 75026-0772 Spring'93 DECUS Symposia Slide No. 13


Sample SDA Output: SHOW PROCESS/CHANNEL

SDA> SHOW PROCESS/INDEX=0CF/CHANNEL 

            Process index: 000F   Name: DAVIDC_1   Extended PID: 000000CF 
- - - - - - -------------------------------------------------------------

                            Process active channels
                            -----------------------

            Channel  Window           Status        Device/file accessed
- - - - - - -------  ------           ------        --------------------
  0010  00000000                        DUA0: 
  0020  8071C470                        DUA0:[DAVIDC.RWAST]RWAST_BIO.EXE;4 
  0030  00000000            Busy        MBA50: <-+
  0040  00000000                        TWA3:    |
  0050  00000000                        TWA3:    |
                                                 |
 Mailbox I/O incomplete, probably needs flushing-+

David L. Cathey, Montagar Software Concepts, P.O.Box 260772, Plano, TX 75026-0772 Spring'93 DECUS Symposia Slide No. 14


Breaking Processes Out of RWAST

  • Incomplete I/O make up most of the "undeletable" processes.
     
  • Different device types require different ways to break the I/O.
     
  • Typical types of incomplete I/O cases that are seen:
    • Network devices (NETnn: and RTAn:)
       
    • Mailboxes (MBAnnn:)
       
    • Disks
       
    • Tapes
       
    • Printers (believe it or not...)
       

David L. Cathey, Montagar Software Concepts, P.O.Box 260772, Plano, TX 75026-0772 Spring'93 DECUS Symposia Slide No. 15


Network Devices

  • Network devices can generally be freed by telling NCP to disconnect the link between the processes:
     
$ MCR NCP SHOW KNOW LINKS $! kill the link that seems to be connected to the RWAST'd process $ MCR NCP DISCONNECT LINK
 
  • Example:
     
$ MCR NCP SHOW KNOW LINK
 
Known Link Volatile Summary as of 6-APR-1993 20:17:47
 
Link
Node
PID
Process
Remote link
Remote user
8193
1.42 (AVATAR)
21600033
REMACP
8445
DAVIDC
 
$ MCR NCP DISCONNECT LINK 8193
 

David L. Cathey, Montagar Software Concepts, P.O.Box 260772, Plano, TX 75026-0772 Spring'93 DECUS Symposia Slide No. 16


Mailbox Devices

  • Once the proper mailbox is discovered, use a utility such as COPY to dump data into, or out of, the mailbox:
     
  • For example, if the mailbox name was MBA1284, one of the two following commands should clear the condition:
     
$ COPY MBA1284: NLA0:
or
$ COPY LOGIN.COM MBA1284:
 
It's probably a better practice to copy from before copying to...
 

David L. Cathey, Montagar Software Concepts, P.O.Box 260772, Plano, TX 75026-0772 Spring'93 DECUS Symposia Slide No. 17


Tape Devices

  • Tape devices occasionally fall off-line, and are not handled correctly. If this happens, it can usually be fixed by:
    • Reload the tape and place back on-line
       
    • DISMOUNT/ABORT [tape_drive]
       
    • Force a "pack acknowledge":
       
        devnam: .ascid  /MUA0:/ 
        chan:   .word   0 
                .entry  packack,0 
                $ASSIGN_S       chan=chan,- 
                                devnam=devnam 
                $QIOW_S         chan=chan,- 
                                func=#IO$_PACKACK 
                ret 
                .end     packack 

David L. Cathey, Montagar Software Concepts, P.O.Box 260772, Plano, TX 75026-0772 Spring'93 DECUS Symposia Slide No. 18


Disk Devices

  • Disk devices fall off-line as well. Follow the same guidelines used for tape devices:
     
    • DISMOUNT/ABORT/OVER=CHECK [disk_name]
       
    • Toggle drive off/on-line
       
    • Also, the "pack acknowledge" routine can sometimes be used to recover the disk drive.
       

David L. Cathey, Montagar Software Concepts, P.O.Box 260772, Plano, TX 75026-0772 Spring'93 DECUS Symposia Slide No. 19


Line Printer Devices (believe it or not!)

  • In really perverse cases, a line printer device (typically LP11's) may get a partial buffer out, but for some reason be unable to complete the current I/O. Of course, just simply:
     
    • Close printer doors
       
    • Clear jams
       
    • Add paper
       
    • Other printer stuff...
       
       
  • Case history:
     
A printer got stuck at the same time the SYMBIONT had a lock on a RIGHTSLIST entry... and was stopped. The SYMBIONT was RWASTed, had a blocking lock on the RIGHTSLIST, that ended up locking up everyone on the system (600+ angry users) in LOGINOUT, DIR/OWNER, etc.

The solution? Close the door to the 15-year-old-washing-machine-sized LP27 printer :-(
 

David L. Cathey, Montagar Software Concepts, P.O.Box 260772, Plano, TX 75026-0772 Spring'93 DECUS Symposia Slide No. 20


Brute Force Approaches to Getting out of RWAST

  • REBOOT THE SYSTEM!!!
     
  • But really folks, "simply" disable the resource wait for the process and let it die off:
     
    • Get the Process Control Block (PCB) address of the target process.
       
    • Set the PCB$M_SSRWAIT bit in PCB$L_STS.
       
    • Clear the PCB$M_DELPEN bit in PCB$L_STS.
       
    • Issue another $DELPRC to the process...
       

David L. Cathey, Montagar Software Concepts, P.O.Box 260772, Plano, TX 75026-0772 Spring'93 DECUS Symposia Slide No. 21


For those that need the code example...

          .title   DISABLE_RW 
;++ ;     DISABLE_RW -- Disable Resource Wait of another process 
; ; Author:        David L. Cathey 
;                  Montagar Software Concepts 
;                  P. O. Box 260772 
;                  Plano, TX 75026-0772 
;                  davidc@montagar.com 
; 
          .link             "SYS$SYSTEM:SYS.STB"/SE 
          .library /SYS$LIBRARY:LIB/ 
          $PCBDEF           ; Process Control Block definitions 

asc_pid: .ascid    "xxxxxxxx"                   ; Save space for PID 
bin_pid: .long     0 
prompt:  .ascid    "Process ID: "               ; Prompt string 

         .entry    Main,0 

         pushaw    asc_pid 
         pushaq    prompt 
         pushaq    asc_pid 
         calls     #3,g^LIB$GET_FOREIGN         ; Get PID from user 
         blbc      r0,999$ 

         pushal    bin_pid 

David L. Cathey, Montagar Software Concepts, P.O.Box 260772, Plano, TX 75026-0772 Spring'93 DECUS Symposia Slide No. 22


         pushaq    asc_pid                      ; Convert ascii hex to binary 
         calls     #2,g^OTS$CVT_TZ_L 
         blbc      r0,999$ 

         $CMKRNL_S routin=do_it                 ; Play with the process... 
999$:    ret 


         .entry    Do_It,^M<> 

         movl      bin_pid,r0 
         jsb       g^EXE$EPID_TO_PCB            ; Get PCB from EPID 
         tstl      r0                           ; Did we??? 
         beql      99$                          ; Nope, bail out 
         bisl2     #PCB$M_SSRWAIT,PCB$L_STS(r0) ; Set SSRWAIT disable 
         bicl2     #PCB$M_DELPEN,PCB$L_STS(r0)  ; Clear delete pending 
         $DELPRC_S pidadr=bin_pid               ; And delete again. 
         ret                                    ; Bye... 
99$:     movl      #SS$_NONEXPR,r0              ; Non-existent process! 
         ret 
         .end      Main 

David L. Cathey, Montagar Software Concepts, P.O.Box 260772, Plano, TX 75026-0772 Spring'93 DECUS Symposia Slide No. 23


Document #2 (published 1994) via Actian

caveat: I recently noticed this on a Actian support page ( https://communities.actian.com/s/article/Procedure-To-Determine-Why-A-Process-Is-In-A-RWAST-State ) so all credit goes to them.
p.s. I preserved the contents of their RWAST page for posterity

If the DBMS server (or any other process) goes into an RWAST state, is there a way to find out what resource it is waiting for? Created: 15-Apr-1994

The SHOW SYSTEM command display indicates that a process is in an RWAST state and the process seems locked. How can you determine why the process is in this state?

The RWAST is a general purpose "Resource Wait" state. It indicates that the wait is expected to be satisfied by the delivery and/or enqueueing of an AST to the process.

There are 4 common reasons for a process to go into an RWAST state:
1) It is waiting for an I/O to complete on a channel.
2) It has exhausted an AUTHORIZE or SYSGEN quota.
3) It is waiting for a file system or lock request to complete.
4) It is waiting for a subprocess to terminate.

Processes in the RWAST state can NOT be deleted (e.g., with STOP/ID) until the condition they are waiting for is met. If you can not identify what the process is waiting for, you will have to reboot the system in order to eliminate the process.

If the process in an RWAST state is running a user-written program, then it is possible to rewrite the program to arrange to receive an error status for certain system calls, rather than have VMS put the process into the resource wait state. Usually, the error status indicates either a quota problem or insufficient pooled memory. This is accomplished by using the SYS$SETRWM system service call.
 
Procedure
To find out why a process is in an RWAST state, use the System Dump Analyzer (SDA):

1. Invoke SDA
$ ANALYZE/SYSTEM
VAX/VMS System analyzer
! To find the RWAST process and its INDEX
SDA> SHOW SUMMARY
Current process summary
Extended Indx Process name Username State Pri PCB PHD Wkset PID
20200080 0000 NULL COM 0 800024A8 80002328 0
20200081 0001 SWAPPER HIB 16 80002748 800025C8 0
20201005 0005 KILEY KILEY LEF 4 80363C50 82CEEE00 211
20200086 0006 ERRFMT SYSTEM HIB 7 8030CA80 80A2FA00 88
20200087 0007 CACHE_SERVER SYSTEM HIB 16 80317F70 80C3AE00 62
2020104F 004F SMITH SMITH RWAST 6 8036CE90 82DF4800 200

2. Set Your Default to the RWAST Process Using its INDEX Value
SDA> SET PROCESS/INDEX=4F !Selects the process in RWAST state, in this case, the SMITH process

Note
If you have tried to delete the process, SDA may not permit you to set your process to the RWAST process. In this case, you would receive the following error:

%SDA-E-NOTINPHYS, xxxxxx: not in physical memory

If you receive this error, you may have to format the PCB and/or JIB to figure out the problem. The address for the PHD and PCB can be found from the SHOW SUMMARY display. The address for the JIB will be an offset PCB$L_JIB in the formatted PCB. Keep this in mind if SDA will not allow you "normal" access to the data structures that follow. If you can get no access to the process data structures, for example the process header is outswapped, you may have to reboot the system and wait for the problem to occur again. If it happens again, you may be able to catch the data structures in memory and analyze the Resource Wait state more thoroughly.

3. Find the Process Program Counter (PC)
SDA> EXAMINE @PC
and see if it evaluates to one of the following symbols:
EXE$DASSGN+6D, in which case you go to step 4 (below)
EXE$MULTIQUOTA+032, in which case you go to step 5 (below)
EXE$DCLEXH+0A5, in which case you go to step 6 (below)
EXE$DCLEXH+141, in which case you go to step 7 (below)
Other RWAST states are possible but very rare. If the PC does not evaluate to one of the available symbols, you will have to reboot the system to eliminate the hung process. Take a crash dump of the system so you can later determine why the process was in RWAST.
Occasionally, the RWAST process will clear itself if the process waits long enough and the AST somehow gets satisfied.

4. PC is at EXE$DASSGN+6D
If the PC is at EXE$DASSGN+6D, the process is waiting for an I/O request to complete. Many times, the device on which it is waiting for I/O to complete will be shown as "Busy" in the SHOW PROCESS/CHANNEL display:
SDA> SHOW PROCESS/CHANNEL !Look for a status of "Busy"
!and this will determine the
!device that is waiting for the I/O
Process index: 004F Name: SMITH Extended PID: 22E0124D
Process active channels
Channel Window Status Device/file accessed
0010 00000000 LFILNG$DUA14:
00C0 00000000 Busy LPA0: <-- This device is
00D0 00000000 MBA1: blocking the process
In the above example, you have only 1 BUSY channel, so this must be the channel causing the process to hang in RWAST. If you have multiple BUSY channels, you can identify which one is causing the RWAST state with the following commands:
SDA> READ SYS$SYSTEM:SYSDEF.STB !read in system symbols
SDA> EXAMINE @R6+CCB$W_IOC !number of outstanding requests
8041BBBA: 00180001 "...." !in lower word (1)
SDA> DEFINE UCB=@(@R6+CCB$L_UCB)!define UCB address
SDA> EXAMINE UCB+UCB$W_UNIT !low order word is unit number
UCB+054: 00000000 "...." !here it is unit # 0
SDA> EXAMINE @(UCB+UCB$L_DDB)+DDB$T_NAME;8 !device name
8041BBBA: 2041504C "LPA." !device is LPA0:
Note
If the device is a printer connected to a terminal port and the symbiont is waiting for an XON to be delivered, occasionally turning the printer OFF and back ON again will cause an XON to be sent back to the VAX. This allows the I/O to complete, permitting the print symbiont to continue.
If the device is a printer connected to a printer port (LPA0, LCA0...), the VAX thinks the printer is offline.
This may indicate a hardware problem with the printer or controller, if it is really online. Again, turning the printer OFF and back ON again may help.

5. Value Returned is EXE$MULTIQUOTA+032
If the value returned is EXE$MULTIQUOTA+032, the RWAST state indicates the process has run out of a quota. A SHOW SYSTEM display will often show the process continuing to accumulate CPU time.
A quick check to help determine which quota the process has exhausted is the following:
SDA> SHOW PROCESS !for the SMITH process
Process index: 004F Name: SMITH Extended PID: 2020104F
Process status: 02040001 RES,PHDRES
PCB address 8036CE90 JIB address 8064F3C0
PHD address 82DF4800 Swapfile disk address 01002821
Master internal PID 0020004F Subprocess count 0
Internal PID 0020004F Creator internal PID 00000000
Extended PID 2020104F Creator extended PID 00000000
State RWAST Termination mailbox 0000
Base priority 4 AST's active NONE
UIC [00022,000016] AST's remaining 16
Mutex count 0 Buffered I/O count/limit 0/18*
Waiting EF cluster 0 Direct I/O count/limit 18/18*
Starting wait time 1B001B1B BUFIO byte count/limit 30478/31936*Event flag wait mask 00000001 # open files allowed left 75
Local EF cluster 0 E4000000 Timer entries allowed left 10
Local EF cluster 1 00000000 Active page table count 0
Global cluster 2 pointer 00000000 Process WS page count 140
Global cluster 3 pointer 00000000 Global WS page count 60
Look in the lower right hand portion of the display, denoted by asterisk (*), to see if any quotas are down to zero. In the example above, you can see that Buffered I/O count/limit is zero. The number before the slash (/) is the amount of this quota left. The number after the slash (/) is the total amount allowed. These fields relate to the following Authorization (UAF) records and SYSGEN PQL parameters if the process is a detached process:
RUN/Detached
Authorization SYSGEN PQLs Limits
AST's remaining - ASTLM PQL_DASTLM /AST_LIMIT
Buffered I/O count/limit - BIOLM PQL_DBIOLM /IO_BUFFERED
Direct I/O count/limit - DIOLM PQL_DDIOLM /IO_DIRECT
BUFIO byte count/limit - BYTLM PQL_DBYTLM /BUFFER_LIMIT
# open files allowed left - FILLM PQL_DFILLM /FILE_LIMIT
Timer Entries allowed left - TQELM PQL_DTQELM /QUEUE_LIMIT
Once it has been determined which quota needs to be increased for processes or subprocesses, increase that value within the
User Authorization File, the SYSGEN PQL parameter (if it is a detached process), or on the SYS$CREPRC system service (if a process is creating the detached process). The new value is then used when a new process is created or when you log out and log back in again. At this point, reboot the system to eliminate the RWAST process waiting for quota. Hopefully, you have increased the parameter to a value high enough that you will not see the new process go into RWAST state again.
If the RWAST process is waiting for a quota and the quota does not appear to be any of these, you can format and display the Job Information Block (JIB), Process Control Block (PCB), and the Process Header (PHD) to locate the quota problem.
R2 contains the address of the insufficient quota. To determine the insufficient quota, do the following:
SDA> READ SYS$SYSTEM:SYSDEF ! read system definitions
SDA> EXAMINE R2
R2: 8036CDCA "JNG." ! obtain value contained in R2
Next, locate the addresses of the PCB (Process Control Block) and the JIB (Job Information Block) from the top of the SHOW PROCESS display. The value found in R2 will be pointing somewhere in one of these two data structures. Identify which data structure would contain the value in R2 and format that data structure. In this case, R2 would be in the PCB so the PCB needs to be formatted:
SDA> FORMAT 8036CE90 !Formatting PCB of process
8036CE90 PCB$L_SQFL 80002180
8036CE94 PCB$L_SQBL 80002180
........ ........ ........
8036CEC6 PCB$W_PPGCNT 008C
8036CEC8 PCB$W_ASTCNT 0010
"8036CECA" PCB$W_BIOCNT 000 <-- This address matches
8036CECC PCB$W_BIOLM 0012 R2. The value for
PCB$W_BIOCNT is zero,
indicating that the
quota is depleted.
BIOLM needs to be
increased beyond 18,
8036CECE PCB$W_DIOCNT 0012 or the application
...... ....... ...... program modified so
8036CF0C 00000000 that not so many
8036CF10 PCB$L_JOB 8064F3C0 outstanding buffered
I/O requests are made at once.
If the address is not found in the PCB, format the JIB.
The JIB address can be found from either the SHOW PROCESS display or the PCB$L_JIB value above:
SDA> FORMAT 8064F3C0 ! Formatting JIB of process

6. Value Comes Back with EXE$DCLEXH+0A5
If the value comes back with EXE$DCLEXH+0A5, the process may be waiting either for the file system to complete a request, or for a lock request. EXE$DCLEXH+0A5 is returned if you have tried to delete the process, whose former state was probably LEF.
A quick check to see if the RWAST process is waiting for an XQP file request to complete is to format the PCB and look for an non-zero value in PCB$B_DPC. If the process is being forced to wait under these circumstances, the SDA "SHOW PROCESS" command displays the PROCESS status as "DELPEN". v

SDA> SHOW PROCESS

Process index: 00DF Name: Mike Mc. | Extended PID: 000007DF
Process status: 02040023 RES,DELPEN,RESPEN,PHDRES
PCB address 80339230 JIB address 804FE7E0
PHD address 82DF4800 Swapfile disk address 010065A1
Master internal PID 007000DF Subprocess count 0
Internal PID 007000DF Creator internal PID 00000000
Extended PID 000007DF Creator extended PID 00000000
State RWAST Termination mailbox 0000
Base priority 4 AST's active NONE
UIC [00022,000016] AST's remaining 16
Mutex count 0 Buffered I/O count/limit 0/18*
Waiting EF cluster 0 Direct I/O count/limit 18/18*
Starting wait time 1B001B1B BUFIO byte count/limit 30478/31936*Event flag wait mask 00000001 # open files allowed left 75
Locak EF cluster 0 E4000000 Timer entries allowed left 10
Local EF cluster 1 00000000 Active page table count 0
Global cluster 2 pointer 00000000 Process WS page count 140
Global cluster 3 pointer 00000000 Global WS page count 60
SDA> FORMAT 80339230 ! format the PCB
80339230 PCB$L_SQFL 80002180
80339234 PCB$L_SQBL 8032D5C0
.......
8033925A PCB$B_DPC 01 <----NON-ZERO, waiting for XQP (file
....... ......... .... system) activity to complete
If the value is zero (00), the RWAST is not waiting for the XQP and you can check for outstanding lock requests using the command SDA> SHOW PROCESS/LOCK. You will often find a lock in either "Waiting for" or "Converting to" state.
Two other articles in this database describe how to trace lock requests on both clustered and nonclustered systems.
The process holding the lock this process is waiting for is often in a RWxxx state itself, and solving that process' problem would clear up this process' RWAST state.
If you are at this address, EXE$DCLEXH+0A5, and the process is not getting CPU time or waiting for XQP or lock operations to complete, reboot the system to eliminate the process. Take a crash dump for later examination if the problem occurs often.
If the value in PCB$B_DPC is non-zero, then the process is waiting for an XQP file system request. You may use the following information to analyze further. Note that it is possible that this command itself could cause the SDA process to go into an RWAST state.
SDA> SHOW PROCESS/CHANNELS
Process index: 00DF Name: Mike Mc. Extended PID: 000007DF
Process active channels
Channel Window Status Device/file accessed
0010 00000000 DUA2:
0040 00000000 VTA52:
0050 00000000 VTA52:
0070 00000000 Busy DUA2: <---- Device waiting for XQP
The most common reason for an RWAST process to be waiting for XQP is that the PAGEDYN SYSGEN parameter has been exhausted, or there is very little left. If this is the case, PAGEDYN will have to be increased and the system rebooted to prevent further problems.
PAGEDYN should normally be at least 25-30% free, and having it 40% free on a busy system may actually help increase your system performance.
To determine how much PAGEDYN has been used, issue the following command:
SDA> SHOW POOL/SUMMARY/PAGE
Page dynamic storage pool
Summary of paged pool contents
108 UNKNOWN = 357984 (20%)
2 LOG = 83728 (4%)
......
1 CI = 96 (0%)
1 CLU = 2384 (0%)
Total space used = 1719808 out of 1988608 total bytes, 268800 bytes left.
Total space utilization = 86% <--- indicates that only 14% is left
To determine if the RWAST process is waiting for PAGEDYN, you can do the following:
Create the following MACRO program that will set up a symbol definition in order to format data structures to get necessary symbols defined for the next step. Note that case matters here:
$ CREATE F11BDEF.MAR
$F11BCDEF GLOBAL
$F11BDEF GLOBAL
.END

$ MACRO /OBJ=F11BDEF.STB F11BDEF.MAR+SYS$LIBRARY:LIB/LIB
SDA> READ F11BDEF !read in data structures created by MACRO above
SDA> SHOW DEVICE DUA2 !DUA2 is the device shown busy from the
!SDA command SHOW PROCESS/CHANNELS
Look for the AQB address in the following information:
I/O data structures
DUA2 RA80 UCB address: 80484B90
Device status: 00021810 online,valid,unload,lcl_valid
Characteristics: 1C4D4108 dir,rct,fod,shr,avl,mnt,elg,idv,odv,rnd
00000221 clu,mscp,nnm
Owner UIC [000001,000001] Operation count 274887 ORB address 80484C96
PID 00000000 Error count 0 DDB address 80BA3D20
Alloc. lock ID 000400AF Reference count 1 DDT address 80550C78
Alloc. class 2 Online count 1 VCB address 8048AD70
........ .................
Press RETURN for more.
I/O data structures
---Volume Control Block (VCB) 8048AD70 ---
Volume: TUBORGPAGE Lock name: TUBORGPAGE
Status: A0 extfid,system
Status2: 05 writethru,mountver

***Here it is

Mount count 1 Rel. volume 0 AQB address 80BA4AA0
Transactions 2 Max. files 29651 RVT address 80484B90
Free blocks 34020 Rsvd. files 9 FCB queue 808A9D10
Window size 7 Cluster size 3 Cache blk. 80768460Vol. lock ID 000200B1 Def. extend sz. 5
Block.lock ID 001A00A7 Record size 0
SDA> FORMAT 80BA4AA0
80BA4AA0 AQB$L_ACPQFL 80BA4AA0
........ ............
80BA4AB6 AQB$B_CLASS 00
80BA4AB7 00
80BA4AB8 AQB$L_BUFCACHE 80274380 <--- !Look for this value
........ ...........
SDA> FORMAT 80274380 ! formatting the AQB$L_BUFCACHE
80274380 F11BC$L_BUFBASE 80297400
.......
802743CB F11BC$Q_POOL_WAITQ 802743C8 <--- !Look for this value.
802743CC 802743c8 !The forward link and the
!backward link point to the
802743D0 802743D0 !forward link address which
802743D4 802743D0 !indicates the queue is empty.
802743D8 802743D8 !In this example, it is not
802743DC 802743D8 !waiting for PAGEDYN. If it
!was waiting for PAGEDYN,
802743E0 802743E0 !different address values
802743E4 802743E0 !for the forward and backward
!link would be displayed and
802743E8 F11BC$L_POOLAVAIL 0000002F !they would not have the same
802743EC 0000044A !value, indicating that the
. !queue is not empty, waiting
!for PAGEDYN.
SDA> EXIT

7. Value Comes Back with EXE$DCLEXH+141
If the value comes back with EXE$DCLEXH+141, the process is waiting for a subprocess to terminate before it can terminate. Many times the subprocess is also in an RWxxx state.
To check for this occurrence, the PCB can be formatted as described above and the PCB$W_PRCNT field checked. This field contains the number of subprocesses this process is waiting for.
The following DCL command procedure can be used to check all subprocesses on the system to find which process has the RWAST parent process. The subprocess Username and Process Name are displayed along with the parent process PID and image name.
$ context = ""
$!
$loop:
$ pid = F$PID( context )
$ if pid .eqs. "" then $ goto done
$ owner = F$GETJPI( pid , "owner" )
$ if owner .eqs. "" then $ goto loop
$ username = F$GETJPI( pid , "username" )
$ prcname = F$GETJPI( pid , "prcnam" )
$ imagname = F$GETJPI( pid , "imagname" )
$ imagname = F$PARSE( imagname ,,, "name" )
$ text = F$FA0( "!8AS !8AS !8AS !15AS !AS" -
, pid , owner , username , prcnam , imagname )
$ write sys$output text
$ goto loop
$!
$done:
If no subprocess is found for the parent process, the parent process will wait forever and the system will have to be rebooted to
eliminate this process. It could be that privileged code is altering the subprocess PCB$L_OWNER field so process termination does not know about the parent process.
To find the subprocess in a crash dump, you need to locate the OWNER field whose PID matches that of the parent process in RWAST.
This can be done by displaying every owner field in every PCB available on the system.
SDA> READ SYS$SYSTEM:SYSDEF
SDA> SHOW SUMMARY ! to get all the PCB addresses
then for each PCB address you can enter:
SDA> EXAMINE +PCB$L_OWNER
When you find the process whose owner/parent is the PID of the process in RWAST, you can start analyzing why the subprocess is not terminating.

 
Back to Home
Neil Rieck
Waterloo, Ontario, Canada.