Subscribe to WebSphere: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get WebSphere: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

IBM WebSphere software products Authors: Yeshim Deniz, hyper filter, Timothée Bensimon, XebiaLabs Blog, Javier Paniza

Related Topics: Java EE Journal, WebSphere

J2EE Journal: Article

WebSphere Risk Management Part 1

WebSphere Risk Management Part 1

In this article, the first part of a two-part series, I will present WASLED/WASMON, a WebSphere monitoring application, and show you how you can use it to monitor WebSphere Application Server and to plan a WebSphere risk management procedure.

I will discuss some of the system resources that need to be made available to ensure the operability of WebSphere Application Server. You will learn how to configure WASMON to ensure that these resources are available and up and running. I will show you how to prepare the essential recovery scripts and make them available within the WASMON repository so you can trigger specific actions upon failure notification. You will then learn how to communicate with WASMON via the Internet to initialize an administrative action.

In this article I use the example Web application WASDG, which was used throughout my book, IBM WebSphere Application Server Programming. In addition, for brevity and generic notation of WAS commands, I will use the WASDG environment notation (also defined in my book). For example, $WASLOG_STDOUT refers to the fully specified name of the file to which WAS writes the standard output messages; $SEAPPINSTALL refers to SEAppInstall, $WASSTOP; and $WASSTART refers to the combined command to restart WAS. The test environment for the risk management scenario consisted of Pentium III, 750MHz, 2GB RAM, Linux servers running Red Hat v7. The applications used are WebSphere Application Server Single Server Edition v4, downloadable from www.ibm.com; and WASLED/WASMON v1.2.2, downloadable from www.tcnd.com.

Defining/Asserting the Operability of a WebSphere Region
A WebSphere region encompasses many server machines running processes to render the functionality of a WebSphere domain and its Web applications. A WebSphere region therefore consists of servers that are running WAS processes, an HTTP server, one or more database servers, and a monitoring application such as WASLED/WASMON. WAS processes are typically running to fulfill any of the following:

  • To store and retrieve information from the WAS configuration repository
  • To manage and serve requests redirected by the HTTP server
  • To service or to organize workload distribution
  • To supervise WAS containment and cache cleanup (for the Web container and the EJB container)

    A WebSphere region is termed valid and operational when all the necessary resources and services hosted by the server machines are in place and able to serve the Web applications. Consider the WebSphere region depicted in Figure 1.

    In this figure, the Internet server hserv.tcnd.com is an HTTP server that is patched with WAS's vendor plug-in to forward requests to the WAS server at node1.tcnd.com. The session persistence database and the DataSource are managed by the UDB server installed on db.tcnd.com. Let's take a look at how you can use WASMON to monitor the servers and resources in a WebSphere region.

    The WASMON Application
    WASLED/WASMON v1.2.2 can be downloaded from www.tcnd.com. For the remainder of this article I will refer to WASLED/WASMON simply as WASMON, unless it is necessary to make a distinction between the two.

    WASMON consists of two main programs, wasmon and wasmonhelper, along with miscellaneous programs such as wasmoncl, the client that connects to wasmon; wasmontkt, the ticket and configuration checker; and wasmonvar, the directive-to-variable mapper.

    WASMON monitors WAS without using any WebSphere APIs. The application permits monitoring of the WAS runtime or of a Web application runtime simply by preprocessing the WAS log file or the Web application log file. Such monitoring is therefore dependent on the logging of information to a file system. WASMON also offers a supervisor that allows you to monitor a WebSphere region independently of the log files. The WASMON supervisor mode evolved to ensure that a WebSphere region is operational whenever data ceases to flow (or no more exceptions are thrown) from WAS components or the hosted Web applications.

    The WASLED/WASMON Console
    By default Linux installations have the prerequisites needed to run WASMON, except for Perl/Tk, which needs to be installed manually. To start the WASLED/WASMON console, execute the following command:

    # wasmon

    Figure 2 shows the WASLED/WASMON console in its active state. The console consists of three panes.

  • In the first pane you can make entries using the keyboard or the mouse to interact with WASMON. In addition, this pane has two throbs that show the varying state of the WAS Web container and the WAS EJB container.
  • The second pane, also called the WASLED pane, is a graphical representation of the activities of the WAS components. When a component is detected by WASMON, it is dynamically allocated a graphical object to show its current state. Each graphical object is shown on one line, as shown in Figure 3. The line is formed of three parts: the component's descriptive name, the last LED component processed by WASMON (its color reflects the state of the LED previously processed), and a colored bar that shows the status of the current processed component.
  • The third pane is a log showing the wasmon program's activities.

    The console shown in Figure 2 has been set to active and is monitoring WAS runtime components. To activate WASMON, enter a socket number in the upper corner of the console and click on Listen; this starts WASMON as a server to process WAS logging records on socket 12345. Then start wasmoncl on a workstation that has access to the WAS log files. wasmoncl will connect to WASMON listening on socket 12345 on supervise.tcnd.com and send the WAS log information to be processed. For example, on node1.tcnd.com we will start the command:

    # tail -f $WASLOG_STDOUT | perl wasmoncl 12345 supervise.tcnd.com

    Finally, let's take a look at a very simple example to quickly test WASMON. When WAS is restarted, the SRVE component (that corresponds to the WAS servlet engine) will be active, and typically an SRVE0091I is logged to the file. For this reason, to filter an event while WASMON is running, you can just enter the string to be matched in the Filter Event text box; for instance, you could enter the regular expression SRVE00[0-9][0-9]I. You will also specify a trigger to associate with the filter; for example, specify trigger 0002 and check the trigger button (see Figure 4). Remember to make WASMON active by entering a socket number and clicking on Listen.

    Now restart WAS (as we did on node1.tcnd.com). As WAS is restarting, SRVE0091I is detected by WASMON and the associated trigger 0002 is executed to simply print "triggering script wm_0002" to the terminal where WASMON was started. Trigger 0002 corresponds to the shell script ./TriggerPool/wm_0002.sh:

    echo triggering script ws_0002

    You can also consider testing with trigger 0003, which corresponds to the shell script ./TriggerPool/wm_0003-notify-all-users.sh:

    wall WASMON notifying all users, and triggering 0003

    Both of these scripts are simple scripts used for demonstration. Later in this article I will show you more interesting scripts that can initiate the takeover of a WAS server by another.

    The second pane (WASLED pane) of the console shows the LEDs of the many WAS components that are detected by WASMON. In our case, SRVE0091I should be shown in the WASLED pane. In the first pane, the top throb should have changed state because the WAS Web container is active.

    In the previous example, I showed you the simplest way to detect and filter WAS events. However, WASMON reads its configuration from wasmon.conf, and manually specifying a regular expression to be filtered and a trigger number to be executed is only useful for online monitoring. The following directive, in wasmon.conf, shows a more interesting way to initialize WASMON monitoring for the WAS Web container component (SRVE):

    1,1,0002,,admin@tcnd.com,,WARNING ON SRVE and CNTR,,,SRVE,CNTR

    What follows the directive shown above is a list of fields based on the following template:

    number of times to send email,
    number of times to trigger the script,
    4-digit trigger number,
    an argument to be passed to the trigger,
    e-mail address, cc-email-address, email-subject,
    email-body, logical-expression,

    The directive states: "Filter any warning thrown by the WAS runtime components SRVE (Servlet Container) and CNTR (EJB Container), if an exception is thrown by any of these two components, then send (only once) an e-mail to admin@tcnd.com with the subject 'WARNING ON SRVE and CNTR,' and trigger (only once) the shell script that is bound to the trigger number 0002."

    Let's take a look at two other variations of the directives. The first is to monitor the WAS Web container using a regular expression as a filter:

    1,1,0002,,admin@tcnd.com,,Sending alert from WASMON

    The second directive is to monitor WAS using a logical expression that asserts that "either the WAS Web container OR the EJB container is in error, AND WAS is running low on memory."

    0,1,0002,,admin@tcnd.com,,Alert WAS low on memory,
    ,(c1 || c1) && c3,SRVE[0-9][0-9][0-9][0-9]
    E,CNTR[0-9][0-9][0-9][0-9]E,running low on memory(.)

    This second monitoring directive requires the evaluation of the following logical expression: (c1 || c2) && c3; where c1 is the filter for the Web container, c2 is the filter for the EJB container, and c3 is the filter for a string stating that WAS is low on memory.

    The expression at the end of the above directive is a regular expression where (.)* means "match anything that falls between the word 'memory' and the word 'destroyed..'"

    Because it is possible to specify regular expressions as filters, the use of WASMON monitoring can also be applied to any log file that you wish to filter specific data from. In this case, you need to use the filtering directives in which you can specify regular expressions: ALERT ON LIST FILTERS and its logical counterpart (that can accept a regular expression) LOGICAL ALERT ON LIST FILTERS.

    By default WASMON is set to monitor all 40 WAS components. You can remove a component or add another component to WASMON simply by typing its four-letter acronym in the first pane and clicking on Add Monitor or Stop Monitor, respectively.

    WASMON monitoring can also be applied to Web applications that throw exceptions similar to the way WAS throws its exceptions. IBM WebSphere Application Server Programming shows a full-blown exception handler (called the WasdgException) that throws exceptions (and LEDs) similar to the way WAS does. The exception handler uses messages bundled using the BundleManager. A Web application that uses the WasdgException handler can also be monitored using WASMON; in this situation, you would start WASMON as follows:

    # wasmon -xwasled -wc <Web_app_components>
    -wl <Web_app_leds>

    I am not going to elaborate on Web application monitoring, but I mention it here because of its importance. The above command starts WASMON by excluding the monitoring of all WAS components and including the components listed in Web_app_components file, and by reading the LED mapping from Web_app_leds file.

    So far we have seen that WASMON can filter the data as it is being logged by the WAS runtime or the Web application runtime. But a true monitoring application should not depend on the runtime of the application server being monitored. There might be situations in which WAS is dead, or the JNDI name lookup is not responding because the WAS EJB container is dead (or idle due to a race condition on threads trying to clean up the cache). WASMON monitoring should not depend exclusively on the messages logged by WAS or its Web applications.

    Therefore, WASMON monitoring is based on filters that can be mapped to logical expressions. WASMON logical expressions are formed from conditional variables, also called monitoring or diagnostic variables. These variables are set as a result of filtering WAS log data, or by WASMON after scrutinization of the systems and resources in a WebSphere region. This last capability of WASMON monitoring (carried out independently of WAS) is based on a set of Boolean variables called b-var, and a set of differential variables called d-var. The b-var and d-var variables are set by a WASMON delegator program called wasmonhelper.

    We will start our demonstration by writing the configuration file, wasmonhelper.conf, which directs wasmonhelper to collect data about the WebSphere region.

    What we are trying to accomplish is a server takeover initiated by WASMON. For example, consider Figure 5, in which node2.tcnd.com is to replace node1.tcnd.com.

    WASMON makes the replacement of a failing server with a back-up server possible. It is necessary to follow a strategy based on detection, assertion, and decision - the "big three" functions of WebSphere risk management.

  • Detection: The detection of the a malfunctioning WAS component or system resource can be handled by WASMON through the use of filtering and associated monitoring variables.
  • Assertion: The detected variables can be combined into a logical expression that is evaluated by WASMON to assert a certain condition. For instance, what if the WAS Java Virtual Machine is running but not responding to the JNDI name lookup on iiop://was.host.name on port 900? WASMON logical expression will assert such a condition.
  • Decision: If such an error occurs, what should you do? What is the next step? In WASMON the action is initiated by one or more scripts. We will consider such shell scripts that automatically rebuild a J2EE Web application without the attendance of an operator.

    Monitoring the WebSphere Region with wasmonhelper
    Figure 1 illustrates the systems in the WebSphere region that we wish to monitor. To assert the proper operability of a WebSphere region, let's examine the figure and tabulate some of the essential elements that need to be monitored. Table 1 details usage of the ports, processes, and URIs for each server shown in Figure 1.

    We will use wasmonhelper to monitor the systems and resources shown in Table 1. To do so, we will write the configuration file, wasmonhelper.conf, shown in Listing 1. (All of the listings for this article are available at www.sys-con.com/websphere/sourcec.cfm.)

    wasmonhelper.conf contains several directives to set up monitor variables. Each of the directives that start with BOOL or DVAR in wasmonhelper.conf maps to a monitor variable whose name can be used in wasmon.conf. To view what a variable will map to, just issue the command wasmonhelper with option ideal. For instance, to dump the b-var and d-var as they are initially set (or ideally as set by wasmonhelper before the program starts to evaluate them), issue the command:

    # wasmonhelper ideal

    Listing 2 shows the mapping of the directives shown in Listing 1 to their variables counterparts.

    Each variable can be used in the logical expression wasmon.conf by adding the @ symbol as a suffix. In Part 2 of this series I will provide an example.

    Your understanding of the usage of wasmonhelper to glance at the system resources defined in a WebSphere region will become clearer when I discuss the wglance command. Now let's clarify the evaluation of the d-var shown on line 18 of Listing 2.

    Scrutinizing System Resources: %Memory and %CPU
    The wasmonhelper also offers a set of d-var variables that are used to scrutinize the system resource consumption per process. Consider the directive DVAR RSHSCRUTPROCESS. This directive can be used to raise and set a d-var to an escalating positive integer (therefore evaluating to true in a logical condition) when the system consumption of a process reaches a certain limit.

    For instance, say WAS is started on the Linux host node1.tcnd.com with process number 32184. To monitor the percentage of memory consumption of this process and to raise an alert when WAS process 32184 consumes more than 8.1 percent of the system memory, we will use the following directive in wasmonhelper.conf:


    The first argument, (3), is a list of values that are reserved and used internally by WASMON. This is called a freeze value and is normally set to 0 or 1 so that wasmonhelper will stop (freeze) differentiating the first time the process consumption reaches 8.1% or more of system memory. We set it to 3 so that wasmonhelper will continue differentiating three more times. MEM is the attribute of the process that we seek to scrutinize; it is followed by the number, 8.1, which represents the limit we wish to place on the amount of memory consumed.

    MEM is not the only attribute that you can examine; in fact you can monitor any of the following attributes when using Linux: TRS, DRS, RSS, and MEM; or any of these when using AIX: PGIN, SIZE, RSS, LIM, TSIZ, TRS, CPU, and MEM. Getting a list of the process attributes is fairly simple; just issue the command ps v against the process number on your system.

    WASMON monitoring does not use any Java or WebSphere APIs that can heavily impact and degrade the performance of the system being monitored. This essential characteristic of WASMON sets this application apart from libraries or applications offered by IBM. There are myriad reasons why a monitoring application should not depend on the throwable exceptions generated during the runtime of the J2EE application server.

    On the one hand, exceptions are typically placed around a segment of code to be evaluated when we suspect a warning or an error, yet this would suggest knowledge beforehand of where an error will occur. On the other hand, an application used for risk management should not just interrogate and give reasons why there is a failure. There is no time to make any reasonable judgment. A risk management application should promote system recovery or initiate a server takeover. For instance, if the JNDI name cannot be looked up, an automatic compilation, rebuild, and deployment of the EJB module should be initiated.

    To see the effect of the directive,


    and how the d-var can be used to escalate memory consumption, you need to stress-test the application server and run wasmonhelper in verbose mode. To stress the application server you can use SharkUrl, discussed in Chapter 22 of my book. SharkUrl is freely distributed with the Gramercy ToolKit for noncommercial use. Refer to the article www.tcnd.com/article/p9 for a complete scenario for monitoring the escalation of memory consumption.

    So far so good! Now that you know how to get and scrutinize the WAS process and other system resources using wasmonhelper's monitoring variables, all that is left is to show you how to bind such monitoring variables to active scripts with WASMON. However, before discussing how to associate variables and scripts, we need to discuss a few shell scripts that can compile and deploy a Web application automatically. Such scripts will be discussed in Part 2 of this article. Now let's take a look at a command that can fit nicely into the system of any WebSphere developer or administrator: the wglance command.

    Glancing at a WebSphere Region: wglance
    Although wasmonhelper is used to set diagnostic variables, it is also used generically as a system command to assess the serviceability of the systems and resources in a WebSphere region. Once the product is installed, a system administrator can use wasmonhelper and its counterpart wglance as commands run from the command prompt to glance at and assess the functionality of the many computer systems in a WebSphere region.

    A programmer can write a configuration file, wasmonhelper.conf, in his or her $HOME directory, or instead set a configuration file globally for all users in the WASMON installation directory. Consequently, any user can then run wasmonhelper or wglance to glance at WAS and its EJB container service for the JNDI name lookup, or to monitor other resources such as the HTTP server daemon and the UDB active port. Programmers will find these commands handy; wglance is a simple command that will also release the system administrator from trivial questions about the operability of the systems in a WebSphere development environment. The following two commands are equivalent:

    # wasmonhelper glance


    # wglance

    When wglance is executed without any option, it prints all data that has been evaluated by wasmonhelper. You can follow wglance with a word to print specific data. For instance to glance at the JNDI names lookup:

    # wglance jndi

    With this command you can also specify an iteration value and a delay value (to be used during the iteration). For instance, to monitor the memory consumption while scrutinizing the WAS process (using the d-var as shown earlier), we will issue the following command:

    # wglance MEM 5 d12

    Also consider the following command, which you can use during the deployment of an EJB module:

    # wglance DataAccessComponent 10 d14

    This command will iterate 10 times, pausing 14 seconds after each iteration, and it will print statistics about the resolvability of the JNDI name lookup, DataAccessComponent. Such a command will help programmers demystify a strange situation such as the disappearance of a JNDI name lookup when deploying a J2EE Web application!

    Finally, wglance can be used to verify the successful first-time installation of WAS Advanced Edition. On the server where you have installed WAS AE, start the application server, then execute the following command:

    # wglance wasaes <hostname-or-IP-where-you-installed-WAS>

    This command will verify whether or not your installation was successful by printing diagnostic commands to the terminal. For information on setting the wasmonhelper or wglance commands, refer to www.tcnd.com/article/p9.

    In Part 2 of this series I will discuss how to write recovery scripts that can remotely compile and deploy Web applications on the tcnd.com network. Because such scripts can do the job without the intervention of an operator, I will show you how you can use them with WASMON to initiate the server takeover of a failing WebSphere node.

  • More Stories By Bassem Jamaleddine

    Bassem W. Jamaleddine is a Web systems engineer who has orchestrated the development of several projects at IBM's T.J. Watson Research Center, including IBM's Java-based network computer and the new generation of WebSphere Application Server technology. Bassem is the author of IBM WebSphere Application Server Programming (McGraw-Hill).

    Comments (0)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.