IBM i Apache freezes or slows down, or PHP stops responding - how to research

Follow

Issue

Sometimes the web site running on Zend Server for IBM i will freeze or lock up, or run much slower than normal, or appear to crash.  Restarting Apache seems to clear up the problem, at least until the next time it happens.

These problems can be difficult to analyze.  This article tells how to gather some relevant material related to the IBM HTTP Server Apache instance for Zend Server to assist in the analysis of these types of issues.

Environment

Zend Server for IBM i version 6 or higher, running on any supported version of IBM i.

Resolution

Note: Try setting the Apache timeout to clean up locked jobs more quickly.

While it is tempting to just restart Apache as quickly as possible to get the site back up and running, it is important to first take a look while Apache is active but not responding.

The place to start looking is the Work with Active Jobs display to see what the ZENDSVR6 FastCGI child jobs are doing. Each FastCGI child job runs a copy of the PHP engine. Every request to process a PHP request via the Apache server is sent to one of these children for processing. If any of the children become permanently unavailable, that can slow the site, as requests may have to wait longer for available children when there are fewer of them. If enough of these children are locked up, the web site can become intolerably slow, as there are simply not enough active children left to process all the traffic in a timely manner.

The following steps won't tell the whole story as to why child jobs are locking up, but they do show the lock ups and can tell some important information that can be used for further research, as well as suggest some possible workarounds that can help the situation become less troublesome, even before the issue is fully resolved.

The following example assumes you are running the default ZENDSVR6 Apache instance only. For customers running additional Apache instances using PHP, if some other instance is locking up, please substitute that instance name for ZENDSVR6 in the example.

The first step is to wait for the lock up or freeze to occur. This is because we need to see the status of the Apache instance jobs when the problem is occurring.  Before restarting Apache to get your web site going again, please take just a couple of minutes to review the status of your Apache jobs. From the 5250 command line:

WRKACTJOB SBS(QHTTPSVR) JOB(ZENDSVR6)

You should see a list of jobs similar to this one:

Subsystem/Job  User        Type  CPU %  Function        Status
ZENDSVR6 QTMHHTTP BCH .0 PGM-QZHBMAIN SIGW
ZENDSVR6 QTMHHTTP BCI .0 PGM-QZSRLOG SIGW
ZENDSVR6 QTMHHTTP BCI .0 PGM-QZSRLOG SIGW
ZENDSVR6 QTMHHTTP BCI .0 PGM-QZSRHTTP SIGW
ZENDSVR6 QTMHHTTP BCI .0 PGM-zfcgi SELW
ZENDSVR6 QTMHHTTP BCI .0 PGM-php-cgi.bi THDW
ZENDSVR6 QTMHHTTP BCI .0 PGM-php-cgi.bi TIMW
ZENDSVR6 QTMHHTTP BCI .0 PGM-php-cgi.bi TIMW
ZENDSVR6 QTMHHTTP BCI .0 PGM-php-cgi.bi TIMW
ZENDSVR6 QTMHHTTP BCI .0 PGM-php-cgi.bi TIMW
ZENDSVR6 QTMHHTTP BCI .0 PGM-php-cgi.bi TIMW
ZENDSVR6 QTMHHTTP BCI .0 PGM-php-cgi.bi TIMW
ZENDSVR6 QTMHHTTP BCI .0 PGM-php-cgi.bi TIMW
ZENDSVR6 QTMHHTTP BCI .0 PGM-php-cgi.bi TIMW
ZENDSVR6 QTMHHTTP BCI .0 PGM-php-cgi.bi TIMW
ZENDSVR6 QTMHHTTP BCI .0 PGM-php-cgi.bi TIMW

Again, these are the jobs for the ZENDSVR6 instance of Apache, and the Apache configuration is set to the defaults as shipped by Zend. You may see some different jobs if your configuration has been modified.

The first six jobs are not child processes. If any of these are in any kind for error status, then you should try to get a job log for it. If there is a MSGW status, use option 7 to view the message and respond. If your configuration is changed, there may be more or less than six jobs that are not child processes. These will include any jobs running a program starting with 'Q', the job running program zfcgi, and the first job running php-cgi.bi. The normal status for these jobs is shown in the list above. Again, if any of these jobs is doing anything abnormal, try to get a job log for it.

The remaining jobs running php-cgi.bi are the child processes. Each of these processes runs an instance of FastCGI and PHP. A normal status for these jobs when idle is TIMW. If they are running a script, a normal status is TIMA. You may see them briefly in some other status, but wait a few seconds and then use F5 to refresh the display, and they should eventually settle back into a TIMW or TIMA status. If they seem stuck in some other status like CNDW or DEQW, or anything else, and stay that way even after refreshing the display a few times over two or three minutes, they are stuck and that is why your web site is not responsive. Try to get job logs for any child process that appears to be stuck.

Also, please take screen prints or copy the screen as text, to show all of the ZENDSVR6 jobs and the status for them.

After collecting your job logs and taking your screen shots, go ahead and restart Apache to get your web site going again:

Restart Apache on IBM i

As soon as you can after restarting the web site, please also obtain your Apache logs and run a Support Tool and get the output. If Zend Support is assisting you with the analysis, we will want to see all of these:

Job logs
Screen shots
Apache Logs
Support tool

We already discussed how to get the job logs and the screen shots. These articles can help you get the Apache Logs and Support Tool:

How to send your IBM i Apache logs to Support

How to run the Support Tool on IBM i

This probably seems like a lot of material to collect, and it is. The reason for this is because the only time any information can be collected is when an incident is happening, so it pays to be thorough. All of this may not be needed to analyze every issue, but it is difficult to predict which of these things might be needed until after the analysis is done. So, it is best to get everything.

Finally, chances are that if you have PHP scripts locking up, someone using your web site may notice the issue.  If you have complaints of a page that fails to respond, that can be an important clue as to which script is locking up.

Have more questions? Submit a request

Comments

Powered by Zendesk