See also server reached MaxClients setting, consider raising the MaxClients setting

We've seen both bennett and andrews crash.

The apache error_log file will say:

Not enough space: fork: Unable to fork new process
The error seems to occur between 1 and 2 in the morning, though not always. /var/adm/messages says:
Jan 27 01:51:52 andrews.EECS.Berkeley.EDU genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 493 (cron)
Jan 27 01:53:10 andrews.EECS.Berkeley.EDU cron[502]: [ID 401707 auth.error] open_module: /usr/lib/security/pam_authtok_get.so.1 failed: ld.so.1: cron: fatal: /usr/lib/security/pam_authtok_get.so.1: open failed: No such file or directory
Jan 27 01:53:10 andrews.EECS.Berkeley.EDU cron[502]: [ID 487707 auth.error] load_modules: can not open module /usr/lib/security/pam_authtok_get.so.1
Jan 27 01:55:47 andrews.EECS.Berkeley.EDU tmpfs: [ID 518458 kern.warning] WARNING: /etc/svc/volatile: File system full, swap space limit exceeded
Jan 27 01:55:47 andrews.EECS.Berkeley.EDU last message repeated 1 time
Jan 27 02:00:47 andrews.EECS.Berkeley.EDU tmpfs: [ID 518458 kern.warning] WARNING: /etc/svc/volatile: File system full, swap space limit exceeded

We are running /usr/local/bin/andrewstatus, which runs top and saves output into /export/home1/tmp/logs We can see that at one point, we ran out of swap, though we had memory.
op.2009-01-27.01:33:47:Memory: 2048M real, 941M free, 617M swap in use, 1224M swap free
top.2009-01-27.01:34:18:Memory: 2048M real, 958M free, 599M swap in use, 1242M swap free
top.2009-01-27.01:34:47:Memory: 2048M real, 894M free, 593M swap in use, 1247M swap free
top.2009-01-27.01:35:17:Memory: 2048M real, 879M free, 611M swap in use, 1230M swap free
top.2009-01-27.01:35:46:Memory: 2048M real, 879M free, 625M swap in use, 1215M swap free
top.2009-01-27.01:36:15:Memory: 2048M real, 914M free, 617M swap in use, 1224M swap free
top.2009-01-27.01:36:43:Memory: 2048M real, 904M free, 636M swap in use, 1204M swap free
top.2009-01-27.01:37:11:Memory: 2048M real, 1080M free, 648M swap in use, 1193M swap free
top.2009-01-27.01:37:38:Memory: 2048M real, 1074M free, 720M swap in use, 1120M swap free
top.2009-01-27.01:38:06:Memory: 2048M real, 1067M free, 796M swap in use, 1044M swap free
top.2009-01-27.01:38:34:Memory: 2048M real, 1055M free, 934M swap in use, 905M swap free
top.2009-01-27.01:39:01:Memory: 2048M real, 1047M free, 1029M swap in use, 809M swap free
top.2009-01-27.01:39:39:Memory: 2048M real, 1037M free, 1147M swap in use, 690M swap free
top.2009-01-27.01:40:07:Memory: 2048M real, 1026M free, 1223M swap in use, 610M swap free
top.2009-01-27.01:40:34:Memory: 2048M real, 1020M free, 1289M swap in use, 544M swap free
top.2009-01-27.01:41:02:Memory: 2048M real, 1011M free, 1372M swap in use, 460M swap free
top.2009-01-27.01:41:30:Memory: 2048M real, 1000M free, 1497M swap in use, 333M swap free
top.2009-01-27.01:41:57:Memory: 2048M real, 987M free, 1629M swap in use, 199M swap free
top.2009-01-27.01:42:25:Memory: 2048M real, 975M free, 1737M swap in use, 88M swap free
top.2009-01-27.01:42:53:Memory: 2048M real, 970M free, 1790M swap in use, 35M swap free
top.2009-01-27.01:44:03:Memory: 2048M real, 999M free, 1853M swap in use, 7376K swap free
We are also running the df command in /tmp, which indicates that /tmp ran out of space:
 df.2009-01-27.01:39:01:swap                  810256   57456  752800     8%    /tmp
df.2009-01-27.01:39:39:swap                  735880   57456  678424     8%    /tmp
df.2009-01-27.01:40:07:swap                  660448   57456  602992     9%    /tmp
df.2009-01-27.01:40:34:swap                  579216   57456  521760    10%    /tmp
df.2009-01-27.01:41:02:swap                  458632   57456  401176    13%    /tmp
df.2009-01-27.01:41:30:swap                  356144   57456  298688    17%    /tmp
df.2009-01-27.01:41:57:swap                  201344   57456  143888    29%    /tmp
df.2009-01-27.01:42:25:swap                  146392   57456   88936    40%    /tmp
df.2009-01-27.01:42:53:swap                   72672   57456   15216    80%    /tmp
df.2009-01-27.01:44:03:swap                   57456   57456       0   100%    /tmp
However, du was also running, which indicated that /tmp did not fill up, that is, the number of files in /tmp did not increase.
Below, the number of lines in the du output did not increase:
     628 du.2009-01-27.01:38:34
     628 du.2009-01-27.01:39:01
     628 du.2009-01-27.01:39:39
     628 du.2009-01-27.01:40:07
     628 du.2009-01-27.01:40:34
     628 du.2009-01-27.01:41:02
     628 du.2009-01-27.01:41:30
     628 du.2009-01-27.01:41:57
     628 du.2009-01-27.01:42:25
     628 du.2009-01-27.01:42:53
     628 du.2009-01-27.01:44:03
The apache log for chess shows that we were getting 3 hits per second from the search engine:
128.32.244.31 - - [27/Jan/2009:01:36:52 -0800] "GET /pubs/507.html HTTP/1.0" 200 13157
128.32.244.31 - - [27/Jan/2009:01:36:52 -0800] "GET /pubs/487.html HTTP/1.0" 200 13434
128.32.244.31 - - [27/Jan/2009:01:36:53 -0800] "GET /pubs/489.html HTTP/1.0" 200 13543
128.32.244.31 - - [27/Jan/2009:01:36:53 -0800] "GET /pubs/405.html HTTP/1.0" 200 13351
128.32.244.31 - - [27/Jan/2009:01:36:54 -0800] "GET /pubs/336.html HTTP/1.0" 200 13887
128.32.244.31 - - [27/Jan/2009:01:36:54 -0800] "GET /pubs/504.html HTTP/1.0" 200 12596
128.32.244.31 - - [27/Jan/2009:01:36:54 -0800] "GET /pubs/506.html HTTP/1.0" 200 14112
72.30.65.44 - - [27/Jan/2009:08:49:45 -0800] "GET /projects/ITR/proposal/index.htm HTTP/1.0" 404 
91
66.249.71.219 - - [27/Jan/2009:08:50:01 -0800] "GET /people/directory/sameerp/memberships/ HTTP/1
.1" 200 10806
Trust was also getting hit:
28.32.244.34 - - [27/Jan/2009:01:33:49 -0800] "GET /superb/wiki/Category/DocumentationToDo?action=diff&source=n&minor=y HTTP/1.0" 200 9817
128.32.244.34 - - [27/Jan/2009:01:33:50 -0800] "GET /superb/wiki/Category/DocumentationToDo?action=diff&source=y&minor=n HTTP/1.0" 200 9817
128.32.244.34 - - [27/Jan/2009:01:33:50 -0800] "GET /superb/wiki/Category/WikiFarms?action=diff&source=n&minor=y HTTP/1.0" 200 9734
128.32.244.34 - - [27/Jan/2009:01:33:51 -0800] "GET /superb/wiki/Category/WikiFarms?action=diff&s
ource=y&minor=n HTTP/1.0" 200 9734
65.55.209.206 - - [27/Jan/2009:01:33:59 -0800] "GET /people/directory/tinghe/memberships/ HTTP/1.
1" 200 6642
66.249.65.44 - - [27/Jan/2009:01:34:24 -0800] "GET /trustj/displayEvent.do?eventId=591&year=2009&
month=3&day=19&gsrc.location.group=trust HTTP/1.1" 500 4069
65.55.217.57 - - [27/Jan/2009:01:34:52 -0800] "GET /robots.txt HTTP/1.1" 200 1254
65.55.217.57 - - [27/Jan/2009:01:34:52 -0800] "GET /images/headshots/dkm.jpg HTTP/1.1" 200 9957
65.55.209.206 - - [27/Jan/2009:01:35:07 -0800] "GET /people/directory/acarlin HTTP/1.1" 200 6426
65.55.209.206 - - [27/Jan/2009:01:35:09 -0800] "GET /people/directory/lioy HTTP/1.1" 200 6516
66.249.65.44 - - [27/Jan/2009:01:36:25 -0800] "GET /people/directory/DAAgarwal/memberships/ HTTP/
1.1" 200 6442
88.73.50.121 - - [27/Jan/2009:01:39:03 -0800] "GET /robots.txt HTTP/1.1" 302 218
88.73.50.121 - - [27/Jan/2009:01:39:15 -0800] "GET / HTTP/1.1" 302 208
74.6.22.102 - - [27/Jan/2009:01:41:09 -0800] "GET /people/directory/dkm HTTP/1.0" 302 228
66.249.65.44 - - [27/Jan/2009:08:49:47 -0800] "GET /trustj/gsrcCalendar.do?calendarName=trustCale
ndar&gsrc.location.group=trust HTTP/1.1" 503 323
66.249.65.44 - - [27/Jan/2009:08:49:52 -0800] "GET /trustlocal/faq/ HTTP/1.1" 302 -

Embedded was getting some traffic, but not that much
67.195.37.106 - - [27/Jan/2009:01:36:34 -0800] "GET /Alumni/pinhong/scriptEDA/perlgnd-0.35.tar.gz
 HTTP/1.0" 200 258310
65.55.211.168 - - [27/Jan/2009:01:38:30 -0800] "GET /robots.txt HTTP/1.1" 302 228
88.73.50.121 - - [27/Jan/2009:01:38:44 -0800] "GET /centers.htm HTTP/1.1" 302 229
218.248.67.189 - - [27/Jan/2009:01:38:56 -0800] "GET /Respep/Research/hsc/abstract.html HTTP/1.1"
 302 251
65.55.211.168 - - [27/Jan/2009:01:39:32 -0800] "GET /~rupak/blast/ HTTP/1.1" 302 231
88.73.50.121 - - [27/Jan/2009:01:40:33 -0800] "GET /pubs/index.htm HTTP/1.1" 302 232
88.73.50.121 - - [27/Jan/2009:01:41:06 -0800] "GET /history.htm HTTP/1.1" 302 229
66.235.124.40 - - [27/Jan/2009:01:41:42 -0800] "GET /research/hsc/class.F98/ee249/handouts/lec03/
sld107.htm HTTP/1.1" 302 272
219.142.53.19 - - [27/Jan/2009:01:41:53 -0800] "GET /robots.txt HTTP/1.0" 302 228
132.165.76.2 - - [27/Jan/2009:01:42:05 -0800] "GET /~pchong/sis.html HTTP/1.0" 302 234
74.6.22.160 - - [27/Jan/2009:01:42:22 -0800] "GET /%7enewton/Presentations/GreatWorks_Haas10_04_f
iles/frame.htm HTTP/1.0" 302 276
84.100.192.65 - - [27/Jan/2009:08:49:55 -0800] "GET /favicon.ico HTTP/1.1" 302 229
72.30.81.177 - - [27/Jan/2009:08:50:04 -0800] "GET /Respep/Research/dds/Hyper-RNweb/1378.html HTT
P/1.0" 200 5618
Going back to the top output, we can see that the number or processes increased:
    129 top.2009-01-27.01:33:47
     129 top.2009-01-27.01:34:18
     130 top.2009-01-27.01:34:47
     136 top.2009-01-27.01:35:17
     138 top.2009-01-27.01:35:46
     133 top.2009-01-27.01:36:15
     138 top.2009-01-27.01:36:43
     140 top.2009-01-27.01:37:11
     151 top.2009-01-27.01:37:38
     165 top.2009-01-27.01:38:06
     186 top.2009-01-27.01:38:34
     203 top.2009-01-27.01:39:01
     221 top.2009-01-27.01:39:39
     235 top.2009-01-27.01:40:07
     245 top.2009-01-27.01:40:34
     261 top.2009-01-27.01:41:02
     280 top.2009-01-27.01:41:30
     300 top.2009-01-27.01:41:57
     319 top.2009-01-27.01:42:25
     327 top.2009-01-27.01:42:53
     339 top.2009-01-27.01:44:03
Comparing 41:30 and 42:53, we see that the new processes are all httpd processes.

apache "fork: Unable to fork new process" errors on solaris discusses various causes including memory leaks in Apache.

One idea is to change MaxRequestPerChild: /usr/local/apache/conf/extras/httpd-mpm.conf

    #MaxRequestsPerChild 0                                                                   
    # Ptolemy: changed MaxRequestsPerChild from 0-10000 1/27/09 cxh                               
    MaxRequestsPerChild   10000
See also Apache Process taking up 3 GB of ram: msg#00058.

See Apache Performance Tuning.