See also server reached MaxClients setting, consider raising the MaxClients setting
We've seen both bennett and andrews crash.
The apache error_log file will say:
Not enough space: fork: Unable to fork new processThe error seems to occur between 1 and 2 in the morning, though not always.
/var/adm/messages
says:
Jan 27 01:51:52 andrews.EECS.Berkeley.EDU genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 493 (cron) Jan 27 01:53:10 andrews.EECS.Berkeley.EDU cron[502]: [ID 401707 auth.error] open_module: /usr/lib/security/pam_authtok_get.so.1 failed: ld.so.1: cron: fatal: /usr/lib/security/pam_authtok_get.so.1: open failed: No such file or directory Jan 27 01:53:10 andrews.EECS.Berkeley.EDU cron[502]: [ID 487707 auth.error] load_modules: can not open module /usr/lib/security/pam_authtok_get.so.1 Jan 27 01:55:47 andrews.EECS.Berkeley.EDU tmpfs: [ID 518458 kern.warning] WARNING: /etc/svc/volatile: File system full, swap space limit exceeded Jan 27 01:55:47 andrews.EECS.Berkeley.EDU last message repeated 1 time Jan 27 02:00:47 andrews.EECS.Berkeley.EDU tmpfs: [ID 518458 kern.warning] WARNING: /etc/svc/volatile: File system full, swap space limit exceeded
/usr/local/bin/andrewstatus
, which runs top
and saves
output into /export/home1/tmp/logs
We can see that at one point, we ran out of swap,
though we had memory.
op.2009-01-27.01:33:47:Memory: 2048M real, 941M free, 617M swap in use, 1224M swap free top.2009-01-27.01:34:18:Memory: 2048M real, 958M free, 599M swap in use, 1242M swap free top.2009-01-27.01:34:47:Memory: 2048M real, 894M free, 593M swap in use, 1247M swap free top.2009-01-27.01:35:17:Memory: 2048M real, 879M free, 611M swap in use, 1230M swap free top.2009-01-27.01:35:46:Memory: 2048M real, 879M free, 625M swap in use, 1215M swap free top.2009-01-27.01:36:15:Memory: 2048M real, 914M free, 617M swap in use, 1224M swap free top.2009-01-27.01:36:43:Memory: 2048M real, 904M free, 636M swap in use, 1204M swap free top.2009-01-27.01:37:11:Memory: 2048M real, 1080M free, 648M swap in use, 1193M swap free top.2009-01-27.01:37:38:Memory: 2048M real, 1074M free, 720M swap in use, 1120M swap free top.2009-01-27.01:38:06:Memory: 2048M real, 1067M free, 796M swap in use, 1044M swap free top.2009-01-27.01:38:34:Memory: 2048M real, 1055M free, 934M swap in use, 905M swap free top.2009-01-27.01:39:01:Memory: 2048M real, 1047M free, 1029M swap in use, 809M swap free top.2009-01-27.01:39:39:Memory: 2048M real, 1037M free, 1147M swap in use, 690M swap free top.2009-01-27.01:40:07:Memory: 2048M real, 1026M free, 1223M swap in use, 610M swap free top.2009-01-27.01:40:34:Memory: 2048M real, 1020M free, 1289M swap in use, 544M swap free top.2009-01-27.01:41:02:Memory: 2048M real, 1011M free, 1372M swap in use, 460M swap free top.2009-01-27.01:41:30:Memory: 2048M real, 1000M free, 1497M swap in use, 333M swap free top.2009-01-27.01:41:57:Memory: 2048M real, 987M free, 1629M swap in use, 199M swap free top.2009-01-27.01:42:25:Memory: 2048M real, 975M free, 1737M swap in use, 88M swap free top.2009-01-27.01:42:53:Memory: 2048M real, 970M free, 1790M swap in use, 35M swap free top.2009-01-27.01:44:03:Memory: 2048M real, 999M free, 1853M swap in use, 7376K swap freeWe are also running the
df
command in
/tmp
, which indicates that /tmp
ran out of space:
df.2009-01-27.01:39:01:swap 810256 57456 752800 8% /tmp df.2009-01-27.01:39:39:swap 735880 57456 678424 8% /tmp df.2009-01-27.01:40:07:swap 660448 57456 602992 9% /tmp df.2009-01-27.01:40:34:swap 579216 57456 521760 10% /tmp df.2009-01-27.01:41:02:swap 458632 57456 401176 13% /tmp df.2009-01-27.01:41:30:swap 356144 57456 298688 17% /tmp df.2009-01-27.01:41:57:swap 201344 57456 143888 29% /tmp df.2009-01-27.01:42:25:swap 146392 57456 88936 40% /tmp df.2009-01-27.01:42:53:swap 72672 57456 15216 80% /tmp df.2009-01-27.01:44:03:swap 57456 57456 0 100% /tmpHowever, du was also running, which indicated that
/tmp
did not fill up, that is,
the number of files in /tmp
did not increase.
du
output did not increase:
628 du.2009-01-27.01:38:34 628 du.2009-01-27.01:39:01 628 du.2009-01-27.01:39:39 628 du.2009-01-27.01:40:07 628 du.2009-01-27.01:40:34 628 du.2009-01-27.01:41:02 628 du.2009-01-27.01:41:30 628 du.2009-01-27.01:41:57 628 du.2009-01-27.01:42:25 628 du.2009-01-27.01:42:53 628 du.2009-01-27.01:44:03The apache log for chess shows that we were getting 3 hits per second from the search engine:
128.32.244.31 - - [27/Jan/2009:01:36:52 -0800] "GET /pubs/507.html HTTP/1.0" 200 13157 128.32.244.31 - - [27/Jan/2009:01:36:52 -0800] "GET /pubs/487.html HTTP/1.0" 200 13434 128.32.244.31 - - [27/Jan/2009:01:36:53 -0800] "GET /pubs/489.html HTTP/1.0" 200 13543 128.32.244.31 - - [27/Jan/2009:01:36:53 -0800] "GET /pubs/405.html HTTP/1.0" 200 13351 128.32.244.31 - - [27/Jan/2009:01:36:54 -0800] "GET /pubs/336.html HTTP/1.0" 200 13887 128.32.244.31 - - [27/Jan/2009:01:36:54 -0800] "GET /pubs/504.html HTTP/1.0" 200 12596 128.32.244.31 - - [27/Jan/2009:01:36:54 -0800] "GET /pubs/506.html HTTP/1.0" 200 14112 72.30.65.44 - - [27/Jan/2009:08:49:45 -0800] "GET /projects/ITR/proposal/index.htm HTTP/1.0" 404 91 66.249.71.219 - - [27/Jan/2009:08:50:01 -0800] "GET /people/directory/sameerp/memberships/ HTTP/1 .1" 200 10806Trust was also getting hit:
28.32.244.34 - - [27/Jan/2009:01:33:49 -0800] "GET /superb/wiki/Category/DocumentationToDo?action=diff&source=n&minor=y HTTP/1.0" 200 9817 128.32.244.34 - - [27/Jan/2009:01:33:50 -0800] "GET /superb/wiki/Category/DocumentationToDo?action=diff&source=y&minor=n HTTP/1.0" 200 9817 128.32.244.34 - - [27/Jan/2009:01:33:50 -0800] "GET /superb/wiki/Category/WikiFarms?action=diff&source=n&minor=y HTTP/1.0" 200 9734 128.32.244.34 - - [27/Jan/2009:01:33:51 -0800] "GET /superb/wiki/Category/WikiFarms?action=diff&s ource=y&minor=n HTTP/1.0" 200 9734 65.55.209.206 - - [27/Jan/2009:01:33:59 -0800] "GET /people/directory/tinghe/memberships/ HTTP/1. 1" 200 6642 66.249.65.44 - - [27/Jan/2009:01:34:24 -0800] "GET /trustj/displayEvent.do?eventId=591&year=2009& month=3&day=19&gsrc.location.group=trust HTTP/1.1" 500 4069 65.55.217.57 - - [27/Jan/2009:01:34:52 -0800] "GET /robots.txt HTTP/1.1" 200 1254 65.55.217.57 - - [27/Jan/2009:01:34:52 -0800] "GET /images/headshots/dkm.jpg HTTP/1.1" 200 9957 65.55.209.206 - - [27/Jan/2009:01:35:07 -0800] "GET /people/directory/acarlin HTTP/1.1" 200 6426 65.55.209.206 - - [27/Jan/2009:01:35:09 -0800] "GET /people/directory/lioy HTTP/1.1" 200 6516 66.249.65.44 - - [27/Jan/2009:01:36:25 -0800] "GET /people/directory/DAAgarwal/memberships/ HTTP/ 1.1" 200 6442 88.73.50.121 - - [27/Jan/2009:01:39:03 -0800] "GET /robots.txt HTTP/1.1" 302 218 88.73.50.121 - - [27/Jan/2009:01:39:15 -0800] "GET / HTTP/1.1" 302 208 74.6.22.102 - - [27/Jan/2009:01:41:09 -0800] "GET /people/directory/dkm HTTP/1.0" 302 228 66.249.65.44 - - [27/Jan/2009:08:49:47 -0800] "GET /trustj/gsrcCalendar.do?calendarName=trustCale ndar&gsrc.location.group=trust HTTP/1.1" 503 323 66.249.65.44 - - [27/Jan/2009:08:49:52 -0800] "GET /trustlocal/faq/ HTTP/1.1" 302 -Embedded was getting some traffic, but not that much
67.195.37.106 - - [27/Jan/2009:01:36:34 -0800] "GET /Alumni/pinhong/scriptEDA/perlgnd-0.35.tar.gz HTTP/1.0" 200 258310 65.55.211.168 - - [27/Jan/2009:01:38:30 -0800] "GET /robots.txt HTTP/1.1" 302 228 88.73.50.121 - - [27/Jan/2009:01:38:44 -0800] "GET /centers.htm HTTP/1.1" 302 229 218.248.67.189 - - [27/Jan/2009:01:38:56 -0800] "GET /Respep/Research/hsc/abstract.html HTTP/1.1" 302 251 65.55.211.168 - - [27/Jan/2009:01:39:32 -0800] "GET /~rupak/blast/ HTTP/1.1" 302 231 88.73.50.121 - - [27/Jan/2009:01:40:33 -0800] "GET /pubs/index.htm HTTP/1.1" 302 232 88.73.50.121 - - [27/Jan/2009:01:41:06 -0800] "GET /history.htm HTTP/1.1" 302 229 66.235.124.40 - - [27/Jan/2009:01:41:42 -0800] "GET /research/hsc/class.F98/ee249/handouts/lec03/ sld107.htm HTTP/1.1" 302 272 219.142.53.19 - - [27/Jan/2009:01:41:53 -0800] "GET /robots.txt HTTP/1.0" 302 228 132.165.76.2 - - [27/Jan/2009:01:42:05 -0800] "GET /~pchong/sis.html HTTP/1.0" 302 234 74.6.22.160 - - [27/Jan/2009:01:42:22 -0800] "GET /%7enewton/Presentations/GreatWorks_Haas10_04_f iles/frame.htm HTTP/1.0" 302 276 84.100.192.65 - - [27/Jan/2009:08:49:55 -0800] "GET /favicon.ico HTTP/1.1" 302 229 72.30.81.177 - - [27/Jan/2009:08:50:04 -0800] "GET /Respep/Research/dds/Hyper-RNweb/1378.html HTT P/1.0" 200 5618Going back to the top output, we can see that the number or processes increased:
129 top.2009-01-27.01:33:47 129 top.2009-01-27.01:34:18 130 top.2009-01-27.01:34:47 136 top.2009-01-27.01:35:17 138 top.2009-01-27.01:35:46 133 top.2009-01-27.01:36:15 138 top.2009-01-27.01:36:43 140 top.2009-01-27.01:37:11 151 top.2009-01-27.01:37:38 165 top.2009-01-27.01:38:06 186 top.2009-01-27.01:38:34 203 top.2009-01-27.01:39:01 221 top.2009-01-27.01:39:39 235 top.2009-01-27.01:40:07 245 top.2009-01-27.01:40:34 261 top.2009-01-27.01:41:02 280 top.2009-01-27.01:41:30 300 top.2009-01-27.01:41:57 319 top.2009-01-27.01:42:25 327 top.2009-01-27.01:42:53 339 top.2009-01-27.01:44:03Comparing 41:30 and 42:53, we see that the new processes are all
httpd
processes.
apache "fork: Unable to fork new process" errors on solaris discusses various causes including memory leaks in Apache.
One idea is to change MaxRequestPerChild
:
/usr/local/apache/conf/extras/httpd-mpm.conf
#MaxRequestsPerChild 0 # Ptolemy: changed MaxRequestsPerChild from 0-10000 1/27/09 cxh MaxRequestsPerChild 10000See also Apache Process taking up 3 GB of ram: msg#00058.