Accessing http://ptolemy.eecs.berkeley.edu/~cxh and http://ptolemy.eecs.berkeley.edu/~eal is slow.

If I use Firefox's Tools -> Web Developer -> Toggle Tools, then I can see that the GET of /~cxh/ is taking 43 seconds. Clicking on "Show Request Details" shows that the bulk of the request time is "waiting".

The same is true when running on a machine on campus.

Looking at the logs, if the trailing slash redirect from http://ptolemy.eecs.berkeley.edu/~cxh to http://ptolemy.eecs.berkeley.edu/~cxh/ typically takes a second.

180.76.15.149 - - [25/Feb/2016:21:35:25 -0800] "GET /~cxh HTTP/1.1" 301 338
180.76.15.156 - - [25/Feb/2016:21:35:26 -0800] "GET /~cxh/ HTTP/1.1" 200 26395
However, around February 26, the delays of 43 seconds started to occur:
50.188.21.247 - - [26/Feb/2016:20:38:30 -0800] "GET /~cxh HTTP/1.1" 301 338
50.188.21.247 - - [26/Feb/2016:20:39:13 -0800] "GET /~cxh/ HTTP/1.1" 200 26395

Not much on the machine changed, though in the logs on Feb 24, messages about key ".htaccess" not found started appearing:

Feb 24 15:38:05 moog automount[9038]: lookup_read_master: lookup(nisplus): couldn't locate nis+ table auto.master
Feb 24 15:39:08 moog automount[9038]: umount_autofs_indirect: ask umount returned busy /home/eecs
Feb 24 15:39:11 moog automount[26694]: lookup_read_master: lookup(nisplus): couldn't locate nis+ table auto.master
Feb 24 15:40:26 moog automount[26694]: key ".htaccess" not found in map source(s).
Feb 24 15:41:27 moog automount[26694]: key ".htaccess" not found in map source(s).
Feb 24 15:55:51 moog automount[26694]: key ".htaccess" not found in map source(s).

/usr/local/apache/conf/extra/httpd-vhosts.conf looks like:

# Settings for user home directories  
#            
# Required module: mod_userdir  

#   
# UserDir: The name of the directory that is appended onto a user's home   
# directory if a ~user request is received.  Note that you must also set  
# the default access control for these directories, as in the example below.
#                                                                                                                        
UserDir public_html

#  
# Control access to UserDir directories.  The following is an example  
# for a site where these directories are restricted to read-only. 
#  
<Directory "/home/*/public_html">
    AllowOverride FileInfo AuthConfig Limit Indexes
    Options MultiViews Indexes SymLinksIfOwnerMatch IncludesNoExec
    <Limit GET POST OPTIONS>
        Order allow,deny
        Allow from all
    </Limit>
    <LimitExcept GET POST OPTIONS>
        Order deny,allow
        Deny from all
    </LimitExcept>
</Directory>
# Ptolemy_Localization end  
Watching the network traffic from moog to home10 shows that moog sends a packet and then sends another one 43 seconds later
[root@moog etc]# tcpdump -v -u host home10
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
13:18:16.026684 IP (tos 0x0, ttl 64, id 11414, offset 0, flags [DF], proto TCP (6), length 228)
    moog.eecs.berkeley.edu.1045667650 > naphome.EECS.Berkeley.EDU.nfs: 172 access fh[1:80000438:0:3d0:242a8ec1:80000438:0:3d0:242a8ec1:40:0:125:0] 001f
13:18:16.026969 IP (tos 0x0, ttl 61, id 32763, offset 0, flags [DF], proto TCP (6), length 176)
    naphome.EECS.Berkeley.EDU.nfs > moog.eecs.berkeley.edu.1045667650: reply ok 120 access attr: DIR 755 ids 6269/244 sz 16384 c 0003
13:18:16.027031 IP (tos 0x0, ttl 64, id 11415, offset 0, flags [DF], proto TCP (6), length 224)
    moog.eecs.berkeley.edu.1062444866 > naphome.EECS.Berkeley.EDU.nfs: 168 getattr fh[1:80000438:0:7bdbbc:2e01796a:80000438:0:3d0:242a8ec1:40:0:125:0]
13:18:16.027257 IP (tos 0x0, ttl 61, id 33019, offset 0, flags [DF], proto TCP (6), length 168)
    naphome.EECS.Berkeley.EDU.nfs > moog.eecs.berkeley.edu.1062444866: reply ok 112 getattr DIR 755 ids 6269/244 sz 12288
13:18:16.066805 IP (tos 0x0, ttl 64, id 11416, offset 0, flags [DF], proto TCP (6), length 52)
    moog.eecs.berkeley.edu.863 > naphome.EECS.Berkeley.EDU.nfs: Flags [.], cksum 0x5283 (correct), ack 931053800, win 5360, options [nop,nop,TS val 772890630 ecr 3250707893], length 0
13:18:59.074712 IP (tos 0x0, ttl 64, id 11417, offset 0, flags [DF], proto TCP (6), length 228)
    moog.eecs.berkeley.edu.1079222082 > naphome.EECS.Berkeley.EDU.nfs: 172 access fh[1:80000438:0:7bdbbc:2e01796a:80000438:0:3d0:242a8ec1:40:0:125:0] 001f
13:18:59.074930 IP (tos 0x0, ttl 61, id 58036, offset 0, flags [DF], proto TCP (6), length 176)
    naphome.EECS.Berkeley.EDU.nfs > moog.eecs.berkeley.edu.1079222082: reply ok 120 access attr: DIR 755 ids 6269/244 sz 12288 c 0003
13:18:59.074956 IP (tos 0x0, ttl 64, id 11418, offset 0, flags [DF], proto TCP (6), length 52)
    moog.eecs.berkeley.edu.863 > naphome.EECS.Berkeley.EDU.nfs: Flags [.], cksum 0x012e (correct), ack 125, win 5360, options [nop,nop,TS val 772933638 ecr 3250750941], length 0
If I comment out:
#    <Limit GET POST OPTIONS>
#        Order allow,deny
#        Allow from all
#    </Limit>
#    <LimitExcept GET POST OPTIONS>
#        Order deny,allow
#        Deny from all
#    </LimitExcept>
and do /etc/init.d/apachectl graceful, then the time is much improved, but the problem still occurs. The network traffic is different:
[root@moog etc]# tcpdump -v -u host home10
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
13:22:09.618089 IP (tos 0x0, ttl 64, id 11554, offset 0, flags [DF], proto TCP (6), length 224)
    moog.eecs.berkeley.edu.2522062658 > naphome.EECS.Berkeley.EDU.nfs: 168 getattr fh[1:80000438:0:9f9fd6:2e01797d:80000438:0:3d0:242a8ec1:40:0:125:0]
13:22:09.618336 IP (tos 0x0, ttl 61, id 53910, offset 0, flags [DF], proto TCP (6), length 168)
    naphome.EECS.Berkeley.EDU.nfs > moog.eecs.berkeley.edu.2522062658: reply ok 112 getattr REG 755 ids 6269/30 sz 24535
13:22:09.618360 IP (tos 0x0, ttl 64, id 11555, offset 0, flags [DF], proto TCP (6), length 52)
    moog.eecs.berkeley.edu.863 > naphome.EECS.Berkeley.EDU.nfs: Flags [.], cksum 0x7691 (correct), ack 931348148, win 5881, options [nop,nop,TS val 773124181 ecr 3250941487], length 0
13:22:09.618683 IP (tos 0x0, ttl 64, id 11556, offset 0, flags [DF], proto TCP (6), length 224)
    moog.eecs.berkeley.edu.2538839874 > naphome.EECS.Berkeley.EDU.nfs: 168 getattr fh[1:80000438:0:dadd7d:2e01797e:80000438:0:3d0:242a8ec1:40:0:125:0]
13:22:09.618895 IP (tos 0x0, ttl 61, id 13891, offset 0, flags [DF], proto TCP (6), length 168)
    naphome.EECS.Berkeley.EDU.nfs > moog.eecs.berkeley.edu.2538839874: reply ok 112 getattr REG 644 ids 6269/30 sz 1738
13:22:09.638880 IP (tos 0x0, ttl 64, id 11557, offset 0, flags [DF], proto TCP (6), length 224)
    moog.eecs.berkeley.edu.2555617090 > naphome.EECS.Berkeley.EDU.nfs: 168 getattr fh[1:80000438:0:dadd7f:2e01797e:80000438:0:3d0:242a8ec1:40:0:125:0]
13:22:09.639132 IP (tos 0x0, ttl 61, id 6212, offset 0, flags [DF], proto TCP (6), length 168)
    naphome.EECS.Berkeley.EDU.nfs > moog.eecs.berkeley.edu.2555617090: reply ok 112 getattr REG 644 ids 6269/30 sz 312
13:22:09.678787 IP (tos 0x0, ttl 64, id 11558, offset 0, flags [DF], proto TCP (6), length 52)
    moog.eecs.berkeley.edu.863 > naphome.EECS.Berkeley.EDU.nfs: Flags [.], cksum 0x7400 (correct), ack 233, win 5881, options [nop,nop,TS val 773124242 ecr 3250941507], length 0

Robotics

Interestingly, pages that are NFS mounted on robotics seem to be ok.

In addition, if rewrite-robotics.conf is updated so that ~eal and ~cxh are available:

RewriteRule ^/~cxh$ http://robotics.eecs.berkeley.edu/~cxh/ [L]
RewriteRule ^/~cxh/$ /home/eecs/cxh/public_html/index.html [L]
RewriteRule ^/~cxh/(.*) /home/eecs/cxh/public_html/$1 [L]

RewriteRule ^/~eal$ http://robotics.eecs.berkeley.edu/~eal/ [L]
RewriteRule ^/~eal/$ /home/eecs/eal/public_html/index.html [L]
RewriteRule ^/~eal/(.*) /home/eecs/eal/public_html/$1 [L]

then the problem does not occur with
http://robotics.eecs.berkeley.edu/~eal/ or
http://robotics.eecs.berkeley.edu/~cxh/.

One difference between the two sites is that the Ptolemy site has SSI enabled.

Email from Rob

Rob wrote:

Just to provide another data point, I have not noticed anything like this on www.eecs since the switchover to the new NetApps.

It does sound more like an NFS problem than an Apache configuration problem. FYI, there is my automounter configuration for home directories on www.eecs (which is running Red Hat 7).

/home/eecs    /usr/sww/share/etc/automount/maps/auto.home.eecs -soft,intr,bg,nolock

Here is a sample 'mount' output that shows all the options.

home30.EECS.Berkeley.EDU:/home30/robm on /home/eecs/robm type nfs (rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,soft,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=169.229.3.163,mountvers=3,mountport=635,mountproto=udp,local_lock=all,addr=169.229.3.163)

I've moved this ticket into our 'fileserver' queue and hope that Pathma will be able to work with you on the NFS issue. He will probably be curious about your NFS configuration so please send that along for a home that is experiencing the problem.

Editing /etc/auto.master and changing

/home/eecs      /etc/auto.home
to
/home/eecs      /etc/auto.home  soft,intr,bg,nolock
and then doing /etc/init.d/autofs reload seemed to help?

Email from Pathma

Thanks for the details, thats very helpful.

How reproducible is this? I tried it just now and it was as you said exactly 43s and then of course it was fast after that.

I have seen sessions hang in the past even with the old netapps, but I am not sure if its a storage issue.

On another note I am noticing that access from moog comes via various different IP address on the netapp. Now I had specified the IPs for each volume, so I am expecting at most 2 IPs for ptolemy, ie naphome and napcluster. However I am also seeing access via napvault and napproj which is non optimal.

I need to check to see how this is happening.