To use wget to mirror a site, it is necessary to get the session cookies and to also exclude the /logout/ directory.

Getting the cookies

wget can use a cookies.txt file, but there is no really easy way to generate one from Firefox.

The workaround is to get the cookie using the preferences. The cookie to look for centerSessionKey, for example, e3scenterSessionKey.

Then create a cookies.txt file that is comma separated, for example:

.www.e3s-center.org     TRUE    /       FALSE   1603937999      e3scenterSessionKey     be3b829af7acf08418cfcf71fbb8d92d

Note that the 5th arg is the expiration time, which should be far in the future

Then run the following command on moog:

wget -X '/logout/' --load-cookies cookies.txt -m https://www.e3s-center.org >& e3s.out &

Look in /export/home1/tmp/php.err for lines like:

[28-Aug-2017 16:09:51]  [client 128.32.48.150] pubs.php: download pub html_refuse: , E3S_Apr72011_XZhao&delAlamo.pdf,  only logged in users

In the above, 128.32.48.150 is moog's address.

Here, the problem was that the /logout/ link was hit.

See also Want to help Archive Team do a "panic grab" of a website, so that you can later upload it to the Internet Archive for inclusion in its WayBack Machine? Here's the code! :

export USER_AGENT="Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27"
export DOMAIN_NAME_TO_SAVE="www.example.com"
export SPECIFIC_HOSTNAMES_TO_INCLUDE="example1.com,example2.com,images.example2.com"
export FILES_AND_PATHS_TO_EXCLUDE="/path/to/ignore"
export WARC_NAME="example.com-20130810-panicgrab"
(use this for grabbing single domain names:)
wget -e robots=off --mirror --page-requisites --save-headers --keep-session-cookies --save-cookies "cookies.txt" --wait 2 --waitretry 3 --timeout 60 --tries 3 --span-hosts --domains="$SPECIFIC_HOSTNAMES_TO_INCLUDE" --warc-header "operator: Archive Team" --warc-cdx --warc-file="$WARC_NAME" -U "$USER_AGENT" "$DOMAIN_NAME_TO_SAVE"
(use this for for grabbing single domain names recursively, and have the spider follow links up to 10 levels deep:)
wget -e robots=off --mirror --page-requisites --save-headers --keep-session-cookies --save-cookies "cookies.txt" --recursive --level=10 --wait 2 --waitretry 3 --timeout 60 --tries 3 --span-hosts --domains="$SPECIFIC_HOSTNAMES_TO_INCLUDE" --warc-header "operator: Archive Team" --warc-cdx --warc-file="$WARC_NAME" -U "$USER_AGENT" "$DOMAIN_NAME_TO_SAVE"

(use this for for grabbing single domain names recursively, and have the spider follow links up to 20 levels deep:)

wget -e robots=off --mirror --page-requisites --save-headers --keep-session-cookies --save-cookies "cookies.txt" --recursive --level=20 --wait 2 --waitretry 3 --timeout 60 --tries 3 --span-hosts --domains="$SPECIFIC_HOSTNAMES_TO_INCLUDE" --warc-header "operator: Archive Team" --warc-cdx --warc-file="$WARC_NAME" -U "$USER_AGENT" "$DOMAIN_NAME_TO_SAVE"

(use this for for grabbing single domain names recursively, and have the spider follow links up to 10 levels deep, but EXCLUDE a certain file or path:)

wget -e robots=off --mirror --page-requisites --save-headers --keep-session-cookies --save-cookies "cookies.txt" --recursive --level=10 --wait 2 --waitretry 3 --timeout 60 --tries 3 --span-hosts --domains="$SPECIFIC_HOSTNAMES_TO_INCLUDE" -X "$FILES_AND_PATHS_TO_EXCLUDE" --warc-header "operator: Archive Team" --warc-cdx --warc-file="$WARC_NAME" -U "$USER_AGENT" "$DOMAIN_NAME_TO_SAVE"

(use this for for grabbing single domain names recursively, and have the spider follow links up to 10 levels deep, but do NOT crawl upwards and grab stuff from the parent directory:)

wget -e robots=off --mirror --page-requisites --save-headers --keep-session-cookies --save-cookies "cookies.txt" --recursive --level=10 --no-parent --wait 2 --waitretry 3 --timeout 60 --tries 3 --span-hosts --domains="$SPECIFIC_HOSTNAMES_TO_INCLUDE" --warc-header "operator: Archive Team" --warc-cdx --warc-file="$WARC_NAME" -U "$USER_AGENT" "$DOMAIN_NAME_TO_SAVE"

Note that all of this code is explicitly ignoring the website's robots.txt file, the ethics of which is left up to your own discretion.