| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
By default, Wget is very simple to invoke. The basic syntax is:
wget [option]... [URL]... |
Wget will simply download all the URLs specified on the command line. URL is a Uniform Resource Locator, as defined below.
However, you may wish to change some of the default parameters of Wget. You can do it two ways: permanently, adding the appropriate command to `.wgetrc' (See section 6. Startup File), or specifying it on the command line.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
URL is an acronym for Uniform Resource Locator. A uniform resource locator is a compact string representation for a resource available via the Internet. Wget recognizes the URL syntax as per RFC1738. This is the most widely used form (square brackets denote optional parts):
http://host[:port]/directory/file ftp://host[:port]/directory/file |
You can also encode your username and password within a URL:
ftp://user:password@host/path http://user:password@host/path |
Either user or password, or both, may be left out. If you leave out either the HTTP username or password, no authentication will be sent. If you leave out the FTP username, `anonymous' will be used. If you leave out the FTP password, your email address will be supplied as a default password.(1)
You can encode unsafe characters in a URL as `%xy', xy
being the hexadecimal representation of the character's ASCII
value. Some common unsafe characters include `%' (quoted as
`%25'), `:' (quoted as `%3A'), and `@' (quoted as
`%40'). Refer to RFC1738 for a comprehensive list of unsafe
characters.
Wget also supports the type feature for FTP URLs. By
default, FTP documents are retrieved in the binary mode (type
`i'), which means that they are downloaded unchanged. Another
useful mode is the `a' (ASCII) mode, which converts the line
delimiters between the different operating systems, and is thus useful
for text files. Here is an example:
ftp://host/directory/file;type=a |
Two alternative variants of URL specification are also supported, because of historical (hysterical?) reasons and their wide-spreadedness.
FTP-only syntax (supported by NcFTP):
host:/dir/file |
HTTP-only syntax (introduced by Netscape):
host[:port]/dir/file |
These two alternative forms are deprecated, and may cease being supported in the future.
If you do not understand the difference between these notations, or do
not know which one to use, just use the plain ordinary format you use
with your favorite browser, like Lynx or Netscape.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Since Wget uses GNU getopts to process its arguments, every option has a short form and a long form. Long options are more convenient to remember, but take time to type. You may freely mix different option styles, or specify options after the command-line arguments. Thus you may write:
wget -r --tries=10 http://fly.cc.fer.hr/ -o log |
The space between the option accepting an argument and the argument may be omitted. Instead `-o log' you can write `-olog'.
You may put several options that do not require arguments together, like:
wget -drc URL |
This is a complete equivalent of:
wget -d -r -c URL |
Since the options can be specified after the arguments, you may terminate them with `--'. So the following will try to download URL `-x', reporting failure to `log':
wget -o log -- -x |
The options that accept comma-separated lists all respect the convention
that specifying an empty list clears its value. This can be useful to
clear the `.wgetrc' settings. For instance, if your `.wgetrc'
sets exclude_directories to `/cgi-bin', the following
example will first reset it, and then set it to exclude `/~nobody'
and `/~somebody'. You can also clear the lists in `.wgetrc'
(See section 6.2 Wgetrc Syntax).
wget -X '' -X /~nobody,/~somebody |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
However, if you specify `--force-html', the document will be
regarded as `html'. In that case you may have problems with
relative links, which you can solve either by adding <base
href="url"> to the documents or by specifying
`--base=url' on the command line.
<base
href="url"> to HTML, or using the `--base' command-line
option.
<base href="base_href">. Note that the base in
the file will take precedence over the one on the command-line.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z |
If there is a file name `ls-lR.Z' in the current directory, Wget will assume that it is the first portion of the remote file, and will require the server to continue the retrieval from an offset equal to the length of the local file.
Note that you need not specify this option if all you want is Wget to continue retrieving where it left off when the connection is lost--Wget does this by default. You need this option only when you want to continue retrieval of a file already halfway retrieved, saved by another FTP client, or left by Wget being killed.
Without `-c', the previous example would just begin to download the
remote file to `ls-lR.Z.1'. The `-c' option is also
applicable for HTTP servers that support the Range header.
With the default style each dot represents 1K, there are ten dots
in a cluster and 50 dots in a line. The binary style has a more
"computer"-like orientation--8K dots, 16-dots clusters and 48 dots
per line (which makes for 384K lines). The mega style is
suitable for downloading very large files--each dot represents 64K
retrieved, there are eight dots in a cluster, and 48 dots on each line
(so each line contains 3M). The micro style is exactly the
reverse; it is suitable for downloading small files, with 128-byte dots,
8 dots per cluster, and 48 dots (6K) per line.
wget --spider --force-html -i bookmarks.html |
This feature needs much more work for Wget to get close to the functionality of real WWW spiders.
Please do not lower the default timeout value with this option unless you know what you are doing.
m suffix, in hours using h
suffix, or in days using d suffix.
Specifying a large value for this option is useful if the network or the destination host is down, so that Wget can wait long enough to reasonably expect the network error to be fixed before the retry.
Note that quota will never affect downloading a single file. So if you specify `wget -Q10k ftp://wuarchive.wustl.edu/ls-lR.gz', all of the `ls-lR.gz' will be downloaded. The same goes even when several URLs are specified on the command-line. However, quota is respected when retrieving either recursively, or from an input file. Thus you may safely type `wget -Q2m -i sites'---download will be aborted when the quota is exceeded.
Setting quota to 0 or to `inf' unlimits the download quota.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Take, for example, the directory at `ftp://ftp.xemacs.org/pub/xemacs/'. If you retrieve it with `-r', it will be saved locally under `ftp.xemacs.org/pub/xemacs/'. While the `-nH' option can remove the `ftp.xemacs.org/' part, you are still stuck with `pub/xemacs'. This is where `--cut-dirs' comes in handy; it makes Wget not "see" number remote directory components. Here are several examples of how `--cut-dirs' option works.
No options -> ftp.xemacs.org/pub/xemacs/ -nH -> pub/xemacs/ -nH --cut-dirs=1 -> xemacs/ -nH --cut-dirs=2 -> . --cut-dirs=1 -> ftp.xemacs.org/xemacs/ ... |
If you just want to get rid of the directory structure, this option is similar to a combination of `-nd' and `-P'. However, unlike `-nd', `--cut-dirs' does not lose with subdirectories--for instance, with `-nH --cut-dirs=1', a `beta/' subdirectory will be placed to `xemacs/beta', as one would expect.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
basic (insecure) or the
digest authentication scheme.
Another way to specify username and password is in the URL itself (See section 2.1 URL Format). For more information about security issues with Wget, See section 9.2 Security Considerations.
Caching is allowed by default.
Content-Length headers, which makes Wget
go wild, as it thinks not all the document was retrieved. You can spot
this syndrome if Wget retries getting the same document again and again,
each time claiming that the (otherwise normal) connection has closed on
the very same byte.
With this option, Wget will ignore the Content-Length header--as
if it never existed.
You may define more than one additional header by specifying `--header' more than once.
wget --header='Accept-Charset: iso-8859-2' \
--header='Accept-Language: hr' \
http://fly.cc.fer.hr/
|
Specification of an empty string as the header value will clear all previous user-defined headers.
basic authentication scheme.
The HTTP protocol allows the clients to identify themselves using a
User-Agent header field. This enables distinguishing the
WWW software, usually for statistical purposes or for tracing of
protocol violations. Wget normally identifies as
`Wget/version', version being the current version
number of Wget.
However, some sites have been known to impose the policy of tailoring
the output according to the User-Agent-supplied information.
While conceptually this is not such a bad idea, it has been abused by
servers denying information to clients other than Mozilla or
Microsoft Internet Explorer. This option allows you to change
the User-Agent line issued by Wget. Use of this option is
discouraged, unless you really know what you are doing.
NOTE that Netscape Communications Corp. has claimed that false
transmissions of `Mozilla' as the User-Agent are a copyright
infringement, which will be prosecuted. DO NOT misrepresent
Wget as Mozilla.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
wget ftp://gnjilux.cc.fer.hr/*.msg |
By default, globbing will be turned on if the URL contains a globbing character. This option may be used to turn globbing on or off permanently.
You may have to quote the URL to protect it from being expanded by
your shell. Globbing makes Wget look for a directory listing, which is
system-specific. This is why it currently works only with Unix FTP
servers (and the ones emulating Unix ls output).
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
wget -r -nd --delete-after http://whatever.com/~popular/page/ |
The `-r' option is to retrieve recursively, and `-nd' not to create directories.
Note that only at the end of the download can Wget know which links have been downloaded. Because of that, much of the work done by `-k' will be performed at the end of the downloads.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
| [ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |