GNU Wget is a computer program that recovers content from web servers. The name has been derived from the World Wide Web and get which is connotative of the primary functions. The downloading via HTTP, FTP and HTTPS protocols is supported by Wget, which are the most popular TCP / IP based protocols used for web browsing.
Features such as recursive downloading, conversion of links for offline viewing of local HTML, and much more are included in Wget. Written in portable C Wget can be very easily installed on any Unix like system and ported to many environments, including Microsoft Windows, Mac OS, AmigaOS and OpenVMS. It had appeared in the year 1996, that was in tune with the popularity of the Web causing a wide use among Unix users and distribution with most major Linux distributions. Wget is a free software and has been used for graphical programs such as Gwget for the GNOME Desktop.
FEATURES OF WGET
- Portability: The GNU Wget is written in a highly portable style of C with the minimal dependence on third party libraries; something more than a C compiler or a BSD like interface is what is required for Wget for TCP / IP networking. It is designed as a UNIX program that can be ported to numerous Unix-like environment and systems such as Microsoft Windows via Cygwin and Mac OS X.
- Robustness: It has been designed for robustness over unstable network connections. If, for some reason a download does not complete, Wget would automatically attempt to continue the download from where it left and repeat this until the complete file is retrieved.
- Recursive Download: Wget can also work like a web crawler by extracting resources linked from HTML pages and downloading them in an order and the process recursively repeated untill all the pages have been downloaded or if a maximum recursion depth has been reached. Now, in a directory structure the downloaded pages are saved that resemble the one on the remote server. Recursive downloading allows partial or the complete mirroring of the web sites via the HTTP. The links in the already downloaded HTML pages can be changed so it points to locally downloaded content for offline viewing. When such sort of automatic mirroring of web sites is done, Wget would support the Robots Exclusion Standard (unless the option -e robots=off is provided). Recursive download works with FTP as well, when Wget issues the LIST command in order to find which further files are to be downloaded and this process for directories and files is thus repeated under the one specified in the top URL. Now, when the download for (FTP) URLs is requested the shell-like wildcards are supported.
While recursively downloading over HTTP or FTP the GNU Wget can be initiated to inspect timestamps of the remote with local files, this will allow only the downloading for only the remote files that are newer than the corresponding local ones. Now, the mirroring of HTTP and FTP sites would be made very easy but at the same time, it's considered inefficient and is more prone to error when it is being compared to a program that is designed for the mirroring from. On the other hand, there is no requirement for special server side software for this task.
- Non-interactiveness: Wget is a non-interactive program as in, when it starts it does not require any kind of user interaction and also there is no need for the control of a TTY as it can log its progress to an entirely separate file for later inspection. This way the user would be able to start the Wget and log off leaving the program unattended. However, in contrast most textual or graphical user interface web browsers need the user to remain logged in and the restarting of the failed downloads can be started manually, that can be a hindrance when transferring a lot of data.
- Some other features of Wget:
- Wget supports download through proxies that are deployed to provide web access inside company firewalls and to cache and swiftly deliver frequently accessed content.
- Persistent HTTP is used in connections where available.
- IPv6 is supported on systems that consist of suitable interfaces.
- SSL / TLS are also supported for encrypted downloading using the Open SSL library.
- The file that is larger than 2 GiB is supported on a 32-bit system that would include the appropriate interfaces.
- Downloading speed might be throttled in order to shun the exhaustion of all of the available bandwidth.
The most characteristic usage of the GNU Wget is invoking it from the command line and provide URLs as arguments.
- To download the title page of test.com to a file named index.html : wget http://www.test.com/
- To download the Wget's source code from the GNU ftp site : wget ftp: // ftp. gnu. org/public/gnu/wget/wgetLatest.tars.gz
- To download only *.mid files from a website: wget -e robots = off -r -l2 --noparent -A.mid http://www.jespero.com/dir/goto
- Downloading title page of xyz.com, with the images and the style sheets needed to display the page and then converting into content that is locally available: wget -p -k http://www.xyz.com/
- To download the full contents of abc.com: wget -r -l 0 http://www.abc.com/
- For reading the list of URLs from a file : wget -i file
- Creating a mirror image of a website : wget -r t 1 http://www.mit.edu/ -o gnulog
- To retrieve the first layer of msn links: wget -r -l1 http://www.msn.com/
- To retrieve the index.htm of www.jocks.com and showing the original server headers : wget -S http://www.jocks.com/
- Saving server headers with file : wget -s http://www.jocks.com/
- To retrieve the first three levels of ntsu.edu and save them to /tmp : wget -P/tmp l3 ftp: // ntsu.edu/
- If in the middle of a download Wget is interrupted and the clobbing of the already downloaded is not required : wget -nc -r http://www.ntsu.edu/
- If it is required to keep the mirror of a page, `--mirror' or `-m' is used short for `-r -N'.
- To put the Wget in the crontab file and then asking it to check the file on a particular day : crontab 0 0 * * 0 wget --mirror http://www.zuma.org/pub/zumacs/ -o /home/mme/ weeklog
- To output the document to a standard output file : get -O - http://qwerty.pk/ http://www.qwerty.pk/
- It is also possible to combine 2 options and make pipelines for the recovery of documents from remote hotlist : wget -O - http://jot.list.com/ | wget --force-html -i -
AUTHORS AND COPYRIGHTS
The GNU Wget was written by Hrvoje Nikic with contributions from Dan Harkles, Mauro Torttonesi and Ian Abbott. These significant contributions have been credited in the authors file and also been made a part of in the distribution and those that remain are documented in the change logs, also included with the program. Micah Cowan maintains the Wget software program. The Free Software Foundation owns the copyright to Wget. As its policy it requires the copyright assignments for the important contributions to GNU software.
The Wget software program is the descendant of GetUrl by the same author. Its development started in late 1995. Its name was then ultimately changed to Wget. There was no single program that could download files via both the FTP and HTTP protocols. The existing programs that were available either only supported FTP (such as dl and NcFTP) or were either written in Perl. While, Wget took inspiration from the features of the existing programs, but at the same time it's aim was to support both HTTP and FTP that would enable the users in building it by only using the standard tools that are found on each and every UNIX system.
But at that point of time, many UNIX users struggled because of the extremely slow dial-up connections that lead to the growth in the need for an agent for downloading which could deal with transient network failures with no assistance from the human operator.
These following releases marked the development of the Wget. The features for each release have subsequently been mentioned.
The GetUrl 1.0 was released in January 1996 and was the first one to be available publicly. The first English language version was Geturl 1.3.4 released in June
- The Wget 1.4.0. was released in December 1996 and was the first one to use the name Wget.
- Wget 1.4.3 was released in February 1997 and this was the first to be released as part of the GNU project.
- Wget 1.5.3 was released in September 1998 and was a milestone in the program's recognition. This particular version was bundled with many Linux distributions.
- Wget 1.6 was released in December 1999 and has incorporated many bug fixes for the 1.5.3 release
- Wget 1.7 was released in June 2001 and SSL support, persistent connections and cookies were introduced.
- Wget 1.8. was released in December 2001, this version added new progress indicators and introduced breadth first traversal of hyperlink graph
- Wget 1.9. was released in October 2003 which included experimental IPv6 support and the ability to POST data to the HTTP servers
- Wget 1.10 was released in June 2005 and introduced large file support IPv6 support on dual-family systems, SSL improvements and NTLM authorization. The maintainership was singled out up by Mauro Tortonesi
- Wget 1.11 was released in January 2008 and was moved to version 3 of GNU General Public License. This is often used by CGI scripts to specify the names of a file for the purpose of downloading. In HTTP authentication code security related improvements were made.
- Wget 1.12 was released in September 2009 added the support for parsing URLs from CSS content on the web and to handle Internationalized Resource Identifiers
Development and release cycle
The Wget is developed in an open fashion. Its design decisions were discussed on public mailing list, followed by the users and the developers. The patches and bug reports are also relayed to the same list.
The GNU Wget is distributed in the terms of the GNU General Public License from version 3 onwards with an exception that would allow the distribution of the binaries linked against the Open SSL library. It is supposed that the exception clause be omitted once Wget is modified to link with the Gnu TLS library. The Wget's documentation in form of a Texinfo reference manual is issued under the terms of the GNU Free Documentations License version 1.2 or afterward. The main page that is usually distributed on UNIX like systems is repeatedly being generated from a subset of the Tex-info manual and is under the terms of the same license.