An HTTP proxy or a Web Cache proxy is a proxy server that is used primarily to cache (store) web content in the server's memory cache, thereby reducing bandwidth load and provides faster response times for frequently accessed web content, and functions as a security layer by isolating the web server through interception of web requests on their behalf. The Proxy can also be used to provide content filtering - deny access to certain url's, as well as provide access control by requiring users / principal's to authenticate before requesting web content.
Increased deployment of Proxies on the Internet:-
- Caching - Storage of web content locally - which in case of frequent access could be used to satisfy the browser request from the proxy server without the need to request the web resource from the web server directly.
- Reduction of load on Web server resources - This is provided by the abstraction of content retrieval which is satisfied via local caching of content in the proxy, and load balancing by redirecting requests to the appropriate server within the server farm
- Increased security - This is provident by the proxy which acts as a primary interface between the web server and the public internet through which all incoming traffic / requests are first parsed through the proxy server, and then either immediately serviced (if content is cached) or routed to a web server. The proxy could also provide an authentication or encryption layer, which enhances the performance of the web servers.
- Enhanced browsing experience.
Dynamic Web Content's effect on Proxy performance:-
The proliferation of Web content and E-commerce transactions on the Internet demanded that companies generate ever increasing dynamic - database driven content that is modified in real-time, this places an increased performance burden on the network architecture in order to guarantee superior web performance, and enhanced web browsing / transaction capabilities.
In lieu of the above, and due to the nature of the proxy server's functionality. Dynamically generated web content would require frequent updates on the proxy's cache (increased round trips to web server to update cached content records) which render the purpose of the proxy placement obsolete, due to performance degradation of the entire web farm hierarchy (i.e. Database -> Application -> Web).
Operation of Web cache
Web caching is caching documents, images or HTML in order to reduce bandwidth usage, server load and lag. To reduce latency the proxy server which is located near to the client does the caching. Normally the server is located between the clients and the web server. Frequently accessed documents, images or HTML pages will be fetched from nearby cache servers. This reduces response time for user requests and traffic on the internet.
Types of web caches
User agent caches - caches found in web browsers, private caches.
Proxy caches - developed by Internet Service Providers to save bandwidth, so schools and corporation users can get connection at low latency.
Gateway caches - work together to implement CDN (Content Delivery Network). (Wikipedia n.d.)
Web cache stores most frequently used web pages, documents, images to a close by server does making web access faster. The web cache takes copies of the request which uses use mostly and when an user browsers he/she receives most of the data from the cache server nearby rather than directly from the original server. Web caching takes the load off the web servers reducing the number of incoming requests. (Harsha & Shiva,2002)
Content Distribution Network (CDN)
This service replicates web contents in various servers which are in different locations which are also near to the clients system. These servers sit at the edge of the internet to make sure they are close enough to the clients. This is a reason why these servers are also called edge servers.
Because of this an improved latency service, reliability and scalability is provided to the clients. Content Distribution Network (CDN) is a collection of proxies that act as in-between the original server and the end clients. This is a more efficient architecture which delivers the clients with available content fast and same time improving application performance and network impacts. Contents can be saved or stored at intermediate locations between clients and servers. Figure 5 shows caching or replication. In replication data is kept in different servers in different locations. Content is delivered by the central content server to the entire network.
Functions of Content Distribution Network (CDN) include:
- Redirection and deliver service to direct a request to the cache server that is the closest and most available.
- Makes sure replicated servers or distribution servers contain up to date content. (Bob, 2001)
Key difference between CDN and Web cache
One of the main differences between CDN and Web cache is that CDN's are placed at the edge of the ISP network which reduces distance between user and content. So content is received much faster. And both of the content providers can be end users. Also CDN has multiple servers holding copies of the content to cover for the original server, where the content is placed.
Disadvantages in CDN
Operating CDN is not easy task as operating a web cache, CDN are complex systems and have to connect points that could be far away geographically. (Raimo, 2003)
Differences between HTTP and FTP
HTTP (Hyper Text Transfer Protocol) is a very simple protocol and is the primary protocol for WWW (World Wide Web). When connected to internet the web browser connects to web servers, which uses HTTP to request for web pages. FTP (File Transfer Protocol) little different, it is a way to download and upload files from WWW (World Wide Web).
HTTP has the ability to transfer web pages, graphics and media files on the web. HTTP is strictly one way transferring from the server to client computer, from which text, pictures or other data is just viewed using a designated browser which on the formulated a web page. And files are not downloaded in to the user computer, when the browser is closed the page will close and the content will be lost. FTP can be used as a two way system, which is to copy or move files from server to client computer as well as upload or transfer files from client to server. When files are copied or moved with FTP to the user computer the files are copied at the user's computer. FTP uses binary sets to transmit encoded data. This allows faster transfer than HTTP. HTTP systems encode their data in MIME format which is larger than Binary and more complex. When files are attached to email through HTTP, the file size increases, this is due to additional encoding using MIME.