Architecture of Grid Computing
The Basic Architecture of grid computing is divided into 3 layers or categories i.e. User Portal, Service Layer (Grid Services) and Resource layer (Physical resources). All these layers are connected to each other and work together to provide a good and reliable service. All these layers are explained one by one below:
1. User Portals
User portal is basically the human interfaces of the grid and acts as the intermediate between the human and the grid middleware. Under this layer, it comes all the devices or tools through which we can get general access to the grid and to the related basic information. At this layer we use the different tools and experimental devices for the different projects to tackle out the big tasks or problems through grid computing. It also contains some scientific devices and resources related to the various scientific computational tasks. Various jobs like controlling of activities, monitoring the processing, result collection etc comes under this layer.
2. Service layer
As it's very clear from its name, this layer provides all the grid services to the user portals to perform different typical tasks or processes. This layer is further divided into 2 parts i.e. network part and information part. So the sharing of resources, the scheduling of different tasks, the security aspects, monitoring of different events and their occurrence, installation, administration and maintenance of different resources comes under the network part and the information part deals with the management of the different resources and the data available like user and resource management. By user and resource management, we mean the accurate usage of the resources by the users to avoid wastage of resources and time.
3. Resource layer
Resource layer contains all the physical resources that our application needs like supercomputer, stand alone computer, cluster of devices, databases etc. This layer deals with the computing part which depends on the type and speed of CPU and its memory size, deals with the capacity of data centres to share the computations with the different parts of the grid. This layer also deals with the concept of bandwidth and connectivity of communication network between different computational parts of the grid to share the results of the computations. The software and the licensing part like Operating system, local tools and libraries also come under the resource layer of grid architecture. There are also some non technical characteristics which come in resource layer such as Authorisation (confirmation of the access to the applications) and accounting (monitoring, tracking and charging as per the usage of application) policies.
Infrastructure of Grid Computing
The Grid infrastructure represents all the physical hardware and software components which are used to connect multiple computers in grid computing. These are the components which are responsible for the flow of information between the grid systems and provide the basic services for connectivity, performance availability, security and management. Some of the components are optional and it's up to the designer to decide which components should be used according to the requirements and needs of the system. There are some components or aspects which are very important for the infrastructure of grid computing.
In grid infrastructure, security functions are responsible for the authentication, authorisation and secure communication between different grid resources. The clients can gain access to the grid only by registering with the security provider based on the security mechanism (PKI or Kerberos) used. Also the encryption techniques are used between different grid systems to provide message integrity and confidentiality. The firewalls used limits the types of services and protocols used to secure the grid systems, networks and the grid servers. These firewalls are not the only thing to protect the grid servers but they put an additional defensive layer to stop any internal or external user to get access to your systems.
In a grid infrastructure, networks occur in many different shapes. This component represents the LAN or WAN communication between the grid networks. This component is responsible to provide the adequate amount of bandwidth to the grid systems. Just like other components, the networking can also be modified to provide best of performance, availability and security. So to ensure adequate performance, the infrastructure must be designed in order to handle a significant network load.
Systems and Resources Management.
To determine or enhance the availability and performance in a grid, a set of systems management tools must be required. It is very difficult to give the support and information about the health of the grid infrastructure without these tools. The resources management tools act as the interface between multiple heterogeneous resources of the grid. The different computers can get access to the dispersed resources within a grid only due to these resources management tools as these tools work with different authentication mechanisms to map the grid clients to the different resources within the grid.
To cope with the dynamic nature of the grid, information services are the best way for the resources. Depending on the availability to process and the sharing of data, both the CPU and the data resources fluctuate within a grid. So if the resources are free in the grid then they can update their status in the grid information services and hence the clients can get that important information and can take intelligent decisions regarding the use of free resources.
Along with all the components, the most important aspect is still the Data. In an infrastructure, it's very important to determine the data requirements and the suitable way to move the data around the grid's infrastructure. This can be possible by using some standards protocols to communicate between the different data sources available. There are also some other choices for building a federated database which can create a virtual data store or some more options to store the data e.g. storage area networks, dedicated storage servers and network file systems.
In a grid, the storage possibilities have no end. The point is that how that storage can be secured, backed up, managed and replicated if required. In a grid infrastructure, it must be sure that the data must be available to the resources whenever they need it. Along with availability, the security of the data is also very important as we do not want the unauthorised access to the sensitive data. So to achieve these factors, more decent performance must be required to get access to the data.
Topologies in Grid Computing
There are three different topologies which come under computational architecture of grid computing which are as under:
1. Intragrid Topology.
The simplest topology among all topologies is the Intragrid topology. A typical intragrid topology exists in a single organisation providing a basic set of grid services. In a single organisation, a number of computers are connected to each other and share the data internally on a private network of that organisation in a common security domain. So the primary characteristics of an intragrid topology is that it has a single security provider, the bandwidth in a private network is always available and is quite high and a single environment as well. A relatively static set of computing resources is provided and the ability to share data between the grid systems is very easy. There is no integration of any partner in intragrid topology and it comprises only one cluster per organisation.
2. Extragrid Topology.
An extragrid can be defined as the combination of two or more intragrids or we can say that an extragrid expands on the concept of bringing two or more intragrid together as shown in the figure below. So in extragrid topology, it involves more than one security provider and hence the management is a bit more complex as compared to intragrid topology but it also have some characteristics like dispersed security, multiple organisations and remote or WAN connectivity. In extragrid topology, the grids must need to be more reactive to the failure of components and resources due to the more dynamic nature of the resources in it. Also the information services have become more relevant that the grid resources can access the workload management efficiently at run time. A business can get benefit by having a business initiative to integrate itself with external trusted business partners.
3. Intergrid Topology.
An intergrid topology is the aggregation of both intragrid and extragrid topologies. It requires the dynamic integration of all applications, resources and services with the trusted partners, customers and the authorised organisations so that they can get access to the grid via the internet or WAN. An intergrid topology can accommodate lots of organisations as shown in figure below and is primarily used by engineering firms, manufacturers, big science industries and by the businesses in financial industry. Globally dispersed security, multiple organisations and remote connectivity are some of the main characteristics of the intergrid topology. It uses the global public data so the applications in intergrid topology must be modified globally. Intergrid topology is necessary for a business if it needs peer to peer computing, a collaborative or aggregated computing community or end to end processes with all the trusted and authorised organisations using the intergrid.
Performance Evaluation of Grid Computing
Description and Discovery of Grid Network Services-
The description and discovery technologies for Grid network services play a key role in the integration of networking systems and their performance. Current Grid service description is based on the Web Service Description Language (WSDL). The WSDL-based service description provides information only about functions and invoking interfaces of the grid services rather than the Quality of Service (QoS) capability of a service, such as the minimum service rate and the maximum delay guaranteed by the service provider to a client. Therefore, current Grid service discovery mechanism is based on functional criteria
Instead of performance criteria; that is, the Grid service broker selects a service without considering the achievable QoS performance of the service. However, network QoS is significantly very important aspect for high-performance Grid computing and therefore the service broker must select that network services that can guarantee the appropriate QoS performance required by Grid Applications. So there are some new approaches for describing the QoS capabilities of network services and discovering network services to achieve appropriate QoS performance required by high-performance Grid computing. The major difficulty in performance-based grid network service discovery lies in the heterogeneity of Grid networking systems. So a general approach for describing and discovering various services of different heterogeneous networks is required. So a new approach for describing the QoS capabilities of Grid network services is proposed here. This approach is applicable to different network implementations as well as to the networking systems of different heterogeneous domains. In this new approach, a technology is developed for discovering Grid network services that enhance the Grid networking performance by guarantying the QoS performance required by Grid applications. In grid computing, a service is defined as a self-contained implementation of some function(s) with a well-defined interface specifying the pattern of message exchange which is used to interact with the function(s). A Grid service should provide enough descriptive information about the required functions and interfaces to gain access to the service and this descriptive information is called a service description, which is published at a service registry because when a computing application needs to utilize or wants to use the Grid services, the application submits a job request to a service broker and specifies about the functions it want to use. The service broker then searches service descriptions published at the registry to discover the appropriate service that supports the functions required by the application. Then finally the service broker retrieves the necessary binding information of the selected service and binds it with the application accordingly. So to discover the appropriate service for each
Application is the key to high-performance Grid computing and service descriptions form the basis for successful service discovery. Therefore, service description and discovery play critical roles in enhancing the performance of Grid computing. The network-Grid integration requires new mechanisms for describing and discovering Grid network services. Since network QoS has significant impact on Grid computing performance, it is extremely important to discover the Grid network services that can guarantee the performance required by Grid applications. Therefore, the description and discovery mechanisms for Grid network services should be performance-based instead of function-based. One of the main challenges of describing QoS capabilities of Grid network services lies in the heterogeneity of Grid networking systems. Due to the wide geographical distribution of resource sharing in Grids, it is very likely that the underlying networking system for a Grid consists of multiple network domains with various implementations. Therefore, the description and discovery approach for Grid network services must meet the following requirements: first, the approach should be applicable to various networks without different implementations; second, the approach must support the composition of multiple heterogeneous network domains into one cross-domain network service.
1. Description of Grid Network Services.
The main function of a Grid network service is data delivery and the provisioning capability for data delivery includes two aspects: the destinations can be reached by the network i.e. Reachability and the achievable QoS performance for data delivery to each destination i.e. QoS capability. So Reachability can be defined by giving all pairs of sources and destinations between which the network can transfer data and QoS capability typically include the minimum bandwidth and the maximum delay while transferring the data between the source and the destination. To provide a formal description for network service capabilities, we define the Capability Matrix C that describes both reachability and QoS capability for a network service and the network service S with m ingress ports and n egress ports is defined. The capability matrix C for this service will be a m n matrix whose elements are denoted as ci,j where (i = 1, ,m; j = 1, , n); that is,
Where Si,j is the QoS descriptor for the route Ri,j . According to the definitions given in (1) and (2), the capability matrix element ci,j = 0 if the network service cannot reach j from i. If the network service has a route from i to j, then the QoS capability of this route is given by the descriptor Si,j . In a Grid network service the routes between different ingress-egress pairs may have different implementations. Therefore, the key requirement for the QoS descriptor is to be applicable to various networking systems. To achieve this objective, the notion of service curve in the network calculus theory is adopted here to define the QoS descriptor. The service curve in network calculus is defined as follows in network calculus. Let Rin(t) and Rout(t) respectively be the accumulated amount of traffic of a flow that arrives at and departs from a server by time t. Given a non-negative, non-decreasing function, S(), where S(0) = 0, we say that the server guarantees a service curve S() for the flow, if for any in the busy period of the server,
Essentially a service curve gives the minimum amount of service offered by the server to a client in an arbitrary time interval within a busy period.
Such a curve describes the lower bound of the service provisioning capability offered to a client. A typical server model for networking systems is the Latency-Rate (LR) server, which guarantees each flow a service curve, where and r are respectively called the latency and service rate for the flow. LR server is a general server model for networks. In our service description approach, we adopt the service curve guaranteed by the route Ri,j as the QoS descriptor Si,j used in the capability matrix C. Since a service curve is a general data structure that is independent with network implementations, it is flexible enough to describe heterogeneous networking systems. In a typical networking system where a route Ri,j can be modelled by a LR server with a service curve, the matrix element ci,j can be represented by a data structure with two parameters: the service rate and the latency. The end-to-end Grid network service utilized directly by a Grid application is very likely a service consisting of a set of network domains. Therefore, composing the QoS capabilities
of a set of heterogeneous network links into one descriptor for the end-to-end route is an important and challenging problem. The approach containing the description based on service curve support composition of QoS capabilities. Assume that a service system consists of a series of servers G1, G2, , Gn, which respectively guarantees the service curves S1(t), S2(t), , Sn(t) to a flow, it is known from network calculus that the service curve guaranteed so S(t) can be obtained through the convolution of the service curves guaranteed by each server; that is, The equations (5) and (6) imply that the total latency of an end-to-end network route is described by the summation of the latency parameters of all links on the route, and the bandwidth of the end-to-end route is limited by the link with the least transmission rate.
2. Discovery for Grid Network Services.
In this section we are going to discuss about a technology that enables the service broker to discover Grid network services to meet the requirements of performance for various Grid applications. This technology focuses on network service selection along with all other important components of the discovery mechanism like publishing service descriptions, searching the registry for available services and binding the selected service with the corresponding application.
In order to conduct performance-based network service discovery, a service broker needs three aspects of information:
(a) The provisioning capabilities of available network services.
(b) The performance requirement of the application.
(c) The character of network traffic load generated by the application.
The information in (a) can be obtained from the capability matrix C provided by the network service provider and the rest two aspects of information, which specify the demand of a Grid application on the network service, should be provided to the service broker by the application as part of its request. So here we define a Demand Profile, as a general specification of Grid application demands. This profile consists of three elements: the pair of source-destination for data
Essentially the arrival curve of an application gives the upper bound of the amount of traffic the application can load on the network service. In fact most QoS-capable networks apply different mechanisms for traffic regulation at network boundaries to shape arrival traffic from various applications. The traffic regulators most commonly used in practice are leaky buckets. A traffic flow constrained by a leaky bucket has an arrival curve are respectively the peak rate, the sustained rate, and the maximal burst size of this flow.
Now a technique is proposed for the service broker to predict the QoS performance that can be guaranteed by a network service to an application to decide whether the network service meets the application requirement or not. Among various performance requirements of an application minimum bandwidth and the maximum delay for data delivery are the most important performance parameters of high-performance networking for Grid applications. As network calculus provides an effective approach for analyzing the minimum bandwidth and maximum delay guaranteed by a network service. A service curve itself is a description of the minimum service capacity offered by a network, which gives the minimum bandwidth guaranteed by the network to an application. Therefore, given the QoS descriptor for a route in a network service, which is described by a service curve, the minimum bandwidth guaranteed on the route can be determined as
Resource management is one of the important aspects regarding the performance of grid computing. An ideal grid environment should provide access to the available resources in a seamless manner. The main aim of the resource management is to efficiently schedule applications that need to utilize the available resources in the grid computing environment. To do so, it should convert a huge heterogeneous environment into the virtual homogeneous one which exhibits two main challenges.
To resolve these problems, an agent based hierarchical model is used in which an agent is considered to be the both a service provider and a service requestor. Here the service is used to describe the details of a resource in a grid. The model shown below is used as the basis to understand the mechanisms of service discovery and service advertisement which in turn can be used for the implementation of various functions of resource management.
In this model, a single component the agent is used compose the whole system. Same set of functions are assigned to all the agents to send requests and to provide services as well. Every agent can act as a router between a request and a service. To differentiate different levels of the agents in the hierarchy, some terms are used like the head of the whole hierarchy is the agent called Broker which maintains all service information of the system whereas the agent that heads the sub-hierarchy is Coordinator and the leaf-node is termed as an agent. So When a new agent wants to join the system, in the hierarchical model, it will broadcast to find its nearest existing agent. An agent can only have one connection to an agent higher in the hierarchy to register with, but can be
Registered with many lower level agents. All requests that enter a sub-hierarchy must arrive at the coordinator of the sub-hierarchy first and then dispatched to the lower agents. From the view of service providers, a sub hierarchy can be regarded just as an agent. To get sufficient access to the services, if an agent has the required service information, it can contact the target agent directly. Otherwise, it must search its local agents, or ask its upper agent, for a service discovery to find an agent that can provide the requested service. The lower or upper agent can also
ask or contact other agents for assistance or help until the service information is found. All the connections between the different agents are broken once use of the service is finished. Also due to the dynamic nature of the grid computing, the services offered by the agents changes periodically and hence the corresponding service information also needs to be updated. The dynamics of this system increases the difficulty of resource management and allocation. So now the issue is that how an agent advertises its services and coordinates with the other agents in an efficient way. Two main situations come into picture here:
1. No Service Advertisement- in this case, the agents have no knowledge about the services offered by the other agents. So whenever a service is required, a service discovery procedure is requested which can be complex and uses a number of agents to get the service. So this is called a pure Data pull model because at the time of discovery the service information is pulled from the agents.
2. Full Service Advertisement- it requires no service discovery process. In this situation, each agent advertises to all other agents in the system providing all the available service information to all other agents in the system. The service can be finding directly whenever a request is made. This is a pure Data Push model because the service information is always pushed to all the agents during the process of advertisement.
Different systems can use different optimisation strategies to achieve high performance. For example in static systems, where the frequency of change in the service information is far less than the frequency of service request, the pure data-push model can be used to achieve high performance service discovery. In extremely dynamic systems, where the frequency change in the service information is far greater than the request frequency, the pure data-pull model is used. Still there are some more criteria's on which the performance of the system depends a lot which are discussed as:
Discovery Speed- Each request from an agent can pass one or more agents in order to find a target agent that can provide the required service. The performance of the discovery process is mainly based on the number of routing connections, since the size of data communication is small. Fewer connections have a quick discovery process, and the higher system performance. There can be simultaneous services requests in the whole system. So the average service discovery speed, v is defined as: where r is the total number of requests during a certain period and d is the total number of connections made for the discovery.
System Efficiency- The cost for the service discovery also includes connections made for service advertisement and data maintenance. Service advertisement may add additional workload to the system. For each request to find a corresponding service, the total number of connections, c, between agents includes those for the discovery
Processes, d, and also those for the advertising processes, a. i. e.And hence the efficiency of the system can be considered as the ratio of the total number of requests, r, during certain Period, to the total number of connections c and is:
Load Balancing- In some of the systems when the system resources are Critical, load-balancing may be an important issue. In this system, no agents are used only for service discovery. There is no reason to have any agent with a higher discovery workload than any other. For a system with n agents, the workload, wk, of each can be described as
Success Rate- In some of the performance optimisation strategies the
Discovery model cannot guarantee to find the target service (that may actually exist in the system). However, in a general system a reasonable service discovery success rate should always be achieved. The success rate, f, describes successful service discovery:
There are some performance optimisations strategies which can be used to enhance the performance of grid computing to a great extend. In a more dynamic environment, the combination of these strategies give better results in resolving the performance issues.
1. Use of Cache.
Cache is used to temporarily store the most frequently used data with the ease of using it again in just one step. So caching previous service discovery results is a good strategy for performance optimisation that assumes a request may be required more than once. Cached service
Information is expressed as C_ACT. C_ACT is a data type called as Cached Agent Capability table. When an agent sends a request for service discovery, the result can be stored in C_ACT, and hence can be looked up on next request. If however the service has changed and is not
Available any more, the agent may update the C_ACT and perform another service discovery. Many current network applications use caches to optimise performance. Using cached service information may result in direct service discovery in one step and also it adds no additional data maintenance workload. However, if the service information changes frequently compared to the request frequency, using cache may decrease the service discovery speed. So the efficiency of using cache depends on the characteristics of the actual system.
2. Using Local and Global knowledge.
Another performance optimisation strategy is to add some local or global knowledge to an agent which assumes that services are often accessed by local agents. A request may need fewer connections to find the local service as the higher-level agents need not take part in the discovery
Process and hence reducing the system load. Suppose an agent needs to coordinate the other agents to find the services, two kinds of ACTs can be used in each agent to record the service details and information, which are local (L_ACT) and global (G_ACT). Each agent has one L_ACT to record the service information about the agents registered with it. If a request is within the capabilities of the local agents, the agent can directly transfers the request to the target agent. The G_ACT in an agent is actually a copy of its upper agent's L_ACT. Thus an agent can have the information of more services and be able to contact them directly without submitting the request to the upper agent. Unlike the C_ACT, additional data maintenance workload is needed for the L_ACT and G_ACT because all the agents must have to maintain or update their L_ACT and G_ACT according to any change in the network services. The process for the service discovery using the L_ACT and G_ACT is also different from that using the C_ACT. When an agent receives a request, it will look up its L_ACT. If the agent finds that one of its lower
Agents can provide the service; it will dispatch the request directly to the corresponding agent. Otherwise, it will look up its G_ACT. If G_ACT shows that another agent can provide the service, it will dispatch the request to that agent. If the service is not found in either, the agent will ask its upper agent for further service information. After the upper agent returns the result, it can update its own G_ACT and return the result to the agent who originated the request.
3. Limit Service Lifetime.
One of the strategies for optimisation of performance is to add a service lifetime limitation to the attributes of the service information. This lifetime should be pre-estimated before the service is advertised. The agent can check the ACTs frequently and delete out-of-date service information. This can avoid unnecessary routing processes and increase the speed of service discovery. There is also no additional data maintenance workload. However, the lifetime of some services in the system may be unpredictable.
Security In Grid Computing and its Issues
Security is one of the most important and critical aspect in grid computing. It doesn't matters that how good are your infrastructure and how much you have enhanced the performance if the system is not a robust and secure system. Basically by security, we mean the processes or the mechanisms implemented to protect our services, resources, sensitive data or information from being accessed or used by the intruders or the unwanted sources. There is always some important information or data which is always intended to be read or to have access only by some specific sources or organisations so to obtain this privacy, some techniques or mechanisms are used which do not allow the unwanted sources to get access to any information or data and hence provide a good security to the system. So as we all know that a computational grid consists of different dynamic heterogeneous administrative domains interconnected with each other to perform various computations. Different users used to have access to the different resources according to their security policies and the security aspect among the various users accessing the different dynamic resources has become very challenging and critical. So here the Security Infrastructure of grid computing involving various terminologies like Authentication, Authorisation, Confidentiality and Integrity etc are described giving a qualitative view of various issues in security of grid computing and the possible solutions. So to examine the different techniques of security, first we have to start with some basic fundamentals of grid security. Actually security requires three basic fundamentals e.g. Authentication, Authorisation and Encryption. First of all the grid users and resources must be authenticated before starting any event of any process in grid computing. Once the authentication is done properly, the grid user can be authorised or can be given some rights to get access to the relative resources and the encryption is used to prevent the data from being captured or stolen from the transit state between the various grid resources. To better understand the grid security, some important terms are defined or explained as:
Authentication- is the mechanism to verify the identity of any individual or to identify that the individual is really the one who he is supposed to be. Apart from the human beings, all services, applications and different entities also need to authenticate themselves.
Authorisation- is the process to determine that what the individual is allowed to do or which resources the individual can access in a grid. It determines the limit of the rights given to anyone.
Data Integrity- is the assurance that the data is not altered or destroyed in an unauthorised way or you can say that when any data is send from the source to the destination via a secure communication link then the data is not changed during the duration when it leaves the source and when it reaches to destination.
Data Confidentiality- it assures that the data is send or received only by the intended sources or by the resources which are supposed to do so. It means that the sensitive information must not be revealed to those parties that are not meant for.
Encryption- is the process used to properly secure the data or information during the time of transmission from a sender to receiver. It uses different techniques to encrypt the data before sending and then decrypt it at the end of receiver giving a secure blanket over the whole data during transmission.
Key Management- it deals with the secure generation, distribution, authentication and storage of different keys used in encryption techniques.
There are concepts which are very important regarding the security in grid computing. These concepts give a brief background that how grid security works in dynamic grid environment. These concepts are discussed below one by one:
1. Symmetric Key Encryption.
This is one of the encryption technique in which only one shared secret key is used for both encryption and decryption of the data. The secret key used in this technique has to be shared or distributed very securely only between the intended sender and receiver to ensure the Confidentiality of the message or data. If anyhow any third party apart from the sender and receiver would get access to this secret key then it will be very easy for the third party to decrypt the encrypted message and can get access to the sensitive information. This symmetric key encryption is very faster as compared to Asymmetric encryption. Some commonly used examples of Symmetric Key encryption techniques are Data Encryption Standard (DES): 56-bit key plus 8 parity bits, Triple-DES: 112-bit key plus 16 parity bits or 168-bit key plus 24 parity bits. Summarising, secret key encryption is faster for encryption as well as decryption but the guarantee of secure distribution and management of secret key is difficult.
Symmetric Key Encryption using a Secret Key
2. Asymmetric Key Encryption.
Asymmetric key encryption is another commonly used encryption technique and is also called as Public Key encryption. The prime example of public key encryption is RSA encryption standard. In this technique, an asymmetric key pair consisting of one Private Key and one Public Key is used. One key is used for encryption and another one is used for decryption. In asymmetric key encryption, the private key is kept secret and the public is not secret at all and can be made available for the public. Normally, the digital certificate issued by the Certificate Authority contains the public key of the user. So a computation algorithm is designed to calculate the pair of Private-Public key in such a way that if one key (private/public) encrypts the data then the corresponding key (public/private) of that pair can only decrypt that data i.e. whichever key (private or public) is used to encrypt the data, the other key is required to decrypt that data. The public key is made available openly to anyone through a trusted Certificate Authority (CA) which is the owner of all the public keys. This asymmetric key encryption technique works in a way that the public key is used twice to secure a message completely in a transmission e.g. the sender encrypts the message first by using its own private key and then again encrypts the result by using receiver's public key and then send the data. At the receiver's end, the receiver decrypts the encrypted message first by its own private key and then again decrypts the result by the sender's public key and hence no one can read the message between the sender and receiver. And if any alteration will take place in between with the message then it will not be decrypted properly and the receiver will come to know about the alteration. Furthermore it is very difficult for any intruder to calculate the key pair in an unauthorised way because this key pair is computed by an algorithm by multiplying two very large prime numbers and get a product but anyone can not guess the two original distinct numbers from the product or the one number if the other is known like even if the public key is available then it is very difficult for the computers to calculate the private key.
3. Certificate Authority (CA).
Certificate Authority is responsible to provide the valid certificates to the different users in a dynamic grid computing. To achieve a good security, there are some responsibilities associated with Certificate Authority which must be followed in proper way. i.e.
- To identify the different entities or users requesting the certificates.
- To issue or remove the requested certificates.
- To protect the CA server from unwanted access.
- To maintain a database of the unique names for various certificate owners.
There is one RA (Registrant Authority) which works with CA to give help in completing some of its duties. This RA is responsible for the approval or rejection of the different requests for the certificate of public keys and it also forwards the important user information further to the CA. The RA validates all the information provided by the users and then if the information is corrects then only it send the requested signed digital certificates back to the users. Sometimes if the number of jobs is not quite high, the CA can handle the role of RA. Also just like other users, CA has to do the same thinks to prove its identity because system's trustworthiness is a very critical issue in a dynamic grid environment. So for this, CA must has to generates its key pair and has to secure its own private key, it also creates its own certificate and signs that certificate with its private key. As the private key of CA is used to sign every valid requested certificate in a grid environment so the probability of attacking the private key of CA by hackers is quite high. If anyone could get access to this key, he would be able to alter or intrude anyone in the whole environment. Hence it is very important to provide proper security mechanisms to the CA server to avoid any discrepancies.
4. Digital Certificates.
A digital document that certifies or represents a grid resource with its unique public key enclosed in it is called as digital certificate. It is a type of database which contains the public key and some details about the key owner. These certificates are used by the Certificate Authority to identify or authenticate any user in a grid environment. The important fact about the digital certificate is that the CA certifies that the enclosed public key belongs to the entity listed in the certificate. Technically, it is very difficult to alter any certificate even with an easy detection because the CA uses its signature on the certificate to provide the message integrity check. It is used in a way like if two parties want to communicate with each other, the sender first send his certificate and his own public key with the message to the receiver. When the receiver receives the certificate, he just checks the signature of the CA on the certificate. If the certificate is signed by a CA that the receiver trusts, then the receiver will understand that the public key attached with the certificate is really the public key of the sender and hence no one cannot use a wrong public key to impersonate any public key owner. By summarizing it all, there are few steps to explain how a grid user obtains its certificate from a Certificate Authority (CA):
1. First of all, the grid user who wants to get certification from the CA generates its key pair i.e. a private key and a public key.
2. Then the grid user needs to sign its own public key plus some information required by the CA and needs to attach these things in a certificate. By signing the public key, the user certifies that he holds the private key corresponding to this signed public key and give assurance that CA can verify it if needed.
3. This signed information is then transmitted to the CA securely and the private key of the grid user should be kept secured only at the user.
4. Then the CA needs to verify that the grid user really holds the private key corresponding to the public key sent to CA.
5. Now for this, CA or RA needs to verify the identity of the user and this can be done by some methods like by using telephone, e-mails or face to face conversation.
6. After getting a positive result from the identification process, CA then creates a digital certificate for that grid user by signing that particular public key to the user and at last this certificate is passed to the RA to forward it to that grid user.
After elaborating these security fundamentals, now it comes the turn of how the different security components interact with each other to give good security and robustness to the grid environment. For this, it is necessary to explain the various functions and scenarios of a Grid Security Infrastructure (GSI). So the different functions of grid security infrastructure are discussed here one by one:
Getting Access to the Grid.
To build a dynamic grid environment using the GSI components, firstly you have to create your own set of keys (private and public) for the encryption processes, also you need to request the required certificates from the Certificate Authority (CA) and a copy of the public key of the CA. The procedure to get access to the grid is more clearly described by the following steps and the figure shown below:
1. To set up the GSI, first of all copy the public key of the CA to your host computer.
2. Then create your own set of keys that is your own private key (needs to keep secret) and public key and also the certificate request consisting your public key and some information.
3. Now send the certificate request to the CA by some secured way of transmission e.g. e-mails etc and then the CA needs to identify you by the authentication process.
4. After identifying you positively, the CA will sign your certificate by assigning you a unique public key and will send your certificate back to you. So after completing this procedure, you will be having three important files on your grid host i.e. the Public key of the CA, the private key of the grid host and the digital certificate of the grid host.
Authentication and Authorisation.
After being successful in getting access to a grid, you need to communicate with another grid hosts to gain access to the various grid resources to complete any task or to run the different applications. So before starting any communication with any grid host you need to authenticate or prove your identity or you need to being authenticate by other hosts and authorisation is also required which will prove your capability that whether you have been granted the permission to access any specific resource/application or not. So for authentication and authorisation mechanisms, imagine a grid environment where you want to communicate with an application or resource which belongs to another grid host and also you want to be sure enough that the host with which you will communicate is really the host you want to communicate or you can say that you trust on that host computer. But besides this, the important thing is that the host with which you are going to communicate must trust you because if that host will not trust you then that will never allow you gain access to the applications or resources of that particular host computer. So the authentication and authorisation functions of GSI come in picture here to solve these problems and make the sharing of different resources possible in a grid. Now once you have been authenticated by the remote grid host, then you will have the option that whether the remote grid host will give you access to its resources on your behalf or not. So for this, you can use the function of Authorisation in GSI. Briefly speaking, in a dynamic grid environment, in some cases your host will become a client and in some cases a server. Therefore sometimes it has to authenticate someone and sometimes it has to be authenticated by someone at the same time. So in this case, we use the method of Mutual Authentication which is also a function of GSI which is done in a very similar way as authentication and authorisation is done. So here some steps and a figure is shown below to make you understand better the concept of authentication and authorisation, here you are assumed as Host A and the remote grid host is assumed as Host B.
1. Firstly, Host A or you send your certificate to Host B with which you want yourself to be authenticated.
2. Then the host B will use the public key of CA to get the subject and the public key of host A from the certificate sent.
3. Now the host A will create any random number by its own and will send it to host A.
4. After receiving that random number, host A will encrypt that random number with its own private key (which is kept secret) and will send the encrypted message back to the host B.
5. After getting the encrypted message from host A, host B will decrypt that message by host A' public key sent in certificate before. Host B will check that if the decrypted number is really that number which was sent to host A then host B will authenticate host A because the message encrypted by host A with host A's private key can only be decrypted with the public key of host A.
6. Now once host A has been authenticated by host B, then the subject of host A in the certificate is mapped to a local user name in the form of "Distinguished Name" (DN). Basically, Authorisation is the process of mapping your DN to a local user or group of remote grid host in a dynamic grid environment. Then this subject of host A will be used in grid environment to specify the identity of host A and then the owner of the DN i.e. host A will be authorised by host B to act as a local user on host B.
Authentication and Authorisation
In a grid computing environment, there are some situations in which the various jobs or tasks are distributed among the different remote grid hosts to perform them and also these remote grid hosts used to split the tasks to sub tasks and further distribute these sub tasks to the other remote hosts under the same security policies. So in these circumstances we can use the Delegation function of the GSI. Delegation can be defined as an authority to any host to act as someone else or we can say to create a proxy of you on any host and then can communicate with any other on your behalf host by using that proxy. So this procedure is explained well in a few steps with the help of a well structured diagram shown below. Here you are assumed to be on Host A and to delegate your authority, you (host A) are going to create your proxy on host B and then your proxy acting as yourself and which is at host B will submit a request to host C on your behalf.
1. First of all, a secure and trusted communication link is made between host A and host B.
2. After creating the secure communication, to delegate its authority, host A will request host B to create its proxy.
3. Then host B will request host A for the proxy certificate.
4. Host A signs the request and creates its proxy certificate using its own private key.
5. Now the host A will send both of its signed proxy certificate and its digital certificate to the host B. After completing these steps, you have completed the task of proxy creation on a remote grid host i.e. host B. Now the following steps will describe that how you will communicate with another grid host i.e. host C.
6. Now the host A has successfully created its proxy on host B and this proxy of host A on host B now delegate its authority on the behalf of host A. So this proxy will send host A's proxy certificate and digital certificate to the host C.
7. Now the host C will use this information to get the required things like it will get proxy's public key by the path validation process used in remote delegation. In path validation, a certificate is verified that it is valid or not or a checking process is applied to verify the validity of the identity which signed the certificate. Also host C will get the public key and subject of host A from host A's certificate by using CA's public key. And also using the public key of host A, host C will get the proxy's subject and proxy's public key from the proxy's certificate as shown in the step 7 of the figure. As we already know that the subject in a certificate is a unique "Distinguished Name" (DN) and this subject is used by the host C to authenticate the proxy of host A i.e. the subject of host A must be similar to the subject of its proxy and host C have got both of the subjects and hence host C will just simply compare the subject of host A and the subject of proxy, if they are the same then the proxy will be authenticated by host C and then it can act on the behalf of host A.
8. After being authenticated by host C, the proxy will encrypt its request message by its private key and send it to host C.
9. Host C will use the public key of proxy to decrypt the message and gets the original request.
10. Now at last for authorisation, host C will run the request send by the host C under a local user's authority and the user is specified in a grid environment by a mapping file which will then represents the mapping between the local users and grid users.
In this process of remote delegation, when we create a proxy on the remote grid host or on the remote machine then the private key of the proxy is on that remote machine as shown very clearly in figure. So the user of that machine can have access to the private key of the proxy and hence it can be vulnerable to attacks or impersonation so it is always highly recommended that the proxy must have restricted policies from its owner to avoid any major discrepancies.
Grid Security Communication.
After describing the major components and functions of Grid Security Infrastructure, it is also very important to understand the concept of communication in a dynamic grid environment. The whole communication in a grid computing is based on the process of mutual authentication of the components like digital certificates and SSL/TLS function. As we have already discussed about digital certificates installed on the grid hosts and are used to carry out the process of mutual authentication between the different hosts. The SSL/TLS which stands for Secure Socket Layer/Transport Layer Security is a protocol which is used to encrypt all the information or data transferred between the different hosts to achieve enough security. So these two functions work together to provide authentication and confidentiality which is the basic requirement of a secure and robust system.
Mutual Authentication is required when two grid users want to share their resources with each other. Based on their digital certificates, each grid user has to authenticate itself with one another. Once the users are authenticated then they can get access to each other's resources and hence can share their resources with each other in a dynamic grid environment. So in order to maintain a secure communication between a grid client and a grid server, a process called as "SSL Handshake" is required which is responsible to carry out the mutual handshake process, to determine SSL settings and to exchange the public keys. This SSL Handshake process is described well with the help of some steps shown below:
1. To start a secure communication or session, a grid client first needs to contact a remote grid server with a digital certificate.
2. To use the SSL protocol, the grid client needs to send the required information to the grid server like client's SSL version number, randomly generated data and the cipher settings etc.
3. Then the grid server automatically reply to the grid client by sending the same information of its own like its SSL version number, its digital certificate and the cipher settings etc.
4. After getting all the information from the server, the client then do a examining test to verify that
- the certificate of the server is valid or not.
- the CA of the server, who signs the certificate of server, has been signed by a trusted CA or not.
- the public key of the CA of server validates the digital signature of the CA or not.
- the domain name provided by the certificate of the server matches the actual domain name of the server.
5. Now after the successful authentication of the grid server, the client will generate a specific key called as "Session Key" which will be used to encrypt all the communications between the client and the server using asymmetric encryption standard.
6. Then the grid client encrypts this Session Key by using the public key of server so that the server can only decrypt it and send the encrypted message to the server.
7. After receiving the encrypted message, the server decrypts it with its own private key and gets the secret Session Key.
8. After creating the session key, both the grid server and the client send the message to each other that in future they are going to use that session key to encrypt or decrypt all the communicating messages between them.
9. So in this way, a SSL Secured Session is established in between the Grid server and the user client who will use the symmetric key encryption (because it is much faster than asymmetric one) for encryption and decryption in this SSL secured pipeline.
10. Similarly other grid users can authenticate themselves and can get the access to the various resources and applications in a grid computing.
11. At last, once the session is completed, the Session Key is eliminated.
The process of mutual authentication is will continue as long as both the hosts have a valid digital certificate. Here it is shown clearly that how a grid security uses both asymmetric and symmetric encryption techniques to encrypt and decrypt the data transferring between two grid hosts. Firstly the grid client uses asymmetric encryption during the authentication process and after being authenticated, it starts using symmetric encryption technique by using a shared secret key for the encryption and decryption of the data.
Apart from all these components of Grid Security Infrastructure, there are also some other factors which matters a lot when we talk about the security in Grid Computing. Although these components or factors are optional sometimes according to the organisation or networks but they are also considered as standards in some networks. So we are going to discuss some of those components one by one and will explore that how they fit into the infrastructure of grid computing.
1. Physical Security.
In a grid infrastructure, physical environment is also considered a major part of it. It involves the basic solid physical security practices needed for the various hosts or computers in a grid computing environment. Like if the grid servers are not physically secured or anyone can get access to the physical devices operating as servers or clients then it doesn't matters that how strongly your security policies are designed and how robust is your cryptographic techniques to secure your system, the different services or applications can be interrupted very easily. The intruder can interrupt in any way like power off the servers or can tamper the important information. Therefore, there should be appropriate methods or policies to control the physical access to the various grid components like grid clients, grid servers, certificate authority etc. CA server is one of the most important and sensitive component in a grid computing because anyone getting access to the CA server can tamper any certificate of any user and can impersonate anyone in a grid computing. So for security purposes, the place where the CA server is located should be properly locked, dedicated and robust enough to avoid any unauthorised access. Also there should be a provision to logged and control all the accesses to the CA server so that the personnel related to CA server can only go in and have access to it. An uninterrupted power supply is very important for the servers in grid computing so a UPS (uninterrupted power supply) must be used to give power supply to the different servers and for the emergency there should be enough mechanisms or provisions for the backup of the servers so that in worst case, the servers can be able to back up the data automatically and can be shut down properly.
According to my point of view, the segment or part of the network where the entire sensitive server machines like CA server, proxy server etc are installed must be separated from the rest of the network logically as well as physically. This segmented part then must be provided with more tight and secure techniques like it should be separated by a firewall that will allow only that traffic which is related to these sensitive machines and drop all the other traffic.
2. Operating System Security.
In a dynamic grid environment, another important component you need to secure is the operating system of your computing. Operating system used contributes lots to the different security policies in restricting the access to unwanted or unauthorised users and also detects the various unauthorised attempts made to get access to the grid resources. Therefore, some of the measures are elaborated here which can be used to protect the operating system from the intruders or sniffers:
- There are always lots of processes running in an operating system. It is good to remove all the unnecessary processes from the server like send email, FTP server etc if they are not required by the server then disable them.
- The unnecessary groups or users should also be removed.
- All the users having access to the grid server must use strong passwords which decrease the probability of hacking of passwords and hence make the system more secure.
- You should keep on updating your grid server with all the latest updates and different security packs.
- The access to the different directories of the grid server should be restricted according to the authorised policies.
- The logging and auditing of the grid server should be enabled all the time.
- You should keep on monitoring the important directories of the grid server by using the different host ID's.
- You should also enable the file level restrictions for the files which are really important in the grid server.
- Keep on monitoring or reviewing the operating system periodically so that you will be aware of all the major changes in the operating system.
- At last but not least, the anti-virus protection should be enabled at all the times.
3. Use of Firewalls.
Firewalls are the part of security components that are used to restrict a set of users or the traffic to get access to any part of the network. The firewalls allowed the authorised users to get access to the device or resources and dropout or destroy rest of all. So in grid computing environment, firewalls can be used to separate the sensitive part from the whole network and further can be used to protect those sensitive parts which required additional security as compared to the other parts of the grid computing. By using firewalls, we can restrict the access to the different computers and the networks as well. The different firewalls used in the grid security infrastructure should be designed and analysed carefully before their implementation because they are very important part of the whole security infrastructure of the grid computing.
4. Host Intrusion Detection.
Host intrusion is the name given to the process in which any unwanted or unauthorised personnel tries to intrude or tries to get access to any machine, device or any resource of the grid in an unauthorised manner. This is generally done by the various hackers or sniffers with the intention of stealing sensitive information and then using that information in an illegal way. So we should have enough strong security policies or techniques to befool these intruders and to protect our data or information from being steeled by them. In a grid computing, the best way to secure our hosts or machines from the intruders is to use a Host Intrusion Detection product i.e. IDS product. As there are various software applications running on the different local machines or hosts which stores important information or files, so it is very important to protect the local hosts from the intruders and it can be done easily by using the IDS product. This IDS can increase the level of defence for those who are trying to get access to those files or important information on different local hosts. This IDS product sends an alert to the central host or server if it detects any change in the information in any file on any host and hence also alerts the other hosts or machine about that change or alteration. An IDS used to gather the important information from the various hosts or the different parts of the network and then analyses that information to identify the different security leak points. The main function of an IDS product involves:
- Analysing and monitoring the different activities of the system and the users as well.
- Understanding the system's configuration and then analysing the relative vulnerabilities.
- Getting all the important information about the whole system and then assessing the file Integrity of the system.
- To be very active against all the attacks and to be able to recognise and identify the typical patterns of the various attacks.
- Analysing various user policies and keep on tracking for any violation of the user policies.
- Keep on analysing the various patterns of the activities performed and reports immediately about any abnormal activity pattern.
Security Risks In Grid Computing.
In a grid computing environment, the Public Key Infrastructure (PKI) and the Grid Security Infrastructure (GSI) provide the necessary important services to make it more secure and robust environment. Although these two factors contributes lots to make the system attack free or to protect it from all the vulnerabilities but still it doesn't guarantee that there are no security risks in grid computing. So it is very clear that there is always some risks or vulnerabilities like PKI vulnerabilities, grid server vulnerabilities etc in the security of grid computing. There are no such processes, policies and security tools which can completely secure any network environment. The risks are always involved but what we can do is that by using proper tools and policies, we can reduce the effects of these risks to a level which can be negligible. So here some of the vulnerabilities associated with the security of grid computing are described to give you a broader view of the risks resulting from these vulnerabilities so that proper steps could be taken to overcome the effects and provide a better security to the system.
Public Key Infrastructure (PKI) Vulnerabilities.
Just only by building a well structured PKI environment doesn't means that the network is fully secured network. There can be lots of vulnerabilities associated with the PKI environment for which you should aware of. So it is very important to work always with an open mind and good understanding that some risks are always involved in any networked environment. So specifically in a PKI environment, the major thing about which you have to worry is the thefts or leakage of the digital certificates of the users and the location of the private keys. Therefore some areas are considered to be more sensitive regarding the vulnerabilities in PKI environment. i.e.
- Impersonationit means that any intruder or any organisation can act on the behalf of any user or network correspondingly in the environment if the intruder can obtain the certificates by any fraudulent means or any unfair means.
- Theft of the private keyin this case, if any intruder or hacker can get access to the private key of any user anyhow then the intruder can use that private key and the valid original certificate in an unauthorised manner and can get access to the whole network.
- Compromise with CA private keythe most important and critical component in PKI is the CA's private key because all the users trust on the CA and if any unwanted user can get access to the private key of the certificate authority (CA) then he can sign invalid certificates to anyone and can destroy the private key of any user and thus can impersonate anyone in the whole network.
- Sometimes there are some decisions which are taken by the machine by itself automatically according to the situation and these decisions are called as automated trust decisions. So these automated trust decisions can also sometimes results in some frauds.
Grid Server Vulnerabilities.
In a dynamic grid environment, any grid server or any workstation can have a potential vulnerability to any intruder or hacker. So it is very important job to isolate or protect each and every grid user from the networks or users which are not allowed to get access to the grid and its resources and this can only be possible by applying some good security mechanisms and policies. There are no magical programmes or firewalls which can protect your grid users completely from the intruders and hackers but sometimes some common things can play a good role to give a good security provisions to the grid. Therefore some parts of the grid server are very important to protect anyhow which are described as:
- Physical security is one of the important parts of the whole security policies in grid computing which limits the exposure and access of anyone to the grid server.
- Protecting all the directories from unauthorised access also contributes in the security policies.
- We need to be very careful regarding the thefts of the private keys and the digital certificates and should take appropriate measures to protect the digital certificates and the private keys.
- We should also take care of the various application vulnerabilities and the processes running on the grid server.
- We should be aware of the any minor or major modifications or changes in the grid map file and also about the latest security packs of the operating system.