In order to send human voice conversation through data networks such as LANs, WANs and the Internet VoIP was designed. With this technology the existing data networks can be used in our advantage in order to transfer voice data, instead of the traditional Public Switched Telephone Network (PSTN). As easily can be thought this technology have many advantages based on the rapid growth of data networks. Benefits of Voice over IP (VoIP) in comparison with PSTN include:
- Cost savings
- Infrastructure savings
- New applications (such as voice mail, video conferencing etc.)
The basic protocols on which the technology relies are H.323 and SIP. H.323 is the proposed protocol of ITU (International Telecommunication Union) for VoIP. H.323's strength lies in its ability to have multiple implementations, range from single voice and video transmission to simultaneous multiple transmissions (conferencing), along with its compatibility with other networks such as PSTN when needed. This protocol also provides the means to conserve bandwidth and network resources since voice and particularly video can affect the performance of the network when are present.
SIP (Session Initiation Protocol) is the IETF recommendation for VOIP that was developed as a media based protocol. The features that made SIP so successful are that is very easy to implement, scalable, and extensible. SIP also provides encryption and authentication for protecting the transmission from attacks. In order to support the VoIP load between high-speed LANs and WANs the ISPs, in their network, use Frame Relay or ATM technologies as the preferred link level protocols.
Frame Relay is a popular solution for ISP. This technology is used by the service providers in their core network in order to provide a high speed, with reasonable cost, connection between the various LANs and WANs. With Frame Relay the users transmit data traffic through permanent virtual circuits known as PVC, which provide access at any time without having the big cost of a leased line. Depending on the importance of the data that needed to be transmitted the customers choose the level of QoS that they want and pay based on this level.
Another solution being adopted by the service providers for their core network is asynchronous transfer mode (ATM). This technology also allows the users to have access in permanent connections which are taking place by using hardware, a feature that provides fast processing and speeds up to 10Gbps. ATM technology is designed for real time video and voice transmission, all simultaneously. The architecture ATM uses switches that organise logical circuits which ensure incredible quality of service (QoS).
The aim of this dissertation is to focus and learn this interesting technology of VoIP and how it works. Another goal of the dissertation is to see how this technology can be used for enterprise use (user requirements, architectures, services, QoS) and how the ISPs can support this technology. A medium size enterprise scenario will be presented (features, requirements) and simulated in order to see how VoIP performs and in what degree the services of the enterprise can be affected. In order to perform this case study we will use OPNET IT Guru Academic Edition 9.1
Chapter 1: VoIP Protocols
1.1 H.323 Overview
H.323 protocol is actually a suite of protocols combined in order to support all kind of data communication, such as voice and video, through the existing data networks. The H.323 suite was developed and proposed by ITU-T organization, as a solution for voice and video transmission. The advantage of this protocol is that it was designed to work over the transport layer of the OSI model. This design made the use of H.323 efficient for most of the existing data networks, since the OSI model acts as a reference point. Examples of such kind of networks are LANs, MANs, WANs  and of course the internet. From the above we can conclude that H.323 compatibility contribute to its rapid growth and made its implementation popular.
ITU-T organization has released many protocols based on the type of networks and traffic that needed to be supported. Such releases are H.310, H.320, H.321, H.322 and H.324 . Each of these protocols was designed to work with one type of network. For example H.310 and H.320 were intended for ISDN, H.321 was developed for ATM and H.322 for LANs. Finally H.324 was designed to work over the PSTN. All these protocols function without problems until there is a need to communicate with protocols other themselves. The main reason for releasing the H.323 protocol was interoperability . With the implementation of H.323 users from different type of networks could communicate without considerable problems. This was the great success of H.323 protocol.
1.2 Scope of H.323
H.323 allows for voice and video communication between two or more users through the same or different data networks, without focusing entirely on quality of service (QoS). Due to its design (suite of protocols), H.323 has the ability to support various features such as the following:
Point-to-point and multipoint conferencing support: Through H.323 simultaneous multiple transmissions between more than two users can be achieved without using extra hardware or software. But even if such units are used like a multipoint control unit (MCU) , H.323 protocol can achieve decentralization of the conference in order for the users to have the ability of choosing which participants to connect to. This feature introduces flexibility into the communication.
Audio and video codecs: The H.323 recommendation  specifies an essential for the conferencing, audio and video codec. But the H.323 protocol does not restrict the use of other types of codecs regardless of their efficiency. The only restriction is that the codecs on which the participants concluded must be supported by all of the them.
Management and accounting support: H.323 allows for better management of the calls and the network resources. Also with this protocol policies can be applied such as call and time restrictions. So by using the above the network can be easily administered without serious problems, providing also adequate information for accounting services such as billing.
Security: Another important feature that H.323 protocol supports is the security that offers to the participants through security measures such as encryption and authentication.
Additional services: Last but not least is that ITU-T developed H.323 with an eye to the future because of the rapid growth of multimedia communications through data networks. So H.323 can be easily adapted to future technologies, by adding features, due to its design. This gives a great advantage to this protocol.
1.3 H.323 Coverage
The last years data networks such as the internet and local area networks have been growth rapidly and new technologies are constantly developed. So companies but also and simple users take advantage of the existing data networks for multimedia communications, instead of using the traditional telephone network. This explosive growth of multimedia made necessary the development of a broad and flexible standard such as H.323  .
Flexibility: H.323 protocol provides many services and for this reason it may be the solution for the simple user, the company and also in the area of entertainment with the same efficiency. As new technologies being constantly introduced in multimedia communications flexibility allows H.323 to adapt with no significant effort.
Standardization: At first many companies in front of the rapid growth of multimedia communications design hardware based on proprietary protocols. This had the effect of malfunction when products from different companies had to communicate with each other. But because of the design of H.323 and its popularity most vendors based their products on it, an important factor which in turn made this protocol to be adopted with even higher rate as a solution for multimedia communications.
Internetworking: As mentioned earlier the success of H.323 based on its ability to support interoperability between traditional networks (SCN) and data networks . Users in those different networks can communicate with H.323 users by adopting the appropriate protocol based on the network's underlying technology . This also allows companies to easily upgrade from traditional network to data network without encounter serious problems.
Integrated services: H.323 standard provides, among others, the framework for expanding its support for additional features like email, fax, voice mail even acting as call center. A few services have been already integrated in H.450x protocol (such as call transfer and call forwarding). Other services could be added in the future based on the needs of each enterprise. This is the source of H.323's popularity and strength. It was designed to be flexible and adaptable with future technologies by using its integration ability.
1.4 H.323 Components
H.323 suite of protocols describes the elements needed in order for the protocol to provide multiple simultaneous multimedia connections. These elements are very important and with the proper use can boost the efficiency of H.323. The elements can be divided in four categories  . These categories are:
- Multipoint control units (MCUs)
As H.323 defines, the terminal  is the element that makes real time two way communication possible between the units of the network, which could be another terminal, gateway or a multipoint control unit. Data traffic between those units can be consisted by audio, video (fore example moving colour pictures) or data. By their design terminals must provide at least voice capability, supporting G.711 audio codec, while data and video are not compulsory. Based on the protocol each terminal must provide:
- H.245 in order to communicate their capabilities and establish channels
- H.225 for call synchronization and establishment
- RAS for resource allocation, admission control and status information
- RTP/RTCP for time stamps, sequence numbers and feedback
The gateway  is an optional component as defined by H.323 recommendation that act as an intermediate unit for connectivity with other endpoints in different networks (for example, between traditional and data networks). Based on this characteristic gateways are not needed when the calls are established between endpoints in the same network. Apart from the translation, the gateways can establish or tear down connections at both data networks and switched circuit networks, which make this element essential for interoperability between different networks. Generally, the role of this element can be thought as a bridge between users in different networks. An endpoint can send and receive data using different gateways. This characteristic introduces flexibility in the network.
A gatekeeper , as with gateways, is not compulsory based on H.323 recommendation, nevertheless provides the H.323 endpoints with call control services. This H.323 element act as the coordinator of the network operation. Therefore, in the presence of those elements in the H.323 network, the clients are forced to use the services that are offered by these gatekeepers. More than one gatekeeper can be used based on the recommendation and can cooperate with each other for maximum results. According H.323 protocol gatekeepers must also support:
- addressing, authorization and authentication
- bandwidth management, accounting, billing
- call routing services
Multipoint Control Units
As its name implies the MCU  is the H.323 element that allows for more than two users to communicate simultaneous providing the conference feature of H.323 protocol. The MCU may establish a point to point connection but if needed it can upgrade it to multipoint without having to tear down the connection. This operation is what makes MCU so important for H.323. Another operation of the MCU is the resource allocation for the conference, also it negotiates between the terminals in order for them to conclude on the audio or video coder/decoder and finally in some cases, if needed take charge of the media stream.
All these four categories of H.323 elements are considered discrete units, but H.323 recommendation does not restrict the combination of these characteristics into a single unit, something which many vendors take advantage in order to develop products with multifunction operation.
1.5 H.323 Protocol Suite
H.323 protocol is actually a suite of protocols combined in order to support all kind of data communication, such as voice and video, through the existing data networks. H.323 was designed to work over the transport layer so it can be applied independently the underlying network. The H.323 suite is consisted from the following  :
- audio codecs
- video codecs
- H.225 registration, admission, RAS
- H.225 call signalling
- H.245 control signalling
- H.235 security protocol
- Real time transfer protocol (RTP)
- Real time control protocol (RTCP)
An audio CODEC on the transmitting H.323 terminal is used in order to encode for transmission the audio signal that is detected by the microphone and transmitted from the source to destination terminal and be decoded in order to be repeated by the speaker. Because audio is the basic and most common service that is required by the H.323 protocol, all terminals in the H.323 network must provide at least one audio CODEC, as defined by G.711 recommendation (audio coding at 64 kbps). Based on the current needs additional codecs may be used according ITU-T releases.
A video CODEC on the transmitting H.323 terminal is used to encode video from a camera, in order to be transmitted from the source to destination terminal at which it will be decoded and sent to the video display. Because as defined by ITU-T recommendation supporting video is not compulsory the use of video codecs is also optional. Nevertheless if the H.323 terminal is to support video conferencing it must first of all provide video encoding and decoding based on the H.261 recommendation .
H.225 Registration, Admission, and Status
H.225  is the protocol that H.323 network elements use in order to communicate with each other. This communication includes network resource status, connection information, registration status and admission control. H.225 creates a separate independent connection which must first be established in order for other operations take place. It has a very important function because all the feedback of the H.323 network can be communicated by this protocol.
H.225 Call Control Signalling
In order for communication to be achieved between users in the H.323 network the H.225 protocol is used for call setup. This is accomplished by exchanging messages that defined by the H.225 protocol. In order for these messages to be exchanged without problems H.225 uses a separate channel which can be established between all the H.323 components.
In figure 2 we can see the call setup in H.323. First the source endpoint sends an ARQ message to the gatekeeper requesting to connect with the destination endpoint, indicating the required bandwidth and the destination endpoint's name. The gatekeeper responds with bandwidth requirements and transport address. Then the source endpoint sends a setup message in the transport address (destination). The destination reply with call proceeding (if accepts the call) or with release complete. Then the destination request requirements from the gatekeeper and if it acquires them it alerts the source and sends a connect message to complete the setup.
H.245 Control Signalling
H.245   plays an important role in the H.323 network because it is used to exchange control messages between the network's components along with essential information about the connections. In the following lines the overall operation of H.245 is described:
- Capabilities exchange
- Setup and tear down of channels
- Flow control messages
- General commands and indications
In figure 3 we can see the operation of H.245. The source sends its transport address and capabilities to the destination (TerminalCapabilitiesSet). Then the destination acknowledges (TerminalCapabilitiesSetAck) the received message and sends its own capabilities (TerminalCapabilitiesSet). When the source acknowledges the last message the H.245 channel has been established. Then the source sends a request message to open a logical channel (OpenLogicalChannel) along with the type of data and the transport address. The destination responds with a same request and an acknowledgement when it is ready to receive data. After it receives the message the source can start sending data through the channel.
H.235 Security protocol
For securing procedures such as control, signalling, multimedia communication and data conferencing (audio, video), H.235  security protocol is used. It is the hurt of H.323 security mechanism. It will be described in more depth in chapter 3.
Real time Transport Protocol (RTP)
Real time Transport Protocol (RTP)  is used when transmission of time sensitive data such as voice and video in real time must be achieved. In order to transmit this kind of data UDP transport layer protocol is used. The problem with UDP is that it does not guarantee that data will reach the destination in time or intact. For this reason RTP adopts several features such as sequence numbers, time stamps and checksum computation. The advantage of using RTP is that it can work with other transport layer protocols other than UDP.
Real Time Transport Control Protocol (RTCP)
RTP and RTCP  are almost identical in their operation except of the data that have to transfer. Exactly as RTP is responsible to transmit real time voice and video RTCP is intended to transmit control information. It is very important for the operation of the H.323 network because of the feedback that provides in order for the network elements to adjust with the condition of the network.
SIP (Session Initiation Protocol)
1.6 SIP Overview
Another solution for real time multimedia communication was introduced by IETF and is known as Session Initiation Protocol (SIP)  . This standard despite the fact that was came after the successful H.323 protocol managed to be world wide accepted as a VoIP protocol. Its strength is based on its function which was designed to work over the application layer of the OSI model . This design made the use of SIP efficient for most of the existing data networks, since all these networks are based on the OSI model. Examples of such kind of networks are LANs, MANs, WANs and of course the internet. Some of the features of SIP are described in the following lines:
- SIP maintains detailed tables with information for the network such as addresses and names in order to achieve call setup fast with any user in the network.
- One of the key features of SIP is the Session Description Protocol (SDP)  which allows SIP to find out what type of media can, the involved parties, support. SDP ensures that all participants in a conference have no compatibility issues. So based on the information that SDP provides SIP establish the connection only when all the participants can support the media, saving network resources.
- Another capability of SIP is that allows the user which initiates the call to know whether or not the destination is available, for any reason, before establishing the connection. With this feature SIP free the network from unnecessary connections which could consume bandwidth and network resources.
- SIP also manages to alter the connections without having to tear down them. Call forwarding and redirection are examples of such managing. SIP can provide these services without users experience any changes in their status. This feature can be especially useful for conferences because they can be more flexible since SIP can add or remove parties without the rest participants have to stop the communication.
1.7 SIP Architecture
The SIP protocol describes two main elements for the network  the user agent and the network server:
- SIP User Agent (UA) can be considered the endpoints of the network. Both hardware and software devices implementing SIP (such as an IP phone) can be considered as user agents. UA consists of two basic components:
- User Agent Client (UAC) the component that initiates the call.
- User Agent Server (UAS) the component that serves the call.
- The SIP Network Server is responsible for managing signalling and call establishment. It maintains detailed tables with information for the network such as addresses and names in order to achieve call setup. Three types of such servers exist:
- SIP Register Server. The role of a register server is to make a network map based on the user registrations the addresses and other related information such as domains. With these mappings SIP is able to establish connections between all the users of the network. This information can be exchanged between the servers for redundancy and faster access.
- SIP Proxy Server. For SIP to be sure that the call requests will reach their destination, it uses the nearest proxy to forward these requests to other proxies across the network, creating a search tree which ensures that the requests will not be lost. These proxy servers can be divided based on their operation into stateless and statefull. With the stateless operation the server does not maintain any information once the call request is sent and the statefull operation in which proxies maintain knowledge of passed requests in order to achieve faster calls setup.
- SIP Redirect Server. When a SIP user wants to make a call but the destination address remain unknown this type of server redirect him in order to try another server which might have or know where to find the specified destination address.
1.8 Call Establishment
In order for SIP to establish a call the following messages   are being exchanged between the network elements:
- INVITE: This message start a connection.
- ACK: This type of message confirms a connection as a reply of INVITE.
- BYE: Is used to terminate a connection.
- CANCEL: Cancels an INVITE request.
- OPTIONS: For exchange capabilities.
- REGISTER: The message is used for address allocation.
For establishing a connection in a SIP network the user that initiates the call transmits an INVITE message to the redirect server for acquiring the destination address. The next step is to communicate the redirect server with the register server to acquire the destination address from its database. Then the redirect server transmits the address back to the user which acknowledges upon the receipt. By using the destination address the user is able to transmit a call request to the recipient which responses to this request. When the caller receives this response transmits an acknowledgement. After the connection is established RTP  take over the transmission of data. When the transmission of data is over the called user sends a BYE message to terminate the connection and the caller acknowledges this message.
1.9 SIP Implementations
One of the main reasons for its popularity and wide acceptance is that SIP can be applied as a solution in several cases. The flexible design makes it ideal to be adopted by devices such as IP phones, media Gateways, internet call centers and application servers. In the following section these implementations  will be described in more depth.
Unified Communications: Except of the flexibility that SIP protocol provides for its connections it can also used to unify many components and features to a single application. For example when SIP is used web interfaces can have multiple implementations by using multimedia plug-ins along with extended managing capabilities of profiles and connections. Also integrated existing URL and DNS  services are being used for maximum compatibility.
Unified Messaging: With this implementation the users can be free from the use of several different devices, each one for different use and application. For example with this feature telephony, email, fax and other communication technologies can be integrated into a single and portable device that can allow the user to be more flexible.
Directory Services: This feature allows the administrator to have a complete knowledge about the network's resources and devices (such as printers, PCs, servers and other network elements). This database can be configured to be accessed from any user in the network that wants to find a certain device according the services that wants to use. Finally by using this database the administrator can apply policies based on time and rights restrictions.
IP-PBX functionality: PBX (Private Branch Exchange)  implementation is another important feature which allows enterprises to use this technology for their corporate network. It also allows for companies to migrate from traditional technologies to VoIP without having compatibility or interoperability issues.
Mobile phones / PDAs: Because of its simplicity and that does not need many requirements in order to work SIP is the ideal solution for mobile devices. The user can perform the same actions with these devices as that would perform with traditional equipment. Especially when SIP is combined with mobile devises that support wireless access in data networks allows the users to have access in even more services. This portability made the protocol very popular and used by many vendors that design products not only for professional use but also for simple users.
Desktop Call Management: As its name implies this feature allows managing multimedia services through other computer applications. It is a very important feature because allows vendors to take advantage of existing popular applications to integrate the SIP protocol. This makes users to adapt faster with this technology since it can be accessed through well known programs.
Chapter 2: VoIP Security
Voice over IP technology allows for existing data networks to take over voice calls which offers increased features and productivity along with significant cost saving. All these advantages make this technology very attractive but it has a disadvantage which is the attacks on these networks. Data networks suffer from hacker attacks which have many ways to steal or alter data. The existing security mechanisms that protect with efficiency these networks cannot be used, at least in their current form, when VoIP is used. Many issues, such as type of attacks and security, need to be addressed before VoIP can be implemented.
2.1 Attacks on VoIP
As mentioned earlier attacks in these networks can take many forms. Some attacks are more passive and just try to acquire important information while other attacks are more aggressive and can cause more damage to the data or to the entire network. Some of the most frequent types of attacks are eavesdropping, spoofing, denial of service, call redirection, and replay attacks  .
Eavesdropping is one of the most common attacks were hackers interfere in to the communication to steal VoIP packets, in order to hear the conversation. This type of attack can be easily performed using network analyzers that can be found on the web, which can sniff and capture packets that can be used for transforming VoIP traffic into wave files. These wave files can then be saved locally on the computer and play them back with a media player and hear the conversation. This type of attack cannot affect the entire network but usually the domain or subnet in which the attack is taking place.
Replay attacks allow hackers to retrieve all kind of information related to the network. In order to perform this attack the hacker steals a data packet which in turn must transmitted back to the network. As a result of this transmission is for more traffic to be produced acquiring with this way additional information about the entire network.
With packet spoofing hackers have the ability to change the source address of a packet in order for the recipient to think that the packet was transmitted from a trusted member of the network and allow the delivery. Along with the source address the caller ID number can also be changed when VoIP packets are sent. Many free programs exist that allows you to spoof your phone number. An important issue with spoofing is how the identity of the participants can be protected.
Call redirection is happening when the hacker alters the call in order to take another route than the original. This redirection can cause improper use of the network's resources and can affect its performance. Also can be the cause of other type of attacks since the network have been breached.
Denial of Service  is one of the most dangerous attacks that hackers can perform because the network is overwhelmed with unnecessary traffic that consumes bandwidth and network resources. One of the first services that are affected is VoIP which is sensitive in network changes. Also the attack can be more VoIP specific by using VoIP messages that create and tear down useless connections. Such messages include CANCEL, GOODBYE and PORT UNREACHABLE. This has a negative impact on the VoIP conversation since calls or hang up procedures cannot be completed. The problem with DoS is that not only the VoIP service is compromised but also the entire network.
Message alteration: Message alteration is a very serious attack because although the message doesn't have anything suspicious it's not what the originating source has sent. The attacker could have easily alter the content of the message. This attack can be blocked if we use encryption with one way hash function before sending the message.
From the above information for each attack, they can be categorized based on their way that they affect the network. Eavesdropping and Replay attacks affect the confidentiality in the network. Packet spoofing and message alteration affect the integrity of data transmitted into the network. Call redirection affects both confidentiality and integrity. Finally Denial of Service compromises the availability of the network. These three characteristics confidentiality, integrity and availability must all be addressed in order for the network to be secure.
2.2 Security Measures
To protect the network against hacker attacks several mechanisms exist such as encryption, firewalls, virtual LANs and network address translation. But these security measures come with a price in the performance of the network that can affect VoIP. In the next sections it will be described how these measures can be adapted when VoIP is present in the network.
Encryption is required in order to protect network's privacy and to authenticate the messages. Two main encryption methods that are used are Transport Layer Security and IPsec . These methods can adopt several types of encryption algorithms like DES, 3DES, AES, RC4 and RC5 . This wide range allows for flexibility according the network's needs. Each algorithm provides a certain level of security but the bigger the security, more the network's performance will decrease and time delays will introduced by the processing.
Almost all data networks use firewalls   in order to filter traffic coming in and out of the network. This mechanism is the first line of defence against attacks. But when firewalls have to cope with VoIP traffic some issues are emerged. These issues are the time delays that introduced into the network and the thousands of ports that must open and close in order for VoIP to work properly.
Virtual LANs (VLANS)  can be used in the network in order to isolate domains. This will make it more difficult for the attacker to hack the entire network. When a part of the network experience problems the other VLANs will be working without issues. So traffic could be routed through those VLANs that have not been attacked and the services will continue to work. Virtual LANs actually work as sealed rooms in a ship which prevent the ship from flooding.
Network address translation   is another typical feature in the network. NAT provides a method to substitute private IP addresses with addresses that can be used outside the network. NAT can also act as a security measure since internal addresses are staying secret. Along with these benefits, NAT can have a negative impact on the VoIP operation. This alteration of private to public and back to private addresses can cause problems to VoIP operation because of its lack to follow this contiguous address-port assignment.
All these security measures can help administrators to protect their network from unauthorised access and attacks. But these measures can also hold back the network's performance affecting sensitive services such as VoIP. It is a very important issue because users not only need security in their network but also QoS, an important aspect that will be examined in Chapter 3.
2.3 H.323 Security
For providing secure communications in the H.323 network H.235 release propose several features that when combined properly provide maximum efficiency with the least cost. Some of these features are authentication, integrity, privacy and non repudiation . In the VoIP network gatekeepers are responsible to authenticate users and providing non repudiation in order for users that take part in a conversation cannot deny their participation. Encryption can be adopted in order to provide privacy along with integrity. Two main encryption methods that are used are Transport Layer Security (TLS) and IP Security (IPSec) . The basic characteristic of H.235 is that it recognises a person instead of a device. There are three kinds of security profiles in H.235:
- Security profiles which are based on a simple password.
- Profiles which make use of digital certification and depends on public key infrastructures.
- Profiles which combines passwords, digital certifications and public key infrastructures.
H.235 recommendation provides many encryption algorithms with various options, depending on the security requirements . The structure of H.235 is described below:
- Call signalling channel can be secured by IP Security (IPSec) or Transport Layer Security (TLS).
- Clients can be authenticated during the initial call setup or in the process while securing H.245 channel or by exchanging certificates on H.245 channel.
- Encryption algorithm on media channel is determined by capability negotiation mechanism.
- Initial distribution of key is done by H.245 commands, such as OpenLogicalChannel, OpenLogicalChannelAck.
- The distribution of key can be protected either by using H.245 channel as a private channel, or by encrypting the key.
H.245 message and H.225 signalling can be protected by using TLS on transport layer, or IPSec on network layer. VoIP packets transferred by RTP can be protected by encryption and authentication. Also H.235 supports security protection for the H.225 terminal to Gatekeeper signalling (RAS) .
2.4 H.323 Security Issues
Firewalls cause the majority of the problems for VOIP networks using H.323 . The use of stateless firewalls enhances the presence of these problems since this type of firewalls can't control this kind of traffic. H.323 protocol uses dynamic ports for its traffic. Stateless firewalls find it difficult to track down UDP queries and replies. The solution in this problem is to manually open ports in the firewall in order for H.323 traffic to get pass through it. This practice can cause problems to the security of the network because such an implementation would need to leave thousands UDP ports and several H.323 specific TCP ports wide open. For this reason statefull firewalls that can control H.323 traffic must be used in the VoIP network. This type of firewalls can let VoIP traffic to get pass through them by open and close the ports, that H.323 protocol requires, dynamically providing a solution to the grate number of ports which a stateless firewall would have left open.
Even if a statefull firewall is used instead of a conventional firewall, it can still experience problems in managing H.323 traffic that pass through it. H.323 traffic is encoded in a binary form which can be found also on ASN.1 . The use of ASN.1 makes difficult for statefull firewalls to manage H.323 traffic. So along with firewalls specialized hardware (such as gateways) can be used in order to compensate the problems which VoIP traffic creates upon these firewalls. The drawback of using this VoIP aware hardware is the latency that is introduced into the network by this hardware.
In addition with the firewall problem, NAT is coming to add further problems in VoIP networks that use the H.323 protocol . NAT works by translating the private address of the VoIP message into a public in order for VoIP traffic to travel outside the user's network. The problem is that the use of NAT makes the conversation more complex to be managed by firewalls and VoIP hardware, since the actual (private) address must be found and replaced into the VoIP message at the destination before it reaches the recipient.
2.5 SIP Security
When SIP is used in an IP network can be exposed to a broad range of different threats such identity problems and threats originated from the internet. Displaying the right ID of a caller is an important requirement for the phone companies. The main reason that makes internet not safe is that there has never been enough security policies and equipment to keep a network totally safe from attacks originated from the web.
In order for a SIP based network to be safe, it must confront two different types of threats . These two types are internal and external threats. The external threats are attacks generated by an attacker who is not participating in the actual SIP based communication. The external threats are more likely to happen when the information crosses boundaries of networks which involve a third-party or other untrusting networks.
The other type of threat is the internal threat. This is normally a threat launched by a SIP session participant. Because the SIP-session participant is generating the attack the participant can no longer be trusted. Firewalls are designed to protect the network from external attacks. For this reason attacks from the inside are more complex and it is much more difficult to find the source of the attack in order to repel it .
A number of mechanisms exist in order to provide security for the SIP protocol. Many of these mechanisms exist as part of the SIP protocol and others as separate modules. These security mechanisms which are described below can be found in more depth in RFC 3261 .
- Digest Authentication
- S/MIME Usage within SIP
- Confidentiality of Media Data
- TLS usage within SIP
- IPsec usage within SIP
- Security Enhancements for SIP
The Digest authentication  mechanism, described in RFC 2617, is based on the function of the MD5 algorithm which calculates the checksum of certain parameters such as the user name and password, the HTTP method that is used and the URI (Universal Resource Indicator, which means secure SIP, like Https ). With this mechanism, the password is never sent out in plain text, which reduces the chances for a hacker to acquire the password at least without significant effort (time consuming to break the password).
S/MIME mechanism . S/MIME itself specifies mechanisms in order to protect the integrity protection and the encryption of the S/MIME content. Mechanisms such as public key distribution, authentication and integrity protection, confidentiality of SIP signalling data and tunnelling are some of the measures that S/MIME adopts. S/MIME can also be considered as the successor of PGP (Pretty Good Privacy).
SIP protocol from its own does not provide encryption for media data. In order for the media data to be confidential SIP works with SDP to provide encryption. An alternate solution for data confidentiality is by using SRTP  . The drawback of using SRTP is the overhead that adds at the process because of the encryption that perform on the media streams. Something that SDP is not able to do. Again the choice between security and processing time must be made.
TLS is specified by RFC 3261  for the use on proxies, redirect and register servers because it provides the mechanisms to the SIP conversation, to protect the messages from data loss, hacker attacks and confidentiality. Because of the security issues TLS needs a transport layer protocol that can support these features. Obviously UDP cannot provide the support for the features of TLS. But the other transport layer protocol, TCP, can. The disadvantage of this protocol, is that the use of TCP introduces delay in a time sensitive SIP communication.
For securing SIP communication at the network layer IPsec protocol is the most adaptive solution, because of its ability to provide the security features, mentioned earlier, using both TCP and UDP as the underlying transport layer protocol. As shown earlier TCP is used when security and quality is needed between the communicating parties. But when time and delays are essential in order for the communication to take place, then UDP can be used since the security features can be provided by IPsec. One of the main mechanisms that IPsec uses for providing security is the IKE (Internet Key Exchange) protocol. IKE works by exchanging encrypted keys and security parameters between the involved parties. By offering security at the network layer IPsec manage to give security at SIP communication while delays are not significant.
Several drafts concerning Security Enhancements for SIP are being discussed by IETF, which focus on finding a universal security solution for various SIP scenarios. Several drafts have been released related with the support for authentication, integrity, and confidentiality in SIP:
- SIP Authenticated Identity Body (AIB) 
- SIP Authenticated Identity Management 
- S/MIME AES Requirement for SIP 
2.6 SIP Security Issues
As we saw with H.323 protocol, when SIP is adopted as the VoIP protocol in the network some security problems are emerging. These security problems are related to the firewalls used to protect the network and also with the use of NAT. As with H.323, SIP protocol need to use ports through the firewall for its traffic. With simple stateless firewalls thousands ports are opened, leaving weak points in the network for hackers to attack. To perform this task manually is very difficult and time consuming, if not impossible. So as with H.323 statefull firewalls must be used that can follow SIP traffic while opening and closing the ports that SIP needs. The second security problem, NAT, causes the same problems to SIP communication with H.323. The mapping between private and public addresses makes difficult for the network's components (hardware and/or software) to follow the conversation. But because of the important role that NAT plays in the world of internet, is something that its use cannot be avoided. In the following section solutions for the NAT problem will be presented.
2.7 Solution to the NAT problem
As mentioned in the previous sections H.323 and SIP protocols encounter problems when used with NAT in the network. This problem origin from the fact that NAT changes the private IP addresses, which included in the header of VoIP messages on both protocols, with public. This operation leads to the conclusion that the third parties which perform the operation of NAT must be secure in order for the integrity of the conversation to be preserved. In the following lines some techniques for solving the NAT problem will be presented .
Simple Traversal of UDP through NATs (STUN). The use of STUN allows the software that handles VoIP conversations to detect what kind of firewalls and NATs are intervened between the communication parties and take the necessary actions. STUN is kept simple and light which means that it can be easily adopted without changing the network's structure. Also it provides maximum compatibility with a wide range of NATs and firewalls, something which makes it flexible.
Traversal Using Relay NAT (TURN). TURN protocol is similar with STUN regarding its structure and the way it works. Actually TURN came to complete STUN, as it was designed to do what STUN couldn't. TURN works like a database with address and port mappings used by both h.323 and SIP. So now a secure party exists in the VoIP communications which keep track of address and port mappings. The security derives from usernames and passwords which are needed in order to logon to the TURN party and obtain the information.
Interactive Connectivity Establishment (ICE). ICE is a protocol that was designed be IETF and describes the operations, that both parties of the VoIP communication, take in order to outrun the limitations that NAT introduces into the network. It can be thought, in a certain degree, as the combination of these two protocols, STUN and TURN, with only difference that the two parties communicate with each other without having a third interfering in the conversation. So the risk of having an unreliable knot in the network is eliminated.