Real-Time Systems

A real-time application (RTA) is an application program that functions within a time frame that the user senses as immediate or current. The latency must be less than a defined value, usually measured in seconds. Whether or not a given application qualifies as an RTA depends on the worst-case execution time (WCET), the maximum length of time a defined task or set of tasks requires on a given hardware platform [1]. The use of RTAs is called real-time computing (RTC) [2]. Real-time software systems contain computations that depend not only on how they are done but also when they are done. Increasing software systems' complexity makes it difficult to ensure the application will complete tasks correctly and on-time. This proves to be an added responsibility.

A metric is a measure of some property of an entity. You cannot control anything unless you measure it. The objective of this thesis is to discuss the design metrics for the software phase of real time applications.

A real-time system is one in which the correctness of the computations not only depends on their logical correctness, but also on the time at which the result is produced. In other words, a late answer is a wrong answer.

As an example of a real-time system, consider a computer-controlled machine on the production line at a bottling plant. The machine's function is simply to cap each bottle as it passes within the machine's field of motion on a continuously moving conveyor belt. If the machine operates too quickly, the bottle won't be there yet. If the machine operates too slowly, the bottle will be too far along for the machine to reach it. Stopping the conveyor belt is a costly operation, because the entire production line must correspondingly be stopped. Therefore, the range of motion of the machine coupled with the speed of the conveyor belt establishes a window of opportunity for the machine to put the cap on the bottle.

2.1 Specific Features of Real-Time Systems

The major characteristics of real-time systems include the following [6] :

Timeliness is important. The system performs its functions within the specified time-limit.

It ought to be reactive. The system continuously responds to events from the external environment that drives the “execution” of the system.

The concurrent execution of threads of control is vital, where different parts of the software run in parallel.

It usually has very high requirements on most of the non functional requirements, such as reliability, fault tolerance and performance.

It ought to be non deterministic.

It also ought to be deadline-driven.

2.2 Period VS Deadline

A window of opportunity imposes timing constraints on the operation of the machine. Applications with these kinds of timing constraints are considered real time. In this case, the timing constraints are in the form of a period and deadline.

The period is the amount of time between each iteration of a regularly repeated task. Such repeated tasks are called periodic tasks.

Suppose bottles pass under the machine at a rate of five per second. This means a new bottle shows up every 200ms. Thus, the period of the task is 200ms. Note that whether bottles pass once per second or 100 times per second, it doesn't change the fact that this is a real-time system. Real time does not mean fast; it means that a system has timing constraints that must be met to avoid failure.

The deadline is a constraint on the latest time at which the operation must complete. Suppose the window of opportunity is 150ms. The deadline is then 150ms after the start time of the operation. In our example, the start time is defined as the moment the bottle enters the range of the machine. This bottle example has physical constraints, namely the speed of the conveyor belt and the machine's range of motion, that dictate the period and deadline of the task.

In many real-time systems, the period is a design parameter. Consider a cruise control mechanism on an automobile. The basic operation of cruise control is to keep the speed of the vehicle constant. Suppose the driver selects 60mph as the desired speed. If the vehicle is going slower than 60mph, then the embedded computer sends a signal to the engine controller to accelerate. If the vehicle is going faster than 60mph, it sends a signal to decelerate. A question to ask is: how often does the computer check if the current speed is too slow or too fast? The answer is called the control rate (or frequency). It is defined by the control system designer, who will try to find a rate that is fast enough to meet specifications, but not so fast that it adds unnecessary cost to the system. The period is then the reciprocal of the rate (that is, period = 1/rate). The deadline is typically the beginning of the next cycle of a periodic task, because, to start the new cycle, it needs to be finished with the old one.

Communication systems also have real-time constraints. Suppose a multimedia application needs to compress video data at a rate of 30 frames per second. Before it processes a new frame, it needs to finish processing the old frame, otherwise data might get lost in the form of dropped frames. The period of such a task is the frame rate. Processing the old frame must complete before processing on the new frame can begin. Therefore, the deadline is the beginning of the next frame. [7]

2.3 Aperiodic Tasks

Not all real-time tasks are periodic. Aperiodic tasks, also called aperiodic servers, respond to randomly arriving events. Consider anti-lock braking. If the driver presses the brake pedal, the car must respond very quickly. The response time is the time between the moment the brake pedal is pressed, and the moment the anti-lock braking software actuates the brakes. If the response time was one second, an accident might occur. So the fastest possible response is desired. But, like the cruise control algorithm, fastest is not necessarily best, because it is also desirable to keep the cost of parts down by using small microcontrollers. What is important is for the application requirements to specify a worst-case response time. The hardware and software is then designed to meet those specifications.

Note that many aperiodic tasks can be converted to periodic tasks. This is basically the same transformation as converting an interrupt handler to a polling task. Instead of reacting to an external event the moment it occurs, the software polls the external input regularly, perhaps tens or hundreds of times per second. If the awaited event is detected, the appropriated computation is carried out.

2.4 Hard or Soft?

A real-time system can be classified as either hard or soft. The distinction, however, is somewhat fuzzy. As illustrated in 1, the meaning of real-time spans a spectrum. At one end of the spectrum is non-real-time, where there are no important deadlines (meaning all deadlines can be missed). The other end is hard real-time, where no deadlines can be missed. Every application falls somewhere between the two endpoints.

The Real-Time Spectrum

A hard real-time system is one in which one or more activities must never miss a deadline or timing constraint, otherwise the system fails. Failure includes damage to the equipment, major loss in revenues, or even injury or death to users of the system. One example of a hard real-time system is a flight controller. If action in response to new events is not taken within the allotted time, it could lead to an unstable aircraft, which could, in turn, lead to a crash; by no standard is this acceptable.

In contrast, a soft real-time system is one that has timing requirements, but occasionally missing them has negligible effects, as application requirements as a whole continue to be met. Consider again the cruise control application. Suppose the software fails to measure current velocity in time for the control algorithm to use it. The control algorithm can still use the old value, because the amount that the velocity would have changed between the last sample and this sample is so small that it can still operate correctly. Missing several consecutive samples, on the other hand, could be a problem, as the cruise control would likely stop meeting application requirements because it is not able to maintain the desired speed within a proper error tolerance [6].

2.5 Predictable VS Deterministic

Two more terms often used to describe real-time systems are predictable and deterministic. These terms are related, but because they are often interchanged or used improperly (as compared to their official definition in real-time systems theory), they are often a source of confusion.

Real-time systems researchers generally use the term predictable to refer to a system whose timing behavior is always within an acceptable range. The behavior is specified on a system-wide basis, such as “all tasks will meet all deadlines.” Generally, the period, deadline, and worst-case execution time of each task need to be known to create a predictable system. An appropriate scheduling algorithm with a corresponding schedulability analysis is used to ensure a system is predictable.

A deterministic system is a special case of a predictable system. Not only is the timing behavior within a certain range, but that timing behavior can be pre-determined. For example, a system can be designed with pre-allocated time slots for each task. Execution for each task occurs only during those time slots. Such a system must have execution time for every task known, as well as no anomalies that might cause deviation from the pre-determined behavior. That is, of course, difficult to achieve. Fortunately, determinism is not essential to build predictable real-time systems.

2.5 Time-triggered VS Event-triggered

There are two fundamentally different principles that determine the activation of tasks in a real-time system, event-triggered and time-triggered. In event-triggered systems, all activities are activated in reaction to relevant events external to the system. When a significant event in the outside world happens, it is detected by some sensor, which then causes the attached device (processor) to get an interrupt signal. For soft real-time systems with lots of computing power to spare, this approach is simple, and works well. The main problem with event-triggered systems is that they fail under heavy load conditions, i.e., when many events are happening at once. Event-triggered designs give a faster response at low load but more overhead and chance of failure at high load. This approach is more suitable for dynamic environments, where dynamic activities can arrive at any time.

In a time-triggered system, all activities are activated at certain points in time that are known as a priori. Accordingly, all nodes in time-triggered systems have a common notion of time, based on synchronized clocks. One of the most important advantages of time-triggered systems is the deterministic temporal behaviour of the system, which eases system validation and verification considerably. Time-triggered systems are suitable in static environments in which the system behaviour can be completely known in advance.

2.6 Design Lifecycle of a System

A software development life cycle is the process followed when developing information systems, from the initial stage of information gathering, all the way through to maintenance and support of the system.

The literature mentions an inverse waterfall sort of model to describe the design of a typical system. This is explained comprehensively in detail in a study guide for Information Systems [10] and also in [8] and [9].

The V-Design Model of a General System [10]

This process is known as the "V" model of systems development.[10] At each testing stage (see diagram, above), the corresponding planning stage is referred to, ensuring the system accurately meets the goals specified in the analysis and design stages.

If seen, analytically, we will realize that any system when needs to be designed goes through the following phases [12]:

The Birth of the Concept

This is the first phase of design and as the name suggests, it marks the beginning of identifying the needs of the end-user. Its major steps include the following:

Need Identification

Feasibility Analysis

System Requirements Analysis

System Specification

Conceptual Design Review

We will outline each of these sub-phases of conception. Need Identification enlists the needs of the user. This is usually given in plain English and is noted down by the person in charge of collecting information about what are the user's demands. Once these needs are identified, the software engineer heads towards performing two important analyses: the Feasibility Analysis and System Requirements Analysis. The System Specification comes next which highlights the technical requirements of the system to be developed. As the technical requirements are chalked out here, it helps in the design later on. The last part of this phase is the Conceptual Design Review. This is important because any good document or design is incomplete without performing a review of the thing devised. This review enables the outline produced here to be re-analysed from the perspective of an expert.

Prelim System Design

This is the second phase of design and it marks the beginning of identifying the steps involved in carrying out the system design. Its major steps include the following:

Functional Analysis

Requirements Allocation

Detailed Trade-Off Studies

Synthesis of System Options

Preliminary Design of Engineering Models

Development Specification

Preliminary Design Review

In this phase subsystems are identified which will be responsible for the structure of the system. Then the interfaces between these subsystems are identified. The testing requirements and the evaluation criteria are also laid down here. To mark the end of this phase a Development Specification is produced which equips us with enough material to head towards the detailed design and specification phase.

The Detailed Design and Development

This is the third phase of design and it marks the identification of the details of design and development. Its major steps include the following:

Detailed Design

Detailed Synthesis

Development of Engineering and Prototype Models

Revision of Development Specification

Product, Process and Material Specification

Critical Design Review

This phase simply leads all the things already defined, like the subsystems and the interfaces, into a detailed format. The envisaged environment is created and also evaluated, judged for the maintenance cost that can be incurred. How much and what kind of support is needed is also determined at this phase. If the developer needs to make some changes to the requirements (in collaboration with the user) then this phase is the right time to do it. For instance, as in the waterfall model, the developer re-traces back to the specification, if need arises.

The Much-Awaited Construction

This is the fourth phase of design and it allows the actual product to be assembled by the programmer/engineer. Key steps within the Production/Construction stage include:

Production/Construction of System Components

Acceptance Testing

System Distribution and Operation

Operational Testing and Evaluation

System Assessment

In this phase the system is developed in the true sense and any changes are marked. System assessments are performed to remove errors and also to make the system adaptable to change.

Utilization and Support

This is the fifth phase of design and it caters to the identification of ways and means to ensure support for the software being developed and also to see how it will be utilized. The system is also checked to see if it will operate feasibly in the environment where it will be finally deployed. The important steps in this phase are as follows:

System Operation in the User Environment

Maintenance and Logistics Support

System Modifications for Improvement

System Assessment

Phase-Out and Disposal

This is the last phase of the development lifecycle. The efficiency of the system is tested after installing in the environment where it will be up and running for a long time. Any errors or complaints that surface are catered to. The design engineer looks out for any bugs, any more operations that need to be added-in, matching between operational requirements and system performance and availability of alternative systems.

Waterfall Lifecycle Model, the Spiral Model and all other such models more or less follow this same outline of developing a system. We have seen here the general design strategy and now we head to see how does the design phase differ for Real-Time Systems, as all this high time constraints cannot be catered to by the normal design strategy presented here.
2.7 Design Lifecycle of a Real-Time System

Reactive and real-time systems involve concurrency, have strict requirements, must be reliable, and involve software and hardware components [4].

Reactive systems are computer systems that continuously react to their physical environment, at a speed determined by the environment. This class of systems has been introduced to distinguish them from transformational systems (input, process and output). Reactive Systems include, among others, telephones, communication networks, computer operating systems, man-machine interfaces, etc.

Real-time Systems (RTSs) have reactive behaviour. An RTS involves control of one or more physical devices with essential timing requirements. The correctness of an RTS depends both on the time in which computations are performed as well as the logical correctness of the results. Severe consequences may occur if the requirements of a real-time system are not met. Requirements from an RTS are diverse, ranging from intricacies of interfaces to providing guarantees of safety and reliability of operation.

Real-Time Systems have the following characteristics [6],

They are high in non-functional requirements, viz reliability, fault-tolerance etc.

They are timely, performing within the specified time.

They ought to be reactive.

They ought to handle the execution of threads concurrently.

They ought to be non deterministic.

They ought to be deadline-driven.

These characteristics lead to a difference in the design model, viz each design phase is validated by simulation or verification before going on to the next phase [14].

J. F. Peters and S. Ramanna [13] pointed out the need for a different design strategy for real-time systems by stating that for real-time systems, the original SDLC (System Design Lifecycle) undergoes some changes. They stated that real-time relevant logic must be incorporated at the design phase itself. Thus at the design level it is vital to take into consideration factors which can make the design better via metrics.

2.8 Contrasting the Difference in Design

Ramanna and Peters [13] have further contrasted the designs and given suggestions how to remove the flaws from the orthodox design strategy. This fact is explained diagrammatically below.

2.9 Issues in Real-Time System Design

Designing real-time systems is a challenging task. Most of the challenge comes from the fact that Real-time systems have to interact with real world entities. These interactions can get fairly complex. A typical real-time system might be interacting with thousands of such entities at the same time. For example, a telephone switching system routinely handles calls from tens of thousands of subscriber. The system has to connect each call differently. Also, the exact sequence of events in the call might vary a lot.

In the following sections we will be discussing these very issues [15]...

Real-time Response

Recovering from Failures

Working with Distributed Architectures

Asynchronous Communication

Race Conditions and Timing.

2.9.1 Real-Time Response

Real-Time Systems stand on different grounds than other systems only because they have to respond to external interactions in a predetermined amount of time slot. The response thus received could only be termed to be useful if it would be correct (in value) and within the time slot allowed to it. Delay is simply intolerable and causes the system to fail. What needs to be kept in mind is that both the hardware and software have to be designed keeping in mind the real-time requirements stated. For example, a telephone switching system must respond to thousands of callers/subscribers within the pre-determined time-slot, usually one second or less. To meet these requirements, the sub-systems involved- truncation of call and software communication- have to work in accordance with each other so that the timing requirements can be met. In addition to this all these timing requirements have to be met for any calls set up at any time.

These real-time requirements have to be incorporated very early into the design of the system, in fact right from the architecture design phase these timing constraints are taken into consideration. The hardware and software engineers work in collaboration with one another to achieve these goals. They make the choice of the optimum architecture. The simpler the architecture, the more capable it is to handle the time constraints.

Other things that are taken into consideration are also discussed here. We are asked what kind of processors would be suitable. What kind of speed would help meet time timing requirements? What link speed should be chosen for suitable communication? If the link speeds chosen are not appropriate then queues are built-up and delay can be caused in message transmission. The link speeds should not be more than 40-50% of the total bandwidth.

Another question that is asked is what kind of communication is preferred? Does it have any nodes? What is the CPU utilization speed? The answer lies in choosing powerful and optimum processing components. Both link and peak CPU utilization should be below 50%.

Also the OS suitability has a big question-mark hanging over it. Choosing the right operating system is of utmost importance. Also tasks of critical real-time requirements need to be handled by giving high-priority execution at the operating system level. Also the methodology of pre-emptive scheduling can be used to give importance to the critical tasks. Modes of handling interrupt latency as well as scheduling variance need to be verified at this stage. Scheduling variance refers to predictability in task scheduling times. Interrupt latency refers to the delay with which OS can handle interrupts and schedule tasks to respond to these interrupts. The interrupt latency should be low for the RTOS chosen.

2.9.2 Recovering from Failures

Realtime systems must function reliably in event of failures. These failures can be internal as well as external. The following sections discuss the issues involved in handling these failures.

2.9.3 Internal Failures

Internal failures can be due to hardware and software failures in the system. The different types of failures you would typically expect are:

Software Failures in a Task:

Real-Time Applications cannot rely on the traditional technique used by systems to remove an error. They cannot make a dialogue-box pop-up and then display the error. Nor can they wait for the error to be removed by the user. Real-Time Applications have to use what is known as roll-back conditions, particularly when a task hits a processor exception. The system is simply advised to roll-back into the previous, correctly-functioning saved state. The tasks simply have to be designed to be safeguarded against error conditions. In Real-Time Systems this becomes of crucial events as a series of events in turn trigger another set of events. These new sets of events maybe spontaneously formed and thus all of these cannot be tested in the review section.

Processor Restart:

Most Realtime systems are made up of multiple nodes. It is not possible to bring down the complete system on failure of a single node thus design the software to handle independent failure of any of the nodes. This involves two activities:

Handling Processor Failure: When a processor fails, other processors have to be notified about the failure. These processors will then abort any interactions with the failed processor node. For example, if a control processor fails, the telephone switch clears all calls involving that processor.

Recovering Context for the Failed Processor:

When the processor finally comes back to its old status, it will recover all its states and contexts from other processors in the system. There is always a possibility of inconsistencies between this matching of states amongst different processors. In such cases, the system runs audits to solve these problems.

Board Failure:

Realtime systems are expected to recover from hardware failures. The system must be able to detect and recover from board failures. When a printed circuit board fails, the system notifies the operator about it. Also, the system should be able to switch in a spare for the failed board.

Link Failure:

Most of the communication in real-time systems takes place over links connecting the different processing nodes in the system. Again, the system isolates a link failure and re-routes messages so that link failure does not disturb the message communication.

2.9.4 External Failures

Realtime systems have to perform in the real world. Thus they should recover from failures in the external environment. Different types of failures that can take place in the environment are:

Invalid Behavior of External Entities:

When a real-time system interacts with external entities, it should be able to handle all possible failure conditions from these entities. These conditions include any mishaps or even different situations from the end-user.

Inter Connectivity Failure:

Many times a real-time system is distributed across several locations. External links might connect these locations. Handling of these conditions is similar to handling of internal link failures. The major difference is that such failures might be for an extended duration and many times it might not be possible to reroute the messages.

2.9.5 Asynchronous Communication

Software design can be facilitated by using Remote Procedure Calls (RPCs). In these the designer/programmer has the liberty of calling the procedures on a remote machine, with the same semantics as that of local procedure calls. This technique does sound good and brings great ease, but only for traditional systems. For Real-Time Systems it hardly provides any relief. RPCs work on a query-response theme, whereas Real-Time Systems are more event-based. The communication for Real-Time Systems is more asynchronous in nature.

Real-Time Systems support state-machine based design where multiple messages can be received in a single state. To which state the control has to be transferred next depends on the state of the received message. State-machine models are query-friendly towards Real-Time Systems, though they come with their own set of complexities.

2.9.6 Race Conditions and Timing

Looking at any real-time systems' protocol, simply points out to one factor…timing. Each stage of the protocol has provisions to handle timing separately. Also, each stage of each protocol also attempts to account the timing values for the increasing load. When all these requirements are implemented, timers are used. Timers look-out for the progress of events. If the anticipated event finishes execution, the timer is stopped. If the anticipated event does not take place, the timer will time-out and recovery action will be triggered.

Sometimes the state of a resource is unpredictable. This is when a race-condition occurs. Two tasks compete against each other on basis of time. This condition is usually resolved by defining rules about who gets to keep the resource when a clash occurs. Devising a method to resolve such race-condition conflicts is easy, the difficulty lies in only identifying such conditions.

2.9.7 Flow Control

This refers to the synchronization of events to accommodate both the sender as well as receiver. The aim is to manage them in such a way that the receiver will follow the sender. Usually for Real-Time Systems the controlled object lies elsewhere and not within the domain of the controlling sub-system. Flow of control is necessary to synchronize events. The correlated event showers can be buffered at the interface between the controlled object and the computer system.

Several engineering solutions have been devised to control this flow of events at the interface. Some of them include low pass filters, the intermediate buffering of events in hardware and/or software and so on. It is difficult to come up with a universal flow-control schema for Real-Time Systems that can ensure that no important events are exploited by the flow control mechanism (as in the case of correlated event showers or by the use of a faulty sensor).

2.9.8 Maximum Execution Time of Programs

The deadline for the delivery of a result can be guaranteed if an upper-bound is available at the design phase itself. The upper bound value should not be significantly high, but only of an optimum value. This bound ought to be tight for two reasons: Firstly, the result should be an outcome of recent input data. Secondly, for the static scheduling system, the upper bound should not be loosely chosen as a lot of valuable resources are wasted unnecessarily.

The Real-Time Systems are usually designed so that they can run for a long, long time. The code should be able to monitor the maximum execution time of a program using language restriction. Recent trends like caches and pipelining makes this computation all the more complex.

2.9.9 Scheduling

In general, the problem of deciding whether a set of real-time tasks whose execution is constrained by some dependency relation (e.g., mutual exclusion), is schedulable belongs to the class of NP-complete problems [5]. Finding a feasible schedule, provided it exists, is another difficult problem. The known analytical solutions to the dynamic scheduling problem [6] assume stringent constraints on the interaction properties of task sets that are difficult to meet in distributed real-time systems.

2.9.10 Testing for Timeliness

In many real-time system projects more than 50% of the resources are spent on testing. It is very difficult to design a constructive test suite to systematically test the temporal behavior of a complex real-time system if no temporal encapsulation is enforced by the system architecture.

2.9.11 Error Detection

In a real-time computer system we have to detect value errors and timing errors before an erroneous output is delivered to the control object. Error detection has to be performed by the receiver and by the sender of information. The provision of an error detection schema that will detect all errors specified in the fault hypothesis with a small latency is another difficult design problem.

This thesis undertakes real-time systems and so we are set to define such systems. We will break down each term; define it individually and then head on for a complete definition.

A system is a mapping of a set of inputs into a set of outputs. Every real-world entity can be mapped as a system.

Real-time refers to the correct response provided within a pre-determined time frame.

Hence, we may say that a real-time system is one in which its proper functioning is based on the correctness of the outputs and their timeliness. Quoting Phillip Laplante [13], the definition is as follows:

‘A real-time system is a system that must satisfy explicit (bounded) response-time constraints or risk severe consequences, including failure.'

Failure for a real-time system doesn't simply mean that the requirements were not met. Usually it means life-threatening situations where the simple loss of not meeting the temporal deadline defined by the designer/programmer leads to loss of lives.

According to Sommerville [16], another way to look at real-time systems is to view them as a stimulus/response system. All the inputs take the role as stimuli, and the outputs as responses. Stimuli may be periodic or aperiodic. Periodic stimuli occur at predictable time intervals. Aperiodic stimuli occur irregularly and are usually signalled using the computer's interrupt mechanism.

As an example of a real-time system, consider a computer-controlled machine on the production line at a bottling plant. The machine's function is simply to cap each bottle as it passes within the machine's field of motion on a continuously moving conveyor belt. If the machine operates too quickly, the bottle won't be there yet. If the machine operates too slowly, the bottle will be too far along for the machine to reach it. Stopping the conveyor belt is a costly operation, because the entire production line must correspondingly be stopped. Therefore, the range of motion of the machine coupled with the speed of the conveyor belt establishes a window of opportunity for the machine to put a cap on the bottle. This window of opportunity distinguishes a real-time system from a non-real one. It indicates that the response has to lie within the time-frame provided otherwise the system is a failed one. It doesn't imply that the system has to be fast to be real-time. It simply has to meet the timing requirement set for it. This window of opportunity is in the form of a time-frame known as a period and deadline.

Real-time systems can be classified into three sub-categories [16]:

Hard Real-Time Systems

‘A hard real-time system is one in which failure to meet a single deadline may lead to complete and catastrophic system failure.'

For instance, an avionics weapons delivery system in which pressing a button launches an air-to-air missile. Missing the deadline to launch the missile within a specified time after pressing the button can cause the target to be missed, which will result in catastrophe.

Soft Real-Time Systems

‘A soft real-time system is one in which performance is degraded but not destroyed by failure to meet response-time constraints.'

E.g. An automated teller machine: missing even many deadlines will not lead to catastrophe failure, only degraded performance.

Firm Real-Time Systems

‘A firm real-time system is one in which a few missed deadlines will not lead to total failure, but missing more than a few may lead to complete and catastrophic system failure.'

E.g. An embedded navigation controller for an autonomous robot weed killer. Missing critical navigation deadline causes the robot to veer hopelessly out of control and damage crops.

All applications fall somewhere within these definitions. Rather the demarcation amongst these types is a bit fuzzy. As illustrated in 6, the meaning of real-time spans a spectrum. At one end of the spectrum is non real-time, where there are no important deadline. The other end is hard real-time, where no deadlines can be missed.

In USA ‘real-time' refers to on-line terminal services such as ATMs, database enquiry and on-line reservation and payment systems. Real-time system comprises of parallel or concurrent activities.

Embedded Real-Time Systems are those type of RTS that are based on specialized hardware and lack an operating system. An embedded RTS is called organic if it is not for a specialized hardware. It is known as loosely coupled/ semi-detached if they can be made organic with the re-write of a few modules.

Real-time Systems are usually event-based. An event in real-time software is anything that causes a change in flow. Real-time systems are always susceptible to change in their normal course of action. This is the reason why the notion of events is used to explain their functioning. Events can be of two types, viz:

1. Synchronous-these are the events which occur at predictable intervals.

2. Asynchronous- these occur at unpredictable events (caused by external sources).

2.10 Other Design Methodologies for RTS

Sommerville [16] proposes a design strategy for real-time systems which is event based. The stages are as follows:

1. Identify the stimuli to be processed and what responses are desired for each.

2. Then determine for each stimuli the timing constraints.

3. Cluster the stimuli and responses together based on behaviour. With each such class associate a process.

4. Identify or design any algorithms to carry out this set of concurrent processes.

5. Design a scheduling system which will ensure that processes are started in time to meet their deadlines.

6. Employ a real-time executive to integrate the system.

Sommerville further also indicates the general components of a real-time executive. He says that these components can be used fully or partially depending on the size of the real-time system.

1. A real-time clock-

This provides information to schedule processes periodically.

2. An interrupt handler

This manages aperiodic requests for service.

3. A scheduler

This component is responsible for examining the processes which can be executed and choosing one of these for execution.

4. A resource manager

Given a process which is scheduled for execution, the resource manager allocates appropriate memory and processor resources.

5. A despatcher

This component is responsible for starting the execution of a process.

6. A configuration manager

This is responsible for the dynamic reconfiguration of the system's hardware.

7. A fault manager

This component is responsible for detecting hardware and software faults and take appropriate action to recover from these faults.

The real-time executive has also to be set to accommodate two types of priority levels for real-time systems:

1. The interrupt level-this is for those processes that need immediate attention on the part of processing. The processes handled by interrupts are usually the ones associated with the foreground processes.

2. Clock level- this level is allocated to periodic processes.

Mani B. Srivasta et al [17] identifies two distinct phases in the real-time systems' design phase. Firstly, the high-level system specification is mapped onto a set of inter-communicating hardware and software modules. Secondly, these identified modules are generated using a mix of mapping, synthesis and library based techniques.

A UML based design methodology for Real-Time and Embedded Systems is presented by Gjalt de Jong [18]. He has combined the informal strengths of UML with the formal strengths of SDL. Then using O-O concepts, he has identified software components.

2.11 Design and Design only

The design of real-time systems prefers the K.I.S.S (Keep It Simple Stupid) principle. In addition to this point of view, Rob Williams [19] states, ‘If you follow a good design technique, appropriate questions will emerge at the right moment, disciplining your thought process.'

Designing a real-time software entails five major phases[15], viz:

1. Software Architecture Definition

Once it has been decided that a real-time software is to be designed, a suitable architecture is chosen. Then the UML based use-case design is carried out, where the system is treated as a black-box and all the users (mechanical or human) are considered as actors. The use-case diagram then shows all the possible interactions amongst the users and the system.

2. Co-Design

Next, for all the hardware, decide what software functionality needs to be allotted to which processor and/or link. The aim is that the resources should not get overloaded; also the system should remain scalable. Similar modules ought to be placed nearby as this reduces delay and eases inter-process message communication.

3. Defining Software Subsystems

These are purely software based, compared to the design decisions mentioned beforehand whose choice depended on hardware considerations too. Determine all the features needed by the system. Group them together and consider whether any sub-system can be made to simplify the design. Also, identify the tasks (along with their respective roles) that will implement the features identified.

4. Feature Design

Once the tasks and features have been done with, what needs to be designed is how the messages will interact amongst these tasks. Some of these tasks will be controlling the other ones by keeping a record of the activity of the feature. For this the concept of running timers is to be employed. The mechanism is such that timers are started to monitor the progress of events. If the desired event/task is carried out as per plan, the timer is stopped otherwise the timer times out and some recovery action comes into play. This recovery action is usually in the form of some roll-back activity. Message interfaces need to be specified in detail (all the fields and possible values).

5. Task Design

Once the tasks are identified it needs to be decided which state machine model will be employed to implement it. We are all aware that real-time systems are mostly state-based systems, so this decision is also vital to complete the design process. The state machine chosen can be either for single, multiple or complex tasks. Moreover the model can be either flat or hierarchical. This is also pivotal as here lies the basis of scheduling rules which are important for the smooth functioning of a real-time software.

Kopetz et.al (1991) [22] present a methodology for the design of real-time systems by breaking down the process from specification to the fine-grained task, message and protocol-level. Mars is the underlying architecture for the proposed design methodology. All the aspects of an engineered hard real-time system are considered, viz: predictability, testability, fault-tolerance (fail-stop or fail-operational), consideration of the complete system (both hardware and software), system decomposability (from abstraction to smaller modules) and evaluability (through an early dependability and timing analysis).

Previous design methodologies include SDARTS, DARTS, SDL, SCR, MASCOT, EPOS, SREM and so on. None of these methodologies provided all the characteristics mentioned above so the authors came up with a novel methodology constricted with some basic assumptions. The design methodology is explained diagrammatically below:

This entire process indirectly sets-up an off-line scheduling system which is the key to fulfilling the timing constraints for real-time systems. However, this indirect off-line scheduling does not suffice for predictable hard real-time systems and other issues also need to be considered simultaneously, viz: static and dynamic scheduling. Other crucial parameters include the following[22]:

TRANSACTION LEVEL

* MART (MAximum Response Time)

* MINT (Minimum INTerval)

TASK LEVEL

* MAXTE (estimated maximum execution time)

* MAXTC (calculated maximum execution time)

DATA LEVEL

* Validity (period of time for which data is valid or holds true)

SYSTEM LEVEL

* Maximum execution time of the communication protocol between tasks.

* CPU availability (CPU time usable for application tasks)

* System overheads, i.e: task switching, periodic clock interrupt, CPU cycles used by DMA eg for network access and so on.

OVERALL

* MAXT (execution time: calculated based on a value extracted by all of the timing values given above)

The importance of testing design is also emphasized by these authors. They recommend a testing scenario where each component, cluster (open-looped cluster test and closed-loop cluster test) or task is tested once right after its design and once after the implementation is sought. As we know, the testing done at design has a lower cost (by all means) hence the method is commendable. Other tests mentioned are the system test and the field test.

Finally the authors mention a software, which supports design through MARS, called MARDS. How MARDS aids in the design process is explained through this paper.

Zage points out in a paper [20] that the design phase of a software has two major aspects, viz the architectural design and the detailed design. The author has drawn an analogy between designing a software and constructing a true-to-life model of a building. To assist in devising a better design or tracing out errors in the design already made, two metrics have been developed by the team at Ball State University.

First is the external metric, De, that is based on information extracted during the architectural design, viz “hierarchical module diagrams, dataflow, functional descriptions and interface descriptions”. This basically caters to the inter-modular structure.

Second is the internal metric, DI, that is based on information available after the detailed design, viz all the information given above as well as “any chosen algorithms or the pseudo code developed”. This basically caters to the internal structure of the modules.

Then the author has come up with a composite metric to measure ‘design quality' for a design G.

D(G) = De + Di

where De = e1 (inflow * outflow) + e2 (fan-in * fan-out)

and Di = i1 (CC) + i2 (DSM) + i3 (I/O)

D(G) stands for the design quality,

CC stands for Central Calls,

DSM stands for Data-Structure Manipulations and

I/O refers to the external device accesses.

Moreover, inflow tells about the number of data entities passed to the module, outflow tells about the number of data entities passed from the module, fan-in tells the number of super-ordinate modules directly connected to the given module and fan-out tells the number of sub-ordinate modules directly connected to the given module and both e and i are the weighting factors and can take any values depending on the influence of the parameters involved.

To the calculation of Di, even more crucial parameters could have been added, like ‘cyclomatic complexity, nesting levels, knot counts, live variable counts and declaration-use pairs and so on.' However, the author was not in favour of it as he clearly quotes an important aspect of metrics wherein, trying to cover too many features via metric, muzzles up the result. Similarly, trying to associate just one feature with a metric, doesn't extract any vital information at all.

This metrics suitability to detect potential ‘stress-points' and error-prone modules was then tested against a large-scale software (its suitability for a small-scale software had already been proved). When tested for a large one interesting results were drawn. It was noted that with the calculation of De and Di, the outliers were usually the stress-points. When the outliers were way too many that the standard deviation and the mean didn't have nay sync with each other, the author introduced a method of identifying ‘X-less algorithms'. Once these were identified and marked straightaway as outliers, the rest of the calculations were performed on the remaining modules. These then provided with a better and evenly-distributed result, proving the importance of this metric.

In the end to prove the mettle of this metric even more, the author decided to test a part of it (Di) against a time-based metric called cyclomatic complexity V(G). It was found that Di served the purpose the best so far, even more than V(G). These results were then tallied with LOC-lines of code metric that was available after the generation of the code. As far as overall D(G) metric was concerned, it proved to extract 100% module-error concentrations for large-scale software.

The author also provides a solution to all the modules which do turn out to be error-prone, by saying that the design should be re-tailored or then the coding of such modules should be handed over to the most experienced programmers on the team or the testing-time on the modules should be increased.

However, I would like to point out here that if the solution lied in increasing the testing phase then why to go into such details trying to unfathom the problems at the design phase itself. We should be looking for a solution at the design phase and simply go into re-tailoring the design and running the metrics again.

Design metrics should be in conformance with the requirements set in the preliminary analysis phase. MDD is being used by real-time software developers to assure accurate real-time performance of complex systems. Using the UML (UML 2.1), developers are able to use the abstraction of models to address and understand complexity. The application is simulated to ensure algorithms perform properly; the model is then used to automatically generate real-time code that strictly adhered to design.

The chosen metrics should be able to follow the SMART test, wherein the chosen metric should be S=Smart, M=Measurable, A=Attainable, R=Realistic and T=Timely.

To be able to measure a real-time system, factors that need to be taken into consideration are as follows: timeliness, performance, reactive or not, accuracy, predictability, deterministic or no, maintainability, adaptability, robustness, efficiency, accuracy, reliability, cost-friendly nature, quality-assurance, performance, risk-management, complexity, reliability, fault-tolerance and performance etc.

Fohler et al (2002) [34] suggest a metric for control performance (QoC) based on real-time timing constraints. Only closed-loop control systems have been considered.

Timing constraints and values determine the performance and areas of improvement for such a system. They say that the temporal values are of two types: fixed (task periods-time delay and task deadlines-sampling period) and flexible (sets of feasible instance separation- extracted from the type of controller chosen and response times).

During the design many values qualify in accordance with the timing constraints. Deviations from these values lead to system errors. Two types of system errors are pointed out, namely the IAE (Integral of the Absolute value of the Error) and ITAE (Integral of the Time-weighted Absolute value of the Error).

One of these errors is used to quantify the QoC metric. The QoC metric allows the decisions to be based on both temporal and control information.

A strategy has been devised to improve the performance of real-time closed-loop control systems, viz: instead of time values (fixed or flexible), choose a set of values (<instance separation, response time>) at the design stage and then hand it over to the scheduler to choose the correct/ closest value-set pair. Moreover, the ability to change values at run-time allows improvement of the system.

2.12 Steps to Attain Useful Metrics for RTS

Measurement. Lies at the heart of everything. Standards such as ISO 9000 and SEI's Capability Maturity Model uses metrics too.

As Grady (26) states, “Without such measures for managing software, it is difficult for any organization to understand whether it is successful.” The paper by Linda Westfall (27) outlines “12 Steps to Useful Software Metrics.”Basing our research on these steps, we outline twelve steps for software metrics of real-time systems. The discussion is presented hereforth.

Firstly, identify who has to use the metric. If the metric does not have a user for whom it is being produced, the entire effort is futile. For the software of real-time systems the users can be people related to functional management, project management, the end prosuct.

Secondly, determine which entity precisely needs to be measured? To aid in this process the ‘goal/question/metric' ( 27) ( 28) paradigm may be employed.

Thirdly, structure relevant questions, the answers of which will lead towards the goal(s) determined in the previous step.

Fourthly, for the real-time system software, formulate the objective according to the following formula(27):

Fifthly, hunt for standard definitions of the attributes and/or entities. If there aren't any available or if ambiguity hangs above them, then go ahead and define these things yourself from the perspective of real-time systems. There is a concise set of related definitions available in the IEEE Glossary of Software Engineering Terminology (29).

Sixthly, choose between direct measurement or indirect measurement.

Seventhly, if the measurement method is indirect, break it down to an atomic level so that it is clearly known that which entities need to be measured to achieve the desired goal.

Eighthly, define thresholds, variances, control limits and so. Be specific by giving percentages.and so on.

Ninthly, from the real-time perspective decide the method of reporting. As (27) states, “define the reporting format, data extraction and reporting cycle, reporting mechanisms, distribution and availability.”

Tenthly, determine any additional qualifiers that will be needed if the metric has to be given a wider spectrum than the one allotted to it.

Eleventhly, we ought to be concerned about the collection of data. Since the metrics that have been devised here are concerned with the design phase of real-time systems, thus the collection of data is meager as compared to the entire life-cycle.

Lastly, keep the ethics alive. Make the selection of metrics, collections of data and intervening into professionals easy.

Please be aware that the free essay that you were just reading was not written by us. This essay, and all of the others available to view on the website, were provided to us by students in exchange for services that we offer. This relationship helps our students to get an even better deal while also contributing to the biggest free essay resource in the UK!