Automated test data generation


Automated testing of software optimization data is an important aspect of software quality. To test the quality of the software various test cases should be made for the test. For the evaluation of all aspects of software programs, the number has grown to the test cases quickly. Usually the main objective behind is to uncover as much faults as possible in a limited resources. Here comes the comparison of techniques, techniques which uncover more faults come first and so on. Developing organizations always desire to thoroughly test the software during maintenance phase for the better repute in the market. Artificial Intelligence (AI) can offer more opportunities in a low price but it required the trust of developer in it before implementing it. AI approach has always been at the forefront of research in computer science. In this paper we will develop, implement and validate a technique of AI that uses genetic algorithm for the optimization of software test data. A genetic algorithm emerges from the evaluation of natural species in searching for the optimal solution to a problem. While developing this technique we will focus on those critical factors which we have identified in our IS-I like cost, schedule, coverage etc in respect of optimization.


Testing software is designed to increase confidence in the accuracy of the software. Generation of test data is a central theme in software testing. A good testing package generated test suites, and not just a software system to help locate faults, but also the costs associated with it. Often it is desirable that a test suite should automatically generate test cases. This paper proposes Genetic Algorithm (GA) to test data generation for optimizing the software testing.

Verification and validation through testing software is a dynamic field of software engineering, where progress towards automation has been slow. Especially Design and automatic generation of test data, essentially a manual activity Software testing is still the main technique used to promote consumer confidence in the software available. Time to test the software system is an enormous task that is long and costly [8]. Traditionally, software verification done through tests based on scenarios. The system must be controlled are integrated into a test environment that matches the input and output of this component, and it flows through a series of tests. Each essay is an alternate member of the given input and expected results, corresponding to a scenario of implementation of the component tested. An error is reported when received power is in line with that expected. Even for simple systems, design and maintenance of test suites is difficult and expensive process. It requires a good understanding of the system to be tested to ensure that a maximum number of different situations covered by a minimum number of test cases. Perform the test is too time-consuming task, because the entire program code to be executed, and everything must be re-initialized before each test. The evolution of complex systems, it is quite common for software testing is really more resources than developing it. AI is useful. AI can provide more features for a low price. AI must and will be more widespread [11]. Automatic test data generation is a hot topic in the search software testing in the past few years. Several techniques have been proposed with different accents. Among them, most methods based on genetic algorithms (GA).

Artificial Intelligence

Software development is a very complex process, which are currently the major human activities. Programming, software development requires the use of different types of knowledge: the problem domain and the domain of programming. It also requires a large number of different measures, the combination of these forms of knowledge into a final solution. Artificial intelligence is half the area of information technology, machines of the tasks previously done by men [14]. Artificial intelligence is the study and creation of systems that exhibit a kind of intelligence and knowledge to be able to these systems to understand the natural language understanding, or intelligence tests are of course design. Software engineering is an activity of knowledge-intensive and requires extensive knowledge of the scope and target software itself. Many of the costs of software products can be attributed to the ineffectiveness of current management techniques and knowledge of the techniques of artificial intelligence can help [14].

The traditional view of software development process starts with the specification of requirements and ending with testing the software. At each stage, knowledge of different types of knowledge (knowledge-design to the stage of design and programming and domain-stage to the encoding) are required. At each of the two phases: the design and programming, there is a cycle: the detection of errors and error correction. Experience has shown that errors can occur at any stage of software development. The errors in coding may occur because of construction errors. These errors are usually expensive to correct [14].

Evolutionary Optimization Algorithm

Evolutionary Optimization (EO) algorithms use an approach based on population, where iterations are performed on a number of solutions (called population) and more than one solution has been created in every iteration [12]. The main reason for the popularity of EO algorithms are:

  1. No secondary information
  2. EO algorithms are relatively simple to implement
  3. EO algorithms are flexible and robust, ie they work well on a broad range of problems.

The use of a population of algorithms of equal opportunities, a number of advantages:

  1. Provide a procedure for observing the earth with a power of parallel processing
  2. It allows to find more optimal solutions EO and thereby solve problems and optimize multi-objectification multimodal,
  3. Gives an algorithm EO with the ability to standardize the decision variables (the objectives and limiting functions) in a changing population, with maximum and minimum best values of the population.

The disadvantage of working with a population of solutions computational cost and memory are required to perform each iteration [12].

EA also has its drawbacks. First, it is no guarantee of finding the global optimum solution, but in most cases, do not stop the reliability test is not enough. Second, the EA complex systems that make them quite intricate theoretical analysis, which will lead to lack of sufficient theoretical base in area. In addition, comparison less is difficult experimentally. Thirdly, they are often expensive calculations. Fourthly, it is not possible to know to what extent a result of solutions to the global optimum. Finally, and perhaps their worst performance depends strongly on all parameters, which usually must be tuned to experimental problems. In fact, in some algorithms, this parameter itself becomes the optimization problem. In exceptional cases, the process of adaptation can be avoided by changing strategies and other self-adaptive EA. In summary, it is important to bear in mind that EAs is not a set of techniques ready for implementation, and a set of mechanisms to change and adapt to the specific problem [19].

Basic Flow of Evolutionary Algorithms

The evolutionary process begins with a population of genotypes shown in the left lower corner of diagram. These are usually randomly generated though some EA applications with high bias, because during the initialization of the structure known or expected solutions. Phenotypes of genotypes derived from the development phase. The genotypes were later reproduction, as well as organizations get their DNA throughout their lives to stay [13].

Fitness Evaluation of phenotypes is then a fitness value for each individual assigned as in the figure discussed in every cloud. This value reflects the ability of the phenotype, to fix the problem manually, but it determines the prospects of the individual elements of their genotype to move to the next generation. As described above, the selection of the adults can remove lower ( low fitness) offspring, which allows access to the swimming pool for adults only artist better. Parent Selection chooses the individuals whose genotype (or parties) will be passed on to the next generation. In theory, all the adults remain in the pool for adults up to the next round of selection for adults, where all or some of them taken in order to make room for promising new adults. Finally, the genetic material is recombined by the parents and elected to mutate into a new pool of genotypes, each of which is transferred to a new object. These objects then start the next cycle of development, testing and selection. The four main EA types are Evolutionary Strategies (ES), Evolutionary Programming (EP), Genetic Algorithms (GA) and Genetic Programming (GP) [13].

Genetic Algorithm

They were first described by Holland, genetic algorithms, a masculinity of learning and optimization applied. Genetic algorithm is an algorithm based on the principle of natural selection and genetic mechanism is based. This mechanism simulates the natural evolution of life, the attainment of specific objectives in the artificial system. The principle behind the gas that they create and maintain a population of individuals is represented by chromosomes. By crossing and mutation of chromosomes, the population developed from one generation to another, until the hearing on conditions for the termination. Selection, recombination and mutation genetic operations in each genetic algorithm were also studied in many literatures. genetic algorithm is a method for stochastic optimization of the population is based. It is easy to obtain an optimal local problems. Therefore, many researchers believe a large number of measures to improve the performance of genetic algorithm. Although the genetic algorithm is not guaranteed to find the optimal solution, but it is often a good solution in time. GA is used to test for their robustness and relevance of the test solutions generate tasks, have already demonstrated in previous work [20]. GA begins with the assumptions and attempts to improve the guess by evolution. A typical GA consists of five parts:

  1. A presentation of a proposal called a chromosome,
  2. A first set of chromosomes,
  3. A function of fitness
  4. A selection function and
  5. An operator and a crossover operator mutation.

A chromosome is a binary string or a complex structure of the data. The first set of chromosomes can be created manually or generated randomly. The fitness function measures the fitness of a chromosome in response to a specific purpose: to cover ATG is a chromosome more correct if it corresponds to a greater coverage [20]. The selection function determines the chromosomes are at the stage of evolution of the genetic algorithm is part of the club and the mutation operator. Crossover operator exchanges genes between two chromosomes, creating two new chromosomes. The mutation operator changes a gene in one chromosome and creates a new chromosome.

The outline of the genetic algorithm is shown in Figure 2. First, the algorithm begins with a population or generated. Further, the ability of the chromosomes is calculated. After a test is performed if the fitness is high enough, or there were too many iterations and found no solution. After the failure of selection, crossover and mutation operators are used to generate the new population. Completed in the production of the new population is the process of Figure 2 again. The goal of this procedure is to find a chromosome with fitness that 'good' enough. To illustrate the process of genetic algorithm, the steps of the implementation of the GA are [21]:

Step 1: Consider the problem of variable domain chromosomes of fixed length, choose the population size of N chromosomes, the crossover probability Pc and mutation probability Pm.

Step 2: Determine the evaluation function of fitness of individual chromosomes in the field. Fitness function creates a basis for the selection of chromosomes to be mated with the breeding season.

Step 3: Create the random initial population size N: x1, x2 ,..., x

Step 4: Calculate the fitness of each chromosome: F (x1), F (x2 ),..., P (XN)

Step 5: Choose a pair of chromosomes for mating with the current population. Chromosomes of parents are selected with a probability related to their physical condition. High appropriate chromosomes are more likely to be selected for pairing of chromosomes, the less able.

Step 6: Create a pair of chromosomes children by applying the genetic operators.

Step 7: Place child chromosome created new population.

Step 8: Repeat Step 5 until the new population is equal to the initial situation, N.

Step 9: Replace the original (parent) chromosomes of the population with the new (offspring) population.

Step 10: Go to step 4 and repeat the process until the criterion is satisfied.

GA is an iterative process. Each iteration is called generation. A typical number of generations for the simple GA can vary from 50 to over 500. A common practice is the realization of A. After a number of generations, then explore the best chromosomes. If no satisfactory solution is found, the GA is restarted [21].


In our independent study-1 authors have tried to critically analyze the literature on the basis of papers studied as presented in Table 1, and evaluate existing techniques on the basis of four factors i-e fault detection capability, cost impacts, coverage impacts and execution time [16].


From the related work it has been found that there is a need of an optimized technique which work best in case of generation, reduction and prioritization on the basis of the factors which have been identified in IS-I. So in this paper author tried to work on artificial intelligence techniques to resolve the problem of optimization. Different evolutionary algorithms have been studied and genetic algorithm techniques have been selected for further work. Authors will develop, implement and validate a technique that uses genetic algorithm for the optimization of software test data. A genetic algorithm emerges from the evaluation of natural species in searching for the optimal solution to a problem. While developing this technique we will focus on those critical factors which we have identified in our IS-I like cost, schedule, coverage etc in respect of optimization.


In this section authors have tried to reviewed different techniques of artificial intelligence for the optimization of test data cases. Emphasis is made to genetic algorithm and authors tried to discuss this technique in detail, and try to evaluate the advantages and disadvantages of the genetic algorithm with the help of exiting research papers.

This paper [1] is analyzing the problems of automated software test data generation using evolutionary Algorithm. Paper gave the detailed introduction of automated software testing strategies. Author gave brief introduction of evolutionary algorithms and gave details of one of the technique of evolutionary algorithms i.e. sequential and parallel evolutionary algorithm. The results shows that the technique proposed in this paper gave good results. The only drawback of proposed technique is that it needs to be validating with the existing literature [1].

In this paper [2] authors compare different evolutionary algorithms like genetic algorithms, memetic algorithms, particle swarm, ant colony and shuffled frog leaping. These evolutionary algorithms are compared on the basis of processing time, coverage speed and quality. Author first of all gave briefed introduction of all these five algorithms and how they work. Author use the help of visual base programming to code the algorithms on 1.8 GHz AMD laptop machine. After that author gave results that particle swarm optimization performs well. But the problem is that author uses visual basic and the hardware used was also AMD. So in our point of view the results should be different if some different programming language is used or high end machine is used. So the validation of results is a question [2].

This paper [3] is about genetic algorithms and authors proposed a new genetic algorithm to check the performance. The benchmarks were to investigate crossover strategies and how a parent can be selected for reproduction and mutations. Author gave results that crossover strategy is more successful in path coverage and one must selected the parent according to its fitness rather than random selection. Authors also compare genetic algorithm with random testing and conclude that proposed genetic algorithm work more efficiently. The only drawback of this paper is that this proposed genetic algorithm need to be checked with more complex programs and also with other data types as well [3].

In this paper [4] authors compare two swarm based algorithms with genetic algorithm. Authors choose ten programs of different complexity and gave results on the basis of performance. In this paper authors also discussed the most recent algorithms i.e. artificial bee colony (ABC) algorithms. Authors gave introduction of genetic algorithm, artificial bee colony algorithm and particle swarm optimization. Experiment is done and results are drown from matlab. Results show that particle swarm optimization gave good results in term of test case generation. The only drawback of this paper is that weather this algorithm work best in reduction and optimization of test case. Need more work to validate it [4].

In this paper [5] author tried to compare two evolutionary algorithms i.e. genetic algorithms and evolutionary programming. These two algorithms are compared on the basis of performance. Author chooses the cutting stock problems to compare these two algorithms. Author gave detailed introduction of cutting stock problem and then implement genetic algorithm to resolve the problem through example. Authors gave the recommendation that the cutting stock problem is resolved more effectively with genetic algorithm and evolutionary programming. In our point of view more work is needed to be done to validate this concept [5].

In this paper [6], author tried to discuss the use of metaheuristic search techniques for automatic generation of test data. Author gave detailed introduction of metaheuristic search and its different types like hill climbing, simulated annealing, evolutionary algorithms, genetic algorithms and advanced encoding and operators. Authors apply these algorithms according to different criteria on the basis of functional (Black box) testing, structural (White box) testing, Grey box testing and Non-functional testing. Author uses real world problems as an experimental but again there are many other problems which need to be focused [6].

In this paper [9] authors gave the genetic programming technique in detail with comparison to traditional genetic algorithms approach on the basis of performance. Author performs experiments on wire antenna design. Author gave the competed procedure with implementation and then compares the results. The results show that genetic programming algorithm work better than genetic algorithms. As this approach work best in case of wire antenna design doesn't mean that it focus in this concept cannot be generalized. More effort is required to make this concept generalized in nature [9].

In this paper [7] author tried to evaluate genetic algorithm for automatic software test data generation. Authors gave complete implementation and check the effectiveness on different type of programs. In this paper authors introduce a new tool named GADGET for dynamic test data generation. Authors discussed this tool with different criteria and gave in-depth knowledge of genetic algorithm. Experiments are done on different complex programs on the basis of coverage and complexity. Authors also discuss the condition on which genetic algorithm fails. The results given in this paper are very generic [7].

Conclusion & FUTURE WORK

Successful software testing will add to the delivery of consistent and quality software product, more pleased users, minor repairs cost, and extra correct and consistent result. However, unsuccessful testing will direct to the differing results; minute quality products, on the side the users, improved repairs costs, defective and wrong results. Hence, software testing is a required and vital action in the development of software process. In this paper we have tried to emphasis on the efficiency of test cases in term of generation, reduction and prioritization. Authors suggested that an optimized technique is required which will best work in these three areas for the improvement of the software quality. In future work we have tried to propose a technique in which an emphasis will be made to cover all major critical factors previously used in automated software test data optimization. A single technique is required which will be hybrid of different techniques for generation, reduction and prioritization in such a way that the efficiency of any technique will not loose its benefits and this will work with all the three components. We have also made an attempt to encapsulate all critical factors in the proposed technique, which will be our IS-2.


The present dissertation would have never been achieved without the support of a variety of people. I am particularly thankful to Muhammad Nadeem Khokhar, my IS-II supervisor. His wise ness, while at the same time friendly guidance throughout this semester has made me grow a lot. Muhammad Nadeem Khokhar, thanks for your invaluable help and patience. I would also like to acknowledge the SZABIST, Islamabad Campus for providing required resources to complete this work. It would have been impossible to complete this effort without their continuous support. Finally, I would like to thank my family and friends. I am in debt to you for the continuous support and understanding, which has been essential for concluding the IS. This dissertation is dedicated to you all.


  1. Enrique Alba, Francisco Chicano. Observations in using Parallel and Sequential Evolutionary Algorithms for Automatic Software Testing. Grupo GISUM, Dept. de Lenguajes y Ciencias de la Computacion, University of Malaga, SPAIN, Computers & Operations Research. 16 November 2006.
  2. Emad Elbeltagi,Tarek Hegazy, Donald Grierson. Comparison among five evolutionary-based optimization algorithms. Department of Structural Engineering, Faculty of Engineering, Mansoura University, Mansoura 35516, Egypt. Advanced Engineering Informatics, 19 January 2005.
  3. Maha Alzabidi , Ajay Kumar, and A.D. Shaligram. Automatic Software Structural Testing by Using Evolutionary Algorithms for Test Data Generations. IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009.
  4. S Singh Dahiya, J K Chhabra, S Kumar. Automated Test Data Generation using Swarm Intelligence Approaches. IE(I) Journal-ET Volume 90, January 2010.
  5. Raymond Chiong, Member, IEEE, and Ooi Koon Beng, Member, IEAust. A Comparison between Genetic Algorithms and Evolutionary Programming based on Cutting Stock Problem. Engineering Letters, 14:1, EL_14_1_14 Advance online publication: 12 February 2007.
  6. Phil McMinn. Search-based Software Test Data Generation: A Survey. The Department of Computer Science, University of Sheeld Regent Court, 211 Portobello Street Sheeld, S1 4DP,UK. Software Testing, Verification and Reliability. 14(2), pp. 105-156, June 2004.
  7. Gary McGraw, Christoph Michael, Michael Schatz. Generating Software Test Data by Evolution. Technical report RSTR-018-97-01. RST Corporation, Suite # 250, 21515 Ridgetop Circle Sterling, VA 20166. 9 February 1998.
  8. Praveen Ranjan Srivastava, Tai-hoon Kim. Application of Genetic Algorithm in Software Testing. Computer Science & Information System Group, BITS PILANI - 333031 (INDIA). International Journal of Software Engineering and Its Applications Vol. 3, No.4, October 2009.
  9. P. J.Williams , T. C. A.Molteno. A Comparison of Genetic Programming with Genetic Algorithms forWire Antenna Design. Department of Physics, University of Otago, P.O. Box 56, 9016 Dunedin, New Zealand. Hindawi Publishing Corporation, International Journal of Antennas and Propagation, Volume 2008, Article ID 197849, 6 pages, doi:10.1155/2008/197849, 2008.
  10. Andreas Windisch, Stefan Wappler, Joachim Wegener. Applying Particle Swarm Optimization to Software Testing. GECCO '07: Proceedings of the 2007, conference on Genetic and evolutionary computation. GECCO'07, July 7-11, 2007.
  11. Tim Menzies (Computer Science, Portland State University, Oregon, USA), Charles Pecheur (RIACS at NASA Ames Research Center, Moffett Field, CA, USA). Verification and Validation and Artificial Intelligence. Submitted to Elsevier Science. 12 July 2004.
  12. Vassil Guliashki, Hristo Toshev, Chavdar Korsemov. Survey of Evolutionary Algorithms Used in Multiobjective Optimization. BULGARIAN ACADEMY OF SCIENCES. Problems of engineering cybernetics and robotics, 60, 2009.
  13. Keith L. Downing, The Norwegian University of Science and Technology Trondheim, Norway. Introduction to Evolutionary Algorithms. January 26, 2009.
  14. Jonathan Onowakpo Goddey Ebbah, Department of Computer Science, University of Ibadan, Ibadan, Nigeria. Deploying Artificial Intelligence Techniques In Software Engineering. American journal of undergraduate research. vol. 1 no. 1 (2002).
  15. Martn Agero, Franco Madou, Gabriela Espern, Daniela Lpez De Luise. Universidad de Palermo, Argentina. Artificial Intelligence for Software Quality Improvement. World Academy of Science, Engineering and Technology 63 2010.
  16. Arshad Mansoor, Adnan Shabbir. SZABIST, Islamabad. Analytical Survey on Automated Software Test Data Evaluation. 10th National Research Conference at SZABIST, IEEE Conference Record number: 16756, NISS2010 AT GYEONGJU (KOREA). 13 May, 2010.
  17. Wouter Wiggers, Faculty of EECMS, University of Twente. A comparison of a genetic algorithm and a depth first search algorithm applied to Japanese nonograms. Copyright 2004, University of Twente, Faculty of Electrical Engineering, Mathematics and Computer Science. 1st Twente Student Conference on IT, Enschede 14 June 2004.
  18. Roy P Pargas, Mary Jean, Robert R Peck. Department of computer science, Clemson University. Test-Data generation using genetic algorithms. Journal of software testing, verification and reliability. 1999.
  19. Ramn Sagarna. Department of Computer Science and Arti_cial Intelligence, University of the Basque Country Supervised by: J. A. Lozano. Phd Thesis. Donostia - San Sebastin, January 2007.
  20. Kewen Li, Zilu Zhang, Jisong Kou. College of Computer and Communication Engineering, China University of Petroleum, Dongying, Shandong, 257061, China. Breeding Software Test Data with Genetic-Particle Swarm Mixed Algorithm. journal of computers, VOL. 5, NO. 2, FEBRUARY 2010.
  21. Karl O. Jones. Comparison of genetic algorithms and particle swarm optimization for fermentation feed profile determination. International Conference on Computer Systems and Technologies - CompSysTech' 2006.

Please be aware that the free essay that you were just reading was not written by us. This essay, and all of the others available to view on the website, were provided to us by students in exchange for services that we offer. This relationship helps our students to get an even better deal while also contributing to the biggest free essay resource in the UK!