System of Student Coursework Submission
The report is about developing an application for an existing system of student coursework submission. This application has many stages that should be developed. The first step is how to design a useful website that is easy for students to submit their coursework. The second step is how to design a safety database which is important in the application to store the students' marks. The student marks are our ethical issue. After that, the system should automatically mark and provide the student feedback for both students and lectures. These tasks need some tools to check the program code to see if plagiarism exists. And a tool is needed to run the programs automatically and provide output. So, the literature review will be more about the tools that detect plagiarism in the same area.
The project's main idea is how to improve the existing system with a website that could enable students to submit their software engineering coursework easily. That means the java coursework. The existing system is just a link that student attach their coursework to and press the submit button or by email. Some lectures still use paper copies to mark students' coursework. Students have to submit their course work in the box. This box will collect a lot of paper, making it hard for the office to organize them. Imagine that lots of students do the same course and lecturers present a lot of lectures, so they need more time and effort to mark the students' coursework. The main focus of this report is related to work tools that detect plagarisim.
The aims and descripton of this project are:
Make available a website and system to collect and manage the submission of the students' coursework of java programs. This website will do all things a human does. Automatically by some tools to given result of the program submission. Once the program code is submitted, the unit will test the program to see if it has any errors or not. After that, the programs will run automaticlly with a tool called ant to get outpot from the program. A lecturer could easily check if the programs meet the plagarisim rules with one of the tools which we will see more details on later. When all things are done, we will give feedback and marks to the lecturer and student by email. Therefore, a lecture could easily and quickly print the program or document and see if plagiarism exists.
The objectives for this project are as follows:
- Understand and analyse the existing system.
- Question or work with a lecture to collect nice ideas for their structure of marking students' coursework.
- Carry out a background/literature review of other projects that use plagiarism tools and decide which one to use at the end, as well as clarifying how any tool works.
- The requirment is an automatic system to mark students' coursework.
- The design for this system will be both a usefull website with good guidence and a database which is very important to safeguard the students' marks. So, the usability through the interaction of the users is important for a good design.
1.2. Background/literature review
For this section we will familiarize ourselves with what we need from the dissertation which are tools to detect plagiarism in programming code. In fact, these tools have been divided by their work into many parts, which I will later go through. The main focus of this report is the java programs' submission; therefore, in this literature review I will explore other people's work in java programs and how they detect plagiarism in their work, and later I will see which one is better to use in the program. We need to design a website to collect the student coursework; therefore, we will also explore the usability of web pages in the literature review.
Plagiarism is when a part of some work claims to present its own work and codes when they were actually written by somebody else. Therefore, taking someone's work or ideas and presenting them as though they were your own without any reference is not allowed. So, from this point on there will be quite a lot of study and discussion in this area to define and discover tools that could detect plagiarism in any code.
1.2.3. Methods or ways to detect plagiarism 
a) Attribute system counting
It is clear from its name that the metric is just count the attributes as well as it only focus on both comprises of all type in operators and the operands by using the algorithm or the software Halstead. The main aim of the system is to find the similarity value of two programs. This system has some static variables:
- shows the number of operators.
- views the number of operands.
The total number of operators and operands in all types is the same, but the difference is the capital N. So the N1 shows the total number of operators and N2 shows to the total of operands in all types.
This system has two equations to calculate the metrics:
1- V= (N1+N2) log (n1+n2).
This system will go through all lines where this line is blank or comment to check.
B) Structure metric
The method has been given more attention and focuses on the structure rather than previous one which focuses on the number of attributes. This system is split into many tokens that help to detect plagiarism. The structure metric has been divided into two phases. The name of the first phase is called Tokenization. The first phase is similar in all tools that use this system. Here in this phase I will divide all programs into tokens. This phase is composed of:
- Comments and string-constants are removed.
- Upper-case letters are translated to lower-case.
- A range of synonyms are mapped to a common form.
- If possible, the functions/procedures are expanded in calling order.
- All tokens that are not in the lexicon for the language are removed.
The difference is in the second phase. The Second phase is called Comparison of token. Here in this phase all tokens will be compared to detect plagiarism.
There are many other tools that detect plagiarism which depend on the program's structure rather than an indicator of summary. So, the structure metric method is much more trustworthy than the attribute counting, because it is concerned with the structure rather than the value of these numbers. Some examples of these tools using the structure are MOSS, YAP and JPlag. So, we will explore these tools in more detail.
There are lots of tools that detect plagiarism code. However, these tools work with deferent techniques which have improved from the past up until the present.
The figure below shows the four stages for plagiarism detection.
The Four Stage Plagiarism Process of Culwin and Lancaster
The first challenge for plagiarism detection was generally focused on the feature comparison, because the majority of software systems in those days were focused on computing the number of different software metrics types.
The first system was discovered in 1976 and called 'Ottenstein', which is just concern the FORTRAN language  that uses Halstead metrics. This system focused on the single numbers of both the operators N1 and the operands N2 as well as the total of the both numbers N1 and N2 that we mentioned before.
From this we could know if the four number values are similar to another program resulting in plagiarism.
The next system to detect plagiarism focused on the large number of metrics that not more than 24, rather than unique number as in fist system. This system was used to detect plagiarism in Pascal programs. This was very good and improved performance.
The figure below shows the classification of detecting plagiarism.
What is Moss?
Moss is standard for the Measure Of Software Similarity. It is an automatic system that finds two similarities in programs. It was developed in 1994 and was mostly used to detect plagiarized source code. In addition, MOSS is an automatic software system that supports some language, for instance Java, Ada, ML, C, C++, and scheme programs. It has been much more successful in these languages. However, the Moss tool is just provided as a web internet service.
The internet service is very simple to use. A user just lists all files they want to compare and Moss does the rest. The last version of Moss is supported Linux. When Moss finishes comparing the files it will provide the output as a hotmail page with the list of all pairs of similarity code detected. As well as this, Moss provides an easy way to compare these file by underlining the same thing in both files.  Therefore, it easy to use and anyone could use it for free, so anyone could have an account.
1.2.5. YAP: for Michael Wise
YAP stands forYetAnotherPlague. The system is divided into three versions. This is Michael Wise's tool which was developed at the University of Sydney, Australia. And he defined his own structure as metrics.
YAP1 and YAP2 are previous versions from Michael Wise. He started by developing the first one, YAP1, followed by YAP2, and finally YAP3.
YAP1 : this version was published in 1992. It works with a mix of UNIX utilities, joined as one with a Bourne-shell. This version had a disadvantage in that it was so slow.
YAP2 : this version was an improvement on the previous one and it was much faster. In addition, it was written within Perl and this version used a C program and implemented Heckel's algorithm.
YAP3: this was a further improvement released in 1996 to detect plagiarism code which is another person's work being used as a computer program's own language, as well as texts submitted by students.  This was the final version of Wise's tool. In the latest version, he described the second phases as novel.  The second phase was dependent on algorithms named 'Karp-Rabin, Greedy-String-Tiling (RK-GST)', which will see more details on later in JPag.
This version with its algorithms is much better and able to locate the similarity of lines of code that are transposed.YAP3 is still weak in the changing order of source code . Let's move on to Jplag.
It is a program which was discovered by Lutz Prechelt, Michael Philippe and Guido Malpohl in 1996. They tried to provide a system that takes many sets of source code and discovers similarities in them. It does not only compare the byte of text but it also knows the program's structure, plus the syntax of programming language. The existing languages that JPlag support are Java, C, C++, Scheme, as well as the natural language text. So, it used to detect similar exercise copies of student programs rather than looking for copies of internet programs. In addition, it has a good graphical user interface to show the output as result of the survey. This interface is a nice HTML of the result and it has other nice windows to underline and show the comparison between both files.
126.96.36.199. The algorithm that JPlag uses:
There are two algorithms that JPlag uses:
- The Greedy String Tilling algorithm (GST).
- The Rabin Karp (RK).
Mostly, the Greedy String Tilling algorithm is used. On the other hand, it is more complex when the algorithms run, especially on O (n^3), which is the largest notation. It is the bad or hard case cannot decrease. The main design of both Greedy String Tilling (GST) and the Rabin Karp (RK) are to change the program's code into tokens. However, the GST will immediately the token is compared because the compare is one to one. Also, GST will find and detect two similar texts or patterns, even if they are in a different position. Figure 3 gives an idea about how the code of the GST algorithm works. It is clear in points 11-15 how the algorithm works and how it counts the sets of similar texts. The 'text' here means 'long String' while the shorter string here is called a 'pattern'. In the RK every token should calculate the hash value which is important in the RK, because it will compare all values at the end. It is the best method to apply when the string is too long as the hash value is the greatest.
Why and how JPlag uses the RK algorithm:
The RK has a good technique to detect plagiarism which is the hash value calculation. First, it calculates the Pp substring and then the hash value of the Pp+1. That means the first will calculate the hash value of the pattern, and after that it will calculate the size of the pattern's text. This calculation will be kept or saved in the hash table. So, from this table it will be easy for the hash value to make comparisons between them. If they have the same value then the sub string will be compared, otherwise the difficulty will be a linear size of strings, patterns and texts. For this problem, two methods are used if there is a linear result:
This service is provided as a website. It is easy to make an account, and then you just choose the new submission and view the window, as in Figure 4.
After you fill in the structure and press the submit button it will start comparing, as in Figure 5.
1.2.7. Plague :
This uses the same structure metric type of technique and the work and method are the same as those used in YAP3. However, Plague does not use the RKR-GST algorithm as in YAP3. It has three phases of use in order to work:
- Create a sequence of token and a list of structure metrics to form a structure profile. The profile summarizes the control structures used in the program and represents the iteration/ selection and statement blocks.
- An O (n2) phase compares the structure profiles and determines pairs of nearest neighbours.
- Finally, a comparison of the token sequences uses a variant of the Longest Common Subsequence made for similarity.
But in the summary, as Clough stated, Plague suffers from a number of problems, and these include:
- The fact that it is hard to adapt to new languages.
- Because of the way it produces its output (two lists of indices that require interpretation), the results are not obvious.
This tool was developed in 1994 at the University of Warwick. Also, Sherlock is an independent application and it is easy to use by just going online in the boss submission system.
This system can compare both source code and texts. As well, it presents the result in HTML graph form. The algorithm that Sherlock uses is the same as in YAP3. It also uses the tokens to search for the same sequence line by line in both files and this search is called 'runs'. The technique that Sherlock uses is to look for length runs. Sherlock does not have a website service like Moss and Jplag, therefore it is an individual tool.
1.2.9. Conclusion /summary of plagiarism
It is clear that from 1997 until now, the tools to detect plagiarism have improved in their techniques. For example, the Ottenstein tool was concerned with counting the single numbers of both the operators N1 and the operands N2. After that came the ACM SIGCSM tool which focused on the large numbers. However, the tools were just concerned with attribute system counting. That is not good enough to detect plagiarism because if a student changes the value of the operator and operands they will not detect the plagiarism. While they have the same structure, these tools are limited for program language and do not support the object-oriented language. Therefore, the later tools are concerned with the metric structure such as Jplag, Moss and YAP3. YAP3 has a disadvantage which is that if the order of the code changes it will struggle to find a similarity. JPlag solved this problem because even if they are in different position and similar Jplag will detect it. So, according to this, the Jplag technique is the easiest and most accurate in detecting plagiarism. Also, it has a useful website as well and it is not hard to set up an account. Plague is the same as Jplag and YAP3 in that it uses the metric structure but in different algorithms, which makes it hard for it to see similarities because the results are not obvious. Sherlock is a good tool but it is an independent tool and does not have a website like Jplag; therefore, in order to use it you should follow the instruction mentioned before by going online in the boss submission system.
What does usability mean?
 It means making a system or product as easy as possible to use for a user. It also has other meanings, such as an evaluation or whether something is "easy to use". Also, identifying usability helps designers to simplify problems. It is important to question the users in this situation to help make the system simple. The programmers have to interact with users all the time. Other meaning it is a center or focusing in human's computer interaction [HCI] between the efforts to be clear means in users. In addition, usability has other meaning which is the capacity that humans use. The majority of researchers agree that usability is context dependent in addition to how it is shaped. Also, we could say that usability is an interaction between the users and the problems. In our work we will consider the usability of the website design which nowadays is much more important. On the other hand, to guarantee that a website is good quality it should have a very good recognized asset. People normally leave websites that are hard access or navigate or have dead links. In addition, some website have a lot of information on the home page which makes it difficult for users to locate the right link. Therefore, designers pay attention to usability as quality factor in their designs. In the beginning, usability related to the use of paper. However, it has quickly changed to relate to the form of hypertext.
Therefore, in order to avoid this happening, there are some the general rules regarding usability guidelines.
These ten guidelines are called "heuristics" by Nielsen.
- Visibility of system status.
- Match between system and the real world.
- User control and freedom.
- Consistency and standards.
- Error prevention.
- Recognition rather than recall.
- Flexibility and efficiency of use.
- Aesthetic and minimalist design.
- Help users recognize, diagnose, and recover from errors.
- Help and documentation.
So, these points are very important in terms of usability, which means the users should at all times know what is going on in the application. The system should use the same language as the users' speak rather using system-oriented terms.
Nielsen also mentioned that the system should be flexible with users because they may choose the wrong option; therefore, the system should provide some options such as 'redo' and 'undo' to leave a situation fast and easily.
Also, Nielsen talked about error prevention. With a careful design, an error message presented to users should ask for confirmation an option when a user chooses an action.
Nielsen had a good idea when he stated that the system should not make the user remember all the information from one part to another. The system should have clear options to save all information so that later users can easily retrieve it where appropriate. And the error message should be written in simple language (not code).
The last point is about help and documentation. The system should provide a simple and clear task that users can understand.
188.8.131.52Conclusion /summary of usability:
To sum up, usability means aiming to help users understand and navigate the links easily. Here in this section the ten "heuristics" by Nielsen provide important techniques that all designers should go through in order to build usable web applications to ensure their long-term relationships with users and to keep these applications solid.
1.3. The ethical and legal issues for the project
The most important factor in this project is the students' marks. After their submission, the programs will mark the coursework automatically and produce the result. These marks are important to the lecturer. The lecturer should only have access to the database to see or change the result. Therefore, how can we keep these marks safe and the data protected?
A good databases is needed to keep all the students' marks safe and to protect them from illegal access. So, these issues are our legal concern. For the evaluation the method used will be observations of students and staff to see how they deal with the design interface, and a table will be made for the results.
The plan for the whole dissertation is as shown in Figure 8. After the exam, we will start with the requirement analysis to analyse the existing system and understand how it works. Also, for this part we will learn from books on databases and PHP. After that, the design will start. Two things are important. The test and the writing will feature in all plan tasks. And the last task which the documentation will rewrite all writing has done in previous tasks.
The development and building of a new system comes from thinking about two main components: system analysis and design. Life always develops so no system will be a good for long time, but it must have good usability for the users. Therefore, when you try to develop a system you can't just work on the tools and code and forget about the usability of design. Both must be considered together to build a nice system that users will become familiar with.
 P. Clough, 'Plagiarism in Natural and Programming Languages: An overview of current tools and technologies'. Internal report, Department of Computer Science, University of Sheffield, 2000.
 S. Grier, 'A tool that detects plagiarism in Pascal programs', ACM SIGCSM Bulletin, Vol. 13, No. 1,1981, p. 15-20.
 J. L. Donaldson, L. Ann-Marie, and P.H. Sposato, 'A plagiarism detection system', ACM SIGCSE Bulletin.
 A. Aiken, 'Measure of software similarity', URL Available: http://www.cs.berkeley.edu/-aiken/moss.html.
 M. J. Wise, YAP3: improved detection of similarities in computer programs and other texts, presented at SIGCSE '96, Philadelphia, USA, February, p. 15-17, 1996, p. 130-134.
 Available: http://luggage.bcs.uwa.edu.au/~michaelw/YAP.html [Accessed 07-04-2010].
 Available: http://theory.stanford.edu/~aiken/moss/ [Accessed 07-04-2010].
 Available: https://www.ipd.uni-karlsruhe.de/jplag/ [Accessed 07-04-2010].
 Available: http://www.faqs.org/patents/app/20080270991 [Accessed 07-04-2010].
 Available: http://joypub.joensuu.fi/publications/dissertations/mozgovoy_plagiarism/mozgovoy.pdf [Accessed 08-04-2010].
 Available: http://monod.uwaterloo.ca/papers/04sid.pdf [Accessed 08-04-2010].
 M. J. Wise, Running Karp-Rabin Matching and Greedy String Tiling, University of Sydney (1993).
 Using the Metro Web Tool to Improve Usability Quality of Web Sites.
 Available: http://www.wqusability.com/articles/more-than-ease-of-use.html [Accessed 15-04-2010].
 The Ten Usability Heuristics. Available: http://www.useit.com/papers/heuristic/heuristic_list.html