Quicksort
Quicksort is a well-known sorting algorithm developed by C. A. R. Hoare that, on average, makes (nlogn) (big O notation) comparisons to sort n items. In the worst case, it makes (n2) comparisons, though if implemented correctly this behavior is rare. Typically, quicksort is significantly faster in practice than other (nlogn) algorithms, because its inner loop can be efficiently implemented on most architectures, and in most real-world data, it is possible to make design choices which minimize the probability of requiring quadratic time. Additionally, quicksort tends to make excellent usage of the memory hierarchy, taking perfect advantage of virtual memory and available caches. Coupled with the fact that quicksort is an in-place sort and uses no temporary memory, it is very well suited to modern computer architectures. Quicksort (also known as "partition-exchange sort") is a comparison sort and, in efficient implementations, is not a stable sort.
History
The quicksort algorithm was developed in 1960 by C. A. R. Hoare while in the Soviet Union, as a visiting student at Moscow State University. At that time, Hoare worked in a project on machine translation for the National Physical Laboratory. He developed the algorithm in order to sort the words to be translated, to make them more easily matched to an already-sorted Russian-to-English dictionary that was stored on magnetic tape.
Algorithm
Quicksort sorts by employing a divide and conquer strategy to divide a list into two sub-lists. Full example of quicksort on a random set of numbers. The boxed element is the pivot. It is always chosen as the last element of the partition.
The steps are:
- Pick an element, called a pivot, from the list.
- Reorder the list so that all elements with values less than the pivot come before the pivot, while all elements with values greater than the pivot come after it (equal values can go either way). After this partitioning, the pivot is in its final position. This is called the partition operation.
- Recursively sort the sub-list of lesser elements and the sub-list of greater elements.
The base case of the recursion are lists of size zero or one, which are always sorted. The correctness of the partition algorithm is based on the following two arguments:
- At each iteration, all the elements processed so far are in the desired position: before the pivot if less than or equal to the pivot's value, after the pivot otherwise (loop invariant).
- Each iteration leaves one fewer element to be processed (loop variant).
The correctness of the overall algorithm follows from inductive reasoning: for zero or one element, the algorithm leaves the data unchanged; for a larger data set it produces the concatenation of two parts, elements less than or equal to the pivot and elements greater than it, themselves sorted by the recursive hypothesis.
Parallelizations
Like merge sort, quicksort can also be easily parallelized due to its divide-and-conquer nature. Individual in-place partition operations are difficult to parallelize, but once divided, different sections of the list can be sorted in parallel. If we have p processors, we can divide a list of n elements into p sublists in (n) average time, then sort each. Ignoring the (n) preprocessing, this is linear speedup. Given n processors, only (n) time is required overall.
One advantage of parallel quicksort over other parallel sort algorithms is that no synchronization is required. A new thread is started as soon as a sublist is available for it to work on and it does not communicate with other threads. When all threads complete, the sort is done. Other more sophisticated parallel sorting algorithms can achieve even better time bounds. For example, in 1991 David Powers described a parallelized quicksort that can operate in O(logn) time given enough processors by performing partitioning implicitly.
Formal analysis
From the initial description it's not obvious that quicksort takes (nlogn) time on average. It's not hard to see that the partition operation, which simply loops over the elements of the array once, uses (n) time. In versions that perform concatenation, this operation is also (n). In the best case, each time we perform a partition we divide the list into two nearly equal pieces. This means each recursive call processes a list of half the size. Consequently, we can make only logn nested calls before we reach a list of size 1. This means that the depth of the call tree is (logn). But no two calls at the same level of the call tree process the same part of the original list; thus, each level of calls needs only (n) time all together (each call has some constant overhead, but since there are only (n) calls at each level, this is subsumed in the (n) factor). The result is that the algorithm uses only (nlogn) time. An alternate approach is to set up a recurrence relation for the T(n) factor, the time needed to sort a list of size n. Because a single quicksort call involves (n) factor work plus two recursive calls on lists of size n / 2 in the best case. The master theorem tells us that T(n) = (nlogn). In fact, it's not necessary to divide the list this precisely; even if each pivot splits the elements with 99% on one side and 1% on the other (or any other fixed fraction), the call depth is still limited to 100logn, so the total running time is still (nlogn). In the worst case, however, the two sublists have size 1 and n − 1 (for example, if the array consists of the same element by value), and the call tree becomes a linear chain of n nested calls. This is the same relation as for insertion sort and selection sort, and it solves to T(n) = (n2). Given knowledge of which comparisons are performed by the sort, there are adaptive algorithms that are effective at generating worst-case input for quicksort on-the-fly, regardless of the pivot selection strategy.
Randomized quicksort expected complexity
Randomized quicksort has the desirable property that, for any input, it requires only (nlogn) expected time (averaged over all choices of pivots). But what makes random pivots a good choice?
Suppose we sort the list and then divide it into four parts. The two parts in the middle will contain the best pivots; each of them is larger than at least 25% of the elements and smaller than at least 25% of the elements. If we could consistently choose an element from these two middle parts, we would only have to split the list at most 2log2n times before reaching lists of size 1, yielding an (nlogn) algorithm. A random choice will only choose from these middle parts half the time. However, this is good enough. Imagine that you are flipping a coin over and over until you get k heads. Although this could take a long time, on average only 2k flips are required, and the chance that you won't get k heads after 100k flips is highly improbable. By the same argument, quicksort's recursion will terminate on average at a call depth of only 2(2log2n). But if its average call depth is (logn), and each level of the call tree processes at most n elements, the total amount of work done on average is the product, (nlogn). Note that the algorithm does not have to verify that the pivot is in the middle half - if we hit it any constant fraction of the times, that is enough for the desired complexity. The outline of a formal proof of the O(nlogn) expected time complexity follows. Assume that there are no duplicates as duplicates could be handled with linear time pre- and post-processing, or considered cases easier than the analyzed. Choosing a pivot, uniformly at random from 0 to n − 1, is then equivalent to choosing the size of one particular partition, uniformly at random from 0 to n − 1. With this observation, the continuation of the proof is analogous to the one given in the average complexity section.
Average complexity
Even if pivots aren't chosen randomly, quicksort still requires only (nlogn) time over all possible permutations of its input. Because this average is simply the sum of the times over all permutations of the input divided by n factorial, it's equivalent to choosing a random permutation of the input. When we do this, the pivot choices are essentially random, leading to an algorithm with the same running time as randomized quicksort.
Space complexity
The space used by quicksort depends on the version used.
Quicksort has a space complexity of (logn), even in the worst case, when it is carefully implemented such that
- in-place partitioning is used. This requires (1).
- After partitioning, the partition with the fewest elements is (recursively) sorted first, requiring at most (logn) space. Then the other partition is sorted using tail recursion or iteration. (This idea is commonly attributed to R.Sedgewick .
The version of quicksort with in-place partitioning uses only constant additional space before making any recursive call. However, if it has made (logn) nested recursive calls, it needs to store a constant amount of information from each of them. Since the best case makes at most (logn) nested recursive calls, it uses (logn) space. The worst case makes (n) nested recursive calls, and so needs (n) space; Sedgewick's improved version using tail recursion requires (logn) space in the worst case. We are eliding a small detail here, however. If we consider sorting arbitrarily large lists, we have to keep in mind that our variables like left and right can no longer be considered to occupy constant space; it takes (logn) bits to index into a list of n items. Because we have variables like this in every stack frame, in reality quicksort requires ((logn)2) bits of space in the best and average case and (nlogn) space in the worst case. This isn't too terrible, though, since if the list contains mostly distinct elements, the list itself will also occupy (nlogn) bits of space.
Selection-based pivoting
A selection algorithm chooses the kth smallest of a list of numbers; this is an easier problem in general than sorting. One simple but effective selection algorithm works nearly in the same manner as quicksort, except that instead of making recursive calls on both sublists, it only makes a single tail-recursive call on the sublist which contains the desired element. This small change lowers the average complexity to linear or (n) time, and makes it an in-place algorithm. A variation on this algorithm brings the worst-case time down to (n) (see selection algorithm for more information). Conversely, once we know a worst-case (n) selection algorithm is available, we can use it to find the ideal pivot (the median) at every step of quicksort, producing a variant with worst-case (nlogn) running time. In practical implementations, however, this variant is considerably slower on average. Another variant is to choose the Median of Medians as the pivot element instead of the median itself for partitioning the elements. While maintaining the asymptotically optimal run time complexity of (nlogn) (by preventing worst case partitions), it is also considerably faster than the variant that chooses the median as pivot.
Variants
There are three well known variants of quicksort:
- Balanced quicksort: choose a pivot likely to represent the middle of the values to be sorted, and then follow the regular quicksort algorithm.
- External quicksort: The same as regular quicksort except the pivot is replaced by a buffer. First, read the M/2 first and last elements into the buffer and sort them. Read the next element from the beginning or end to balance writing. If the next element is less than the least of the buffer, write it to available space at the beginning. If greater than the greatest, write it to the end. Otherwise write the greatest or least of the buffer, and put the next element in the buffer. Keep the maximum lower and minimum upper keys written to avoid resorting middle elements that are in order. When done, write the buffer. Recursively sort the smaller partition, and loop to sort the remaining partition.
- Three-way radix quicksort (also called multikey quicksort): is a combination of radix sort and quicksort. Pick an element from the array (the pivot) and consider the first character (key) of the string (multikey). Partition the remaining elements into three sets: those whose corresponding character is less than, equal to, and greater than the pivot's character. Recursively sort the "less than" and "greater than" partitions on the same character. Recursively sort the "equal to" partition by the next character (key).
Source : http://en.wikipedia.org/wiki/Quicksort
Quicksort lebih dikenal sebagai algoritma sorting. Quicksort disebut juga dengan partition exchange sort. Algoritma quick sort mengurutkan dengan sangat cepat dibandingkan dengan algoritma yang lain karena loop dalamnya dapat secara efisien diimplementasikan pada kebanyakan arsitektur dan disebagian besar data di dunia nyata adalah mungkin untuk membuat pilihan desain yang meminimalkan kemungkinan membutuhkan waktu kuadrat, namun algoritma ini sangat komplex dan diproses secara rekursif.. Walaupun begitu algoritma quick sort tidak selalu merupakan pilihan yang terbaik. Seperti yang telah disebutkan sebelumnya, algoritma ini dilakukan secara rekursif yang berarti jika dilakukan untuk tabel yang berukuran sangat besar, walaupun cepat, dapat menghabiskan memori yang besar pula. Selain itu, algoritma ini adalah algoritma yang terlalu komplex untuk mengurutkan tabel yang berukuran kecil (hanya puluhan elemen misalnya). Selain itu algoritma quicksort mempunyai tingkat efisiensi yang buruk ketika dioperasikan pada tabel yang hampir terurut atau pada tabel yang terurut menurun.
Quick sort merupakan divide and conquer algorithm. Langkah-langkahnya adalah sebagai berikut :
a. Pilih satu elemen secara acak
b. Pindahkan semua elemen yang lebih kecil ke sebelah elemen tersebut dan semua elemen yang lebih besar ke sebelah kanannya. Elemen yang nilainya sama bisa disimpan di salah satunya. Ini disebut operasi partisi
c. Lakukan sort secara rekursif terhadap sublist sebelah kiri dan kanannya.
Tidak ada komentar:
Posting Komentar