Use the copy-and-paste method to give an average-case analysis for the
permute function in
- Assume that the random number generator used in the algorithm produces truly random numbers, i.e, for any integer
j in the range
RAND_MAX, any given call of
rand() has a probability
1/RAND_MAX of returning
rand() function runs in
O(1) time (worst-case and average).
- Show all work!
- Remember that you are doing average case analysis.
If, at some point, you don’t ask how the average case behavior differs from the worst case behavior, you’re doing it wrong!
Check your analysis by running the algorithm on a variety of input sizes and measuring the time it takes.
pdriver.cpp can be used as a “driver” to execute the
permute algorithm. This program expects a pair of command line arguments when run. The first argument is the number of items to permute. The second is the number of times you want to repeat the permutation process.
For example, if you compile this program to produce an executable named “
pdriver 50 10
will generate 10 permutations of 50 elements each.
Because the purpose of this exercise is to generate timing data rather than to actually work with the algorithm, the generated permutations are not printed. Feel free to add output statements if you want to see the algorithms in action, but remember to remove those statements before proceeding on to the timing steps. (I/O is slow and may distort your timing results.)
- Compile the program.
- Run the program for different sizes of N and time the algorithm.
- On Linux systems, you can measure the run time of the programs (or of any Unix program/command) by placing the command “
time” in front of the program invocation. For example,
time ./pdriver 50 10
- will determine the time required to generate 10 permutations of 50 elements each.
The time will appear in a format similar to this:
real 0m5.530s user 0m3.362s sys 0m0.090sThe last two numbers are of the most interest to us. Their sum is the number of seconds the CPU devoted to execution of this program. This is often much less than the “clock time” (the third number), because the CPU is usually being shared by several different programs.
- The user time is the amount of time spent in “user” code.
- The sys time is the amount of time spent in “system” calls.Together, these consititute the “CPU time” used by the command.
- The real time is the actual time that passed from when the program started to when it ended. This can be much longer than the CPU time.
- We are interested in the CPU time because the other portion of the real time is being spent running other processes or programs other than the one were are timing.
- An important factor to keep in mind is that we are looking for average times. If we ran the algorithm only a single time for a particular value of N, depending upon how the random numbers came out, we might get an unusually fast or an unusually slow run. We need to do multiple repetitions for each value of N so that we can get the average time of a single run of size N. That’s why the main driver function of this program is designed to allow you to request multiple repetitions of the algorithm for a fixed N.
- Another reason for doing averages over many repetitions is to reduce the effect of measurement errors – always a possibility in any experiment. Even computer clocks have a finite resolution. For small values of N this algorithm might run in a fraction of one clock “tick”. Another source of measurement error in this kind of experiment is the fact that we can’t run just the permute algorithm in isolation. We have to run a whole program that launches the permute algorithm, so some of our measured time will be due to code other than the algorithm we are interested in.
- Timings of less than 10 seconds are likely to be dominated by the overhead of starting and stopping the program, so you should adjust the repetition count (the second argument of the permute program) to make sure that all your timed runs take at least that many seconds. A minimum of 60 seconds is probably a good target.
- In a non-Unix environment, getting the timing information is harder. You can, of course, time the program with a good old-fashioned stopwatch, but you’ll need to take special care to be sure that your own physical reaction time doesn’t affect the results. In addition, you are then measuring clock time, not CPU time. We have already discussed the dangers of using clock time.I strongly recommend, therefore, that you do this experiment on our Linux servers or on a Linux (or MacOS) machine of your own, if you have one.
- For each value of N that you choose to use, determine an appropriate value of R, a number of repetitions of the
permute algorithm that brings the total time of
./pdriver N Rto a reasonable level (i.e., at least 60 seconds, but not so much longer that you get tired of waiting for your data.)
Record the values N, R, and TT (the total CPU time observed) in a spreadsheet, e.g.,
N | 100 | 200 | 400
R| 10000 | 5000 | 2000
TT | 60.52 | 56.5 | 110.2
- 2Each value of N should occur in only one row, and the N’s need to be presented in ascending order. The values shown here are for illustration only – the proper selection of values for N and R is discussed below,
What values of N should you use? Here are some things to keep in mind:
- Big-O is for “sufficiently large N”. For small values of N, the largest term in the overall complexity might not yet be dominating any lower-order terms. In addition, you have certain constant and O(R) factors involved in launching the program and looping around calling the
permute function. For small values of N, these might distort your results.
- You are going to be looking for trends in the data. That requires a fair number of points, but even more important than the total number of data points is that those points be spread widely apart. You should make sure that your values of N range over many orders of magnitude. (An order of magnitude is a power of ten.)
- It is possible to get too large. When your arrays get so large that the virtual memory system begins swapping parts of them out of RAM to disk, your timing results will suddenly become very erratic. Your total array size should probably be kept well under the amount of RAM available to you.
In this lecture we looked at a procedure for confirming a predicted complexity. In this assignment, we will employ a slightly expanded form of the same technique.
In your spreadsheet, make sure that your rows are sorted into increasing values of N. Then add a 4th column in which you compute T,1 the time to execute a single call to the
permute function. This should be (approximately) TT/R:
N | 100 | 200 | 400
R | 10000 | 5000 | 2000
TT | .006052 | .0113 | .0551
Now, from this lecture, you should recall that,
- if a function if O(f(N)), then T/f(N) should be roughly constant over all N.
- if f(N) is actually too large, then we should see a trend where T/f(N) gets smaller as N grows.
- if f(N) is too small, then we should see a trend where T/f(N) gets larger as N grows.
These three observations allow us to “bracket” a function that we think represents the actual complexity.
- Start with the function f(N) that you predicted from your analysis. Add two columns to your spreadsheet. The first should compute the f(N) value, and the second should compute T/f(N). For example, if you predicted that the function was O(N3)O(N3) on average, then you might have
N | 100 | 200 | 400
R | 10000 | 5000 | 2000
TT | 60.52| 56.5 | 110.2
T | .006052 | .0113 | .0551
N^3 | 1000000 | 8000000 | 64000000
T/N^3 | 6.052E-09 | 1.413E-09 | 8.609-10
- In both the T column and various f(N) and T/f(N) columns, use the spreadsheet to compute values. Don’t compute them with a calculator or other program on the side and then manually enter them into the spreadsheet columns. Calculation is what spreadsheets are for! (Besides, it’s much easier for me to check your formulas than to have to recompute your results whenever I see something that looks a bit fishy.)
- Look at the results you have. Is the T/f(N) column nearly constant? If not, what is the trend? If your f(N) is too large, repeat this procedure with smaller functions (e.g. if N3N3 is too large, try N2N2). If your f(N) is too small, repeat this procedure with larger functions (e.g. if N3N3 is too small, try N4N4). You can also try larger functions by multiplying by logNlogN, e.g., if you need something larger than O(N)O(N), try O(NlogN)O(NlogN).
Each function that you try should be shown as an additional two columns in the spreadsheet.
- Once you think you have found the true complexity f(N), make sure that you have “bracketed it” by adding a function slightly smaller than f(N) and a function slightly larger than f(N). (You may already have one or both in your spreadsheet already from step 2. If so, you don’t need to add them again.)
Prepare a report on your analysis and experimental results. This report may be in a plain-text
.txt file or in a PDF
.pdf file. (Most word processors will allow you to save or export to PDF.)
The report should include
- Your copy-and-paste analysis
- Your spreadsheet with your tabulated timing data
- To the document containing your analysis, add brief discussion of how your experimental results match up with your prediction.