301113 | Programming for Data Science | Programming

Home Recent Questions 301113 | Programming for Data Science | Programming

Question 1: Plagiarism (5 marks)

a.Read the following text carefully, then place it in your document. Be sure to format the bullet points correctly.

By including this statement, I — the author of this work — verify that:

• I hold a copy of this assignment that I can produce if the original is lost or damaged.
• I hereby certify that no part of this assignment/product has been copied from any other student’s work or from any other source except where due acknowledgement is made in the assignment.
• No part of this assignment/product has been written/produced for me by another person except where such collaboration has been authorised by the subject lecturer/tutor concerned.
• I am aware that this work may be reproduced and submitted to plagiarism detection software programs for the purpose of detecting possible plagiarism (which may retain a copy on its database for future plagiarism checking).
• I hereby certify that I have read and understand what the School of Computing, Engineering and Mathematics defines as minor and substantial breaches of misconduct as outlined in the learning guide for this unit.

b. Perform an internet image search  for the term “plagiarism”. Choose one image and place it in your document, with the URL in the caption, as in Figure 1.

Question 2: Manipulating Arrays (10 marks)

a.Set a variable studentno equal to your student number (as an integer). Run the following code to seed the random number generator. set.seed(studentno)
b.Create a random vector with 40 elements using the function runif(). Print your vector as below (its entries will be different to the one below).

My vector:

[1]0.562583 0.750217 0.425158 0.724771 0.761201 0.940472 0.624377
[8]0.766676 0.440252 0.239865 0.023170 0.562284 0.626722 0.677718
[15]0.957524 0.323970 0.002627 0.890840 0.608366 0.993184 0.225613
[22]0.233224 0.207173 0.301462 0.943345 0.350438 0.522908 0.233365
[29]0.134381 0.867709 0.668160 0.477765 0.335556 0.717439 0.693888
[36]0.013811 0.760189 0.983373 0.608502 0.735450

c. Print the following elements of your vector. Be sure to format the result in the same way as shown below.

Minimum value: 0.002626505.    Maximum value: 0.9931845
Average value: 0.5478932.    Median value: 0.6084338

d. Choose a number randomly between 4 and 9 using sample(). Call that number n. Print the nth smallest and nth largest value in the vector.

The 7th smallest value: 0.2332236
The 7th largest value: 0.8677092

e. Reshape your vector into a 4 by 10 matrix. Calculate the following characteristics of your matrix.

Column 2 has the largest sum of all the columns.
Row 1 has the smallest sum of all the rows.

f. Create a new matrix, which is the same as the previous matrix, except that any element m[i,j] that is greater than 0.5 is replaced with m[i,j]-1. Print the sum of the matrix.

The sum of the new matrix is    -2.084272

Question 3: Plotting (10 marks)

Hyperspectral images can be seen as a generalisation of normal colour images such as RGB images. In a normal RGB colour image, there are 3 channels, i.e. channels for red colour, green colour and blue colour. It is normally organised in a 3D array, e.g. an array X of size 100 × 120 × 3, for a raster image with 100 pixels in its vertical line and 120 pixels in horizontal line. Then, X[ , , 1] is a matrix that represents the red colour component of the image. Similarly X[ , , 2] is for green and X[ , , 3] for blue. A hyperspectral image (HSI) has more than three colour components.

Each pixel in a HSI is a spectrum (brightness at a range of colours). For example, X[2,3,] is a vector of length 103 which is the spectrum at the second row and the third column of the image.

Here we will look at this HSI called Pavia University scene remotely sensed by ROSIS sensor.  The whole data set and ground truth are packed in a matlab file called paviauni.mat which is in vUWS. You can use readMat() in R package called R.matlab to read in matlab file. The file contains a list item A$X, which is the HSI. The file also contains a list item A$groundtruth, which classifies each pixel into one of 10 classes.

Produce a plot of 100 spectra randomly chosen from each class (if a class has less than 100 spectra then plot them all), plotting each class in separate subfigure as below.

Question 4: Predicting Coin Flips (15 marks)

a.Try to imitate a sequence of coin tosses. Create a vector of ones (representing heads) and zeros (tails), of length 51. Be as random as you can. Do this before you read the rest of the question. (Or, get someone else to do it.) Print your sequence.

My fake sequence: 000110010101111101010010010000110101101010010101011

b.Can we devise a test to distinguish between the fake tosses and a truly random set? One method is based on the observation that people tend to fluctuate between heads and tails more frequently than random. Write code to count the number of times nf the sequence fluctuates between heads and tails (that is, the number of times “head follows tails” or “tails follows head”).

The sequence of flips fluctuates nf = 33 times.

c.How many times would we expect the sequence of tosses to fluctuate if it were truly random? Write a loop to simulate 105 sets of 51 tosses, storing the number of fluctuations. Plot the histogram of fluctuations.
d.You will note that the distribution peaks at 25. In a string of 51 tosses, there are 50 times when the sequence could fluctuate, and a random sequence will fluctuate 50% of the time, on average. Given your value of nf (from part b, the number of fluctuations in your sequence), it is D = |nf − 25| fluctuations away from the average. Calculate the fraction of the 105 random sequences generated above that are D or more fluctuations away from average. For example, for the example above with nf = 33, you would calculate the fraction of sequences with ≥ 33 fluctuations, plus the fraction of sequences with ≤ 17 fluctuations. Print your answer as a percentage.

3.414% of trials have fluctuations that are >8 away from average.

e.We can use this knowledge to create a rigged coin toss game. The player will guess the outcome of the next coin toss, and will accordingly gain or lose money. The coin tosses won’t be random: we’ll use previous guesses to try to outwit the player. Write (and include here) a script that performs the following.

•Initialise money = 10, to give the player $10 starting money.
•Loop over attempts to guess the coin toss.
•For each attempt, read a character from keyboard input.
•If h or t, record this guess as your_current_guess, which will equal one or zero. Keep a record of up to 51 guesses in a vector called guesses, with the most recent guess always at guesses[1]. For any other keyboard entry, break from the loop.
•Keep track of the number of recorded guesses nguesses.
•Given the list guesses, calculate the probability of a fluctuation (pf) — the number of fluctuations in guesses divided by the number of recorded guesses in guesses.
•If the number of recorded guesses is small (less than 5, say), set pf = 0.7.
•Use the probability pf to predict the next guess as follows:

#toss biased coin
if (runif(1) < pf){

#we predict a fluctuation, so make this toss equal to previous guess thistoss = guess[2]
} else {

#we predict no fluctuation, so make this toss different to previous guess thistoss = 1-guess[2]


•If thistoss is equal to your_current_guess, add $1. If thistoss is not equal to your_current_guess, take away $1.
•If zero money remains, end the game. Output the number of guesses the player made.

Be sure to give appropriate feedback to the player after each guess: their guess, the coin toss, You win! or You lose!, and how much money they have. Play your game a few times to test it.

Similar Posts

Order Now

Latest Reviews


Payments And Security