How we generated our prediction for subchallenge 3

Il-Youp Kwak and Wuming Gong

2020-03-01

This vignettes illustrate how our team prepared a submission for subchallenge 3.

Data processing

The ‘csv_file’ is a file path of evaluation for subchallenge 3 given from the competition. Change the like with the one you would like to predict.

csv_file <- 'Data/subC3/SubC3_10K_0001_mutation_table.csv'
x <- read.table(csv_file, header = T, sep = ',', colClasses = "character")

Initical state is ‘0’, interval dropout is ‘-’, point dropout is ‘’ (point dropout was ’’ from the file, but we will replace it with ’’), and mutational outcome states are ‘A’ to ‘Z’.

states <- c('0', '-', '*', LETTERS)

Read file and save it as ‘phyDat’ object.

tip_names <- x[, 1]
x <- x[,-1]
rownames(x) <- tip_names
x[ x == '' ] = '*'  ## specified * as point dropout (point missing) 
x = as.matrix(x)
x <- x %>% phyDat(type = 'USER', levels = states)

states2num = 1:length(states)
names(states2num) = states

Weight parameters for the prediction

We tried weighted hamming I and II with large number of parameter combinations, and found weighted hamming I with weight below worked fairly well.

InfoW = 1:5
InfoW[1] = 1  ## Score 
InfoW[2] = .9
InfoW[3] = .4
InfoW[4:26] = 3

Generating final submission file for subchallenge 3

aTree2 <- x %>% dist_weighted_hamming(InfoW, dropout=FALSE) %>% fastme.ols()
write.tree(aTree2, "Kwak_Gongsub3_submission.nw")

Thanks!