Friday, December 26, 2014

A Proof of Bonferroni's Inequality

Converted document
Bonferroni’s Inequality:
(1) P(AB)  ≥ P(A) + P(B) − 1
Proof:
By definition, for all probabilities, P, 0 ≤ P ≤ 1. So, the probability of the union of two events A and B:
(2) P(AB)  ≤ 1
Further, we have previously shown:
(3) P(AB)  = P(A) + P(B) − P(AB)
Substituting RHS of (3) for the LHS of (2) yields:
(4) P(A) + P(B) − P(AB)  ≤ 1
Re-arranging terms in (4), yields:
(5) P(AB)  ≤ P(A) + P(B) − 1

Wednesday, December 24, 2014

Modifying Files in a Directory Recursively in R

My brother was working on a project recently and asked me if I could assist him in developing a script that would loop through all the php files in a directory, look for a specific keyword within curly brackets, and then insert comments before the first <tr> tag appearing before the keyword and after the first </tr> tag appearing after the keyword.

The following code did the trick.

# Set working directory
setwd("D:\\furious")

# Obtain a list of files in the working directory ending in *.php
selected.files <- list.files(pattern = "\\.php$", all.files = T, full.names = TRUE, 
    recursive = TRUE)

modfiles <- "files modified"
notmodfiles <- "files not modified"
filemod1 <- FALSE
filemod2 <- FALSE

regexp <- "\\{desc\\}"
regexp2 <- "</tr>"
regexp4 <- "<tr>"


# Loop through selected.files list
for (file in selected.files) {
    
    theurl <- file
    webpage <- readLines(theurl)
    
    # line that contains the {desc}
    startline <- which(regexpr(pattern = regexp, text = webpage) > 0)
    i <- startline
    if (length(i) > 0) {
        filemod1 <- TRUE
        # find the </tr> after {desc}
        for (i in startline:length(webpage)) {
            
            if (regexpr(pattern = regexp2, text = webpage[i])[1] > 0) {
                tr.end.line <- i
                break
            }
        }
        # put end description info after </tr>
        webpage[tr.end.line] <- paste(webpage[tr.end.line], "\n<!--end description-->")
    }
    
    
    
    for (i in startline:1) {
        
        if (regexpr(pattern = regexp4, text = webpage[i])[1] > 0) {
            tr.start.line <- i
            filemod2 <- TRUE
            break
        }
    }
    print("ENDLINE")
    webpage[tr.start.line] <- paste("<!--description-->\n", webpage[tr.start.line])
    
    # Set output directory so the original files will no be overwritten
    setwd("d:\\furious2")
    if (filemod1 == TRUE & filemod2 == TRUE) {
        
        fileConn <- file(file)
        writeLines(webpage, fileConn)
        close(fileConn)
        
        modfiles <- c(modfiles, file)
        print(paste("file modified:", theurl))
    } else {
        notmodfiles <- c(modfiles, file)
        print(paste("file NOT modified:", theurl))
    }
    
    # output the the updated files to the d:\furious directory
    setwd("D:\\furious")
    
    # Print list of modified files
    x = cbind(modfiles)
    write.csv(x, file = "d:/ModifiedFiles.csv")
}

The first line afer setting the working directory gets the list of files with name matching the regular expression given in the pattern argument as shown:

setwd("D:\\furious")
selected.files <- list.files(pattern="\\.php$", all.files=T, full.names=TRUE, recursive=TRUE)
head(selected.files)
## [1] "./sd_layout_1-_burgundy.php" "./sd_layout_1-_citrus.php"  
## [3] "./sd_layout_1-_forest.php"   "./sd_layout_1-_gold.php"    
## [5] "./sd_layout_1-_marine.php"   "./sd_layout_1-_midnight.php"

Then the code searches every line in the documents for the line number containing the keyword specified in the variable regexp. Then it searches downward from that keyword until it finds the expression given by the variable regexp2 and writes <!–end description–> after it. After that, the code searches upward in the document from that same keyword for the first encounter with the expression given by the variable regexp4 and writes <!–description–> before it.

If changes were successfully made, the updated file is written to the furious2 directory and the change is noted in the d:.csv file. This process continues for each matching file in the working directory set at the beginning of the code.