Title: | Compression of Illumina BeadArray data |
---|---|
Description: | Provides functionality for the compression and decompression of raw bead-level data from the Illumina BeadArray platform. |
Authors: | Mike Smith, Andy Lynch |
Maintainer: | Mike Smith <[email protected]> |
License: | GPL-2 |
Version: | 1.57.0 |
Built: | 2024-11-07 05:13:12 UTC |
Source: | https://github.com/grimbough/BeadDataPackR |
Given raw bead level data, in the form of .txt and .locs file, this function combines the two producing a new file with the data stored in a compressed format.
compressBeadData(txtFile, locsGrn, locsRed = NULL, outputFile = NULL, path = NULL, nBytes = 8, base2 = TRUE, fullLocsIndex = FALSE, nrow = NULL, ncol = NULL, progressBar = TRUE)
compressBeadData(txtFile, locsGrn, locsRed = NULL, outputFile = NULL, path = NULL, nBytes = 8, base2 = TRUE, fullLocsIndex = FALSE, nrow = NULL, ncol = NULL, progressBar = TRUE)
txtFile |
The name of the .txt file to be read in. |
locsGrn |
The locs file for the green channel. |
locsRed |
The locs file for the red channel. Only needed for two channel data. |
outputFile |
Name of the file to be created. |
path |
Path to where the input files can be found. If NULL the current working directory is used. This is also the directory where the output files will be written. |
nBytes |
Gives the number of bytes that are used to store the fractional parts of the bead coordinates. For a single channel array the maximum value is 4, whilst it is 8 for a two channel array. Any number larger than this is automatically set the the maximum value. If the maximum value is used the coordinates are stored in the .bab file as single precision floating point numbers, as they are in the .locs files. If a value smaller than the maximum is choosen then the integer parts of each coordinate are stored seperately. The requested number of bytes are then used to store the fractional parts, with a corresponding loss of precision as the number of bytes decreases. |
base2 |
If not using the full precision coordinates, the approximations can be stored as either a binary or decimal fraction. Using a binary fraction (base2 = TRUE) provides a greater accuracy, but can lead to a meandering number of decimal places in the reconstructed .txt files. If one wants a consistent number of decimal places, set base2 = FALSE. |
fullLocsIndex |
Default value of 0 uses a linear model fitted to each segment of the array to allow reconstruct the locs file when the file is decompressed. Using a value of 1 a simple index is used to record the locs file order, but requires more space. |
ncol |
This specifies the number of columns in each grid segment on the array and, if left blank, can normally be infered from the grid coordinates. However, this can fail for particularly small grids. If one wants or needs to specify them explicitly, these values can be found in the .sdf which accompanies the bead level output from the scanner. The number of columns per segment can be found within the tag <SizeGridX> |
nrow |
See ncol. If needed can be found within the <SizeGridY> tag in the .sdf file. |
progressBar |
By default the function uses a |
In the future the file names will be determined automatically, rather than requiring manual entry of each. The path argument may also be amended so there are seperate options for the locations of the input and output files.
Primarily invoked for its side effect, which is to produce a compressed version of the input files. The function returns, invisibly, a logical TRUE
if compression was successful.
Mike L. Smith
dataPath <- system.file("extdata", package = "BeadDataPackR") ## copy the files to a temp directory, and don't overwrite system files file.copy( list.files(path = dataPath, pattern = "example", full.names = TRUE), tempdir() ) compressBeadData(txtFile = "example.txt", locsGrn = "example_Grn.locs", outputFile = "example.bab", path = tempdir(), nBytes = 4, nrow = 326, ncol = 4, fullLocsIndex = TRUE)
dataPath <- system.file("extdata", package = "BeadDataPackR") ## copy the files to a temp directory, and don't overwrite system files file.copy( list.files(path = dataPath, pattern = "example", full.names = TRUE), tempdir() ) compressBeadData(txtFile = "example.txt", locsGrn = "example_Grn.locs", outputFile = "example.bab", path = tempdir(), nBytes = 4, nrow = 326, ncol = 4, fullLocsIndex = TRUE)
Decompressed a file create by BeadDataPackR. The original files that were compressed will be restored as accurately as possible, depending upon the degree of precision specified during the compression.
decompressBeadData(input, inputPath = ".", outputMask = NULL, outputPath = ".", outputNonDecoded = FALSE, roundValues = TRUE, progressBar = TRUE)
decompressBeadData(input, inputPath = ".", outputMask = NULL, outputPath = ".", outputNonDecoded = FALSE, roundValues = TRUE, progressBar = TRUE)
input |
The name of the .bab file(s) to be read. Can be a vector of file names, such as generated by |
inputPath |
Path where the compress file is located. The default is to use the current working directory. |
outputMask |
Text specify the names of the output files. The output files will have ".txt", "_Grn.locs" and (if approriate "_Red.locs") appended to this mask. If left NULL the original names of the section will be used. |
outputPath |
Path to where the uncompressed version of the files should be written to. The default is to use the current working directory. |
outputNonDecoded |
If TRUE the undecoded beads will be included in the output .txt file. They will have ProbeID 0 and intensity 0, but the bead centre coordinates will be included. |
roundValues |
The original Illumina text files give the bead centre coordinates to 7 significant figures. When this argument is TRUE decompressed files are also truncated in this manner, whilst FALSE writes them to the full precision they are stored in the compressed file. |
progressBar |
By default the function uses a |
Called primarily for its side effect, in which two (or three) files are written to the disk. These files should be representative of the original files that were compressed. The function returns, invisibly, the number of lines written in the .txt file.
Mike L. Smith
dataPath <- system.file("extdata", package = "BeadDataPackR") decompressBeadData(input = "example.bab", inputPath = dataPath, outputPath = tempdir())
dataPath <- system.file("extdata", package = "BeadDataPackR") decompressBeadData(input = "example.bab", inputPath = dataPath, outputPath = tempdir())
Example bead-level data consisting of a .txt
file, a .locs
file and the .bab
file that is produced from their compression.
Provides a mechanism to extract the information from the original .locs file from a compressed .bab file, without the need to extract the intensity or probe ID values.
extractLocsFile(inputFile, path = ".")
extractLocsFile(inputFile, path = ".")
inputFile |
The name of the .bab file to be read in. |
path |
Path to where the input file can be found. Default is the current working directory. |
A matrix with two columns (four if two-channel data) containing the X and Y values of the bead centre coordinates supplied in the original .locs file. For two-channel data the first two columns contain the coordinates from the green channel, with the red channel held in columns three and four.
Mike L. Smith
dataPath <- system.file("extdata", package = "BeadDataPackR") locs <- extractLocsFile(inputFile = "example.bab", path = dataPath) locs[1:10,]
dataPath <- system.file("extdata", package = "BeadDataPackR") locs <- extractLocsFile(inputFile = "example.bab", path = dataPath) locs[1:10,]
Given a list of probeIDs this function can scan a compressed .bab file for matching entries and return the data as a data.frame within R, rather than decompressing the data and generating new files.
readCompressedData(inputFile, path = ".", probeIDs = NULL)
readCompressedData(inputFile, path = ".", probeIDs = NULL)
inputFile |
The name of the .bab file to be read in. |
path |
Path to where the input file can be found. Default is the current working directory. |
probeIDs |
List the probe IDs for which data should be obtained. If left NULL then every probe on the array is returned. |
If the requested probe IDs are present the function returns a data.frame with one row per bead. If the probes are not found in the file then the function returns NULL and informs the user.
Mike L. Smith
dataPath <- system.file("extdata", package = "BeadDataPackR") readCompressedData(inputFile = "example.bab", path = dataPath, probeIDs = c(10008, 10010))
dataPath <- system.file("extdata", package = "BeadDataPackR") readCompressedData(inputFile = "example.bab", path = dataPath, probeIDs = c(10008, 10010))