Title: | Quality Assessment Tools for Oxford Nanopore MinION data |
---|---|
Description: | IONiseR provides tools for the quality assessment of Oxford Nanopore MinION data. It extracts summary statistics from a set of fast5 files and can be used either before or after base calling. In addition to standard summaries of the read-types produced, it provides a number of plots for visualising metrics relative to experiment run time or spatially over the surface of a flowcell. |
Authors: | Mike Smith [aut, cre] |
Maintainer: | Mike Smith <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2.13.1 |
Built: | 2024-11-06 05:30:13 UTC |
Source: | https://github.com/grimbough/IONiseR |
This generic function accesses the baseCalled slot stored in an object derived from the Fast5Summary class.
baseCalled(x)
baseCalled(x)
x |
Object of class |
A data.frame with 6 columns
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') baseCalled( s.typhi.rep2 ) }
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') baseCalled( s.typhi.rep2 ) }
Plots a line for each fast 5 file, arranged by channel and experiment time when the signal was being recorded. The colour of each line can be specified by the user to reflect any metric they wish. The intention of the plot is to investigate trends that may appear at specific time points, or influence a subset of channels.
channelActivityPlot(summaryData, zScale = NULL, zAverage = TRUE)
channelActivityPlot(summaryData, zScale = NULL, zAverage = TRUE)
summaryData |
Object of class Fast5Summary. |
zScale |
A data.frame containg two columns. The first must be labelled 'id' and correspond to id field present in all slots in summaryData. The second column should contain data pertaining to that reads that you wish to be represented on the coloured z-axis. |
zAverage |
Logical indicating if a bar showing the mean across all channel for the choosen zScale should be shown on the plot. Defaults to TRUE. |
Returns an object of class gg
representing the plot.
if( require(minionSummaryData) ) { require(dplyr) data(s.typhi.rep3, package = 'minionSummaryData') ## we will plot the median event signal for each read on z-axis z_scale = select(eventData(s.typhi.rep3), id, median_signal) channelActivityPlot( s.typhi.rep3, zScale = z_scale ) }
if( require(minionSummaryData) ) { require(dplyr) data(s.typhi.rep3, package = 'minionSummaryData') ## we will plot the median event signal for each read on z-axis z_scale = select(eventData(s.typhi.rep3), id, median_signal) channelActivityPlot( s.typhi.rep3, zScale = z_scale ) }
Creates a plot representing the layout of a MinION flow cell. Each circle
represents an individual channel with the intensity relecting a specified
sequencing metric. This function is a more generalised version of
layoutPlot
, allowing the user to map any value the like on
the channel layout.
channelHeatmap(data, zValue)
channelHeatmap(data, zValue)
data |
A data.frame. Should have at least two columns, one of which has the name 'channel'. |
zValue |
Character string specifying the name of the column to be used for the colour scaling. |
Returns an object of gg
representing the plot.
library(dplyr) if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') ## calculate and plot the mean number of events recorded by each channel avgEvents <- left_join(readInfo(s.typhi.rep2), eventData(s.typhi.rep2), by = 'id') %>% group_by(channel) %>% summarise(mean_nevents = mean(num_events)) channelHeatmap(avgEvents, zValue = 'mean_nevents') }
library(dplyr) if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') ## calculate and plot the mean number of events recorded by each channel avgEvents <- left_join(readInfo(s.typhi.rep2), eventData(s.typhi.rep2), by = 'id') %>% group_by(channel) %>% summarise(mean_nevents = mean(num_events)) channelHeatmap(avgEvents, zValue = 'mean_nevents') }
This generic function accesses the eventData slot stored in an object derived from the Fast5Summary class.
eventData(x)
eventData(x)
x |
Object of class |
A data.frame with 5 columns
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') eventData( s.typhi.rep2 ) }
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') eventData( s.typhi.rep2 ) }
An S4 class for summarised data from a MinION sequencing run
## S4 method for signature 'Fast5Summary' length(x) ## S4 method for signature 'Fast5Summary' readInfo(x) ## S4 method for signature 'Fast5Summary' eventData(x) ## S4 method for signature 'Fast5Summary' baseCalled(x) ## S4 method for signature 'Fast5Summary' fastq(x) ## S4 method for signature 'Fast5Summary,ANY,ANY,ANY' x[i] ## S4 method for signature 'Fast5Summary' fastqTemplate(x) ## S4 method for signature 'Fast5Summary' fastqComplement(x) ## S4 method for signature 'Fast5Summary' fastq2D(x)
## S4 method for signature 'Fast5Summary' length(x) ## S4 method for signature 'Fast5Summary' readInfo(x) ## S4 method for signature 'Fast5Summary' eventData(x) ## S4 method for signature 'Fast5Summary' baseCalled(x) ## S4 method for signature 'Fast5Summary' fastq(x) ## S4 method for signature 'Fast5Summary,ANY,ANY,ANY' x[i] ## S4 method for signature 'Fast5Summary' fastqTemplate(x) ## S4 method for signature 'Fast5Summary' fastqComplement(x) ## S4 method for signature 'Fast5Summary' fastq2D(x)
x |
Object of class Fast5Summary |
i |
Vector defining index to subset by. |
An object of class Fast5Summary
length
: Returns the number of files read during creation of the object
readInfo
: Returns readInfo data.frame
eventData
: Returns eventData data.frame
baseCalled
: Returns baseCalled data.frame
fastq
: Returns ShortReadQ object stored in fastq slot.
[
: Subset object and return an object of the same class.
fastqTemplate
: Returns ShortReadQ object containing only template reads
fastqComplement
: Returns ShortReadQ object containing only complement reads
fastq2D
: Returns ShortReadQ object containing only 2D reads
readInfo
Object of class tibble. Contains five columns:
id - an integer key that allows use to match entries in the separate slots of this object.
file - Basename of the fast5 file the data was read from.
read - Read number from channel.
channel - channel.
mux - Specific pore that was used within the four that are assigned to a single channel. Should be in the range 1-4, but if this isn't available it will be 0.
rawData
Object of class tibble. Intended to hold raw signal data
although reading this is currently not implemented in IONiseR
.
eventData
Object of class tibble. Holds summary of events data prior to base calling. Contains five columns:
id - an integer key that allows use to match entries in the separate slots of this object.
start_time - time in seconds after the run started that this reading began.
duration - time in seconds the reading lasted.
num_events - the number of events that were recorded as part of this reading.
median_signal - median of the recorded signals for this set of events.
baseCalled
Object of class tibble. For the most part contains similar data to the @eventData slot, the base called data is derived from it.
id - an integer key that allows use to match entries in the separate slots of this object.
start_time - time in seconds after the run started that this reading began.
duration - time in seconds the reading lasted.
num_events - the number of events that were recorded as part of this reading.
strand - can be either 'template' or 'complement'
full_2D - boolean value specifying whether the read forms part of a 2D pair. If TRUE the FASTQ data for the template, complement and 2D read will be available in the @fastq slot.
fastq
Object of class ShortReadQ. This slot contains all reads (template, complement and 2D). The read names take the form NUM_STRAND, where NUM matches with the id column in the other slots and STRAND indicates whether the read is template, complement or 2D.
versions
A list intended to store the version of IONiseR that was used to create the object. (May be extended in the future to include the version of MinKNOW that the original fast5 files were processed, if this can be determined accurately.)
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') length( s.typhi.rep2 ) }
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') length( s.typhi.rep2 ) }
This function provides direct access to the FASTQ entries held within fast5 files. If you are only interested in getting hold of the base called reads, and don't require any raw-signal or event information, use this function. Given a vector of fast5 files, the FASTQ entries will be combined and up to three gzip compressed FASTQ will be created - one for each of the template, complement and 2D strands depending upon what is available in the input files.
fast5toFastq(files, strand = "all", fileName = NULL, outputDir = NULL, ncores = 1)
fast5toFastq(files, strand = "all", fileName = NULL, outputDir = NULL, ncores = 1)
files |
Character vector of fast5 files to be read. |
strand |
Character vector specifying the strand to extract. Can take any combination of the following options: "template", "complement", "2D", "all", "both". |
fileName |
Stem for the name of the names of the output file names. The appropriate strand will be appended to each file e.g. fileName_complement.fq.gz or fileName_template.fq.gz |
outputDir |
Directory output files should be written to. |
ncores |
Specify the number of CPU cores that should be used to process the files. Currently this seems to be more IO bound than CPU, so there is little benefit achieved by using a high number of cores. |
No value returned. Run for the side effect of writing the FASTQ files to disk.
## Not run: fast5files <- list.files('/foo/bar/', pattern = '.fast5$') summaryData <- readFast5Summary(fast5files) ## End(Not run)
## Not run: fast5files <- list.files('/foo/bar/', pattern = '.fast5$') summaryData <- readFast5Summary(fast5files) ## End(Not run)
This generic function accesses the fastq slot stored in an object derived from the Fast5Summary class.
fastq(x)
fastq(x)
x |
Object of class |
A ShortReadQ object
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') fastq( s.typhi.rep2 ) }
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') fastq( s.typhi.rep2 ) }
This generic function accesses the fastq slot stored in an object derived from the Fast5Summary class, and returns only the 2D reads.
fastq2D(x)
fastq2D(x)
x |
Object of class |
A ShortReadQ object
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') fastq2D( s.typhi.rep2 ) }
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') fastq2D( s.typhi.rep2 ) }
This generic function accesses the fastq slot stored in an object derived from the Fast5Summary class, and returns only the complement reads.
fastqComplement(x)
fastqComplement(x)
x |
Object of class |
A ShortReadQ object
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') fastqComplement( s.typhi.rep2 ) }
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') fastqComplement( s.typhi.rep2 ) }
This generic function accesses the fastq slot stored in an object derived from the Fast5Summary class, and returns only the template reads.
fastqTemplate(x)
fastqTemplate(x)
x |
Object of class |
A ShortReadQ object
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') fastqTemplate( s.typhi.rep2 ) }
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') fastqTemplate( s.typhi.rep2 ) }
IONiseR provides tools for the quality assessment of Oxford Nanopore MinION data. It extracts summary statistics from a set of fast5 files and can be used either before or after base calling. In addition to standard summaries of the read-types produced, it provides a number of plots for visualising metrics relative to experiment run time or spatially over the surface of a flowcell.
Creates a plot representing the layout of a MinION flow cell. Each circle represents an idividual channel with the intensity relecting the total kilobases of sequence produced. This only considers reads marked as template or complement, 2D reads are ignored as they are generated from the former two.
layoutPlot(summaryData, attribute = NULL)
layoutPlot(summaryData, attribute = NULL)
summaryData |
Object of class Fast5Summary. |
attribute |
Character string indicating what to plot. Currently accepted values are: "nreads", "kb", "signal". |
Returns an object of gg
representing the plot.
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') layoutPlot( s.typhi.rep2, attribute = 'nreads' ) layoutPlot( s.typhi.rep2, attribute = 'kb' ) }
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') layoutPlot( s.typhi.rep2, attribute = 'nreads' ) layoutPlot( s.typhi.rep2, attribute = 'kb' ) }
Creates a plot representing the layout of a MinION flow cell. Each circle
represents an individual channel with the intensity relecting a specified
sequencing metric. This function is a more generalised version of
layoutPlot
, allowing the user to map any value the like on
the channel layout.
muxHeatmap(data, zValue)
muxHeatmap(data, zValue)
data |
A data.frame. Should have at least two columns, one of which has the name 'channel'. |
zValue |
Character string specifying the name of the column to be used for the colour scaling. |
Returns an object of gg
representing the plot.
Plot the number of active channels for each minute of run time
plotActiveChannels(summaryData)
plotActiveChannels(summaryData)
summaryData |
Object of class Fast5Summary. |
Returns an object of class gg
representing the plot.
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') plotActiveChannels( s.typhi.rep2 ) }
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') plotActiveChannels( s.typhi.rep2 ) }
For each read, the ratio between the total number of bases called in the read (template and complement strand, but not 2D composite) and the time spent in the pore is calculated. This is then plotted against the time the read entered the pore, allow us to assess whether the rate at which callable bases are read changes during the experiment run time.
plotBaseProductionRate(summaryData)
plotBaseProductionRate(summaryData)
summaryData |
Object of class Fast5Summary. |
This is likely very similar to plotEventRate
, although one may
find that large number of events occur that can not be base called,
resulting in a difference between these two plots.
Returns an object of class gg
representing the plot.
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') plotBaseProductionRate( s.typhi.rep2 ) }
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') plotBaseProductionRate( s.typhi.rep2 ) }
Plots the median recorded current for each fast5 file against the time at which the recording began.
plotCurrentByTime(summaryData)
plotCurrentByTime(summaryData)
summaryData |
Object of class Fast5Summary. |
Returns an object of class gg
representing the plot.
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') plotCurrentByTime( s.typhi.rep2 ) }
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') plotCurrentByTime( s.typhi.rep2 ) }
For each read, the ratio between the number of events comprising the read and the time spent in the pore is calculated. This is then plotted against the time the read entered the pore, allow us to assess whether the rate at which events occur changes during the experiment run time.
plotEventRate(summaryData)
plotEventRate(summaryData)
summaryData |
Object of class Fast5Summary. |
Returns an object of class gg
representing the plot.
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') plotEventRate( s.typhi.rep2 ) }
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') plotEventRate( s.typhi.rep2 ) }
Plots
plotKmerFrequencyCorrelation(summaryData, kmerLength = 5, groupedMinutes = 10, only2D = TRUE)
plotKmerFrequencyCorrelation(summaryData, kmerLength = 5, groupedMinutes = 10, only2D = TRUE)
summaryData |
Object of class Fast5Summary. |
kmerLength |
Specifies the length of kmers to compare. Defaults to 5 given the current pentamer reading nature of the nanopores. |
groupedMinutes |
Defines how many minutes each grouping of reads spans. |
only2D |
Logical. If TRUE kmers are computed for only full 2D reads. If FALSE 2D reads are ignored and all available template and complement strands are used. |
Returns an object of class gg
representing the plot.
if( require(minionSummaryData) ) { data(s.typhi.rep3, package = 'minionSummaryData') plotKmerFrequencyCorrelation( s.typhi.rep3, only2D = FALSE ) }
if( require(minionSummaryData) ) { data(s.typhi.rep3, package = 'minionSummaryData') plotKmerFrequencyCorrelation( s.typhi.rep3, only2D = FALSE ) }
Plot the accumulation of reads over the duration of the experiment.
plotReadAccumulation(summaryData)
plotReadAccumulation(summaryData)
summaryData |
Object of class Fast5Summary. |
Returns an object of class gg
representing the plot.
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') plotReadAccumulation( s.typhi.rep2 ) }
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') plotReadAccumulation( s.typhi.rep2 ) }
Generates a bar plot showing the breakdown of read types found in a set of fast5 files. There is a strict hierarchy to the types of read that can be found in a fast5 file. A full 2D read requires both a complement and template strand to have been read correctly. Similarly, a complement strand can only be present if the template was read successfully. Finally, you can encounter a file containing now called bases on either strand. Here we visualise the total number of fast5 files, along with the counts containing each of the categories above. For an ideal dataset all four bars will be the same height. This is unlikely, but the drop between bars can give some indication of data quality.
plotReadCategoryCounts(summaryData)
plotReadCategoryCounts(summaryData)
summaryData |
Object of class Fast5Summary. |
Returns an object of class gg
representing the plot.
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') plotReadCategoryCounts( s.typhi.rep2 ) }
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') plotReadCategoryCounts( s.typhi.rep2 ) }
Generates a box plot showing the mean base quality for each read, broken down into the three categories of read type that can be found in a fast5 file.
plotReadCategoryQuals(summaryData)
plotReadCategoryQuals(summaryData)
summaryData |
Object of class Fast5Summary. |
Returns an object of class gg
representing the plot.
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') plotReadCategoryQuals( s.typhi.rep2 ) }
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') plotReadCategoryQuals( s.typhi.rep2 ) }
Plots the median recorded current for each fast5 file against the time at which the recording began.
plotReadTypeProduction(summaryData, groupedMinutes = 10)
plotReadTypeProduction(summaryData, groupedMinutes = 10)
summaryData |
Object of class Fast5Summary. |
groupedMinutes |
Integer specifying how many minutes of runtime should be grouped together. |
Returns an object of class gg
representing the plot.
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') plotReadTypeProduction( s.typhi.rep2 ) }
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') plotReadTypeProduction( s.typhi.rep2 ) }
Basecalling procedures performed on fast5 files generally leave a
log file entry recording how far through the pipeline the file
proceeded. This function will extract this information as a
single string. It can be printed in a more readable format
using the cat
function.
readFast5Log(file)
readFast5Log(file)
file |
Character vector of fast5 file to be read. |
Character vector containing the log information.
NULL
if no log is found.
fast5file <- system.file('extdata', 'example.fast5', package = "IONiseR") log <- readFast5Log(fast5file) cat(log)
fast5file <- system.file('extdata', 'example.fast5', package = "IONiseR") log <- readFast5Log(fast5file) cat(log)
Reads one or more fast5 files and collects summary information about them.
readFast5Summary(files)
readFast5Summary(files)
files |
Character vector of fast5 files to be read. |
Currently this function assumes all files passed to it come from the same sequencing run. It makes no effort to check for alternative file names or the like. If files from multiple runs are passed to it they will be collated together and any analysis performed on them will represent the mixture of both experiments.
Object of class Fast5Summary
## Not run: fast5files <- list.files('/foo/bar/', pattern = '.fast5$') summaryData <- readFast5Summary(fast5files) ## End(Not run)
## Not run: fast5files <- list.files('/foo/bar/', pattern = '.fast5$') summaryData <- readFast5Summary(fast5files) ## End(Not run)
This generic function accesses the readInfo slot stored in an object derived from the Fast5Summary class.
readInfo(x)
readInfo(x)
x |
Object of class |
A data.frame with 5 columns
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') readInfo( s.typhi.rep2 ) }
if( require(minionSummaryData) ) { data(s.typhi.rep2, package = 'minionSummaryData') readInfo( s.typhi.rep2 ) }