Formats#
Quantification output#
The quantification results are provided in the format of a tab-separated values (TSV) file. The file will
be named based on whatever output stem is provided to the piscem-infer
command via the -o
flag.
Specifically, with and output name of res
, the quantification results will be written to a file named
res.quant
.
The file contains a single header row, and consists of 4 columns with the column names and descriptions as specified below:
target_name |
len |
eeln |
tpm |
ecount |
|
---|---|---|---|---|---|
The identifer of the quantified target |
The lenth (in nucleotides) of the target |
The expected effective length of the target | The abundance of the target in transcripts per million (TPM) |
The expected number of fragments assigned to the target |
Fragment length distribution#
Though not directly intended for use by end-users, the fragment length distribution obtained from the
RAD
file used by piscem-infer
, and used to compute the expected effective lengths (eelen) of each
target, is output in a Apache Parquet format file named res.fld.pq
, where
res
is the output stem provided to the -o
option of piscem-infer
.
The Parquet file can be loaded using any of the commonly-available libraries for reading this common format in languages like Python and R.
Inferential replicates#
If piscem-infer
was run with a --num-bootstraps
value greater than 0, then a file called res.infreps.pq
will be written (again, where res
is the output stem provided to the -o
option of piscem-infer
). The
infreps
file is also a Parquet format file, and it stores in each column the value of a single inferential
replicate (i.e. an abundance for each target). The number of columns is equal to the number of requested bootstraps,
and the number of rows is equal to the number of targets represented in the header of the RAD
file. These
inferential replicates can be used to assess the inferential uncertainty in the provided quantification estimates.
In other words, variance across a row represents uncertainty in the estimated abundance of the corresponding target.
While the ability to load up this data frame easily in Python or R makes this information readily available to
end-users, its primary purpose is to be used in uncertainy-aware methods for differential analysis (such as
swish, to which we hope
to add piscem-infer
support soon).
Meta information about the run#
piscem-infer
will also collect and output metadata about each run. This information is written to a file
res.meta_info.json
, where res
is the output stem provided to the -o
option of piscem-infer
.
The information in this file includes statistics about what was encoded in the input RAD
file, how
piscem-infer
itself was invoked, as well as relevant information about the provenance of the reference
sequence against which the reads were mapped (if the piscem
index, itself, was built to contain this
information). Below, is an illustrative example of the meta information for a single run (the one
outlined in the usage example). In general, between versions of piscem-infer
, new
fields may be added to the meta-information that is written out, but we intend to be cautious about removing
or renaming existing fields, since downstream analyses may come to depend on them. That being said, if there is
information in this file that you are using downstream, or there is other information not in this file that you would
like to have provided, please let us know.
{
"mapped_frag_stats": {
"filtered_ori_count": [
0,
0,
0,
0,
0,
0,
0
],
"mapped_ori_count": [
0,
255966,
282047,
10614009,
10635715,
0,
0
],
"num_mapped_reads": 21325482,
"tot_mappings": 125259802
},
"num_bootstraps": 0,
"num_targets": 252797,
"quant_opts": {
"convergence_thresh": 0.001,
"fld_mean": null,
"fld_sd": null,
"input": "/mnt/scratch7/rob/dbg_index_tests/piscem/SRR1039508_mapped",
"lib_type": "InwardUnstranded",
"max_iter": 1500,
"num_bootstraps": 0,
"num_threads": 16,
"output": "quant/SRR1039508"
},
"signatures": {
"sha256_names": "e3f718f453cd3a749c2862a2c0a86ab6baf50529a5b73bf9cbfb95df31847542",
"sha256_seqs": "fd88296600b98e1333273e657145662fd489a41f40296d387ab4265db2dc0f1c",
"sha512_names": "df3e5b8133dc55ad4af4a9faab6bf34a031d3b8284b8f7e19526088a033cce1ddda1ebef5731fe4b094842ed48536b0095739af8b4ced3fcfc897be0bdda5e23",
"sha512_seqs": "373e06811ade1419f3640c476fbb8fd1e75b4eae2e43c189fb2f9da3b6f011d44cbf5027d9e48337a60fd2d6ec1d016fa9f23fce4bc01bf60690e81577d4c3de"
}
}