GRADitude's subcommands
Pre-process sequencing data
After the sequencing the resulting reads they have to be mapped against the reference genome. The users can use all the mapping tools available. However we recommend the use of READemption tool (Förstner et al., 2014, Bioinformatics). This will help you to have all the requested input files without prior elaboration.
After the mapping two tables are relevant to proceed with the usage of GRADitude tool:
1) gene quantification table generated counting the number of overlapped reads for each of the gene 2) read alignment stats table that lists many statistics including the ERCC read counts.
create
$ create
A subcommand that generates the GRADitude folder including all subfolders. Please move or copy the required files into the input folders.
extract_gene_columns
$ extract_gene_columns
This subcommand allow to get names, IDs or any information derived from the "Attribute column". It gives as output a table with columns containing the information required.
- Basic arguments
usage: graditude extract_gene_columns [-h] --feature_count_table FEATURE_COUNT_TABLE --name_columns NAME_COLUMNS [NAME_COLUMNS ...]
--output_table OUTPUT_TABLE
optional arguments:
-h, --help show this help message and exit
basic arguments:
--feature_count_table FEATURE_COUNT_TABLE, -f FEATURE_COUNT_TABLE
gene quantification table
--name_columns NAME_COLUMNS [NAME_COLUMNS ...], -names NAME_COLUMNS [NAME_COLUMNS ...]
this parameter allows the user to specify the fields you would like to extract from the attribute column
--output_table OUTPUT_TABLE, -o OUTPUT_TABLE
the name of the output table
min_row_sum_ercc (filter the table)
min_row_sum_ercc
A subcommand, specific for the sequencing data, that filters the ERCC-reads table based on the minimum row sum. It calculates the sum of all the ERCC row-wise and discard the ones that they don't reach the specified threshold.
- Basic arguments
usage: graditude min_row_sum_ercc [-h] --ref_feature_count_table
REF_FEATURE_COUNT_TABLE
[--min_row_sum MIN_ROW_SUM]
--filtered_ref_feature_count_table
FILTERED_REF_FEATURE_COUNT_TABLE
optional arguments:
-h, --help show this help message and exit
basic arguments:
--ref_feature_count_table REF_FEATURE_COUNT_TABLE, -r REF_FEATURE_COUNT_TABLE
ERCC reads table
--min_row_sum MIN_ROW_SUM, -m MIN_ROW_SUM
Specify the threshold we would like to apply
--filtered_ref_feature_count_table FILTERED_REF_FEATURE_COUNT_TABLE, -fr FILTERED_REF_FEATURE_COUNT_TABLE
Filtered ERCC reads table as output
min_row_sum (filter the table)
min_row_sum
A subcommand, specific for the sequencing data that filters the gene quantification table based on the minimum row sum. It calculates the sum of all the ERCC row-wise and discard the ones that they don't reach the specified threshold.
- Basic arguments
usage: graditude min_row_sum [-h] --feature_count_table FEATURE_COUNT_TABLE
--feature_count_start_column
FEATURE_COUNT_START_COLUMN --min_row MIN_ROW
--output_file OUTPUT_FILE
basic arguments:
--feature_count_table FEATURE_COUNT_TABLE, -f FEATURE_COUNT_TABLE
Gene quantification table
--feature_count_start_column FEATURE_COUNT_START_COLUMN, -fc FEATURE_COUNT_START_COLUMN
Specify the number of the column with the first
fraction
--feature_count_end_column FEATURE_COUNT_END_COLUMN, -fe FEATURE_COUNT_END_COLUMN
Specify the number of the last fraction we would like
to consider in the analysis
--min_row MIN_ROW, -m MIN_ROW
Specify the threshold we would like to apply
--output_file OUTPUT_FILE, -o OUTPUT_FILE
Filtered table as output
drop_column (filter the table)
drop_column
This subcommand is specific for the sequencing data and it can be use to drop a specific column we would not like to don't consider in the downstream analysis. For example it can be used to drop the Lysate column.
usage: graditude drop_column [-h] --feature_count_table FEATURE_COUNT_TABLE
--column_to_drop COLUMN_TO_DROP --output_file
OUTPUT_FILE
basic arguments:
-h, --help show this help message and exit
--feature_count_table FEATURE_COUNT_TABLE, -f FEATURE_COUNT_TABLE
gene quantification table or ERCC-reads table
--column_to_drop COLUMN_TO_DROP [COLUMN_TO_DROP ...], -c COLUMN_TO_DROP [COLUMN_TO_DROP ...]
this parameter specify the name of the column/s you would like to drop
--output_file OUTPUT_FILE, -o OUTPUT_FILE
name of the filtered table as output
move_columns
move_columns
usage: graditude move_columns [-h]
--feature_count_table FEATURE_COUNT_TABLE
--number_of_columns NUMBER_OF_COLUMNS
--output_file OUTPUT_FILE
optional arguments:
-h, --help show this help message and exit
--feature_count_table FEATURE_COUNT_TABLE, -f FEATURE_COUNT_TABLE
gene quantification table or ERCC-reads table
--number_of_columns NUMBER_OF_COLUMNS, -n NUMBER_OF_COLUMNS
this parameter specify the number of the column you would like to move. For example -1 will move the last column in the first position
--output_file OUTPUT_FILE, -o OUTPUT_FILE
name of the new table as output
selecting_specific_features (filter the table)
selecting_specific_features
This subcommand extract from a table specific feature, such as sRNAs or mRNAs.
usage: graditude selecting_specific_features [-h] --normalized_table
NORMALIZED_TABLE
--feature_count_start_column
FEATURE_COUNT_START_COLUMN
--feature_count_end_column
FEATURE_COUNT_END_COLUMN
--features FEATURES
[FEATURES ...] --output_file
OUTPUT_FILE
optional arguments:
-h, --help show this help message and exit
--normalized_table NORMALIZED_TABLE, -n NORMALIZED_TABLE
Normalized table
--feature_count_start_column FEATURE_COUNT_START_COLUMN, -fc FEATURE_COUNT_START_COLUMN
Specify the number of the column with the first
fraction
--feature_count_end_column FEATURE_COUNT_END_COLUMN, -fe FEATURE_COUNT_END_COLUMN
Specify the number of the last fraction we would like
to consider in the analysis
--features FEATURES [FEATURES ...], -f FEATURES [FEATURES ...]
This parameter specify the features we would like to
filter of. It can be a single one or a list
--output_file OUTPUT_FILE, -o OUTPUT_FILE
Filtered table
robust_regression
robust_regression
This subcommand, specific for the sequencing data, compares the ERCC concentration in mix (it could be one or two, depending on the experiment) with the ERCC read counts.
One of the reason behind the use of this particular regression is to find outliers that could generate bad results. It first finds the regression for each of the 21 fractions separately , it plots them, including inliers and outliers, and then it finds the outliers ERCC in common and discard them. The outputs generated are 21 plots showing the regression for each of the fractions and a new table containing only the inliers ERCCs.
- Basic arguments
usage: graditude robust_regression [-h] --ref_feature_count_table
REF_FEATURE_COUNT_TABLE
--concentration_table CONCENTRATION_TABLE
--number_of_outliers NUMBER_OF_OUTLIERS
[--number_of_ercc_in_common NUMBER_OF_ERCC_IN_COMMON]
--used_mix USED_MIX --output_table
OUTPUT_TABLE
basic arguments:
--ref_feature_count_table REF_FEATURE_COUNT_TABLE, -r REF_FEATURE_COUNT_TABLE
Filtered ERCC reads table
--concentration_table CONCENTRATION_TABLE, -c CONCENTRATION_TABLE
ERCC concentration table
--number_of_outliers NUMBER_OF_OUTLIERS, -n NUMBER_OF_OUTLIERS
Number of outliers
--number_of_ercc_in_common NUMBER_OF_ERCC_IN_COMMON, -nc NUMBER_OF_ERCC_IN_COMMON
Number of ERCC considered outliers in common within
the different fractions
--used_mix USED_MIX, -mix USED_MIX
This parameter as to be used to define which ERCC mix
have been used in the experiment, in case of mix1 and
mix2 the --mix is either 3 or 4
--output_table OUTPUT_TABLE, -o OUTPUT_TABLE
Output table with the inliers ERCC
normalize
normalize
This normalization is specific for the sequencing data. It normalizes the gene quantification table, that contains all the detectable transcripts using the ERCC read counts table, filtered or not. This normalization is based on the size factor calculation of DESeq2 (Anders et al., 2010, Bioinformatics). The purpose of the size factor is to render counts from different samples, which may have been sequenced to different depths, comparable. This normalization methods uses the median of the ratios of observed counts and the denominator is considered as a reference sample obtained by taking the geometric mean across samples.
- Basic arguments
usage: graditude normalize [-h] --feature_count_table FEATURE_COUNT_TABLE
[--feature_count_start_column FEATURE_COUNT_START_COLUMN]
[--feature_count_end_column FEATURE_COUNT_END_COLUMN]
--ref_feature_count_table REF_FEATURE_COUNT_TABLE
[--ref_feature_count_start_column REF_FEATURE_COUNT_START_COLUMN]
[--ref_feature_count_end_column REF_FEATURE_COUNT_END_COLUMN]
--normalized_table NORMALIZED_TABLE
[--size_factor_table SIZE_FACTOR_TABLE]
basic arguments:
--feature_count_table FEATURE_COUNT_TABLE, -f FEATURE_COUNT_TABLE
Filtered gene quantification table
--feature_count_start_column FEATURE_COUNT_START_COLUMN, -fc FEATURE_COUNT_START_COLUMN
Specify the number of the column with the first
fraction
--feature_count_end_column FEATURE_COUNT_END_COLUMN, -fe FEATURE_COUNT_END_COLUMN
Specify the number of the last fraction we would like
to consider in the analysis
--ref_feature_count_table REF_FEATURE_COUNT_TABLE, -r REF_FEATURE_COUNT_TABLE
ERCC table with the ERCC read-counts
--ref_feature_count_start_column REF_FEATURE_COUNT_START_COLUMN, -rc REF_FEATURE_COUNT_START_COLUMN
Specify the number of the column with the first
fraction for the ERCC table
--ref_feature_count_end_column REF_FEATURE_COUNT_END_COLUMN, -re REF_FEATURE_COUNT_END_COLUMN
Specify the number of the last fraction we would like
to consider in the analysis for the ERCC table
--normalized_table NORMALIZED_TABLE, -o NORMALIZED_TABLE
Table normalized
--size_factor_table SIZE_FACTOR_TABLE, -s SIZE_FACTOR_TABLE
Table with all the size factor
scaling (scale the data)
scaling
This subcommand can be used for the protein and the sequencing data. It takes a table, that can be the normalized or the raw one, as input and scales them using different methods. The default is normalization to the maximum value but one can modify the behaviour by using parameters such as normalize to range, log10 or log2. The scaled table generated by this subcommand can be the input for plotting the in-gradient behavior of a transcript or a protein.
- Basic arguments
usage: graditude scaling [-h] --feature_count_table FEATURE_COUNT_TABLE
--feature_count_start_column
FEATURE_COUNT_START_COLUMN
[--pseudo_count PSEUDO_COUNT] --scaling_method
{no_normalization,normalized_to_max,normalized_to_range,log10,log2}
--scaled_table SCALED_TABLE
basic arguments:
--feature_count_table FEATURE_COUNT_TABLE, -f FEATURE_COUNT_TABLE
Normalized gene quantification table or raw tables
--feature_count_start_column FEATURE_COUNT_START_COLUMN, -fc FEATURE_COUNT_START_COLUMN
Specify the number of the column with the first
fraction
--feature_count_end_column FEATURE_COUNT_END_COLUMN, -fe FEATURE_COUNT_END_COLUMN
Specify the number of the last fraction we would like
to consider in the analysis
--pseudo_count PSEUDO_COUNT, -p PSEUDO_COUNT
the pseudocount is a number that will always be added
to each value; Adding this number avoid to have
mathematical operation with zeros
--scaling_method {no_normalization,normalized_to_max,normalized_to_range,log10,log2}, -sm {no_normalization,normalized_to_max,normalized_to_range,log10,log2}
Define the scaling methods you would like to apply.
The user can choose between a normalization to the
maximum value, to a range, a log10 and log 2
normalization. Alternatively the user can decide to
not use any kind of normalization
--scaled_table SCALED_TABLE, -o SCALED_TABLE
Scaled table as output
clustering_elbow (find number of clusters)
clustering_elbow
This subcommand implements a method designed to find the appropriate number of clusters in a data set. For several clustering algorithms, such as k-means or hierarchical clustering, the user has to specifies the number of expected clusters as a parameter. In the elbow method the user gives a range of clusters, usually from 0 to 10, and plots the reduction in variance. The resulting curve contains an elbow that indicates the optimal number of clusters.
- Basic arguments
usage: graditude clustering_elbow [-h] --feature_count_table
FEATURE_COUNT_TABLE
--feature_count_start_column
FEATURE_COUNT_START_COLUMN
--feature_count_end_column
FEATURE_COUNT_END_COLUMN
--min_number_of_clusters
MIN_NUMBER_OF_CLUSTERS
--max_number_of_clusters
MAX_NUMBER_OF_CLUSTERS --output_plots1
OUTPUT_PLOTS1 --output_plots2 OUTPUT_PLOTS2
basic arguments:
--feature_count_table FEATURE_COUNT_TABLE, -f FEATURE_COUNT_TABLE
Filtered gene quantification table
--feature_count_start_column FEATURE_COUNT_START_COLUMN, -fc FEATURE_COUNT_START_COLUMN
This parameter specified the number of the column with
the first fraction
--feature_count_end_column FEATURE_COUNT_END_COLUMN, -fe FEATURE_COUNT_END_COLUMN
Specify the number of the last fraction we would like
to consider in the analysis
--min_number_of_clusters MIN_NUMBER_OF_CLUSTERS, -min MIN_NUMBER_OF_CLUSTERS
Minimum number of clusters that you want to represent
in the plot
--max_number_of_clusters MAX_NUMBER_OF_CLUSTERS, -max MAX_NUMBER_OF_CLUSTERS
Maximum number of clusters that you want to represent
in the plot
--output_plots1 OUTPUT_PLOTS1, -o1 OUTPUT_PLOTS1
Plot showing the average within cluster sum of squares
against the number of clusters
--output_plots2 OUTPUT_PLOTS2, -o2 OUTPUT_PLOTS2
Plot showing the percentage of variance explained
versus the number of clusters
clustering (cluster the data)
clustering
The subcommands can be used for the sequencing data. It takes the normalized gene quantification table or the raw one as input and return a table with a new column that declares the number of clusters as output. It needs the number of cluster and the algorithm to apply as parameter. So far the k-means, the hierarchical and the DBSCAN clustering algorithms have been included.
- Basic arguments
usage: graditude clustering [-h] --feature_count_table FEATURE_COUNT_TABLE
--feature_count_start_column
FEATURE_COUNT_START_COLUMN --number_of_clusters
NUMBER_OF_CLUSTERS [--pseudo_count PSEUDO_COUNT]
--clustering_methods
{k-means,DBSCAN,hierarchical_clustering}
[--epsilon EPSILON] [--min_samples MIN_SAMPLES]
--scaling_method
{no_normalization,normalized_to_max,normalized_to_range,log10,log2}
--output_file OUTPUT_FILE
basic arguments:
--feature_count_table FEATURE_COUNT_TABLE, -f FEATURE_COUNT_TABLE
This parameter specified the table we would like to
use. It can be the normalized or the raw table
--feature_count_start_column FEATURE_COUNT_START_COLUMN, -fc FEATURE_COUNT_START_COLUMN
This parameter specified the number of the column with
the first fraction
--pseudo_count PSEUDO_COUNT, -p PSEUDO_COUNT
The pseudocount represent a number that will always be
added to each value; Adding this number avoid to have
mathematical operation with zero
--clustering_methods {k-means,DBSCAN,hierarchical_clustering},
-cm {k-means,DBSCAN,hierarchical_clustering}
The user can choose between 3 clustering algorithm,
k-means clustering, hierarchical clustering and DB-
SCAN clustering
--scaling_method {no_normalization,normalized_to_max,normalized_to_range,log10,log2},
-sm {no_normalization,normalized_to_max,normalized_to_range,log10,log2}
Define the scaling methods you would like to apply.
The user can choose between a normalization to the
maximum value, to a range, a log10 and log 2
normalization. Alternatively the user can decide to
not use any kind of normalization
--output_file OUTPUT_FILE, -o OUTPUT_FILE
Output table with a new column containing the number
of clusters
- Additional arguments
additional arguments:
--epsilon EPSILON, -e EPSILON
This parameter is specific for the DBSCAN clustering
algorithm. It defines how close points should be in
order to be considered part of a cluster. only for
DBSCAN clustering
--min_samples MIN_SAMPLES, -ms MIN_SAMPLES
This parameter is specific for the DBSCAN clustering
algorithm and represent the minimum number of points
necessary to form a dense region
--number_of_clusters NUMBER_OF_CLUSTERS, -nc NUMBER_OF_CLUSTERS
This parameter specify the number of clusters, k
clustering_proteins (cluster the data)
clustering_proteins
This subcommand is specific for the protein data.
- Basic arguments
usage: graditude clustering_proteins [-h] --feature_count_table
FEATURE_COUNT_TABLE
--feature_count_start_column
FEATURE_COUNT_START_COLUMN
--feature_count_end_column
FEATURE_COUNT_END_COLUMN
--number_of_clusters NUMBER_OF_CLUSTERS
--clustering_methods
{k-means,DBSCAN,hierarchical_clustering}
[--epsilon EPSILON]
[--min_samples MIN_SAMPLES] --output_file
OUTPUT_FILE
basic arguments:
--feature_count_table FEATURE_COUNT_TABLE, -f FEATURE_COUNT_TABLE
Protein table
--feature_count_start_column FEATURE_COUNT_START_COLUMN, -fc FEATURE_COUNT_START_COLUMN
This parameter specify the number of the column with
the first fraction
--feature_count_end_column FEATURE_COUNT_END_COLUMN, -fe FEATURE_COUNT_END_COLUMN
Specify the number of the last fraction we would like
to consider in the analysis
--clustering_methods {k-means,DBSCAN,hierarchical_clustering}, -cm {k-means,DBSCAN,hierarchical_clustering}
The user can choose between 3 clustering algorithm,
k-means clustering, hierarchical clustering and DB-
SCAN clustering
--output_file OUTPUT_FILE, -o OUTPUT_FILE
Table with a cluster column
- Additional arguments
--epsilon EPSILON, -e EPSILON
This parameter is specific for the DBSCAN clustering
algorithm. It defines how close points should be in
order to be considered part of a cluster
--min_samples MIN_SAMPLES, -ms MIN_SAMPLES
This parameter is specific for the DBSCANclustering
algorithm and represent the minimum number of points
necessary to form a dense region
--number_of_clusters NUMBER_OF_CLUSTERS, -nc NUMBER_OF_CLUSTERS
This parameter specify the number of clusters, k. It
is required only when using k-means and hierarchical
clustering algorithm
t-sne (dimension reduction)
t_sne
To identify biochemically similar transcripts the t-SNE dimension reduction algorithm has been implemented. The t-SNE also known as t-distributed stochastic neighbor embedding, help us to visualize the data. In order to visualize interactlivi the data-sets we used the python library Bokeh and the JavaScript Callbacks one to navigate t the data-set.
- Basic arguments
usage: graditude t_sne [-h] --feature_count_table FEATURE_COUNT_TABLE
--feature_count_start_column FEATURE_COUNT_START_COLUMN
--perplexity PERPLEXITY
[--srna_list SRNA_LIST [SRNA_LIST ...]]
[--cluster_names CLUSTER_NAMES [CLUSTER_NAMES ...]]
[--url_link URL_LINK]
[--output_file1 OUTPUT_FILE1]
[--output_file2 OUTPUT_FILE2]
[--output_file3 OUTPUT_FILE3]
basic arguments:
--feature_count_table FEATURE_COUNT_TABLE, -f FEATURE_COUNT_TABLE
This parameter specified the table we would like to
use. It can be the normalized or the raw table
--feature_count_start_column FEATURE_COUNT_START_COLUMN, -fc FEATURE_COUNT_START_COLUMN
This parameter specified the number of the column with
the first fraction
--feature_count_end_column FEATURE_COUNT_END_COLUMN, -fe FEATURE_COUNT_END_COLUMN
Specify the number of the last fraction we would like
to consider in the analysis
--perplexity PERPLEXITY, -pp PERPLEXITY
The perplexity is useful tobalance the attention
between the global and the local aspects of data. It
is better to select a value between 5 and 50
--output_file1 OUTPUT_FILE1, -o1 OUTPUT_FILE1
Output plot colorized using clusters information
--output_file2 OUTPUT_FILE2, -o2 OUTPUT_FILE2
Output plot colorized using attributes information
--output_file3 OUTPUT_FILE3, -o3 OUTPUT_FILE3
Output plot colorized using a specific list
- Additional arguments
additional arguments:
--srna_list SRNA_LIST [SRNA_LIST ...], -list SRNA_LIST [SRNA_LIST ...]
This parameter allow the user to specify a list of
features or genes we would like to highlight in the
plot
--cluster_names CLUSTER_NAMES [CLUSTER_NAMES ...], -names CLUSTER_NAMES [CLUSTER_NAMES ...]
This parameter is required only if you provide a
specific list. It allows the user to specify the label
on the third plot
--url_link URL_LINK, -url URL_LINK
This parameter allowed to choose the website you would
like to open when clicking on a specific point in the
html plot
pca (dimension reduction)
pca
To identify biochemically similar transcripts the PCA dimension reduction algorithm has been implemented. The PCA also known as principal component analysis, help us to visualize the data. In order to visualize interactively the data sets we used the python library Bokeh and the JavaScript Callbacks one to navigate the data-set.
- Basic arguments
usage: graditude pca [-h] --feature_count_table FEATURE_COUNT_TABLE
--feature_count_start_column FEATURE_COUNT_START_COLUMN
--feature_count_end_column FEATURE_COUNT_END_COLUMN
[--srna_list_files SRNA_LIST_FILES [SRNA_LIST_FILES ...]]
[--cluster_names CLUSTER_NAMES [CLUSTER_NAMES ...]]
[--url_link URL_LINK]
--output_file_colorized_by_clusters
OUTPUT_FILE_COLORIZED_BY_CLUSTERS
--output_file_colorized_by_rna_class
OUTPUT_FILE_COLORIZED_BY_RNA_CLASS
--output_file_colorized_by_lists
OUTPUT_FILE_COLORIZED_BY_LISTS
basic arguments:
--feature_count_table FEATURE_COUNT_TABLE, -f FEATURE_COUNT_TABLE
This parameter specified the table we would like to
use. It can be the normalized or the raw table
--feature_count_start_column FEATURE_COUNT_START_COLUMN, -fc FEATURE_COUNT_START_COLUMN
This parameter specified the number of the column with
the first fraction
--feature_count_end_column FEATURE_COUNT_END_COLUMN, -fe FEATURE_COUNT_END_COLUMN
Specify the number of the last fraction we would like
to consider in the analysis
--output_file_colorized_by_clusters OUTPUT_FILE_COLORIZED_BY_CLUSTERS, -o1 OUTPUT_FILE_COLORIZED_BY_CLUSTERS
Output plot colorized using clusters information
--output_file_colorized_by_rna_class OUTPUT_FILE_COLORIZED_BY_RNA_CLASS, -o2 OUTPUT_FILE_COLORIZED_BY_RNA_CLASS
Output plot colorized using attributes information
--output_file_colorized_by_lists OUTPUT_FILE_COLORIZED_BY_LISTS, -o3 OUTPUT_FILE_COLORIZED_BY_LISTS
Output plot colorized using a specific list
- Additional arguments
additional arguments:
--srna_list_files SRNA_LIST_FILES [SRNA_LIST_FILES ...], -list SRNA_LIST_FILES [SRNA_LIST_FILES ...]
This parameter allow the user to specify a list of
features or genes we would like to highlight in the
plot
--cluster_names CLUSTER_NAMES [CLUSTER_NAMES ...], -names CLUSTER_NAMES [CLUSTER_NAMES ...]
This parameter is required only if you provide a
specific list. It allows the user to specify the label
on the third plot
--url_link URL_LINK, -url URL_LINK
This parameter allowed to choose the website you would
like to open when clicking on a specific point in the
html plot
umap (dimension reduction)
umap
To identify biochemically similar transcripts the PCA dimension reduction algorithm has been implemented. The PCA also known as principal component analysis, help us to visualize the data. In order to visualize interactively the data sets we used the python library Bokeh and the JavaScript Callbacks one to navigate the data-set.
- Basic arguments
usage: graditude umap [-h] --feature_count_table FEATURE_COUNT_TABLE
--feature_count_start_column FEATURE_COUNT_START_COLUMN
--feature_count_end_column FEATURE_COUNT_END_COLUMN
[--srna_list SRNA_LIST [SRNA_LIST ...]]
[--cluster_names CLUSTER_NAMES [CLUSTER_NAMES ...]]
[--n_neighbors N_NEIGHBORS] [--nmin_dist NMIN_DIST]
[--url_link URL_LINK]
[--output_file1 OUTPUT_FILE1]
[--output_file2 OUTPUT_FILE2]
[--output_file3 OUTPUT_FILE3]
basic arguments:from sklearn.decomposition import PCA
--feature_count_table FEATURE_COUNT_TABLE, -f FEATURE_COUNT_TABLE
This parameter specified the table we would like to
use. It can be the normalized or the raw table
--feature_count_start_column FEATURE_COUNT_START_COLUMN, -fc FEATURE_COUNT_START_COLUMN
This parameter specify the number of the column with
the first fraction
--feature_count_end_column FEATURE_COUNT_END_COLUMN, -fe FEATURE_COUNT_END_COLUMN
Specify the number of the last fraction we would like
to consider in the analysis
--n_neighbors N_NEIGHBORS, -n_neighbors N_NEIGHBORS
This parameter helps to balances local vs global
structure in the data.
--nmin_dist NMIN_DIST, -nmin_dist NMIN_DIST
This parameter controls how close are the points
together.Smaller values are useful if you want to use
the UMAP to cluster data and larger values are good to
preserve the overall structure
--output_file1 OUTPUT_FILE1, -o1 OUTPUT_FILE1
Output plot colorized using clusters information
--output_file2 OUTPUT_FILE2, -o2 OUTPUT_FILE2
Output plot colorized using attributes information
--output_file3 OUTPUT_FILE3, -o3 OUTPUT_FILE3
Output plot colorized using a specific list
- Additional arguments
additional arguments:
--srna_list_files SRNA_LIST_FILES [SRNA_LIST_FILES ...], -list SRNA_LIST_FILES [SRNA_LIST_FILES ...]
This parameter allow the user to specify a list of
features or genes we would like to highlight in the
plot
--cluster_names CLUSTER_NAMES [CLUSTER_NAMES ...], -names CLUSTER_NAMES [CLUSTER_NAMES ...]
This parameter is required only if you provide a
specific list. It allows the user to specify the label
on the third plot
--url_link URL_LINK, -url URL_LINK
This parameter allowed to choose the website you would
like to open when clicking on a specific point in the
html plot
plot_kinetics (plot the in-gradient behavior)
plot_kinetics
This subcommand is useful to better visualize the behavior of a specific transcript or protein within the gradient. One foundation of the Grad-seq analysis is that the kinetic of molecule in the fractionations allows the reconstruction of the sedimentation profiles of all detectable RNA and proteins.
- Basic arguments
usage: graditude plot_kinetics [-h] --feature_count_table FEATURE_COUNT_TABLE
--feature_count_start_column
FEATURE_COUNT_START_COLUMN
--feature_count_end_column
FEATURE_COUNT_END_COLUMN --gene_name GENE_NAME
[--output_format {html,pdf}]
basic arguments:
--feature_count_table FEATURE_COUNT_TABLE, -f FEATURE_COUNT_TABLE
This parameter specified the table we would like to
use. It can be the normalized or the raw table
--feature_count_start_column FEATURE_COUNT_START_COLUMN, -fc FEATURE_COUNT_START_COLUMN
Specify the number of the column with the first
fraction
--feature_count_end_column FEATURE_COUNT_END_COLUMN, -fe FEATURE_COUNT_END_COLUMN
Specify the number of the last fraction we would like
to consider in the analysis
--gene_name GENE_NAME, -gene GENE_NAME
With this parameter you can specify the name of the
gene or the proteinyou would like to explore
--output_format {html,pdf}, -format {html,pdf}
You can use this parameter tospecify in which format
you would like tosave your plot
heatmap (plot the heatmap)
heatmap
This subcommand is useful to better visualize the in-gradient behavior of a larger group of transcripts or proteins.
- Basic arguments
usage: graditude heatmap [-h] --feature_count_table FEATURE_COUNT_TABLE
[--feature_count_start_column FEATURE_COUNT_START_COLUMN]
[--feature_count_end_column FEATURE_COUNT_END_COLUMN]
--y_label Y_LABEL --output_file OUTPUT_FILE
basic arguments:
--feature_count_table FEATURE_COUNT_TABLE, -f FEATURE_COUNT_TABLE
Gene quantification table or protein table
--feature_count_start_column FEATURE_COUNT_START_COLUMN, -fc FEATURE_COUNT_START_COLUMN
Specify the number of the column with the first
fraction
--feature_count_end_column FEATURE_COUNT_END_COLUMN, -fe FEATURE_COUNT_END_COLUMN
Specify the number of the last fraction we would like
to consider in the analysis
--y_label Y_LABEL, -label Y_LABEL
This parameter allow you to specify the label you
would like to visualize on the y-axis
--output_file OUTPUT_FILE, -o OUTPUT_FILE
Plot as output
silhouette_analysis (clustering)
silhouette_analysis
This subcommand can be used to interpret the distance between clusters. It is useful to see if the number of clusters (k) you have chosen is correct for the data set.
- Basic arguments
usage: graditude silhouette_analysis [-h] --feature_count_table
FEATURE_COUNT_TABLE
--feature_count_start_column
FEATURE_COUNT_START_COLUMN
--feature_count_end_column
FEATURE_COUNT_END_COLUMN
--min_number_of_clusters
MIN_NUMBER_OF_CLUSTERS
--max_number_of_clusters
MAX_NUMBER_OF_CLUSTERS
basic arguments:
--feature_count_table FEATURE_COUNT_TABLE, -c FEATURE_COUNT_TABLE
Gene quantification table or Protein table
--feature_count_start_column FEATURE_COUNT_START_COLUMN, -fc FEATURE_COUNT_START_COLUMN
This parameter specify the number of the column with
the first fraction
--feature_count_end_column FEATURE_COUNT_END_COLUMN, -fe FEATURE_COUNT_END_COLUMN
Specify the number of the last fraction we would like
to consider in the analysis
--min_number_of_clusters MIN_NUMBER_OF_CLUSTERS, -min MIN_NUMBER_OF_CLUSTERS
Minimum number of clusters that you want to represent
in the plot
--max_number_of_clusters MAX_NUMBER_OF_CLUSTERS, -max MAX_NUMBER_OF_CLUSTERS
Maximum number of clusters that you want to represent
in the plot
correlation_specific_gene (correlation coefficient)
correlation_specific_gene
This subcommand is useful if you have a gene or a protein of interest and you would like to predict new interactions. The assumption is that proteins or genes with an high correlation coefficients might interact.
- Basic arguments
usage: graditude correlation_specific_gene [-h] --feature_count_table
FEATURE_COUNT_TABLE
--feature_count_start_column
FEATURE_COUNT_START_COLUMN
[--feature_count_end_column FEATURE_COUNT_END_COLUMN]
--name_column_with_genes_name
NAME_COLUMN_WITH_GENES_NAME --name
NAME --correlation
{Pearson,Spearman}
[--output_file OUTPUT_FILE]
basic arguments:
--feature_count_table FEATURE_COUNT_TABLE, -f FEATURE_COUNT_TABLE
Gene quantification table or Protein table
--feature_count_start_column FEATURE_COUNT_START_COLUMN, -fc FEATURE_COUNT_START_COLUMN
This parameter specify the number of the column with
the first fraction
--feature_count_end_column FEATURE_COUNT_END_COLUMN, -fe FEATURE_COUNT_END_COLUMN
Specify the number of the last fraction we would like
to consider in the analysis
--name_column_with_genes_name NAME_COLUMN_WITH_GENES_NAME, -nc NAME_COLUMN_WITH_GENES_NAME
This parameter allows the user to specify the name of
the column where we want to search the gene or the
proteins
--name NAME, -name NAME
Name of the gene or protein of your interest
--correlation {Pearson,Spearman}, -corr {Pearson,Spearman}
Choose if applying the Pearson or Spearman correlation
--output_file OUTPUT_FILE, -o OUTPUT_FILE
Table with correlation coefficients and p-values
correlation_distribution_graph (histogram of correlation coefficients distribution)
correlation_distribution_graph
This subcommand creates a histogram to identify the distribution of the correlation coefficients, to better visualize the generated data and to create a comprehensible plot. The plot shows the percentile that might be used as cut-off in the network plot or in further analysis.
- Basic arguments
usage: graditude correlation_distribution_graph [-h]
--table_with_correlation_coefficient
TABLE_WITH_CORRELATION_COEFFICIENT
--percentile PERCENTILE
--output_plot OUTPUT_PLOT
basic arguments:
--table_with_correlation_coefficient TABLE_WITH_CORRELATION_COEFFICIENT, -c TABLE_WITH_CORRELATION_COEFFICIENT
Table with a column containing the correlation
coefficients
--percentile PERCENTILE, -p PERCENTILE
Define the percentile value
--output_plot OUTPUT_PLOT, -o OUTPUT_PLOT
Histogram with the correlation coefficients
distribution
correlation_rnas_protein (correlation coefficient)
correlation_rnas_protein
This subcommand find the correlation coefficient of two different tables. It can be used for example for combining RNA-sequencing and Mass-spectrometry data set. In this way we can predict RNA-protein interactions. In the implemented approach an all-against-Spearman or Pearson correlation of the protein and the gene quantification tables are generated.
- Basic arguments
usage: graditude correlation_rnas_protein [-h] --feature_count_table
FEATURE_COUNT_TABLE
--feature_count_start_column
FEATURE_COUNT_START_COLUMN
[--feature_count_end_column FEATURE_COUNT_END_COLUMN]
--protein_table PROTEIN_TABLE
--protein_count_start_column
PROTEIN_COUNT_START_COLUMN
[--protein_count_end_column PROTEIN_COUNT_END_COLUMN]
[--correlation_type {Pearson,Spearman}]
[--output_file OUTPUT_FILE]
basic arguments:
--feature_count_table FEATURE_COUNT_TABLE, -f FEATURE_COUNT_TABLE
This parameter specify the number of the column with
the first fraction in the first table
--feature_count_start_column FEATURE_COUNT_START_COLUMN, -fc FEATURE_COUNT_START_COLUMN
First table, for examplethe sequencing table
--feature_count_end_column FEATURE_COUNT_END_COLUMN, -fe FEATURE_COUNT_END_COLUMN
Specify the number of the last fraction we would like
to consider in the analysis in the first table
--protein_table PROTEIN_TABLE, -p PROTEIN_TABLE
Second table, for examplethe protein table
--protein_count_start_column PROTEIN_COUNT_START_COLUMN, -pc PROTEIN_COUNT_START_COLUMN
This parameter specify the number of the column with
the first fraction in the second table
--protein_count_end_column PROTEIN_COUNT_END_COLUMN, -pe PROTEIN_COUNT_END_COLUMN
Specify the number of the last fraction we would like
to consider in the analysis in the first table
--correlation_type {Pearson,Spearman}, -corr {Pearson,Spearman}
Choose if applying the Pearson or Spearman correlation
--output_file OUTPUT_FILE, -o OUTPUT_FILE
Output table containing the correlation coefficients
correlation_all_against_all (correlation coefficient)
correlation_all_against_all
This subcommand calculates the correlation coefficients all agaist all. It can be used in the sequencing table or with the protein data.
- Basic arguments
usage: graditude correlation_all_against_all [-h] --feature_count_table
FEATURE_COUNT_TABLE
--feature_count_start_column
FEATURE_COUNT_START_COLUMN
--feature_count_end_column
FEATURE_COUNT_END_COLUMN
--correlation_type
{Pearson,Spearman} --output_table
OUTPUT_TABLE
basic arguments:
--feature_count_table FEATURE_COUNT_TABLE, -f FEATURE_COUNT_TABLE
Gene quantification or protein table
--feature_count_start_column FEATURE_COUNT_START_COLUMN, -fc FEATURE_COUNT_START_COLUMN
Specify the number of the column with the first
fraction
--feature_count_end_column FEATURE_COUNT_END_COLUMN, -fe FEATURE_COUNT_END_COLUMN
Specify the number of the last fraction we would like
to consider in the analysis
--correlation_type {Pearson,Spearman}, -corr {Pearson,Spearman}
Choose if applying the Pearson or Spearman correlation
--output_table OUTPUT_TABLE, -o OUTPUT_TABLE
Table with correlation coefficients
correlation_replicates (correlation coefficient)
correlation_replicates
This subcommand allows to see the distribution of the correlation coefficient between two biological replicates
- Basic arguments
usage: graditude correlation_replicates [-h] --table_replicate1
TABLE_REPLICATE1 --table_replicate2
TABLE_REPLICATE2
--table_start_column-fc
TABLE_START_COLUMN_FC
--table_end_column-fe
TABLE_END_COLUMN_FE
[--output_table OUTPUT_TABLE]
[--output_figure OUTPUT_FIGURE]
optional arguments:
-h, --help show this help message and exit
--table_replicate1 TABLE_REPLICATE1, -r1 TABLE_REPLICATE1
Sequencing table first replicate
--table_replicate2 TABLE_REPLICATE2, -r2 TABLE_REPLICATE2
Sequencing table second replicate
--table_start_column-fc TABLE_START_COLUMN_FC
This parameter specify the number of the column with
the first fraction
--table_end_column-fe TABLE_END_COLUMN_FE
Specify the number of the last fraction we would like
to consider in the analysis
--output_table OUTPUT_TABLE, -o OUTPUT_TABLE
Output table containing both the replicates and the
--output_figure OUTPUT_FIGURE, -f OUTPUT_FIGURE
Plot that show a histogram with thedistribution ofthe
correlation coefficient
plot_network_graph (network plot)
plot_network_graph
This subcommand plots the network plot. The subcommand identifies two kind of nodes: transcripts and proteins. The edges are the correlation coefficients. The graph has been drawn using a force-directed layout, computed using the Fruchterman-Reingold algorithm.
- Basic arguments
usage: graditude plot_network_graph [-h] --feature_count_table
FEATURE_COUNT_TABLE --threshold THRESHOLD
--max_size MAX_SIZE
[--output_plot OUTPUT_PLOT]
optional arguments:
--feature_count_table FEATURE_COUNT_TABLE, -f FEATURE_COUNT_TABLE
Table with correlation coefficients values
--threshold THRESHOLD, -t THRESHOLD
Cut-off necessary to plot the genes or proteins with
an high correlation coefficients
--max_size MAX_SIZE, -max MAX_SIZE
This parameter is useful to set the maximum area of
each point. All the points are then scaled based on
that
--output_plot OUTPUT_PLOT, -o OUTPUT_PLOT
Network plot
interactive_plots (explore the plot)
interactive_plots
This subcommand shows the results of a dimension reduction and makes the plot more explorable. The lass select has been implemented and this allows the selection of a specific region in the plot generated. The selection will be showed in a separate plot and a table containing only the genes or proteins of interest will be generated.
- Basic arguments
usage: graditude interactive_plots [-h] --feature_count_table
FEATURE_COUNT_TABLE
--feature_count_start_column
FEATURE_COUNT_START_COLUMN
--feature_count_end_column
FEATURE_COUNT_END_COLUMN
--dimension_reduction_algorithm {t-SNE,PCA}
[--perplexity PERPLEXITY]
[--url_link URL_LINK]
[--output_file OUTPUT_FILE]
basic arguments:
--feature_count_table FEATURE_COUNT_TABLE, -t FEATURE_COUNT_TABLE
Gene quantification table or Protein table
--feature_count_start_column FEATURE_COUNT_START_COLUMN, -fc FEATURE_COUNT_START_COLUMN
This parameter specify the number of the column with
the first fraction
--feature_count_end_column FEATURE_COUNT_END_COLUMN, -fe FEATURE_COUNT_END_COLUMN
Specify the number of the last fraction we would like
to consider in the analysis
--dimension_reduction_algorithm {t-SNE,PCA}, -dm {t-SNE,PCA}
Specify the dimension reduction algorithm. The user
can choose between t-SNE and PCA
--perplexity PERPLEXITY, -pp PERPLEXITY
The perplexity is useful tobalance the attention
between the global and the local aspects of data. It
is better to select a value between 5 and 50. This parameter is necessary
only if you are plotting the t-SNE dimension reduction
--output_file OUTPUT_FILE, -o OUTPUT_FILE
Output plot
- Additional arguments
--url_link URL_LINK, -url URL_LINK
This parameter allows to choose the website you would
like to open when clicking on a specific point in the
html plot
find_complexes (quality control of protein data)
find_complexes
As a general quality control for the protein data, we look at how many of the know protein complexes are actually present in our specific data sets. This subcommand detects if all the subunit of that specific complexes are present and calculate the correlation.
- Basic arguments
usage: graditude find_complexes [-h] --tables_containing_list_complexes
TABLES_CONTAINING_LIST_COMPLEXES
--protein_table PROTEIN_TABLE
--table_start_column-pc TABLE_START_COLUMN_PC
--table_end_column-pe TABLE_END_COLUMN_PE
[--output_table OUTPUT_TABLE]
optional arguments:
-h, --help show this help message and exit
--tables_containing_list_complexes TABLES_CONTAINING_LIST_COMPLEXES, -complexes TABLES_CONTAINING_LIST_COMPLEXES
Table containing the list of known complexes
--protein_table PROTEIN_TABLE, -p PROTEIN_TABLE
Protein table
--table_start_column-pc TABLE_START_COLUMN_PC
This parameter specified the number of the column with
the first fraction in the protein table
--table_end_column-pe TABLE_END_COLUMN_PE
Specify the number of the last fraction we would like
to consider in the analysis in the protein table
--output_table OUTPUT_TABLE, -o OUTPUT_TABLE
Output table containing the complete complexeswith all
the complex subunits and correlation coefficients