GRADitude: a computational tool for Grad-seq data analysis
Introduction
Grad-seq is a high-throughput profiling approach for the organism-wide detection of RNA-RNA and RNA-protein interactions in which molecular complexes are separated in a gradient by shape and size (Smirnov et al., 2016, PNAS). Grad-seq separates native cellular lysates including complexes, according to their molecular weight and shape in a glycerol gradient, independent of charge and sequence. After this fractionation, RNA-seq and MS-analysis of each of the fractions generated allows the reconstruction of the sedimentation profiles of all detectable RNAs and protein in a single experiment. Further analysis can reveal possible interactions between the individual molecules.
So far, Grad-seq has been used to globally study RNA-RNA and RNA-protein interactions in Salmonella Typhimurium and that allowed us to identify ProQ as a new global RNA-binding protein
usage: graditude [-h]
{create,min_row_sum_ercc,min_row_sum,drop_column,move_columns,merge_features,robust_regression,normalize,find_spike_in,normalize_with_spikein,scaling,correlation_all_against_all,selecting_specific_features,heatmap,plot_kinetics,clustering,clustering_elbow,silhouette_analysis,pca,t_sne,umap,correlation_rnas_protein,correlation_distribution_graph,plot_network_graph,clustering_proteins,dimension_reduction_proteins,correlation_specific_gene,interactive_plots,correlation_replicates,find_complexes,generate_html,extract_gene_columns,merge_attributes,version}
...
positional arguments:
{create,min_row_sum_ercc,min_row_sum,drop_column,move_columns,merge_features,robust_regression,normalize,find_spike_in,normalize_with_spikein,scaling,correlation_all_against_all,selecting_specific_features,heatmap,plot_kinetics,clustering,clustering_elbow,silhouette_analysis,pca,t_sne,umap,correlation_rnas_protein,correlation_distribution_graph,plot_network_graph,clustering_proteins,dimension_reduction_proteins,correlation_specific_gene,interactive_plots,correlation_replicates,find_complexes,generate_html,extract_gene_columns,merge_attributes,version}
commands
min_row_sum_ercc Filter the ERCC table based on the min row sum. It calculates the sum row-wise and discard the rows
with a sum below the specified threshold
min_row_sum Filter the gene quantification table based on the min row sum. It calculates the sum row wise and
discard the rows with a sum below the specified threshold
drop_column It filters a table dropping a specific column.
move_columns
merge_features This subcommands help to merge specific features
robust_regression It compares the ERCC concentration in mix with the ERCC reads and take it out the outliers
normalize This subcommand calculates the ERCC size factor and normalize the gene quantification table based
on that
find_spike_in This subcommand can be used to find the spike in when there areno ERCC reads available
normalize_with_spikein
This subcommand calculates the ERCC size factor and normalize the gene quantification table based
on that
scaling This subcommand scales tables using different methods
correlation_all_against_all
This subcommand calculate the correlation coefficients all against all.
selecting_specific_features
This subcommand allows to select specific features in a table (for example ncRNAs
heatmap This subcommand is useful to visualize the in-gradient behavior of a larger group of transcripts or
proteins
plot_kinetics This subcommand plot the kinetics of a specific transcript or protein to better visualize their
behavior within the gradient
clustering This subcommand performs unsupervised clustering using different algorithm
clustering_elbow This subcommands plot the elbow graph in order to choose the ideal number of clusters necessary for
the k-means and the hierarchical clustering
silhouette_analysis
This subcommand can be used to interpret the distance between clusters
pca This subcommand performs the PCA-principal component dimension reduction
t_sne This subcommand performs the t-sne dimension reduction
umap This subcommand performs the umap dimension reduction analysis
correlation_rnas_protein
This subcommand performs the Spearman or Pearson correlation coefficients of two tables.
correlation_distribution_graph
This subcommand plots the distribution of the correlation coefficients as a histogram
plot_network_graph This subcommand plots the network plot. It can be used to plot for example sequencing data vs
protein data or ncRNAs vs proteins etc.
clustering_proteins
This subcommand performs the unsupervised clustering of protein data
dimension_reduction_proteins
This subcommand perform the t-sne analysis of Mass spectrometry data
correlation_specific_gene
This subcommand calculate the Spearman or Pearson correlation of a specific gene or a specific
protein against all
interactive_plots This subcommand is useful to visualize interactively a plot after a dimension reduction algorithm
has been applied.
correlation_replicates
This subcommand allows to see the distribution of the correlation coefficient between two
biological replicates
find_complexes With this subcommand we look at how many of the known proteincomplexes are actually present in our
specific data sets.It finds all the sub-unit of a specific complexand calculate the correlation
generate_html This subcommand give to the user the possibilityof creating an html page containingmany results
generated using the tool.
extract_gene_columns
This subcommand can be usedto extract from the attributecolumn the name or the ID of a specific
gene
merge_attributes
version Print version
optional arguments:
-h, --help show this help message and exit
Download
Source code
The source code of GRADitude can be found at Github
License
ISC (Internet Systems Consortium license ~ simplified BSD license) - see LICENSE
Contact
For question and requests feel free to contact Silvia Di Giorgio