ggplot2: A function for aesthetic 'super-enhancer' plots

ggplot2: A function for aesthetic 'super-enhancer' plots

I have been analysing a lot of ChIP-seq data lately. One of the steps in the analysis process is identifying Super-Enhancers from H3K27ac ChIP-seq data.

Briefly, Super-Enhancers are long stretches of genomic regions with clusters of enhancers within these stretches. Super-enhancers are known to have very high enhancer activity than the sum of its individual enhancers. While this post is not about the definition or biological importance of super-enhancers but showing the identified super enhancers through an aesthetic, publication quality plot.

Here are some references to understand super-enhancers in detail

To identify super enhancers, most published articles used the algorithm ROSE (Rank Ordering of Super Enhancers) from Young lab. You can grab the code from here or here. One can find the instruction on how to run this tool on their bitbucket page (Instructions are also hardcoded within ROSE2_main.py script).

The super-enhancer code provided above also generates a plot by default. The color chosen for super-enhancers and typical-enhancers is same and it would be helpful to differentiate them with different colors. Adding different colors to super-enhancers will also come in handy to separate different conditions (e.g. Different sub-groups of Breast cancer).

I wrote a naive R function to generate super-enhancer plot from the results of ROSE algorithm. It was so naive that I need to change the code if I want a different color on the plot.

Few days back, I came across an impressive thought from David Robinson which said

Above tweet from David made a lot of sense for me to sit for sometime and write a generalized function to produce super-enhancer plot, with a flexibility to choose colors, labels etc and returns a ggplot2 object to customize according to the requirements. Here, I am sharing the function and instructions on how to use it to generate a super-enhancer plot.

ROSE algorithm generates a couple of output files. The one with *_AllEnhancers.table.txt extension will be used for the plots.

For convinience, I pushed the code to GitHub, so one can directly source it from R.

Arguments and Example Usage

seFile	- path to ROSE result (*_AllEnhancers.table.txt)
seCol	- color for super-enhancers (Default - red)
teCol	- color for typical-enhancers (Default - black)
bg	- (TRUE|FALSE) - whether background is used during ROSE run (Default - TRUE)
mark	- Which ChIP-seq data is used to generate super-enhancer data (Default - H3K27ac)

Example-1 Example-2
Example-3 Example-4

comments powered by Disqus