SequenceServer v1.0 pre-release

Just recently I have been lucky enough to have had access, for testing, to the pre-release version of SequenceServer

$ gem install –pre sequenceserver

NB – At the time of writing this post I was using pre2, that has now been updated to pre4 and I have had several chats with both Anurag and Yannick via email, so some of the information below is possibly slightly out of date. I will work at updating this over the weekend, but for now wanted to publish this ‘as is’ so people can actually get to read/see it! ūüôā

There are quite a few updates visually and in the underlying code. The interface is much cleaner now and the slightly garish orange/green colour scheme has been replaced by a frosty white and blue. The other major change is the output report of BLAST hits where we now have a graphical representation of where the hits align to the query – similar to the graphical report you see at NCBI’s web blast – but with a nicer colour scheme and the same image link map to take you to the relevant links. I would still like integration with CDD here to show protein domains… There are hints of more options to come, You can now download FASTA files of your sequences as well as¬†XML and TSV copies of the BLAST report. I’ve not looked too much at the code-base changes as I am not completely au-fait with Ruby (Anurag tells me: “We now have a minified CSS and JS file each for production deploy. And a gzipped version of each as well. Using [the] gzipped version improves page load times by 70%.) but there are a few more gems required and what looks like a tidying-up of some of the code!

If you remember from my previous post, here, I had the need to set up multiple versions of SequenceServer for different sequencing projects – rather than have all the sequence databases appear under one server – each associated with their own domain usinf Phusion Passenger. ¬†I accomplished this by pulling the github repository to a local directory (/usr/share) and then creating a set of symlinks to the files within that directory to a directory for each of my projects (e.g. /etc/blast_project1, /etc/blast_project2). Then from my apache install (/var/www)¬†creating another set of symlinks to each of the projects’ public folder. ¬†Although complicated, this allowed me to keep the repository up-to-date and keep each server running, and to be able to modify individual files by copying them and editing rather than using a symlink.

It worked remarkably well and other than when non-symlink files changed I didn’t get any problems other than WordPress making it difficult for sequence retrieval due to some .htaccess settings but that was easily sorted with a few settings changes in Apache.

This time I have had to figure out Ruby a bit more (still hate it! :p) as the pre-release version is installed via ruby gems but I have kept much of the same setup for testing. I will again attempt to keep a record below of how I achieved multiple SequenceServers running for multiple projects.

1. Ruby Gems Locations

You can find out where you local copy of Ruby Gems are installed by typing:

$ gem environment

RubyGems Environment:
– RUBYGEMS VERSION: 1.8.23
– RUBY VERSION: 1.9.3 (2013-11-22 patchlevel 484) [i686-linux]
– INSTALLATION DIRECTORY: /var/lib/gems/1.9.1

So we can see that for Ubuntu Trusty Tahr they are located in /var/lib/gems/1.9.1 with Ruby 1.9.1 and Ruby Gems 1.8.23. BUT!!

$ gem environment

RubyGems Environment:
– RUBYGEMS VERSION: 2.4.5
– RUBY VERSION: 1.9.3 (2013-11-22 patchlevel 484) [i686-linux]
– INSTALLATION DIRECTORY: /usr/lib/ruby/gems/1.9.1

When I needed to update Ruby Gems (see below Section 5) they moved the location to /usr/lib/ruby/gems/ which is something to be aware of! So FRUSTRATING!

2. Symlink Directories

The final folder path for SequenceServer pre is either /var/lib/gems/1.9.1/gems/sequenceserver-1.0.0.pre.2 or /usr/lib/ruby/gems/1.9.1/gems/sequenceserver-1.0.0.pre.2 and I am again going to make a chain of sym-linked directories to help manage the installation of multiple copies.

Source: /usr/lib/ruby/gems/1.9.1/gems/sequenceserver-1.0.0.pre.2

1st Symlink: /usr/share/sequenceserver1 -> /var/lib/gems/1.9.1/gems/sequenceserver-1.0.0.pre.2

$ sudo ln -s /usr/lib/ruby/gems/1.9.1/gems/sequenceserver-1.0.0.pre.2 /usr/share/sequenceserver1

2nd Symlink (content of the folder): /etc/blast_projectX -> /usr/share/sequenceserver1

$ mkdir /etc/blast_projectX/ && cd /etc/blast_projectX

$ sudo ln -s /usr/share/sequenceserver1/* .

3rd Symlink: /var/www/projectX -> /etc/blast_projectX/public

$ sudo ln -s (see below, things to do first!)

You could symlink directly to the gems directory but this¬†way when SequenceServer updates to say¬†sequenceserver-1.0.0.pre.3 or the final version etc, I can modify the 1st symlink and then I should be able to keep everything else in place… (you might have to re-run ‘bundle install’ too). Well,¬†what will happen is that any files that no longer exist will be flagged (usually in red) that the symlink is broken [therefore you can delete those] and any new files in the new SequenceServer gems directory won’t be linked, but you can always symlink them one-by-one. Not the best solution but I can’t think of a better way when gems are managed in folders with version numbers ūüôĀ – if anyone has any thoughts on this I would like to hear them.

3. Static Config Files

Then we can go about setting up the config for this project (remove the symlink config.ru script and make it static and edit it to be specific to this project) note the dot…:

sudo rm config.ru
sudo cp /usr/share/sequenceserver1/config.ru .

Then modify the line that starts “SequenceServer.init” to look like (note the quotes and space before the colon are important):

SequenceServer.init :config_file => “/etc/blast_projectX/config.yml”

We also need to make one more change to this file – this might be due to the version of Sinatra I am running but I cannot be sure – ¬†the command “Run SequenceServer” needs to be changed to “run SequenceServer”. Note the word ‘run’ does not have a capital letter ‘r’.

run SequenceServer

The above should be fixed in pre4+. Yay!

Then we need to create the YAML config file, which contains the location of the blast databases for this project and the port to run on etc:

$ sudo nano config.yml

inserting your needed configuration (note ‘database’ is now written as ‘database_dir’):

bin: /usr/bin/
database_dir: /path/to/your/databases
port: 5678
num_threads: 1

4. Apache Configuration

Now we need to edit Apache’s configuration to allow Phusion Passenger to display the server as a RackBaseURI on a sub-domain. I won’t go in to installing Passenger here, there is information in my previous post and the SequenceServer website for that. But suffice to say, more configuration files have changed their location! In the latest version of Apache (2.4.7 in Trusty Tahr anyway) I have to edit /etc/apache2/sites-available/000-default.conf¬†and put the relevant directory information in there.

$ sudo nano /etc/apache2/sites-available/000-default.conf

Under the “Virtual Host” section add:

RackBaseURI /projectX
<Directory /var/www/projectX>
AllowOverride None
Options -MultiViews
AuthType Basic
AuthName “Blastocladiella Genome Project”
AuthUserFile “/etc/blast_projectX/.htpasswd”
Require valid-user
</Directory>

In this example I have also added some password protection to the directory.

Now we need to add a symlink to the public directory in our project folder to the Apache webserver directory.

$ cd /var/www
$ sudo ln -s /etc/blast_projectX/public projectX

 5. Troubleshooting

I also had to do a number of things to get the latest version of SequenceServer to run in the configuration above, these are detailed below.  On a different server, although also running Ubuntu Trusty, running the pre-release straight from the command line I had no issues.

I received this error:

There was a ArgumentError while loading sequenceserver.gemspec:
Malformed version number string 1.0.0-2 from
/var/lib/gems/1.9.1/gems/sequenceserver-1.0.0.pre.2/sequenceserver.gemspec:4:in `block in <main>’

To remedy it I had to update Ruby Gems from 1.8.3 to 2.4.5.

$ sudo gem install rubygems-update
$ sudo update_rubygems

But I am told when it hits final you won’t have to do that, but you will probably have updated anyway. Then you need to reinstall ‘bundler’:

sudo gem install bundler

I also needed to install the QT4 development pacakges to get the Capybara-webkit to compile.

$ sudo apt-get install libqt4-dev

And finally (do not run bundle as sudo, it will ask if/when it is needed) – I am running with the option ‘–without development’ here to remove the previous [strike through] error I had received:

bundle install –without development

One more oddity was that there was no Gemfile.lock in the SequenceServer gems directory so I created it (prior to running bundle install) and then symlinked it to the /etc/blast_projectX folder also. Bundle should create this but twice now, for me, it has not. Go figure!

Running Multiple SequenceServers on Apache2 and Ruby 1.9

What I would like to do is run multiple instances of SequenceServer under different sub-directories on one domain, e.g. http://richardslab.exeter.ac.uk/blastocladiella and http://richardslab.exeter.ac.uk/hyphochytrium but running from one source code directory and one central location for blast databases (although the FASTAs should be in different sub-directories).

This is how I managed it. It’s similar to the way I have set up multiple copies of mediawiki. It’s a little involved, but it seems to work and there may well be other ways – which I would be interested to hear about – and so I can’t guarantee this is the correct, sanest or safest way! So YMMV. However, there are a few challenges to get through first.

Ruby 1.9.* has had a few security updates and in so doing they have removed ‚Äú.‚ÄĚ (the current directory) from load PATH. There’s lots of talk about it on the internet (e.g. here) with various solutions to get it running. It took a bit of figuring out for SequenceServer which would not run directly from github for me, but I think I have cracked it for the latest version (as of 2014-03-31 – 0.8.7?) if you are running it through Apache with Phusion Passenger. Incidentally – and this may be testament to my lack of knowledge of (and desire to know about) Ruby but it won’t run from the command line unless we make a different set of changes. But I suppose you can always install the gem rather than github for that scenario… Unless some one else can help out on this front…

The reason I am running from git source is that I’m not sure where the ruby gem files get installed or what ruby gems does with them and so this way I can be in control of exactly what files I am going to change below…

1) Install SequenceServer from github

First lets put SequenceServer somewhere safe.

cd /usr/share/

Then clone the source from github;

sudo git clone https://github.com/yannickwurm/sequenceserver.git

You’ll notice you cannot run ./bin/sequenceserver without it throwing an error. We will fix that shortly. You may want to at this point run

sudo bundle install && sudo gem update

in order to make sure any ruby gem dependencies that are needed are installed and updated.

You will also need to make sure you have Phusion Passenger installed for Apache.

gem install passenger

passenger-install-apache2-module

Don’t worry about adding the second part to your Apache config, we’ll do that below… BUT you do need to add the first bit!!!

a2enmod passenger

2) Changes to the Sequence Server Source Code

Follow the changes highlighted (bold) in each file below; substitute ‘nano’ for your favourite awesome text editor.

sudo nano config.ru

require ‘rubygems’
require ‘bundler/setup’
#require ‘sequenceserver’
require File.expand_path(File.join(File.dirname(__FILE__), ‘lib/sequenceserver.rb’))

SequenceServer::App.init
run SequenceServer::App

We will be coming back to this file later…

sudo nano lib/sequenceserver.rb

require_relative ‘./sequenceserver/helpers’
require_relative ‘./sequenceserver/blast’
require_relative ‘./sequenceserver/sequencehelpers’
require_relative ‘./sequenceserver/sinatralikeloggerformatter’
require_relative ‘./sequenceserver/customisation’
require_relative ‘./sequenceserver/version’

sudo nano lib/sequenceserver/helpers.rb

require_relative ‘../sequenceserver/database’

3) The ‘tricky’ Part!

First make a directory for your the first instance of SequenceServer you want to run. In this example I have called my directory blast_genome_n; where n = 1…n etc. You might like to make it something more specific to your needs.

cd /etc/
sudo mkidr blast_genome_n
cd blast_genome_n/

Now we are going to make symlinks to our source code repository in /usr/share/sequenceserver – Note, that we want to get the files (*)¬†from the directory, not the directory itself. We shall be copying them to our current location ‘.’

sudo ln -s /usr/share/sequenceserver/* .

Next comes a bit that you don’t want to mess up. We are going to be removing two of the symlinks and replacing them with copies of a static file. This is because they both contain information that will be specific to each of your instances…

sudo rm config.ru
sudo cp /usr/share/sequenceserver/config.ru .

sudo rm example.config.yml
sudo cp /usr/share/sequenceserver/exampl.config.yml .

You now need to edit each file accordingly.

sudo nano config.ru
We edited this file previously, but we need to update it with the proper location of our configuration file (e.g. this is where you specify the blast database locations etc). So, add this bold line;

SequenceServer::App.config_file = ‘/etc/blast_genome_n/config.yml’
SequenceServer::App.init
run SequenceServer::App

sudo mv example.config.yml config.yml
sudo nano config.yml

For example;

database: /your/specific/blast/databases/

4) Update Apache httpd.conf

sudo nano /etc/apache2/httpd.conf

Add the section below to the <VirtualHost *:80> section in the config file;

RackBaseURI /blast_genome_n
<Directory /var/www/blast_genome_n>
Options -MultiViews
</Directory>

5) Add a symlink Directory to your Apache Directory

sudo ln -s /etc/blast_genome_n/public blast_genome_n

Make sure it is to the “public” directory! Incidentally, you can also change files in the public directory to ‘static’ files and make individual changes to each server portal, e.g. web page colours…

5) Repeat

Repeat Steps 3-5 for any other separate genome blast portals you wish to create

6) Restart Apache

sudo service apache2 restart

And you’re done! It should now work! ūüôā

Addendum

I run WordPress on the same server in the document root and have mod_rewrite enabled to give nicer URLs, however this will interfere with the sequence retrieval of SequenceServer. So, if you are in the same situation (doesn’t have to be WP, it could be Joomla or any situation where mod_rewrite is interfering) you need to add a few lines to the .htaccess file in the root folder

sudo nano .htaccess
It will look something like this:


RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ – [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
#

You need to add a line to make it look like this:


RewriteEngine On

RewriteRule ^blast_genome_n/ – [L]

RewriteBase /
RewriteRule ^index\.php$ – [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
#

Note how it occurs after turning mod_rewrite on but BEFORE wordpress rewrites are carried out!

Addendum

For some reason the above “solution” for .htaccess had stopped working. The below now seems to work…

# BEGIN WordPress
RewriteEngine On
RewriteBase /
RewriteRule ^index.php$ – [L]

# Include in the next line all folders to exclude
RewriteCond %{REQUEST_URI} !(blastocladiella|deepsea|paramecium|hyphochytrium|frogliver) [NC]

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
# END WordPress

One little tweet…

So I thought I had best write up a post about what happened in the latter half of the first week of December 2013.

The story begins back in late 2010 when I published a paper in the journal “Infection, Genetics and Evolution” on “Resolving the question of trypanosome monophyly: A comparative genomics approach using whole genome data sets with low taxon sampling” – ScienceDirect. This was a small chapter of my PhD thesis and was drafted around the time I was in completion mode. As many will know, it’s quite a stressful and manic period of time. I, therefore, and also being quite new to the world of publishing, was not completely confident nor knowledgeable. I left the decision of journal choice to the corresponding author. There were reasons for choosing this journal but they’re lost to time. Then not much happened, there have been about seven¬†citations of the article according to Google Scholar. Not bad for my second paper, but it wasn’t my primary area of research.

In 2011, charged with updating the website for my new post-doctoral position at The Natural History Museum, London. I and with agreement from my PI, uploaded all papers that we had published (both together and separately), making them available to everyone. For example, the general public (who had in part funded many of our research projects) and other scientists at institutions without access to certain journals due to extremely large access fees. At the same time I decided it would be prudent to update my Academia.edu profile and whilst I was at it, my ResearchGate.net and Mendeley profiles. Why not? I had all the PDFs neatly organised and ready for upload. They were uploaded and then I thought no more of it, more pressing post-doc duties were incoming.

Over this time, Open Access and the free accessible nature of scientific research had become more pressing. I noticed it was starting to be discussed in University and Academic institutions, open access publishing funds were set up and institutions started inviting speakers on the subject РI remember Stephen Curry coming to give a very engaging departmental talk on Open Access publication options and trying to impress the importance of this. It was also becoming a larger talking point of several academics that I follow on Twitter, most noticeably Jonathon Eisen Рwho along with his brother were instrumental in setting up PLoS. None of this really matters, other than to say I had become interested in the topic, agreed with the importance of making research accessible (or what is the point!) but had not become heavily involved or really done any huge in-depth background reading. Anyway.

In the very early morning on 6th December – 19 minutes past midnight I received this email:academia_edu_takedown

It was late and I decided to wait till the morning to read it further and to deal with it. Reading the message, it was very clear Academia.edu had received the take-down and had to comply with Elsevier’s strict policy on the posting of published journal articles (here) but they definitely did not agree with Elsevier’s position, the wording is exquisite and included a link to a petition website¬†http://thecostofknowledge.com/¬†protesting Elsevier’s business practises. Immediately signed.

When I got in to work that Friday morning I did a quick twitter search РI forget what I searched for exactly Рto see if anyone else was tweeting about take-down notices. I noticed this tweet from Rafael Maia who also received the same message for one of his papers!

Embellishing slightly, I also made a tweet about the situation:

Okay, “lots” might have been an over statement but, there were certainly lots more tweets to come and as you will see from an interview below lots more take-down requests! It got picked up pretty rapidly by @RossMounce¬†who asked to see what the take-down notice looked like, image above. Then¬†@MikeTaylor¬†who writes a cool blog about his research on Sauropods but is heavily interested in the open access “debate” and also blogs about it quite often, very quickly wrote up a nice blog post, here. That and a reformatted tweet:

is when it all suddenly kicked off and it seemed like hundreds of people were suddenly aware and re-tweeting and talking about Elsevier and myself. My inbox didn’t stop receiving notifications all day, and continues to today. From here it gets a little difficult to track and see how many people actually did retweet my original message and the many subsequent ones but you can follow the conversations from the above linked tweets if you wish.

I was then approached by Jennifer Howard a journalist from Chronicle.com – an American news website discussing Higher Education issues – who asked if she could write up an article on the matter as it interested her and we had a small chat over email. Later that day this¬†article was published. The most interesting thing, I think, to come from this was a late addition – I didn’t see it until this morning – from the CEO of Academi.edu Richard Price:

Richard Price, the founder and chief executive officer of Academia.edu, said in an email that ‚ÄúElsevier has started to send academics on Academia.edu takedown notices in batches of a thousand at a time.‚ÄĚ The email Mr. Leonard received ‚Äúis the notification that we sent to our users,‚ÄĚ Mr. Price said, adding that his company usually receives one or two individual notices from publishers a week, ‚Äúbut not at scale like this.‚ÄĚ (Academia.edu has close to six million registered users; it said it had received about 2,800 takedown notices from Elsevier so far.)

The tweets, Mike’s blog and the Chronicle article got picked up by some other websites, TechDirt.com, SlashDot.com¬†and CNET.com¬†each with their own little spin and generating many re-tweets and comments.¬†Thoroughly¬†interesting.

Richard Price: CEO Academia.edu contacted me late Friday but by that point I had returned home for a much needed pint and didn’t check my work email until this morning. We might have a chat later, depending on schedules. But so far he seems very supportive of academics and the open access nature of publications. Not least the slogan in his email signature:

The goal of Academia.edu is to get every science PDF ever written on the internet, accessible for free.

So where do I – we – go from here?

It is interesting to note that on Academi.edu I have another published journal article from an Elsevier journal. So far there has not been a take-down for this article. Nor have I been contacted by ResearchGate or Mendeley, where exactly the same PDF versions of the paper that Elsevier decided was breaching their rules. I plan to leave them there until / if I am contacted.

Either way as much as I can I will be avoiding Elsevier both for publication and peer review and hopefully impressing on my colleagues to do the same. They say all publicity is good publicity but I really don’t think Elsevier can push a positive spin on their previous conduct nor on their recent conduct. 2800 requests is 2800 pieces of research that have now become inaccessible to the public for no good reason. If one little (open, accessible, free) tweet can generate this amount of interest over a Friday and a weekend then just think how much interest and knowledge you can impart on the world by not publishing with Elsevier and making any articles that you have published with them available and free to access online. Hello Elsevier –¬†leonard_et_al_2011¬†– *waves*!

Edit

There has been some suggestion on line – I won’t waste electrons with a link – by a few people that “we open access advocates” are “oddly surprised” by Elsevier’s decision on take-down notices. How sad that they feel they must defend Elsevier by belittling our intelligence. Perhaps it speaks volumes that these lazy attacks show there is little defence for their actions. The intention of my tweet, this blog, other mentioned tweets and blogs was to raise awareness of the practice of some established publishers’ attitudes to the changing perceptions and demands of science communication. We don’t want our science to be held in a box where only people with money, or whoever the publisher decides can access it. Science should not be a pyramid scheme for the monetarily privileged. The language employed by statements such as those directly from Elsevier on the matter – here – are nothing short of bullying. They wish to “explore user-friendly options for alignment” – alignment! Are you kidding me? It just leaves me with a feeling of utter disappointment. A little that lies in myself for not realising sooner that we should never have published with Elsevier and two that they took that option of publishing with them away from us for future articles. But thankfully it does not leave me with a feeling of despair because I know that there are plenty of other publishers and routes to publishing out there who do not and will not behave in this manner. Evolve or die.

I also somewhat disagree with Stephen Curry’s point¬†that the company is “acting rationally”. No they are not, you may be right ¬†in suggesting that they are acting in the “technically correct” manner, and we all know technically is the best kind of correct, but rationally? So far it’s negative press after negative press. Thankfully they have not behaved litigiously so far, but that’s just steps away.

Addendum

It completely skipped my memory that earlier this year Elsevier acquired Mendeley – here¬†– this add another little layer of interest. The PDF is still on Mendeley, they can presumably send themselves a TDN but will they? And not just that, they currently have PJA PDFs of other papers I am author on from non-Elsevier journals, I would hope they apply the same level of scrutiny to themselves. Well I don’t but you see where I’m going with it. One rule for them, screw the others!