Thursday 23 February 2012

Internship Project - A fully Open Chemically Searchable ChEMBL

For a long time now we have been keen to release a full and freely deployable version of the ChEMBL database with compound search capabilities built in. This has been possible in the past, but complicated by commercial licenses associated with either the databases or the chemical cartridges. There are now a number mature Open Source chemical toolkits available, such as the excellent CDK, and RDKit.

So with that brief bit of background there is now an opportunity for an intern to work in the ChEMBL group on the project for 2-3 months. The idea is will be to setup a process which:

  1. Creates a PostgreSQL version of the ChEMBL database (database required by RDKit).
  2. Install the RDKit chemical cartridge.
  3. Migrate this setup to Amazon Web Service public image.
  4. Migrate existing (or new) ChEMBL interface to run off new database and package this up into AWS image.
  5. Develop scripts to allow new releases of ChEMBL to be processed and uploaded as a new AWS image.
Actually some work has already been done in the public domain, and this will act as a good starting point for someone wanting to learn more about the data and technologies.

If you are looking for internship this year and have interest in the area of cheminformatics tools and some relevant experience please get in touch (as potential interns, we appreciate you may not have years of industry experience, but we would require you to have previous experience with relational databases and be competent in at least one programming language).

No comments: