|
|
|
System Overview
|
X! Hunter is part of the X! Series of protein identification search engines made available by
The Global Proteome Machine Organization (GPMO).
The X! Hunter search engine compares experimentally observed spectra directly with consensus mass spectra obtained from the
GPMDB and produces results in an XML format known as BIOML,
the XML standard used by all GPMO software. This allows for the easy sharing of results between scientists
and the ability to analyse the results using common GPMO tools.
A manuscript describing X! Hunter Annotated Spectrum Library (ASL) searches has
been published on the ASAP section of the Journal of Proteome Research (Abstract).
The suite of programs and software that make up X! Hunter can be divided into three basic groups:
- consensus mass spectra library files and databases
- identification algorithm
- search interface and identification/analysis display
|
| |
|
Consensus mass spectra library files and databases
|
The consensus mass spectra are stored in two formats: binary library files (ASL) and in a database. Each taxonomic group is
represented in both formats, as the formats each have their own applications. Both formats contain the same information,
although the binary format is processed to be non-redundant in the sense that each spectrum is only listed once with homogolous
accession numbers following. Whereas the database does not have any device that removes redundancies. ie: a peptide domain
to consensus spectrum is a one-to-one relationship within each accession number, therefore a spectrum may be listed more than once
within a taxonomy.
The ASL files are used when matching multiple spectra via file upload through a web form and the database
is used when matching a single spectrum via text input through a web form. This is because the entire ASL is loaded into memory
and therefore requires more initial overhead.
Once the ASL is loaded, it can be accessed very quickly
and the initial overhead is negated because the speed of the search is greatly increased due to the binary nature
of the data and because it is indexed and sorted by parent ion mass.
As the number of spectra submitted increases, the overhead is reduced decreasing the time per spectrum.
This is subject to the hardware limitations though. If too many spectra are
submitted the hardware could become overwhelmed, thereby increasing the time per spectrum.
When a single spectrum is submitted, the overhead required to load the ASL outweighs the benefit of its increased speed
so the database is used instead.
The database also provides useful statistics and allows an administrator to run summary and investigative queries
allowing for the generation of reports and diagnostic testing.
|
|
Identification algorithm
|
As there are two formats for storing the consensus mass spectra, there are two methods for accesssing the data.
For the ASL format, a compiled C++ program is used, whereas a PERL program is used for the database format.
Both programs use the same identification algorithm and scoring method and both produce the results in BIOML format. The difference
between the output produced is that the multiple spectra version is able to generate a protein score based on multiple
peptides found matching a single protein accession number. In the single spectrum version, the peptide hyper score
is used as the protein score.
The first step in the identification is to 'condition' the input spectra. This means removing isotopes,
neutral losses from parent ions, parent ion masses themselves and then keeping the 20 most intense peaks.
The new list of ions is then compared to all the consensus mass spectra in the given mass range, producing a theta
and hyper score. Only the top scores for each input spectra are kept. Finally, the protein, peptide,
post translational modifications, if any, and the spectrum are written in BIOML format and displayed in the web browser.
|
|
User types and actions
|
There are three levels of access to the system: casual users, power users and admin users.
-
Casual users:
Casual users are those who uses the X! Hunter interface to compare their spectra with those in the X! Hunter
libraries and databases. A casual user can:
- browse spectra by parent ion mass or peptide sequence
- search using multiple spectra from a file
- search using one spectrum entered in a text field
-
Power users:
To become a power user you must register by sending an email to the admin user detailing the information
found here. The administrator will add you to the system
and provide you with a password.
Power users can perform tasks on the public version and if applicable, perform tasks
on their local installation of X! Hunter.
Once registered and logged in, power users can:
- add a new spectrum using web interface
- remove a spectrum using web interface
- perform casual user tasks
-
Admin users:
A user can become an admin user by
downloading
an installation of X! Hunter. This will allow
them to administer a local copy of the X! Hunter libraries and databases. Changes made to these
installations will not be reflected in the public version of X! Hunter.
Admin users can:
- build the library files
- create the database entries
- manage users
- perform power user tasks
|
|
Browsing spectra
|
If searching for matches to a specific peptide sequence or peptide sequences within a particular
mass range, the browse spectra page can be used.
To browse by mass range (diagram 1), the user enters the parent ion mass, selects the charge and taxonomy and
sets the mass error range. Help information for each parameter is available by clicking on the icon
next to the parameter name. Once satisfied with the chosen parameters, the user clicks the 'Find models'
button. Once the search is finished, the browser is redirected to the results page (diagram 2).
|
|
|
diagram 1
|
|
|
diagram 2
|
|
|
To browse by peptide sequence, the user enters the peptide sequence and selects the charge and taxonomy (diagram 3).
Help information for each parameter is available by clicking on the icon next to the parameter name. Once
satisfied with the parameters, the user clicks the 'Find models' button. Once the search is finished,
the browser is redirected to the results page (diagram 4).
|
|
|
diagram 3
|
|
|
diagram 4
|
|
| |
[top] |
| |
|
|
Search multiple spectra
|
X! Hunter can be used to match multiple spectra using the
multiple spectra search feature
by uploading a text file containing tandem mass spectra in the following formats:
- BIOML
- DTA
- PKL
- Matrix Science
- mzXML
- mzDATA
NOTE: As the schemas for mzXML and mzDATA files are quite involved, it is possible to create incarnations
of these file types that are not X! Series compliant.
View examples of X! Series compatible mzData
and mzXML files.
|
To search using multiple spectra, the user
clicks the 'Browse...' button and selects the spectra file to use (diagram 5).
Help information for each parameter
is available by clicking on the icon next to the parameter name.
Once satisfied with the entered parameters and the selected taxonomy, the user clicks the 'Find Models' button.
The file is uploaded and the search is performed (diagram 6).
Once the search has finished, the browser is redirected to the results page (diagram 7).
|
|
|
diagram 5
|
|
|
diagram 6
|
|
|
diagram 7
|
|
| |
[top] |
| |
|
|
Search single spectrum
|
If searching with a single spectrum, the X! Hunter
single
spectrum search feature is appropriate. The user enters the parent ion mass and pastes
the fragment ion masses and intensities into the associated text input fields (diagram 8).
Help information for each parameter
is available by clicking on the icon next to the parameter name.
Once satisfied with the entered parameters and the selected taxonomy, the user clicks the 'Find Models' button.
Once the search has finished, the browser is redirected to the results page (diagram 9). |
|
|
diagram 8
|
|
|
diagram 9
|
|
| |
[top] |
| |
|
|
Logging in
|
|
Power users must log in before they can perform any tasks.
Entering username and password and clicking 'Login' (diagram 10), the user is forwarded to the control page
(diagram 11). From here, the user can choose to add a new spectrum
or remove a number of spectra associated with a peptide sequence. If the incorrect
username or password were entered, an error message is displayed and the user is redirected back to
the login page (diagram 12).
|
|
|
diagram 10
|
|
|
diagram 11
|
|
|
diagram 12
|
|
| |
[top] |
| |
|
|
Adding a new spectrum
|
Power users have the ability to add new spectra directly to the database.
If a peptide with the given accession number already exists in the ConsensusSpectrum table with
the same charge, start position, end position and modifications, it will be overwritten
with the new spectrum if the user decides to add it anyway. This means that the current
entries in the Spectrum table will also be replaced with a single entry using the new
spectrum. When additional results are added through an incremental population,
peptides identical to the one added in the above manner will be added to the Spectrum table,
and the ConsensusSpectrum recalculated.
The user fills out the form (diagram 13) ensuring all parameters are populated
although only one style of fragment information is required. Help information for each
parameter is available by clicking on the icon next to the parameter name. Once
satisfied with the entered values the user clicks the 'Confirm' button. Once the
values have been validated, the confirmation page is displayed, allowing
the user to compare the current entry (diagram 14a), if one exists, with the new entry and to confirm
that all the entered values are correct (diagram 14b). If any values need to be changed, a link is provided
that will return the user to the input form.
Once satisfied with the new values, click the 'Add entry' button and the new peptide will be
added. A confirmation message will appear once finished (diagram 15) and the user returned to the
Add spectrum page.
Adding a spectrum in this manner will not alter the library file. To create a new library file, see
the libraries section.
|
|
|
diagram 13
|
|
|
diagram 14a
|
|
|
diagram 14b
|
|
|
diagram 15
|
|
| |
[top] |
| |
|
|
Removing spectra
|
Power users also have the ability to remove spectra from the database. The user fills out the form
(diagram 16) by entering the peptide sequence associated with the spectra to remove and selecting
the taxonomy that was the source of the peptide. Once satisfied with the entered values, the user clicks
the 'Confirm' button. The information is gathered from the database and a confirmation page (diagram 17) is
displayed, allowing the user to select which spectra they would like to remove. By default, the spectrum
diagrams are hidden, but can be made to appear by clicking the numbered link next to the summary information
(diagram 18). All the diagrams can be displayed by selecting the checkbox next to 'show/hide all'.
The user may also choose to exclude this peptide from future populations or incremental builds and add
a comment detailing why this peptide was removed.
Once the user has selected a number of spectra, they click the 'Remove Selected' button and the selected
spectra will be removed. A confirmation message will appear once finished (diagram 19) and the user returned to
the Remove spectra page.
Removing spectra in this manner will not alter the library file. To create a new library file, see
the libraries section.
|
|
|
diagram 16
|
|
|
diagram 17
|
|
|
diagram 18
|
|
|
diagram 19
|
|
| |
[top] |
| |
|
|
Building the libraries
|
|
The ASL files contain the database spectra information, optimized to provide very fast
access to the details for each peptide sequence. When a multi-spectra search is initiated, the entire library
for the current taxonomy is loaded into memory and indexed by parent ion mass, allowing seek times in milliseconds.
|
|
Creating a library file is a multi-step process and can be done two ways depending on whether
it is the first time the file is being created or if it is being created after incremental
additions to the database. In either case, a series of perl scripts performs the operations.
|
- First time build
|
When building the library file for the first time, the first step is to create
'pre-composite' files using the entries in GPMDB (xhunter_precompsite.pl). These are BIOML formatted
files that contain the ten best scoring spectra for each peptide at each charge state (1,2 and 3)
from each protein within the current taxonomy.
One set of 'pre-composite' files is created for peptides with
post translational modifications and one set for peptides with no modifications.
Click here for an
excerpt which contains the At1g01090.1 protein (pyruvate dehydrogenase E1 component alpha subunit, chloroplast)
from the Arabidopsis Thaliana taxonomy.
|
|
The resulting 'pre-composite' files are processed (xhunter_populate.pl) to create a BIOML formatted 'composite' file for the taxonomy.
The 'composite' file contains the consensus spectra for each peptide created by 'averaging' the
groups of spectra (the ten best scoring spectra for each peptide at each charge state (1,2 and 3)
from each protein within the current taxonomy) in the 'pre-composite' files.
Click here for an excerpt which is the
composite version of the At1g01090.1 protein (pyruvate dehydrogenase E1 component alpha subunit, chloroplast)
from the Arabidopsis Thaliana taxonomy. This step also populates the database for the current taxonomy.
|
|
The 'composite' files are then optimized and written in binary format to create the ASL file.
This is done using the xcurator program. This is a c++ program that removes redundancies, validates the entries
and writes the peptide and spectra information in binary format.
Once the library file is created, the 'pre-composite' and 'composite' files may be discarded.
|
|
The diagram below describes the data and process flow for a first time build. Click the diagram to
view an enlarged version in a new window.
|
|
|
- Incremental build
If creating the ASL file(s) from an incremental database build instead of from the 'pre-composite' files, the first step is to add
the new result names from GPMDB to the xhunter database with the 'added flag' set to '0'.
The new GPMDB entries are then processed (xhunter_incremental.pl) for each selected taxonomy by selecting peptides
that have a better score than the current entries in the Spectrum database or did not previously exist.
There are three cirmustances where this can happen.
-
the parent ion mass of the new spectrum does not already exist in the database,
the mh gets added to the MH table and the new spectrum added to the Spectrum table.
The consensus spectrum in these cases is simply a copy of the new spectrum.
-
the parent ion mass has been seen before, but less than 10 spectra exist for the current
peptide, The new spectrum is added to the Spectrum table.
-
the parent ion mass has been seen before and a consensus spectrum
already has 10 spectra associated with it. The least probable spectrum is replaced.
Each time a spectrum is added or replaced, the consensus spectrum is marked as needing a recalculation.
|
|
The next step is to go through all the entries that have been marked as needing recalculation
and create the 'averaged' spectra from the associated entries in the Spectrum table.
Then the BIOML formatted 'composite' file is created for the taxonomy using only the entries in
the database (xhunter_composite.pl) instead of using 'pre-composite' files as in the first time build.
|
|
The 'composite' file is then optimized and written in binary format to create the ASL file, just
like in the first time build.
|
|
The diagram below describes the data and process flow for an incremental build. Click the diagram to view
an enlarged version in a new window.
|
|
|
| |
[top] |
| |
|
|
|
Managing users
|
|
If a casual user wishes to become a power user, they must send the admin an email with all
the required information as laid out in the registration page. When the admin user receives the
information, they populate the registration form and click 'Add user'. This will add the new
user to the XUser table in the XHunter database and grant the permissions required for
the new user to access the power user functionality of the system. The admin can then email
the new power user confirming that they have been added.
|
| |
[top] |
| |
|
|
Database descriptions
|
|
The databases are used to store peptide and spectrum information from GPMDB in order to
allow users to find matches for single spectrum searches and to create new library files after
incremental builds. They are also useful for statistical analysis and a good tool to track down
errors or anomalies. Detailed database descriptions in html format here or download as a
Windows help file.
|
There are two types of database used in the X! Hunter system.
The xhunter database (diagram 20) is used to control the population of the xhunter taxonomy databases, manage users
and track statistical information. Only one instance of this database is required for each installation.
diagram 20 - click to enlarge
The xhunter per taxonomy databases (diagram 21) contain the consensus mass spectra information as well as the spectra information
that was used to create the consensus spectra and all the peptide domain information.
They are also used to keep track of which peptides should
not be added in future populations. One instance of this type of database is required for each taxonomy.
diagram 21 - click to enlarge
|
| |
[top] |
| |
|
|
Installation
|
There are two options for installation:
- library only installation -
no custom or library rebuilds, no database, one search form
- library and database installation -
customizable libraries and databases, three search forms (requires gpmdb and gpm installation)
All installations require that the computer have a web server and Perl installed. The next two
sections describe how to install and configure these items before X! Hunter is installed.
If you already have an intallation of GPM or GPMDB you will already have a web server and Perl installed,
but you will need to add a VirtualHost entry to httpd.conf as described below.
|
|
Web server installation
|
- Download and install the Apache web server.
-
Next, configure Apache using the text file, httpd.conf, stored in /etc/bin/httpd/conf on Linux
and C:/Program Files/Apache Group/Apache2/conf on Windows.
The only thing you really need to tell Apache is where your web accessible files are.
This is done using Aliases or VirtualHosts.
Add the one of the following two sets of directives to the appropriate section of the
httpd.conf file. For Linux, change the
paths as necessary to something like /var/www/thegpm-xhunter/[folder]. Note
that X! Hunter may be installed anywhere, the paths given are only one possible
suggestion.
(hints: 1. search for the text string 'Aliases:' to find Alias section easily
in httpd.conf. 2. X! Hunter may be installed anywhere, the paths given are only the best suggestion.)
If you have GPM or GPMDB installed on the same server, add the following Virtual Host directives to
section 3 of the httpd.conf file.
NameVirtualHost xxx.xxx.xxx.xxx ## change this to the ip address or name of this server
<VirtualHost xxx.xxx.xxx.xxx:80>
ServerAdmin admin@thegpm.org
DocumentRoot c:/thegpm-xhunter
ServerName localhost ## change this to the name of your server
ErrorLog logs/thegpm-xhunter_error_log
CustomLog logs/thegpm-xhunter_access_log common
ScriptAlias /thegpm-cgi/ "C:/thegpm-xhunter/thegpm-cgi/"
<Directory "C:/thegpm-xhunter/thegpm-cgi">
AllowOverride None
Options None
Order allow,deny
Allow from all
</Directory>
</VirtualHost>
If you do not have GPM or GPMDB installed, add the following to section 2 of httpd.conf.
(hint: search for the text string “Aliases” to find their location in section 2).
Alias /gpm/ "C:/thegpm-xhunter/"
<Directory "C:/thegpm-xhunter">
Options Indexes MultiViews
AllowOverride None
Order allow,deny
Allow from all
</Directory>
Alias /thegpm/ "C:/thegpm-xhunter/"
<Directory "C:/thegpm-xhunter">
Options Indexes MultiViews
AllowOverride None
Order allow,deny
Allow from all
</Directory>
Alias /gpm/ "C:/thegpm-xhunter/gpm/"
<Directory "C:/thegpm-xhunter/gpm">
Options Indexes MultiViews
AllowOverride None
Order allow,deny
Allow from all
</Directory>
Alias /tandem/ "C:/thegpm-xhunter/tandem/"
<Directory "C:/thegpm-xhunter/tandem">
Options Indexes MultiViews
AllowOverride None
Order allow,deny
Allow from all
</Directory>
Alias /pics/ "C:/thegpm-xhunter/pics/"
<Directory "C:/thegpm-xhunter/pics">
Options Indexes MultiViews
AllowOverride None
Order allow,deny
Allow from all
</Directory>
Alias /cache/ "C:/thegpm-xhunter/cache/"
<Directory "C:/thegpm-xhunter/cache">
Options Indexes MultiViews
AllowOverride None
Order allow,deny
Allow from all
</Directory>
Alias /styles/ "C:/thegpm-xhunter/styles/"
<Directory "C:/thegpm-xhunter/styles">
Options Indexes MultiViews
AllowOverride None
Order allow,deny
Allow from all
</Directory>
ScriptAlias /thegpm-cgi/ "C:/thegpm-xhunter/thegpm-cgi/"
<Directory "C:/thegpm-xhunter/thegpm-cgi">
AllowOverride None
Options None
Order allow,deny
Allow from all
</Directory>
- restart the Apache server
|
|
Perl installation
|
|
This step should only be required for Windows, as most Linux distributions come with Perl installed.
|
-
Download and install ActiveState Perl.
-
It is recommended to install at c:\perl if possible as the perl scripts expect
to find the perl executable there. If installed somewhere else, the first line
of each perl script will need to be changed to point to the correct location of
perl.
|
|
Library only installation
|
Download the latest version of the
installation and unzip the archive to a temp folder.
The library only installation contains the following directory structure:
- thegpm-xhunter
- cache
- fasta
- gpm
- lib
- pics
- styles
- tandem
- thegpm-cgi
Note: The ASL files themselves are very large and are not included in the installation package.
Download the library (hlf) files for your taxonomies of interest
here and move them to the lib folder.
-
Move the entire 'thegpm-xhunter' folder to C:\ (Windows) or var/www/ (Linux),
as the case may be. Ensure the locations of the directories exactly match their
specifications in httpd.conf shown above.
-
Edit the file defines.pl found in 'thegpm-cgi' folder. If the service will be accessed
from other computers, replace 'localhost' in the function 'get_server_name()' with the web address of the server.
You may also want to change the numeric part of the value returned in the function get_gpm_number(). The
value returned from this function is the prefix of the output file names generated by X! Hunter searches.
-
Edit the index.html file. Change 'localhost' to the web address of your server.
-
Edit the file footnote.js found in the 'tandem' folder to specify a 'hosted by' message to
appear at the bottom of the display pages.
-
Edit the file taxonomy.xml found in the 'tandem' folder to add the relative addresses (URLs) of the ASL files.
The new entries are specified with the attribute format="spectrum" in the <file> tag.
In the sample entry below, X! Hunter searches will use all "spectrum" and "peptide" files,
while X! Tandem searches only use the "peptide" files.
Notes:
-
X! Hunter requires the fasta.pro files to obtain full protein sequences.
-
Database searches do not use this taxonomy mechanism.
<taxon label="yeast">
<file format="spectrum" URL="../lib/yeast_cmp_20.hlf" />
<file format="spectrum" URL="../lib/crap_cmp_20.hlf" />
<file format="peptide" URL="../fasta/scd.fasta.pro" />
<file format="peptide" URL="../fasta/scd_1.fasta.pro" />
<file format="peptide" URL="../fasta/crap.fasta.pro" />
</taxon>
|
To test the functionality, open a browser and point it to
http://localhost/index.html. Once again, you may substitute the web address of your server for 'localhost'.
Browse to a spectra file, set the search parameters and click the 'Find models' button to start a search
as described in the Search multiple spectra section above.
|
|
Keeping up to date
|
Each month the public X! Hunter databases are updated with GPMDB entries created since the last
incremental build, and from the databases new library files are created as described above.
The updated library files can be downloaded from the GPMO
ftp site.
|
|
Library and database installation
|
This section is for users who wish to create libraries and databases containing results from their own
installation of GPMDB. Therefore the user must have installed GPMDB and at least one GPM server
providing spectra results to it. For help with these installations,
see
gpmdb and
gpm.
Once GPMDB and GPM are installed, the X! Hunter web files can be installed and the X! Hunter databases can be created
and populated, although the population can only be performed once GPM results have been added to GPMDB.
Follow the instructions in the web server installation section above to install the Apache web server
and set the configuration required for X! Hunter.
Follow the instructions in the perl installation section above to install perl.
Install the dbi and dbd perl modules needed for database communication.
-
Windows:
-
Open a command prompt and enter the following commands to install DBI and DBD-mysql:
c:\> cd perl\bin
c:\perl\bin> ppm
ppm> install DBI
ppm> install DBD-mysql
See here
for pointers.
-
Linux:
-
Download DBI and
DBD perl modules and extract them to a temporary folder.
-
From the command prompt enter the following commands to intall DBD and DBI-mysql:
perl Makefile.PL
make
make test
make install
Read the documentation on the DBD and DBI-mysql pages for pointers.
Download the latest version of the installation
and unzip the archive to a temp folder.
This installation contains the following directory structure:
- thegpm-xhunter
- assembling software
- cache
- docs
- fasta
- gpm
- lib
- pics
- scripts
- taxon_source_[nomods|withmods] (multiple)
- styles
- tandem
- thegpm-cgi
Note: The ASL files themselves are very large and are not included in the installation package.
Download the library (hlf) files for your taxonomies of interest
here and move them to the lib folder.
-
Move the entire 'thegpm-xhunter' folder to C:\ (Windows) or var/www/ (Linux),
as the case may be. Ensure the locations of the directories exactly match their
specifications in httpd.conf shown above.
-
Edit the file defines.pl found in 'thegpm-cgi' folder. If the service will be accessed
from other computers, replace 'localhost' in the function 'get_server_name()' with the web address of the server.
You may also want to change the numeric part of the value returned in the function get_gpm_number(). The
value returned from this function is the prefix of the output file names generated by X! Hunter searches.
This value must be different than the value returned by this function in the current GPM installation.
-
Edit the index.html file. Change 'localhost' to the web address of your server.
-
Edit the file footnote.js found in the 'tandem' folder to specify a 'hosted by' message to
appear at the bottom of the display pages.
-
Edit the file dbcommon.pl found in 'thegpm-cgi'. Change 'xhunterhost'
to the name of the server in GetXHunterHost() and 'gpmdbhost' to the name of the GPMDB server in
GetGPMDBHost().
-
Edit the file taxonomy.xml found in the 'tandem' folder to add the relative addresses (URLs) of the ASL files.
The new entries are specified with the attribute format="spectrum" in the <file> tag.
In the sample entry below, X! Hunter searches will use all "spectrum" and "peptide" files,
while X! Tandem searches only use the "peptide" files.
Notes:
-
X! Hunter requires the fasta.pro files to obtain full protein sequences.
-
Database searches do not use this taxonomy mechanism.
<taxon label="yeast">
<file format="spectrum" URL="../lib/yeast_cmp_20.hlf" />
<file format="spectrum" URL="../lib/crap_cmp_20.hlf" />
<file format="peptide" URL="../fasta/scd.fasta.pro" />
<file format="peptide" URL="../fasta/scd_1.fasta.pro" />
<file format="peptide" URL="../fasta/crap.fasta.pro" />
</taxon>
-
Copy and paste the following code at the end of the current copy of popGPMDB.pl in your GPMDB installation.
addto_hunter();
## add to the XHUNTER GPMDBResult table
## first check to see if this resultid is already added and set to 1,
## incase this population happened while xhunter_precomposite was running,
## as when it finishes it takes ALL the resultids from
## result, adds them to XHUNTER and sets added=1.
##
sub addto_hunter(){
hunter_connect();
$sql = "SELECT resultid from GPMDBResult WHERE resultid = ?";
my $sth_hunter = $dbh_hunter->prepare($sql);
my $rc = $sth_hunter->execute($resultid);
if($rc > 0){
$sql = "UPDATE GPMDBResult SET added = 0 WHERE resultid = ?";
$sth_hunter = $dbh_hunter->prepare($sql);
$sth_hunter->execute($resultid);
}
else{
$sql = "INSERT INTO GPMDBResult(resultid,file,added) values(?,?,?)";
$sth_hunter = $dbh_hunter->prepare($sql);
$sth_hunter->execute($resultid,$file,0);
}
$sth_hunter->finish();
hunter_disconnect();
}
This extra code updates the X! Hunter database
with the names of the files that have been added to GPMDB for future populations
of the xhunter and taxonomy databases.
-
Edit the xhunter_register.html file found in the 'tandem' folder. Change the
administrator email address to something that makes sense for your installation.
-
Edit the xhunter_forgotpassword.html file found in the 'tandem' folder. Change the
administrator email address to something that makes sense for your installation.
Database creation
The next step is to create and populate the xhunter and taxonomy databases.
The taxon_source_[nomods|withmods] folders will contain the 'pre-composite' files that are created
with xhunter_precomposite.pl using the results from your GPMDB site.
For every taxonomy that will be part of the database and search system, there needs to be a set of these folders and
matching entries in the xhunter database describing the following:
- taxonomy
- source database or web site
- perl regular expression that describes the accession numbering style
This distribution contains the taxonomy entries in the database required for many
institutions, although custom and additional standard sources can be added.
|
|
If you want to remove some of the default taxonomies:
-
Edit the file create_xhunter.sql, and for each unwanted taxonomy, remove the lines that look like this:
insert into taxon(taxonid,name,dbname,regex,species)
values(NULL,'human_ensembl','xhunter_human_ensembl','^ENSP[[:digit:]]+$','H. sapiens');
-
Delete the unwanted /scripts/create_xhunter_taxon_source.sql files.
-
Edit the file /tandem/xhunter_species.js, removing the lines that contain taxonomies that were not used.
-
The perl scripts (xhunter_precomposite.pl, xhunter_populate.pl, xhunter_incremental.pl, xhunter_composite.pl)
also need to be editted to remove unwanted taxonomies and to control which taxonomies are processed
when the scripts are run. This is explained in the controlling taxonomies section.
The next step is to create the empty database structures.
-
Open a command prompt and navigate to /thegpm-xhunter/scripts.
-
At the command prompt, type mysql -u user -ppass replacing 'user' and 'pass' with your username and password.
-
Enter the following command:
mysql> source create_xhunter.sql
This will create the xhunter database with the entries to track which files have been added, the statistics
for each build and the taxonomic specifics.
- Enter the following command for each taxonomy you are creating:
mysql> source create_xhunter_taxon_source.sql
replacing 'taxon' with the name of the taxonomy and 'source' with the name of the source database or web site.
For example:
mysql> source create_xhunter_mouse_ensembl.sql
mysql> source create_xhunter_yeast_sgd.sql
This will create the empty taxonomy databases which will contain the entries for the consensus spectra and the spectra
that went into making the consensus spectra, as well as the post translational modifications if any.
The next step is to create the permissions for the new databases for the admin user. You can either use an
existing user or create a new one. Either way, at the mysql command prompt, enter the following commands:
mysql> GRANT ALL PRIVILEGES ON XHUNTER.* TO 'adminuser'@'host' IDENTIFIED BY 'password'
mysql> GRANT ALL PRIVILEGES ON XHUNTER_taxon_source.* TO 'adminuser'@'host' IDENTIFIED BY 'password'
Replacing 'adminuser' with the new user name or an existing admin user, 'host' with localhost if the population
and admin functions are to be performed from the same machine as the database or the ip address of the machine
from where the admin functions will be performed or both and 'password' with the password. The second statement
needs to be entered for each taxonomy database that was created above. Now create the
permissions for the casual users:
mysql> GRANT SELECT ON XHUNTER.* TO 'casualuser'@'host' IDENTIFIED BY 'password'
mysql> GRANT SELECT ON XHUNTER_taxon_source.* TO 'adminuser'@'host' IDENTIFIED BY 'password'
Replacing 'casualuser' with the new user name or an existing resticted user, 'host' with the ip addresses
from which connections will be allowed and 'password' with the password. The second statement
needs to be entered for each taxonomy database that was created above.
Edit the file called dbcommon.pl. Enter the casualuser name and password in GetUser() and GetPass(). Enter the
adminuser in GetRUser() and GetRPass().
For both admin and casual users the host can be '%' (percent sign) if connections will be allowed from anywhere, although this is not
recommended for the admin user.
Once the empty database structures have been created, the databases are populated and ASL files are created as follows:
- 'Pre-composite' files are created using current state of your GPMDB
- 'Composite' files are created using 'pre-composite' files
- X! Hunter databases populated using 'composite' information
- Library files created using 'composite' information
Controlling taxonomies
At times, it may be a requirement to process only select taxonomies as opposed to all existing taxonomies.
Also, some of the default taxonomies that are included with an X! Hunter installation may not be of interest
to the researchers. The population and file creation perl scripts
(xhunter_precomposite.pl, xhunter_populate.pl, xhunter_incremental.pl, xhunter_composite.pl)
need to be editted to control which taxonomies are included when they are run.
The follwing code controls this functionality:
my @taxonid=();
##get all the taxonomies in the xhunter database
hunter_connect();
my $select_taxon = "SELECT taxonid FROM taxon";
my $sth_select_taxon = $dbh_hunter->prepare($select_taxon);
$sth_select_taxon->execute();
my @result_taxon;
while(@result_taxon = $sth_select_taxon->fetchrow_array()){
push(@taxonid,$result_taxon[0]);
}
$sth_select_taxon->finish();
hunter_disconnect();
This will process all the taxonomies from the Taxon table. To select specific taxonomies, change the SQL query
as follows:
my $select_taxon = "SELECT taxonid FROM taxon where name = 'human_ensembl";
or
my $select_taxon = "SELECT taxonid FROM taxon where name in('human_ensembl','at_tigr')";
One issue that exists with not using all available taxonomies is that the result files may have contained
proteins for the taxonomies that were left out. These result files have been marked as 'added' and
if the neglected taxonomies are required in the future, those results will not be included in their database.
The solution to this is to create a new build from scratch starting with 'pre-composite' file creation.
Pre-composite file creation
This step creates the 'pre-composite' files. These are BIOML formatted files that contain the ten best
scoring spectra for each peptide at each charge state (1,2 and 3) from each protein within the current taxonomy.
This step can take a long time. For the main X! Hunter site, where there are close to 14 million peptide entries,
it takes close to a week and creates about 13 GB of 'pre-composite' files.
-
Open a command prompt and navigate to /thegpm-xhunter/assembling software.
-
At the command prompt, enter the following:
perl xhunter_precomposite.pl
The script will continue to run, printing some status messages, until all the taxonomies have been processed.
The files will be created in the taxon_source_[nomods|withmods] folders.
Composite file creation and database population
The next step is to create the 'composite' files from the newly created 'pre-composite' files and to populate the taxonomy
databases.
-
At the command prompt, enter the following:
perl xhunter_populate.pl
The script will continue to run, printing some status messages, until all the taxonomies have been processed.
The files will be created in the 'assembling software' folder.
Library creation
The final step is to create the ASL files from the 'composite' files.
This is a multi step process that uses both the X! Hunter and X! Curator programs.
X! Curator is a c++ program that removes redundancies and validates the entries. The parameters
used by the X! Curator program are contained in xml files much like the input and taxonomy xml files used to control
X! Tandem and X! Hunter searches. Click here for an example.
Follow the steps below to create the new library:
-
Edit the taxonomy.xml file in the 'tandem' folder to use the new 'composite' file(s):
<taxon label="taxon">
<file format="spectrum" URL="../lib/taxon_source_nomods.xml" />
<file format="spectrum" URL="../lib/taxon_source_withmods.xml" />
<file format="peptide" URL="../fasta/taxon_source.fasta.pro" />
</taxon>
-
Run X! Hunter from the command line using the 'composite' file(s) that were created using xhunter_populate.pl
or xhunter_composite.pl.
This will create file(s) with the same name as the 'composite' file(s) with '.hlf' appended.
xhunter.exe input.xml
-
Copy the new '.hlf' file(s) and the 'composite' files to the /assembling software/xcurator/lib folder.
-
Edit the input.xml file in /assembling software/xcurator/bin as follows:
The path to the taxonomy.xml file. Probably should't have to edit this value.
<note type="input" label="list path, taxonomy information">taxonomy.xml</note>
The name of the taxonomy within taxonomy.xml which was the source of the .hlf file creation
when X! Hunter was initially run.
<note type="input" label="protein, taxon">taxon</note>
The path of the ASL file to create.
<note type="input" label="library, append path">../lib/taxon_source_20.hlf</note>
The path to the fasta sequence file that was the source for the identifications
from which the composite file was initially created.
<note type="input" label="sequences, list path">../fasta/taxon.fasta.pro</note>
-
Edit taxonomy.xml in /assembling software/xcurator/bin as follows:
The value for 'protein, taxon' from the input.xml file and the names of the 'composite' files
that were used to create the .hlf file when X! Hunter was initially run.
<taxon label="taxon">
<file format="spectrum" URL="../lib/taxon_source_nomods.xml" />
<file format="spectrum" URL="../lib/taxon_source_withmods.xml" />
</taxon>
-
Run X! Curator from the command line as follows:
xcurator.exe input.xml
-
Move the new ASL file to 'thegpm-xhunter/lib' folder.
-
Edit the taxonomy.xml file in the 'tandem' folder to use the new ASL file:
<taxon label="taxon">
<file format="spectrum" URL="../lib/taxon_source_20.hlf" />
<file format="peptide" URL="../fasta/taxon_source.fasta.pro" />
</taxon>
The new file can now be used by X! Hunter.
Note: This process also creates a X! P3 file with the same name as the ASL file with '.fasta' appended.
To test the library functionality, open a browser and point it to
http://localhost/index.html. Once again, you may substitute the web address of your server for 'localhost'.
Browse to a spectra file, set the search parameters and click the 'Find models' button to start a search
as described in the search multiple spectra section above.
To test the database search functionality, open a browser and point it to
http://localhost//andem/xhunter_singlespec.html.
Once again, you may substitute the web address of your server for 'localhost'.
Enter the fragment masses and intensities, set the search parameters and click the 'Find models' button
to start a search as described in the search single spectrum section above.
To test the database browse functionality, open a browser and point it to
http://localhost/tandem/xhunter_browse.html.
Once again, you may substitute the web address of your server for 'localhost'.
Enter the parameters, select the taxonomy and click the 'Find models' button to start the search as described
in the browse spectra section above.
|
| |
[top] |
| |
|
|
Incremental builds
|
It is possible to incrementally add results from your installation of GPMDB to the X! Hunter databases, as long
as a full build has already been completed.
This process requires three conditions:
- Tandem and/or P3 searches are run on a daily basis
- Local GPMDB is populated daily with Tandem and/or P3 search results
- New results in GPMDB are added to X! Hunter with 'added' flag set to '0'
The first step for an incremental build is to check all the peptides that belong to files with the 'added' flag
set to '0' to see if any have better scores than identical peptides that exist in taxon_source database.
Any new or replaced peptide has it's parent consensus spectrum marked as needing a recalculation. Once all
the files are processed, the consensus spectra that were marked as needing a recalculation are processed
and updated with the newly averaged mass and intensisty pairs.
-
At the command prompt, enter the following:
perl xhunter_incremental.pl
The script will continue to run, printing some status messages, until all the taxonomies have been processed.
The next step is to create the 'composite' files from the entries in the database. Each protein accession
for the current taxon_source database, along with all it's distinct peptide domains, including modifications if they
exist, are written to a new 'composite' file.
-
At the command prompt, enter the following:
perl xhunter_composite.pl
The script will continue to run, printing some status messages, until all the taxonomies have been processed.
Finally, the ASL files are created from the newly created 'composite' files using the instructions in the
library creation section above.
|
|
|
Custom Databases
|
To add your own databases or databases that are not included to the X! Hunter system follow these steps, assuming that results
of this type have been successfully added to the local GPMDB.
In the instructions below replace XHUNTER_TAXON_SOURCE with the taxonomy and source of the fasta file for the custom
database. For example if the new taxonomy was cow and the source was ensembl, the entries would be XHUNTER_COW_ENSEMBL.
-
Add database to taxonomy list search forms. Edit the file called /tandem/xhunter_species.js by copying an existing line and
pasting it at the end of the file. Edit the newly added line for the new database:
document.writeln("<option selected value=\"human\">H. sapiens (human)</option>");
document.writeln("<option selected value=\"custom\">custom database</option>");
-
Edit the /tandem/taxonomy.xml file by adding the new database using the existing format. Copy and paste an existing entry
and make the required changes, ensuring that the value attribute in the xhunter_species.js file matches the
label attribute in taxonomy.xml.
<taxon label="yeast">
<file format="spectrum" URL="../lib/yeast_cmp_20.hlf" />
<file format="spectrum" URL="../lib/crap_cmp_20.hlf" />
<file format="peptide" URL="../fasta/scd.fasta.pro" />
<file format="peptide" URL="../fasta/scd_1.fasta.pro" />
<file format="peptide" URL="../fasta/crap.fasta.pro" />
</taxon>
<taxon label="custom">
<file format="spectrum" URL="../lib/taxon_source_cmp_20.hlf" />
<file format="spectrum" URL="../lib/crap_cmp_20.hlf" />
<file format="peptide" URL="../fasta/taxon_source.fasta.pro" />
<file format="peptide" URL="../fasta/crap.fasta.pro" />
</taxon>
-
Create empty folders named using the taxon_source_withmods and taxon_source_nomods naming convention. These
are the folders that will contain the 'pre-composite' files. For this example that would be 'human_custom_nomods' and
human_custom_withmods.
-
Add an entry to the Taxonomy table in the XHUNTER database. Open a command prompt,
navigate to the /xhunter/scripts folder and type
mysql -u user -ppass replacing 'user' and 'pass' with your username and password. Then type the following:
mysql> \u XHUNTER
mysql> insert into Taxon(taxonid,name,dbname,regex,species)
values(NULL,'taxon_source','xhunter_taxon_source','regular expression','C. ustom');
The 'regular expression' needs to be replaced with an actual SQL regular expression that describes the naming
convention of the accession numbers for the custom database. Help can be found by looking at the examples
in the existing entries or by checking the mysql documentation.
-
Create the sql script for custom database creation.
-
Add permissions for accessing new database. At the mysql command prompt, type the following:
mysql> GRANT ALL PRIVILEGES ON XHUNTER.* TO 'adminuser'@'host' IDENTIFIED BY 'password'
mysql> GRANT ALL PRIVILEGES ON XHUNTER_taxon_source.* TO 'adminuser'@'host' IDENTIFIED BY 'password'
Replacing 'adminuser' with the new user name or an existing admin user, 'host' with localhost if the population
and admin functions are to be performed from the same machine as the database or the ip address of the machine
from where the admin functions will be performed or both and 'password' with the password. Now create the
permissions for the casual users:
mysql> GRANT SELECT ON XHUNTER.* TO 'casualuser'@'host' IDENTIFIED BY 'password'
mysql> GRANT SELECT ON XHUNTER_taxon_source.* TO 'adminuser'@'host' IDENTIFIED BY 'password'
Replacing 'casualuser' with the new user name or an existing resticted user, 'host' with the ip addresses
from which connections will be allowed and 'password' with the password.
-
Edit the perl scripts (xhunter_browse_mh.pl, xhunter_browse_peptide.pl and xhunter_singlespec.pl)
that need to know the name and fasta source of the new database by adding the new taxonomy to the URL hash:
$URL{YEAST_SGD} = "../fasta/scd.fasta.pro";
$URL{TAXON_SOURCE} = "../fasta/taxon_source.fasta.pro";
Once the databases are set up and the access permissions in place, the database is populated and
the ASL file is created as follows:
- 'Pre-composite' files are created using current state of your GPMDB
- 'Composite' files are created using 'pre-composite' files
- X! Hunter database populated using 'composite' information
- Library file created using 'composite' information
Pre-composite file creation
This step creates the 'pre-composite' files. These are BIOML formatted files that contain the ten best
scoring spectra for each peptide at each charge state (1,2 and 3) from each protein within the current taxonomy.
This step can take a long time. For the main X! Hunter site, where there are close to 14 million peptide entries,
it takes close to a week and creates about 13 GB of 'pre-composite' files.
-
Open a command prompt and navigate to /thegpm-xhunter/assembling software.
-
At the command prompt, enter the following:
perl xhunter_precomposite.pl
The script will continue to run, printing some status messages, until all the taxonomies have been processed.
The files will be created in the taxon_source_[nomods|withmods] folders.
Composite file creation and database population
The next step is to create the 'composite' files from the newly created 'pre-composite' files and to populate the taxonomy
databases.
-
At the command prompt, enter the following:
perl xhunter_populate.pl
The script will continue to run, printing some status messages, until all the taxonomies have been processed.
The files will be created in the 'assembling software' folder.
Library creation
Finally, the ASL files are created from the newly created 'composite' files using the instructions in the
library creation section above.
|
Getting help
If you have any problems installing the X! Hunter software, please
contact
us or visit the message board.
|
|
|