X! Hunter Documentation

Navigation
System Overview
User types and actions
Database descriptions
Installation
Custom databases
Getting help
System Overview
X! Hunter is part of the X! Series of protein identification search engines made available by The Global Proteome Machine Organization (GPMO). The X! Hunter search engine compares experimentally observed spectra directly with consensus mass spectra obtained from the GPMDB and produces results in an XML format known as BIOML, the XML standard used by all GPMO software. This allows for the easy sharing of results between scientists and the ability to analyse the results using common GPMO tools.

A manuscript describing X! Hunter Annotated Spectrum Library (ASL) searches has been published on the ASAP section of the Journal of Proteome Research (Abstract).

The suite of programs and software that make up X! Hunter can be divided into three basic groups:
  • consensus mass spectra library files and databases
  • identification algorithm
  • search interface and identification/analysis display
 
Consensus mass spectra library files and databases
The consensus mass spectra are stored in two formats: binary library files (ASL) and in a database. Each taxonomic group is represented in both formats, as the formats each have their own applications. Both formats contain the same information, although the binary format is processed to be non-redundant in the sense that each spectrum is only listed once with homogolous accession numbers following. Whereas the database does not have any device that removes redundancies. ie: a peptide domain to consensus spectrum is a one-to-one relationship within each accession number, therefore a spectrum may be listed more than once within a taxonomy.

The ASL files are used when matching multiple spectra via file upload through a web form and the database is used when matching a single spectrum via text input through a web form. This is because the entire ASL is loaded into memory and therefore requires more initial overhead.

Once the ASL is loaded, it can be accessed very quickly and the initial overhead is negated because the speed of the search is greatly increased due to the binary nature of the data and because it is indexed and sorted by parent ion mass. As the number of spectra submitted increases, the overhead is reduced decreasing the time per spectrum. This is subject to the hardware limitations though. If too many spectra are submitted the hardware could become overwhelmed, thereby increasing the time per spectrum.

When a single spectrum is submitted, the overhead required to load the ASL outweighs the benefit of its increased speed so the database is used instead.

The database also provides useful statistics and allows an administrator to run summary and investigative queries allowing for the generation of reports and diagnostic testing.

Identification algorithm
As there are two formats for storing the consensus mass spectra, there are two methods for accesssing the data. For the ASL format, a compiled C++ program is used, whereas a PERL program is used for the database format. Both programs use the same identification algorithm and scoring method and both produce the results in BIOML format. The difference between the output produced is that the multiple spectra version is able to generate a protein score based on multiple peptides found matching a single protein accession number. In the single spectrum version, the peptide hyper score is used as the protein score.

The first step in the identification is to 'condition' the input spectra. This means removing isotopes, neutral losses from parent ions, parent ion masses themselves and then keeping the 20 most intense peaks. The new list of ions is then compared to all the consensus mass spectra in the given mass range, producing a theta and hyper score. Only the top scores for each input spectra are kept. Finally, the protein, peptide, post translational modifications, if any, and the spectrum are written in BIOML format and displayed in the web browser.

User types and actions
There are three levels of access to the system: casual users, power users and admin users.
  • Casual users:
    Casual users are those who uses the X! Hunter interface to compare their spectra with those in the X! Hunter libraries and databases. A casual user can:
    • browse spectra by parent ion mass or peptide sequence
    • search using multiple spectra from a file
    • search using one spectrum entered in a text field

  • Power users:
    To become a power user you must register by sending an email to the admin user detailing the information found here. The administrator will add you to the system and provide you with a password.

    Power users can perform tasks on the public version and if applicable, perform tasks on their local installation of X! Hunter. Once registered and logged in, power users can:
    • add a new spectrum using web interface
    • remove a spectrum using web interface
    • perform casual user tasks

  • Admin users:
    A user can become an admin user by downloading an installation of X! Hunter. This will allow them to administer a local copy of the X! Hunter libraries and databases. Changes made to these installations will not be reflected in the public version of X! Hunter.
    Admin users can:
    • build the library files
    • create the database entries
    • manage users
    • perform power user tasks

Browsing spectra
If searching for matches to a specific peptide sequence or peptide sequences within a particular mass range, the browse spectra page can be used.

To browse by mass range (diagram 1), the user enters the parent ion mass, selects the charge and taxonomy and sets the mass error range. Help information for each parameter is available by clicking on the icon next to the parameter name. Once satisfied with the chosen parameters, the user clicks the 'Find models' button. Once the search is finished, the browser is redirected to the results page (diagram 2).

diagram 1
diagram 2

To browse by peptide sequence, the user enters the peptide sequence and selects the charge and taxonomy (diagram 3). Help information for each parameter is available by clicking on the icon next to the parameter name. Once satisfied with the parameters, the user clicks the 'Find models' button. Once the search is finished, the browser is redirected to the results page (diagram 4).

diagram 3
diagram 4
  [top]
 
Search multiple spectra
X! Hunter can be used to match multiple spectra using the multiple spectra search feature by uploading a text file containing tandem mass spectra in the following formats:
  • BIOML
  • DTA
  • PKL
  • Matrix Science
  • mzXML
  • mzDATA
NOTE: As the schemas for mzXML and mzDATA files are quite involved, it is possible to create incarnations of these file types that are not X! Series compliant. View examples of X! Series compatible mzData and mzXML files.
To search using multiple spectra, the user clicks the 'Browse...' button and selects the spectra file to use (diagram 5). Help information for each parameter is available by clicking on the icon next to the parameter name. Once satisfied with the entered parameters and the selected taxonomy, the user clicks the 'Find Models' button. The file is uploaded and the search is performed (diagram 6). Once the search has finished, the browser is redirected to the results page (diagram 7).

diagram 5
diagram 6
diagram 7
  [top]
 
Search single spectrum
If searching with a single spectrum, the X! Hunter single spectrum search feature is appropriate. The user enters the parent ion mass and pastes the fragment ion masses and intensities into the associated text input fields (diagram 8). Help information for each parameter is available by clicking on the icon next to the parameter name. Once satisfied with the entered parameters and the selected taxonomy, the user clicks the 'Find Models' button. Once the search has finished, the browser is redirected to the results page (diagram 9).

diagram 8
diagram 9
  [top]
 
Logging in
Power users must log in before they can perform any tasks. Entering username and password and clicking 'Login' (diagram 10), the user is forwarded to the control page (diagram 11). From here, the user can choose to add a new spectrum or remove a number of spectra associated with a peptide sequence. If the incorrect username or password were entered, an error message is displayed and the user is redirected back to the login page (diagram 12).

diagram 10
diagram 11
diagram 12
  [top]
 
Adding a new spectrum
Power users have the ability to add new spectra directly to the database. If a peptide with the given accession number already exists in the ConsensusSpectrum table with the same charge, start position, end position and modifications, it will be overwritten with the new spectrum if the user decides to add it anyway. This means that the current entries in the Spectrum table will also be replaced with a single entry using the new spectrum. When additional results are added through an incremental population, peptides identical to the one added in the above manner will be added to the Spectrum table, and the ConsensusSpectrum recalculated.

The user fills out the form (diagram 13) ensuring all parameters are populated although only one style of fragment information is required. Help information for each parameter is available by clicking on the icon next to the parameter name. Once satisfied with the entered values the user clicks the 'Confirm' button. Once the values have been validated, the confirmation page is displayed, allowing the user to compare the current entry (diagram 14a), if one exists, with the new entry and to confirm that all the entered values are correct (diagram 14b). If any values need to be changed, a link is provided that will return the user to the input form. Once satisfied with the new values, click the 'Add entry' button and the new peptide will be added. A confirmation message will appear once finished (diagram 15) and the user returned to the Add spectrum page.

Adding a spectrum in this manner will not alter the library file. To create a new library file, see the libraries section.

diagram 13
diagram 14a
diagram 14b
diagram 15
  [top]
 
Removing spectra
Power users also have the ability to remove spectra from the database. The user fills out the form (diagram 16) by entering the peptide sequence associated with the spectra to remove and selecting the taxonomy that was the source of the peptide. Once satisfied with the entered values, the user clicks the 'Confirm' button. The information is gathered from the database and a confirmation page (diagram 17) is displayed, allowing the user to select which spectra they would like to remove. By default, the spectrum diagrams are hidden, but can be made to appear by clicking the numbered link next to the summary information (diagram 18). All the diagrams can be displayed by selecting the checkbox next to 'show/hide all'. The user may also choose to exclude this peptide from future populations or incremental builds and add a comment detailing why this peptide was removed.

Once the user has selected a number of spectra, they click the 'Remove Selected' button and the selected spectra will be removed. A confirmation message will appear once finished (diagram 19) and the user returned to the Remove spectra page.

Removing spectra in this manner will not alter the library file. To create a new library file, see the libraries section.

diagram 16
diagram 17
diagram 18
diagram 19
  [top]
 
Building the libraries
The ASL files contain the database spectra information, optimized to provide very fast access to the details for each peptide sequence. When a multi-spectra search is initiated, the entire library for the current taxonomy is loaded into memory and indexed by parent ion mass, allowing seek times in milliseconds.
Creating a library file is a multi-step process and can be done two ways depending on whether it is the first time the file is being created or if it is being created after incremental additions to the database. In either case, a series of perl scripts performs the operations.
  • First time build
    When building the library file for the first time, the first step is to create 'pre-composite' files using the entries in GPMDB (xhunter_precompsite.pl). These are BIOML formatted files that contain the ten best scoring spectra for each peptide at each charge state (1,2 and 3) from each protein within the current taxonomy. One set of 'pre-composite' files is created for peptides with post translational modifications and one set for peptides with no modifications. Click here for an excerpt which contains the At1g01090.1 protein (pyruvate dehydrogenase E1 component alpha subunit, chloroplast) from the Arabidopsis Thaliana taxonomy.
    The resulting 'pre-composite' files are processed (xhunter_populate.pl) to create a BIOML formatted 'composite' file for the taxonomy. The 'composite' file contains the consensus spectra for each peptide created by 'averaging' the groups of spectra (the ten best scoring spectra for each peptide at each charge state (1,2 and 3) from each protein within the current taxonomy) in the 'pre-composite' files. Click here for an excerpt which is the composite version of the At1g01090.1 protein (pyruvate dehydrogenase E1 component alpha subunit, chloroplast) from the Arabidopsis Thaliana taxonomy. This step also populates the database for the current taxonomy.
    The 'composite' files are then optimized and written in binary format to create the ASL file. This is done using the xcurator program. This is a c++ program that removes redundancies, validates the entries and writes the peptide and spectra information in binary format. Once the library file is created, the 'pre-composite' and 'composite' files may be discarded.
    The diagram below describes the data and process flow for a first time build. Click the diagram to view an enlarged version in a new window.
  • Incremental build
    If creating the ASL file(s) from an incremental database build instead of from the 'pre-composite' files, the first step is to add the new result names from GPMDB to the xhunter database with the 'added flag' set to '0'. The new GPMDB entries are then processed (xhunter_incremental.pl) for each selected taxonomy by selecting peptides that have a better score than the current entries in the Spectrum database or did not previously exist. There are three cirmustances where this can happen.
    1. the parent ion mass of the new spectrum does not already exist in the database, the mh gets added to the MH table and the new spectrum added to the Spectrum table. The consensus spectrum in these cases is simply a copy of the new spectrum.
    2. the parent ion mass has been seen before, but less than 10 spectra exist for the current peptide, The new spectrum is added to the Spectrum table.
    3. the parent ion mass has been seen before and a consensus spectrum already has 10 spectra associated with it. The least probable spectrum is replaced.
    Each time a spectrum is added or replaced, the consensus spectrum is marked as needing a recalculation.
    The next step is to go through all the entries that have been marked as needing recalculation and create the 'averaged' spectra from the associated entries in the Spectrum table. Then the BIOML formatted 'composite' file is created for the taxonomy using only the entries in the database (xhunter_composite.pl) instead of using 'pre-composite' files as in the first time build.
    The 'composite' file is then optimized and written in binary format to create the ASL file, just like in the first time build.
    The diagram below describes the data and process flow for an incremental build. Click the diagram to view an enlarged version in a new window.
      [top]
     
Managing users
If a casual user wishes to become a power user, they must send the admin an email with all the required information as laid out in the registration page. When the admin user receives the information, they populate the registration form and click 'Add user'. This will add the new user to the XUser table in the XHunter database and grant the permissions required for the new user to access the power user functionality of the system. The admin can then email the new power user confirming that they have been added.
  [top]
 
Database descriptions
The databases are used to store peptide and spectrum information from GPMDB in order to allow users to find matches for single spectrum searches and to create new library files after incremental builds. They are also useful for statistical analysis and a good tool to track down errors or anomalies. Detailed database descriptions in html format here or download as a Windows help file.
There are two types of database used in the X! Hunter system.
  • xhunter
  • taxonomy


The xhunter database (diagram 20) is used to control the population of the xhunter taxonomy databases, manage users and track statistical information. Only one instance of this database is required for each installation.

diagram 20 - click to enlarge



The xhunter per taxonomy databases (diagram 21) contain the consensus mass spectra information as well as the spectra information that was used to create the consensus spectra and all the peptide domain information. They are also used to keep track of which peptides should not be added in future populations. One instance of this type of database is required for each taxonomy.

diagram 21 - click to enlarge



  [top]
 
Installation
There are two options for installation:
  1. library only installation - no custom or library rebuilds, no database, one search form
  2. library and database installation - customizable libraries and databases, three search forms (requires gpmdb and gpm installation)
All installations require that the computer have a web server and Perl installed. The next two sections describe how to install and configure these items before X! Hunter is installed.

If you already have an intallation of GPM or GPMDB you will already have a web server and Perl installed, but you will need to add a VirtualHost entry to httpd.conf as described below.

Web server installation
  1. Download and install the Apache web server.
  2. Next, configure Apache using the text file, httpd.conf, stored in /etc/bin/httpd/conf on Linux and C:/Program Files/Apache Group/Apache2/conf on Windows. The only thing you really need to tell Apache is where your web accessible files are. This is done using Aliases or VirtualHosts.

    Add the one of the following two sets of directives to the appropriate section of the httpd.conf file. For Linux, change the paths as necessary to something like /var/www/thegpm-xhunter/[folder]. Note that X! Hunter may be installed anywhere, the paths given are only one possible suggestion. (hints: 1. search for the text string 'Aliases:' to find Alias section easily in httpd.conf. 2. X! Hunter may be installed anywhere, the paths given are only the best suggestion.)

    If you have GPM or GPMDB installed on the same server, add the following Virtual Host directives to section 3 of the httpd.conf file.
    NameVirtualHost xxx.xxx.xxx.xxx		## change this to the ip address or name of this server
    
    <VirtualHost xxx.xxx.xxx.xxx:80>
        ServerAdmin admin@thegpm.org
        DocumentRoot c:/thegpm-xhunter
        ServerName localhost ## change this to the name of your server
        ErrorLog logs/thegpm-xhunter_error_log
        CustomLog logs/thegpm-xhunter_access_log common
        ScriptAlias /thegpm-cgi/ "C:/thegpm-xhunter/thegpm-cgi/"
        <Directory "C:/thegpm-xhunter/thegpm-cgi">
    		AllowOverride None
            Options None
            Order allow,deny
            Allow from all
        </Directory>
    </VirtualHost>
    
    If you do not have GPM or GPMDB installed, add the following to section 2 of httpd.conf. (hint: search for the text string “Aliases” to find their location in section 2).
    Alias /gpm/ "C:/thegpm-xhunter/"
    <Directory "C:/thegpm-xhunter">
    	Options Indexes MultiViews
    	AllowOverride None
    	Order allow,deny
    	Allow from all
    </Directory>
    
    Alias /thegpm/ "C:/thegpm-xhunter/"
    <Directory "C:/thegpm-xhunter">
    	Options Indexes MultiViews
    	AllowOverride None
    	Order allow,deny
    	Allow from all
    </Directory>
    
    Alias /gpm/ "C:/thegpm-xhunter/gpm/"
    <Directory "C:/thegpm-xhunter/gpm">
    	Options Indexes MultiViews
    	AllowOverride None
    	Order allow,deny
    	Allow from all
    </Directory>
    
    Alias /tandem/ "C:/thegpm-xhunter/tandem/"
    <Directory "C:/thegpm-xhunter/tandem">
    	Options Indexes MultiViews
    	AllowOverride None
    	Order allow,deny
    	Allow from all
    </Directory>
    
    Alias /pics/ "C:/thegpm-xhunter/pics/"
    <Directory "C:/thegpm-xhunter/pics">
    	Options Indexes MultiViews
    	AllowOverride None
    	Order allow,deny
    	Allow from all
    </Directory>
    
    Alias /cache/ "C:/thegpm-xhunter/cache/"
    <Directory "C:/thegpm-xhunter/cache">
    	Options Indexes MultiViews
    	AllowOverride None
    	Order allow,deny
    	Allow from all
    </Directory>
    
    Alias /styles/ "C:/thegpm-xhunter/styles/"
    <Directory "C:/thegpm-xhunter/styles">
    	Options Indexes MultiViews
    	AllowOverride None
    	Order allow,deny
    	Allow from all
    </Directory>
    
    ScriptAlias /thegpm-cgi/ "C:/thegpm-xhunter/thegpm-cgi/"
    
    <Directory "C:/thegpm-xhunter/thegpm-cgi">
    	AllowOverride None
    	Options None
    	Order allow,deny
    	Allow from all
    </Directory>
    
  3. restart the Apache server
Perl installation
This step should only be required for Windows, as most Linux distributions come with Perl installed.
  1. Download and install ActiveState Perl.
  2. It is recommended to install at c:\perl if possible as the perl scripts expect to find the perl executable there. If installed somewhere else, the first line of each perl script will need to be changed to point to the correct location of perl.
Library only installation
Download the latest version of the installation and unzip the archive to a temp folder.

The library only installation contains the following directory structure:
  • thegpm-xhunter
    • cache
    • fasta
    • gpm
      • archive
    • lib
    • pics
    • styles
    • tandem
      • archive
        • pics
      • methods
    • thegpm-cgi
Note: The ASL files themselves are very large and are not included in the installation package. Download the library (hlf) files for your taxonomies of interest here and move them to the lib folder.
  1. Move the entire 'thegpm-xhunter' folder to C:\ (Windows) or var/www/ (Linux), as the case may be. Ensure the locations of the directories exactly match their specifications in httpd.conf shown above.
  2. Edit the file defines.pl found in 'thegpm-cgi' folder. If the service will be accessed from other computers, replace 'localhost' in the function 'get_server_name()' with the web address of the server. You may also want to change the numeric part of the value returned in the function get_gpm_number(). The value returned from this function is the prefix of the output file names generated by X! Hunter searches.
  3. Edit the index.html file. Change 'localhost' to the web address of your server.
  4. Edit the file footnote.js found in the 'tandem' folder to specify a 'hosted by' message to appear at the bottom of the display pages.
  5. Edit the file taxonomy.xml found in the 'tandem' folder to add the relative addresses (URLs) of the ASL files. The new entries are specified with the attribute format="spectrum" in the <file> tag.

    In the sample entry below, X! Hunter searches will use all "spectrum" and "peptide" files, while X! Tandem searches only use the "peptide" files. Notes:
    1. X! Hunter requires the fasta.pro files to obtain full protein sequences.
    2. Database searches do not use this taxonomy mechanism.
    <taxon label="yeast">
    	<file format="spectrum" URL="../lib/yeast_cmp_20.hlf" />
    	<file format="spectrum" URL="../lib/crap_cmp_20.hlf" />
    	<file format="peptide" URL="../fasta/scd.fasta.pro" />
    	<file format="peptide" URL="../fasta/scd_1.fasta.pro" />
    	<file format="peptide" URL="../fasta/crap.fasta.pro" />
    </taxon>
    
To test the functionality, open a browser and point it to http://localhost/index.html. Once again, you may substitute the web address of your server for 'localhost'. Browse to a spectra file, set the search parameters and click the 'Find models' button to start a search as described in the Search multiple spectra section above.

Keeping up to date
Each month the public X! Hunter databases are updated with GPMDB entries created since the last incremental build, and from the databases new library files are created as described above. The updated library files can be downloaded from the GPMO ftp site.

Library and database installation
This section is for users who wish to create libraries and databases containing results from their own installation of GPMDB. Therefore the user must have installed GPMDB and at least one GPM server providing spectra results to it. For help with these installations, see gpmdb and gpm.

Once GPMDB and GPM are installed, the X! Hunter web files can be installed and the X! Hunter databases can be created and populated, although the population can only be performed once GPM results have been added to GPMDB.

Follow the instructions in the web server installation section above to install the Apache web server and set the configuration required for X! Hunter.

Follow the instructions in the perl installation section above to install perl.

Install the dbi and dbd perl modules needed for database communication.
  • Windows:
    1. Open a command prompt and enter the following commands to install DBI and DBD-mysql:
      	c:\> cd perl\bin
      	c:\perl\bin> ppm
      	ppm> install DBI
      	ppm> install DBD-mysql
      
    See here for pointers.
  • Linux:
    1. Download DBI and DBD perl modules and extract them to a temporary folder.
    2. From the command prompt enter the following commands to intall DBD and DBI-mysql:
      	    perl Makefile.PL
      	    make
      	    make test
      	    make install
      
    Read the documentation on the DBD and DBI-mysql pages for pointers.
Download the latest version of the installation and unzip the archive to a temp folder.

This installation contains the following directory structure:
  • thegpm-xhunter
    • assembling software
      • xcurator
        • bin
        • fasta
        • lib
    • cache
    • docs
      • db
      • pics
      • svg
    • fasta
    • gpm
      • archive
    • lib
    • pics
    • scripts
    • taxon_source_[nomods|withmods] (multiple)
    • styles
    • tandem
      • archive
        • pics
      • methods
    • thegpm-cgi
Note: The ASL files themselves are very large and are not included in the installation package. Download the library (hlf) files for your taxonomies of interest here and move them to the lib folder.
  1. Move the entire 'thegpm-xhunter' folder to C:\ (Windows) or var/www/ (Linux), as the case may be. Ensure the locations of the directories exactly match their specifications in httpd.conf shown above.
  2. Edit the file defines.pl found in 'thegpm-cgi' folder. If the service will be accessed from other computers, replace 'localhost' in the function 'get_server_name()' with the web address of the server. You may also want to change the numeric part of the value returned in the function get_gpm_number(). The value returned from this function is the prefix of the output file names generated by X! Hunter searches. This value must be different than the value returned by this function in the current GPM installation.
  3. Edit the index.html file. Change 'localhost' to the web address of your server.
  4. Edit the file footnote.js found in the 'tandem' folder to specify a 'hosted by' message to appear at the bottom of the display pages.
  5. Edit the file dbcommon.pl found in 'thegpm-cgi'. Change 'xhunterhost' to the name of the server in GetXHunterHost() and 'gpmdbhost' to the name of the GPMDB server in GetGPMDBHost().
  6. Edit the file taxonomy.xml found in the 'tandem' folder to add the relative addresses (URLs) of the ASL files. The new entries are specified with the attribute format="spectrum" in the <file> tag.

    In the sample entry below, X! Hunter searches will use all "spectrum" and "peptide" files, while X! Tandem searches only use the "peptide" files. Notes:
    1. X! Hunter requires the fasta.pro files to obtain full protein sequences.
    2. Database searches do not use this taxonomy mechanism.
    <taxon label="yeast">
    	<file format="spectrum" URL="../lib/yeast_cmp_20.hlf" />
    	<file format="spectrum" URL="../lib/crap_cmp_20.hlf" />
    	<file format="peptide" URL="../fasta/scd.fasta.pro" />
    	<file format="peptide" URL="../fasta/scd_1.fasta.pro" />
    	<file format="peptide" URL="../fasta/crap.fasta.pro" />
    </taxon>
    
  7. Copy and paste the following code at the end of the current copy of popGPMDB.pl in your GPMDB installation.
    addto_hunter();
    
    ## add to the XHUNTER GPMDBResult table
    ## first check to see if this resultid is already added and set to 1, 
    ## incase this population happened while xhunter_precomposite was running, 
    ## as when it finishes it takes ALL the resultids from
    ## result, adds them to XHUNTER and sets added=1.
    ##
    
    sub addto_hunter(){
    	hunter_connect();
    	$sql = "SELECT resultid from GPMDBResult WHERE resultid = ?";
    	my $sth_hunter = $dbh_hunter->prepare($sql);
    	my $rc = $sth_hunter->execute($resultid);
    	if($rc > 0){
    		$sql = "UPDATE GPMDBResult SET added = 0 WHERE resultid = ?";
    		$sth_hunter = $dbh_hunter->prepare($sql);
    		$sth_hunter->execute($resultid);
    	}
    	else{
    		$sql = "INSERT INTO GPMDBResult(resultid,file,added) values(?,?,?)";
    		$sth_hunter = $dbh_hunter->prepare($sql);
    		$sth_hunter->execute($resultid,$file,0);
    	}
    	$sth_hunter->finish();
    	hunter_disconnect();
    }
    
    
    This extra code updates the X! Hunter database with the names of the files that have been added to GPMDB for future populations of the xhunter and taxonomy databases.
  8. Edit the xhunter_register.html file found in the 'tandem' folder. Change the administrator email address to something that makes sense for your installation.
  9. Edit the xhunter_forgotpassword.html file found in the 'tandem' folder. Change the administrator email address to something that makes sense for your installation.

Database creation

The next step is to create and populate the xhunter and taxonomy databases.

The taxon_source_[nomods|withmods] folders will contain the 'pre-composite' files that are created with xhunter_precomposite.pl using the results from your GPMDB site. For every taxonomy that will be part of the database and search system, there needs to be a set of these folders and matching entries in the xhunter database describing the following:
  • taxonomy
  • source database or web site
  • perl regular expression that describes the accession numbering style
This distribution contains the taxonomy entries in the database required for many institutions, although custom and additional standard sources can be added.

If you want to remove some of the default taxonomies:
  1. Edit the file create_xhunter.sql, and for each unwanted taxonomy, remove the lines that look like this:
    insert into taxon(taxonid,name,dbname,regex,species) 
    	values(NULL,'human_ensembl','xhunter_human_ensembl','^ENSP[[:digit:]]+$','H. sapiens');
    
  2. Delete the unwanted /scripts/create_xhunter_taxon_source.sql files.
  3. Edit the file /tandem/xhunter_species.js, removing the lines that contain taxonomies that were not used.
  4. The perl scripts (xhunter_precomposite.pl, xhunter_populate.pl, xhunter_incremental.pl, xhunter_composite.pl) also need to be editted to remove unwanted taxonomies and to control which taxonomies are processed when the scripts are run. This is explained in the controlling taxonomies section.
The next step is to create the empty database structures.
  1. Open a command prompt and navigate to /thegpm-xhunter/scripts.
  2. At the command prompt, type mysql -u user -ppass replacing 'user' and 'pass' with your username and password.
  3. Enter the following command:
    mysql> source create_xhunter.sql
    
    This will create the xhunter database with the entries to track which files have been added, the statistics for each build and the taxonomic specifics.
  4. Enter the following command for each taxonomy you are creating:
    mysql> source create_xhunter_taxon_source.sql
    
    replacing 'taxon' with the name of the taxonomy and 'source' with the name of the source database or web site. For example:
    mysql> source create_xhunter_mouse_ensembl.sql
    mysql> source create_xhunter_yeast_sgd.sql
    
    This will create the empty taxonomy databases which will contain the entries for the consensus spectra and the spectra that went into making the consensus spectra, as well as the post translational modifications if any.
The next step is to create the permissions for the new databases for the admin user. You can either use an existing user or create a new one. Either way, at the mysql command prompt, enter the following commands:
mysql> GRANT ALL PRIVILEGES ON XHUNTER.* TO 'adminuser'@'host' IDENTIFIED BY 'password'
mysql> GRANT ALL PRIVILEGES ON XHUNTER_taxon_source.* TO 'adminuser'@'host' IDENTIFIED BY 'password'
Replacing 'adminuser' with the new user name or an existing admin user, 'host' with localhost if the population and admin functions are to be performed from the same machine as the database or the ip address of the machine from where the admin functions will be performed or both and 'password' with the password. The second statement needs to be entered for each taxonomy database that was created above. Now create the permissions for the casual users:
mysql> GRANT SELECT ON XHUNTER.* TO 'casualuser'@'host' IDENTIFIED BY 'password'
mysql> GRANT SELECT ON XHUNTER_taxon_source.* TO 'adminuser'@'host' IDENTIFIED BY 'password'
Replacing 'casualuser' with the new user name or an existing resticted user, 'host' with the ip addresses from which connections will be allowed and 'password' with the password. The second statement needs to be entered for each taxonomy database that was created above.

Edit the file called dbcommon.pl. Enter the casualuser name and password in GetUser() and GetPass(). Enter the adminuser in GetRUser() and GetRPass(). For both admin and casual users the host can be '%' (percent sign) if connections will be allowed from anywhere, although this is not recommended for the admin user.

Once the empty database structures have been created, the databases are populated and ASL files are created as follows:
  1. 'Pre-composite' files are created using current state of your GPMDB
  2. 'Composite' files are created using 'pre-composite' files
  3. X! Hunter databases populated using 'composite' information
  4. Library files created using 'composite' information
Controlling taxonomies

At times, it may be a requirement to process only select taxonomies as opposed to all existing taxonomies. Also, some of the default taxonomies that are included with an X! Hunter installation may not be of interest to the researchers. The population and file creation perl scripts (xhunter_precomposite.pl, xhunter_populate.pl, xhunter_incremental.pl, xhunter_composite.pl) need to be editted to control which taxonomies are included when they are run.

The follwing code controls this functionality:
my @taxonid=();
##get all the taxonomies in the xhunter database
hunter_connect();
my $select_taxon = "SELECT taxonid FROM taxon";
my $sth_select_taxon = $dbh_hunter->prepare($select_taxon);
$sth_select_taxon->execute();
my @result_taxon;
while(@result_taxon = $sth_select_taxon->fetchrow_array()){
	push(@taxonid,$result_taxon[0]);
}
$sth_select_taxon->finish();
hunter_disconnect();
This will process all the taxonomies from the Taxon table. To select specific taxonomies, change the SQL query as follows:
my $select_taxon = "SELECT taxonid FROM taxon where name = 'human_ensembl";
or
my $select_taxon = "SELECT taxonid FROM taxon where name in('human_ensembl','at_tigr')";
One issue that exists with not using all available taxonomies is that the result files may have contained proteins for the taxonomies that were left out. These result files have been marked as 'added' and if the neglected taxonomies are required in the future, those results will not be included in their database. The solution to this is to create a new build from scratch starting with 'pre-composite' file creation.

Pre-composite file creation

This step creates the 'pre-composite' files. These are BIOML formatted files that contain the ten best scoring spectra for each peptide at each charge state (1,2 and 3) from each protein within the current taxonomy. This step can take a long time. For the main X! Hunter site, where there are close to 14 million peptide entries, it takes close to a week and creates about 13 GB of 'pre-composite' files.
  1. Open a command prompt and navigate to /thegpm-xhunter/assembling software.
  2. At the command prompt, enter the following:
    perl xhunter_precomposite.pl
    
    The script will continue to run, printing some status messages, until all the taxonomies have been processed. The files will be created in the taxon_source_[nomods|withmods] folders.
Composite file creation and database population

The next step is to create the 'composite' files from the newly created 'pre-composite' files and to populate the taxonomy databases.
  1. At the command prompt, enter the following:
    perl xhunter_populate.pl
    
    The script will continue to run, printing some status messages, until all the taxonomies have been processed. The files will be created in the 'assembling software' folder.
Library creation

The final step is to create the ASL files from the 'composite' files. This is a multi step process that uses both the X! Hunter and X! Curator programs. X! Curator is a c++ program that removes redundancies and validates the entries. The parameters used by the X! Curator program are contained in xml files much like the input and taxonomy xml files used to control X! Tandem and X! Hunter searches. Click here for an example. Follow the steps below to create the new library:
  1. Edit the taxonomy.xml file in the 'tandem' folder to use the new 'composite' file(s):
    <taxon label="taxon">
    	<file format="spectrum" URL="../lib/taxon_source_nomods.xml" />
    	<file format="spectrum" URL="../lib/taxon_source_withmods.xml" />
    	<file format="peptide" URL="../fasta/taxon_source.fasta.pro" />
    </taxon>
    
  2. Run X! Hunter from the command line using the 'composite' file(s) that were created using xhunter_populate.pl or xhunter_composite.pl. This will create file(s) with the same name as the 'composite' file(s) with '.hlf' appended.
    xhunter.exe input.xml
    
  3. Copy the new '.hlf' file(s) and the 'composite' files to the /assembling software/xcurator/lib folder.
  4. Edit the input.xml file in /assembling software/xcurator/bin as follows:
    The path to the taxonomy.xml file. Probably should't have to edit this value.
    <note type="input" label="list path, taxonomy information">taxonomy.xml</note>
    
    The name of the taxonomy within taxonomy.xml which was the source of the .hlf file creation
    when X! Hunter was initially run.
    <note type="input" label="protein, taxon">taxon</note>
    
    The path of the ASL file to create.
    <note type="input" label="library, append path">../lib/taxon_source_20.hlf</note>
    
    The path to the fasta sequence file that was the source for the identifications 
    from which the composite file was initially created.
    <note type="input" label="sequences, list path">../fasta/taxon.fasta.pro</note>
    
  5. Edit taxonomy.xml in /assembling software/xcurator/bin as follows: The value for 'protein, taxon' from the input.xml file and the names of the 'composite' files that were used to create the .hlf file when X! Hunter was initially run.
    											
    <taxon label="taxon">
    	<file format="spectrum" URL="../lib/taxon_source_nomods.xml" />
    	<file format="spectrum" URL="../lib/taxon_source_withmods.xml" />
    </taxon>
    
  6. Run X! Curator from the command line as follows:
    											
    xcurator.exe input.xml
    
  7. Move the new ASL file to 'thegpm-xhunter/lib' folder.
  8. Edit the taxonomy.xml file in the 'tandem' folder to use the new ASL file:
    <taxon label="taxon">
    	<file format="spectrum" URL="../lib/taxon_source_20.hlf" />
    	<file format="peptide" URL="../fasta/taxon_source.fasta.pro" />
    </taxon>
    
    The new file can now be used by X! Hunter.
Note: This process also creates a X! P3 file with the same name as the ASL file with '.fasta' appended.

To test the library functionality, open a browser and point it to http://localhost/index.html. Once again, you may substitute the web address of your server for 'localhost'. Browse to a spectra file, set the search parameters and click the 'Find models' button to start a search as described in the search multiple spectra section above.

To test the database search functionality, open a browser and point it to http://localhost//andem/xhunter_singlespec.html. Once again, you may substitute the web address of your server for 'localhost'. Enter the fragment masses and intensities, set the search parameters and click the 'Find models' button to start a search as described in the search single spectrum section above.

To test the database browse functionality, open a browser and point it to http://localhost/tandem/xhunter_browse.html. Once again, you may substitute the web address of your server for 'localhost'. Enter the parameters, select the taxonomy and click the 'Find models' button to start the search as described in the browse spectra section above.

  [top]
 
Incremental builds
It is possible to incrementally add results from your installation of GPMDB to the X! Hunter databases, as long as a full build has already been completed. This process requires three conditions:
  1. Tandem and/or P3 searches are run on a daily basis
  2. Local GPMDB is populated daily with Tandem and/or P3 search results
  3. New results in GPMDB are added to X! Hunter with 'added' flag set to '0'
The first step for an incremental build is to check all the peptides that belong to files with the 'added' flag set to '0' to see if any have better scores than identical peptides that exist in taxon_source database. Any new or replaced peptide has it's parent consensus spectrum marked as needing a recalculation. Once all the files are processed, the consensus spectra that were marked as needing a recalculation are processed and updated with the newly averaged mass and intensisty pairs.
  1. At the command prompt, enter the following:
    perl xhunter_incremental.pl
    
    The script will continue to run, printing some status messages, until all the taxonomies have been processed.
The next step is to create the 'composite' files from the entries in the database. Each protein accession for the current taxon_source database, along with all it's distinct peptide domains, including modifications if they exist, are written to a new 'composite' file.
  1. At the command prompt, enter the following:
    perl xhunter_composite.pl
    
    The script will continue to run, printing some status messages, until all the taxonomies have been processed.
Finally, the ASL files are created from the newly created 'composite' files using the instructions in the library creation section above.
Custom Databases
To add your own databases or databases that are not included to the X! Hunter system follow these steps, assuming that results of this type have been successfully added to the local GPMDB.

In the instructions below replace XHUNTER_TAXON_SOURCE with the taxonomy and source of the fasta file for the custom database. For example if the new taxonomy was cow and the source was ensembl, the entries would be XHUNTER_COW_ENSEMBL.
  1. Add database to taxonomy list search forms. Edit the file called /tandem/xhunter_species.js by copying an existing line and pasting it at the end of the file. Edit the newly added line for the new database:
    document.writeln("<option selected value=\"human\">H. sapiens (human)</option>");
    document.writeln("<option selected value=\"custom\">custom database</option>");
    
  2. Edit the /tandem/taxonomy.xml file by adding the new database using the existing format. Copy and paste an existing entry and make the required changes, ensuring that the value attribute in the xhunter_species.js file matches the label attribute in taxonomy.xml.
    <taxon label="yeast">
    	<file format="spectrum" URL="../lib/yeast_cmp_20.hlf" />
    	<file format="spectrum" URL="../lib/crap_cmp_20.hlf" />
    	<file format="peptide" URL="../fasta/scd.fasta.pro" />
    	<file format="peptide" URL="../fasta/scd_1.fasta.pro" />
    	<file format="peptide" URL="../fasta/crap.fasta.pro" />
    </taxon>
    <taxon label="custom">
    	<file format="spectrum" URL="../lib/taxon_source_cmp_20.hlf" />
    	<file format="spectrum" URL="../lib/crap_cmp_20.hlf" />
    	<file format="peptide" URL="../fasta/taxon_source.fasta.pro" />
    	<file format="peptide" URL="../fasta/crap.fasta.pro" />
    </taxon>
    
  3. Create empty folders named using the taxon_source_withmods and taxon_source_nomods naming convention. These are the folders that will contain the 'pre-composite' files. For this example that would be 'human_custom_nomods' and human_custom_withmods.
  4. Add an entry to the Taxonomy table in the XHUNTER database. Open a command prompt, navigate to the /xhunter/scripts folder and type mysql -u user -ppass replacing 'user' and 'pass' with your username and password. Then type the following:
    mysql> \u XHUNTER
    mysql> insert into Taxon(taxonid,name,dbname,regex,species) 
    	values(NULL,'taxon_source','xhunter_taxon_source','regular expression','C. ustom');
    
    The 'regular expression' needs to be replaced with an actual SQL regular expression that describes the naming convention of the accession numbers for the custom database. Help can be found by looking at the examples in the existing entries or by checking the mysql documentation.
  5. Create the sql script for custom database creation.
    • Copy one of the existing create_xhunter_taxon_source.sql files and rename to match your custom database.
    • Edit the new script by replacing the database name with the new custom database name:
      drop database if exists XHUNTER_TAXON_SOURCE;
      create database XHUNTER_TAXON_SOURCE;
      use XHUNTER_TAXON_SOURCE;
      
    • Create the empty database structure by typing the following at the mysql command prompt:
      mysql> source create_xhunter_taxon_source.sql
      
  6. Add permissions for accessing new database. At the mysql command prompt, type the following:
    mysql> GRANT ALL PRIVILEGES ON XHUNTER.* TO 'adminuser'@'host' IDENTIFIED BY 'password'
    mysql> GRANT ALL PRIVILEGES ON XHUNTER_taxon_source.* TO 'adminuser'@'host' IDENTIFIED BY 'password'
    
    Replacing 'adminuser' with the new user name or an existing admin user, 'host' with localhost if the population and admin functions are to be performed from the same machine as the database or the ip address of the machine from where the admin functions will be performed or both and 'password' with the password. Now create the permissions for the casual users:
    mysql> GRANT SELECT ON XHUNTER.* TO 'casualuser'@'host' IDENTIFIED BY 'password'
    mysql> GRANT SELECT ON XHUNTER_taxon_source.* TO 'adminuser'@'host' IDENTIFIED BY 'password'
    
    Replacing 'casualuser' with the new user name or an existing resticted user, 'host' with the ip addresses from which connections will be allowed and 'password' with the password.
  7. Edit the perl scripts (xhunter_browse_mh.pl, xhunter_browse_peptide.pl and xhunter_singlespec.pl) that need to know the name and fasta source of the new database by adding the new taxonomy to the URL hash:
    $URL{YEAST_SGD} = "../fasta/scd.fasta.pro";
    $URL{TAXON_SOURCE} = "../fasta/taxon_source.fasta.pro";
    
Once the databases are set up and the access permissions in place, the database is populated and the ASL file is created as follows:
  1. 'Pre-composite' files are created using current state of your GPMDB
  2. 'Composite' files are created using 'pre-composite' files
  3. X! Hunter database populated using 'composite' information
  4. Library file created using 'composite' information
Pre-composite file creation

This step creates the 'pre-composite' files. These are BIOML formatted files that contain the ten best scoring spectra for each peptide at each charge state (1,2 and 3) from each protein within the current taxonomy. This step can take a long time. For the main X! Hunter site, where there are close to 14 million peptide entries, it takes close to a week and creates about 13 GB of 'pre-composite' files.
  1. Open a command prompt and navigate to /thegpm-xhunter/assembling software.
  2. At the command prompt, enter the following:
    perl xhunter_precomposite.pl
    
    The script will continue to run, printing some status messages, until all the taxonomies have been processed. The files will be created in the taxon_source_[nomods|withmods] folders.
Composite file creation and database population

The next step is to create the 'composite' files from the newly created 'pre-composite' files and to populate the taxonomy databases.
  1. At the command prompt, enter the following:
    perl xhunter_populate.pl
    
    The script will continue to run, printing some status messages, until all the taxonomies have been processed. The files will be created in the 'assembling software' folder.
Library creation

Finally, the ASL files are created from the newly created 'composite' files using the instructions in the library creation section above.
Getting help

If you have any problems installing the X! Hunter software, please contact us or visit the message board.