ZFIN Documentation:
System, Web Server, and Database Administration
This documentation is preserved for historical purposes, and NOT updated. Updated documentation is found here: http://almost.zfin.org/doc

$Id: admin.html,v 1.60 2007-12-27 23:22:05 peirans Exp $

Back to Table of Contents
Previous Section

This document is one of several that describe the Zebrafish Information Network, or ZFIN. This document focuses on administrative tasks related to maintaining ZFIN machines, web servers, and databases.

Database Administration

ZFIN uses the Informix database management system. Everything described in this section is relevant to Informix. Informix is also used in the CIS department for instructional purposes. The instructional copy of Informix is mostly maintained by CIS folks (Paul Bloch and Chris Wilson as of 2001/06). Before IBM bought Informix in 2001, ZFIN got upgrades for CIS. Now, CIS has gotten its own free copies of Informix through the IBM Scholars program.

Products and Licenses

Informix Dynamic Server

This is the Informix server/engine. The name is often abbreviated as IDS. IDS is the name applied to anything in the 9.x line of Informix products. ZFIN came into the Informix world at around 9.13, so this is all we have ever had.

In the summer of 2003 we switched from IDS 9.21/32 bit to IDS 9.30/64 bits. In the spring of 2004 we upgraded from IDS 9.30 to IDS 9.40.

Informix Connect and ISA currently come bundled with IDS.

Client SDK

Client SDK contains several application programming interfaces (APIs) needed to develop applications for Informix servers. These include

This used to be available for free from Informix and then IBM, but IBM has resumed charging for it.

ZFIN supports both a 32 bit and 64 bit versions of Client SDK. The 64-bit version is installed in the same directory as the IDS engine. The 32 bit version is available through the

/private/apps/Informix/ClientSDK.32bits

sym link for 32 bit clients. The 32-bit Client SDK is currently used by PERL, and the 64-bit Client SDK is used by everything else, including Apache.

Since the 64 bit Client SDK is installed in the default INFORMIXDIR, you will get that version by default. If you ever want to use the 32 bit Client SDK, you will need to redefine your INFORMIXDIR:

setenv INFORMIXDIR /private/apps/Informix/ClientSDK.32bits

Informix Server Administrator (ISA)

Informix Server Administrator (ISA) is a web based tool for managing Informix engines. It is a free product that became available in 2000 and that we installed at ZFIN in 2001/11. It is basically a graphical user interface for all the on* commands. It makes monitoring the engine much easier.
To invoke ISA, goto these URLs
Machine URL
Production Server helix.cs.uoregon.edu:2006
Development Server embryonix.cs.uoregon.edu:2006

These URLs are only accessible from a very limited set of IP addresses

You will need to "log in" to ISA in order to use it. The login for ISA is "zfinisa". This is a read-only login. If you need the password, talk to Dave C or Sierra. (ISA also supports logins that allow you to update things in Informix. However, we don't use this capability at ZFIN. All changes are done through the shell.)

Verity Text Search Datablade

This datablade supports search engine like searches of text fields in Informix. This was evaluated for inclusion in ZFIN by Brock Sprunger in 2001. We had all sorts of trouble with it badly hanging the Informix engine. One of the problems was that if a table had an HTML column in it, then adding Verity indexes to it would certainly cause the server to crash. However, even after changing HTML columns to CLOBs, Brock could still reliably crash the engine (although not quite as easily).

At the 2001 IBM Informix Conference, I was told 2 things:

  1. The handling of HTML data types would greatly improve in IDS 9.30. In theory this might let us keep our columns as HTML.
  2. Datablades tend to play fast and loose with Informix engine user stack space. Error checking for stack overflow is sporadic at best. When we were doing the Verity testing user stack space was limited to 32K. We have since increased it to 128K. It is quite possible that the space required by both Web DataBlade, which defines the HTML datatype, and the Verity Text Search DataBlade, would always overflow the 32K limit. It is also possible that Verity on its own, could also overflow a 32K limit.

The plan was to reevaluate Verity when we went to IDS 9.30. However, at the 2002 IBM Data Management Conference I learned that Verity was no longer sold by IBM. They could not work out an agreement with Verity. This was confirmed by the most recent IBM-Informix White Paper, which mentioned several datablades, but not Verity at all.

I also was later told by someone in IBM, that Verity support would go away entirely at the end of 2003.

So, in the fall of 2002, ZFIN decided that Verity was not a part of its future, since IBM was no longer promoting it or enhancing it.

Full text searches of the database and the web site became a big issues in 2003 and 2004. The Uniquery work addresses this, as does FogBUGZ case 461.

Licenses

Before Informx 9.4, each Informix product that ZFIN used had a serial number that had to be used when installing the product and when dealing with IBM support. However, starting with 9.4 there are no more serial numbers. The current license is shown in bold.
Production Licenses
Product Version Went Live Serial #
IBM Informix Dynamic Server
  IBM Informix Dynamic Server
  IBM Informix Dynamic Server
  Informix Internet Foundation
9.40.FC5
9.40.FC3
9.30.FC3 (64-bit)
9.21.UC4-1 (32-bit)
2004/11/30
2004/04/16
2003/08/19
2001/03/30


ACP#J267193
AAD#J118100
Client SDK, 64-bit
  Client SDK, 64-bit
2.81.FC2R1
2.70.FC5
2004/04/16
2003/08/19

ACP#J267193
Client SDK, 32-bit
  Client SDK
  Client SDK
  Client SDK (free)
2.90.UC1
2.81.UC3
2.70.UC5
2.60.UC1
2005/03/23
2004/04/16
2003/08/19
2001/03/30


ACP#J267193
ACN#J334914
Informix Server Administrator (ISA)
  Informix Server Administrator (ISA) (bundled)
  Informix Server Administrator (ISA) (free)
1.50.UC3
1.50.UC2
1.40.UC3
2004/11/30
2004/04/16
2002/01/29


ACN#J334914
Web Datablade
  Web Datablade
  Web Datablade
4.13.FC3
4.13.FC2 (64-bit)
4.13.UC2 (32-bit)
2004/11/30
2003/08/14
2003/04/22

ACP#J267193
AAD#J118100
JDBC
JDBC
  JDBC
3.00.JC1
2.21.JC6
2.21.JC2
2005/03/09
2004/11/30
2002/08/28

Development Licenses
Product Version Went Live Serial #
IBM Informix Dynamic Server
  IBM Informix Dynamic Server
  IBM Informix Dynamic Server
  Informix Internet Foundation
9.40.FC5
9.40.FC3
9.30.FC3 (64-bit)
9.21.UC4-1 (32-bit)
2004/11/07
2004/03/18
2003/06/29
2001/02/25


ACP#J267193
AAD#J120526
Client SDK, 64-bit
  Replaced: Client SDK
2.81.FC2R1
2.70.FC5
2004/03/18
2003/06/29

ACP#J267193
Client SDK, 32-bit
  Client SDK
  Client SDK
  Client SDK (free)
2.90.UC1
2.81.UC3
2.70.UC5
2.60.UC1
2005/03/08
2004/03/18
2003/06/29
2001/02/25


ACP#J267193
ACN#J334914
Informix Server Administrator (ISA)
  Informix Server Administrator (ISA) (bundled)
  Informix Server Administrator (ISA) (free)
1.50.UC3
1.50.UC2
1.40.UC3
2004/11/07
2004/03/18
2001/11/29


ACN#J334914
Web Datablade
  Web Datablade
  Web Datablade
4.13.FC3
4.13.FC2 (64-bit)
4.13.UC2 (32-bit)
2004/11/07
2003/06/29
2003/04/22

ACP#J267193
AAD#J120526
JDBC
JDBC
  JDBC
3.00.JC1
2.21.JC6
2.21.JC2
2005/03/09
2004/11/07
2002/08/28

Dealing With IBM Corporation

History

The original title of this section was "Dealing With Informix Corporation". IBM purchased Informix on 2001/07/01. This did not change how we got support until 2002/07. Our support contracts with Informix had been from July to July and during that first year we just renewed our old support agreement. However, by 2002/07, IBM was actively migrating former Informix customers to their Passport Advantage support program. It took ZFIN until 2002/12 to actually get the kinks worked out of the new setup.

The information in the remainiing parts of this section may not be entirely reliable. I am doing my best to update them as I find out how things are done under Passport Advantage. However, some of them may still describe the old Informix support.

Prior to the IBM buyout, ZFIN and CIS got their Informix licenses through the Informix Software Grant Program. The two departments paid a total of $1500/year for maintenance and that is all. Diana Tetro (diana@prospero.uoregon.edu) from Purchasing in Molecular Biology handled the contract on the UO end.

Our primary contact at Informix for the grants program was Diane Nicola. As near as anyone can tell, Diane has since left IBM. Verdette Wilkins, 650-926-6079, was the IBM employee (a former Informix employee as well) who helped ZFIN migrate from Informix support to Passport Advantage.

IBM Scholars Program

The IBM Scholars Program allows registered scholars to get almost every piece of software they make for free. This includes Informix products (and DB2 and WebSphere products if we ever get to that). The downside of the scholars program is that you only get web support for anything you get through it. Also, not all versions of all products are available through the Scholars Program. See below.

ZFIN made the decision not to use the Scholars Program for support reasons. ZFIN uses IBM's Passport Advantage support for all IBM products used in the ZFIN web site.

We do this for several reasons:

The URL for the Scholars Program is

http://www-3.ibm.com/software/info/university/

Once there you can get software by clicking on the "Downloads & CDs" link, and then on the "Download now" button. You will then need to login.
Username clements@cs.uoregon.edu
Password See Dave C.

CIS decided to go with the Scholars Program for all of its Informix products and support. Therefore, ZFIN is no longer directly involved in getting them new copies of Informix products or in getting them support.

IBM Passport Advantage Support

Passport Advantage is IBM's paid support program. ZFIN pays about $17,500/year to support its 2 Informix engines, Web Datablade licenses for 6 CPUs, and Client SDKs for 20 seats.

The contract for the Informix engines (about $10K of the total) expires on 07/31 of every year. We renew it through Diana Tetro in the MolBio purchasing office. Each of the 2 licenses is for 30 users. As of 2002/12 all of the paperwork for this is in Dave C's office in his filing cabinet in a folder titled "Informix".

ZFIN uses Passport Advantage for two things:

  1. Bug Reporting: This is done over the phone.
  2. Upgrades: This is done over the web.

In order to do either of these things you will need some information.
Contract Number (2002-2004) 111669
Passport Advantage Site Number (2002-2003) 7377672
IBM Customer Number 9102304 (formerly 9095156, not sure why it changed)

Our contact person for problems with our Passport Advantage support is Chrissy Dowds, 1-888-278-5586, x32714 (as of 2004/06).

The URL for Passport Advantage is

http://www.ibm.com/software/passportadvantage

Once there, to login, click on the "Customer Site" link on the right hand side. This will bring up a login screen:
Username zfin
Password See Dave C.

From there I think the 2 interesting links are the "Account Information" and "Manage Your Account" links.

Bug Reporting

ZFIN traditionally tries to deal with bugs by looking for workarounds. If we can find a workaround we generally use it and do not report the bug to IBM. However, if you can't find a workaround, you will have to contact IBM Informix support.

Before contacting Informix support you should have at least ZFIN's IBM Passport Advantage Agreement and Site numbers. In addition, you may also need the serial number of the product you are having trouble with. I don't know yet if IBM requires the serial number or not.

It is also often helpful to have an e-mail describing the problem typed up and ready to send.

The best way to contact support is through the phone. Their support has a huge phone menu, almost all of which you can ignore. The phone menu is documented in the Customer Service Quick Reference Guide. To open a case:

  1. Dial 1 800 274 8184.
  2. Follow the instructions.

You will then speak with someone who is familiar with IDS. If they can't solve your problem immediately they will open a case and give you a case number. Hang on to that case number. You will need it in every subsequent communication about the problem. They will also give you their e-mail address and typically ask you to send them some information.

Support will typically come up with workarounds for problems rather than fix them outright. Occasionally they will recommend that you upgrade. No matter what, they will not close the case until you agree to it.

Upgrades

Upgrades are available through the Passport Advantage web site.

  1. Log on as described in the IBM Passport Advantage Support section above.
  2. Click on "Account Information".
  3. Click on "Software Download".
  4. This brings up a form that is already filled in. I recommend changing one item in the form: limit the platform to Solaris. Click on the submit button.
  5. Select the item you want to download.

Online Resources

The pages are good sources of information:
Informix documentation http://www-3.ibm.com/software/data/informix/pubs/library/
IBM Informix Homepage http://www.ibm.com/informix
IIUG Software Repository http://www.iiug.org/software/index.html
Informix Answers Online http://www.ibm.com/software/data/informix/pubs/library
Informix Product Family News http://www.ibm.com/software/data/informix/news
Informix Today - Subscription & Archive http://www-306.ibm.com/software/data/informix/informixtoday
IBM Informix Support Site http://www.ibm.com/software/data/informix/support
Upcoming IBM Conferences Worldwide http://www.ibm.com/services/learning/conf

Backups

ZFIN has three types of backups. The first two go hand in hand with each other.

  1. Full nightly dump of production server
  2. Logical log dumps as the logs become full
  3. Unloads of individual databases

Full Nightly Dump of Production Server

A cron job runs under user informix on production every night. It does a full dump of the production server. This dump calls ontape to dump the entire production server. The output of this dump can then be used, along with the logical log dumps, to restore the production server from scratch. The dump includes everything in the server, including the data dictionary, the root dbspace, and any databases defined in production.

Since ZFIN was migrated to Informix in 2000/03, we have not yet (as of 2004/10) had to use these nightly dumps.

The cron job that creates the nightly dump is

server_apps/DB_maintenance/dumpServer.pl

It dumps the server to disk. It uses a bit of trickery to get ontape to believe that a disk file is actually a tape drive in order for this to work. The output of the dump goes to a single file pointed by this symbolic link:

server_apps/DB_maintenance/data

This file is removed and recreated every night. At any point in time, the only full system dump in existence will be the one created the previous night. This file is backed up nightly by the regular CIS backup process (networker).

Logical Log Dumps as the Logs Become Full

The nightly dump produces a snapshot of the production server at the time it was taken. However, if you want to restore the server to some point after a nightly dump was made you will also need to use the logical log dumps.

The logical logs are a record of changes that are made to the database. They first exist inside Informix on the raw disk managed by Informix. As updates are made to the database, those updates are also written to the logical log.

When a logical log become full, the Informix engine closes it, and then opens a new one and starts writing to it. ZFIN uses the ontape utility to copy full log files from inside Informix out to the Unix file system where they can be backed up by the CIS backup process (networker). Once ontape has copied the file to disk the space is marked as available for reuse within the Informix engine.

The script that fires off ontape is

server_apps/DB_maintenance/dumpLogsContinous.pl

It fires off ontape in continuous logging mode. ontape stays up for as long as the Informix engine is up, and writes full logical logs to disk as they become full. As with the nightly backups, we lie to ontape to get it to believe that a disk file is actually a tape drive.

The disk file that ontape writes to is pointed to by this symbolic link:

server_apps/DB_maintenance/logs

Log files are appended to this file as they become full.

Warning: The dumpLogsContinuous.pl script is not automatically started when the production server is rebooted, or when the Informix engine is restarted. The dumpLogsContinuous.pl script must be manually started whenever the Informix engine is restarted. If this is not done then production will appear to work fine for several days until it suddenly comes to a grinding halt when all of the logical logs fill up. If this happens, the first line of onstat output will say "CKPT Required". It can be started by getting to the DB_maintenace directory and then running gmake:

cd ZFIN_WWW/server_apps/DB_maintenance gmake dumplogscontinuous

Unloads of Individual Databases

The previous two sections described backups that were for an entire Informix engine. Those dumps produce large binary files with which you can restore a corrupt engine. However, they are often not useful for the types of backups that ZFIN most often needs. It is practically impossible to use those dumps to transfer a database from one engine to another or to reload the contents of a particular table. That is not what those dumps are designed for.

ZFIN uses the unloaddb.pl script for these purposes. unloaddb.pl unloads a single database, including the database's schema, using (mostly) SQL unload commands. It produces a directory containing one text file for each table in the database (tables with BLOB or CLOB columns will have multiple files).

Please see standards for use of BLOB/CLOB columns: BLOBs (binary large objects) and CLOBs (character large objects)

The output of unloaddb.pl can be loaded with the loaddb.pl script, either as a new database, or in place of an already existing database. unloaddb.pl and loaddb.pl are both in the ZFIN Commons bin directory.

unloaddb.pl is called by cron, 5 nights a week, to unload the production database so that it can be loaded into development databases as needed. These dumps also provide ZFIN with useful snapshots of the state of the database on the day they were taken. Unlike the nightly backups, ZFIN hangs onto multiple unloads of the production database.

The more recent unloads can be found in

/research/zunloads/databases/production/yyyy.mm.dd.n

where yyyy.mm.dd is the date the unload was done, and n is the number of the dump taken that day, usually 1. Older unloads can be found in

/research/zarchive0/databases/zfindb/

Only about one unload per month is stored in the archive directory.

Monitoring

Note: This section was written before ZFIN had installed ISA. Many of the on* commands described here can now be done with ISA.

Try to check the server every morning. These checks work other times during the day as well, of course, but if something goes wrong during the day, you are likely to get a phone call about it before you notice the problem.

Memory Usage

The days of running out of memory should be past, but just in case they are not, log in and run

% top -ores load averages: 0.00, 0.02, 0.06 16:25:33 115 processes: 112 sleeping, 1 zombie, 1 stopped, 1 on cpu CPU states: 98.9% idle, 0.1% user, 0.7% kernel, 0.3% iowait, 0.0% swap Memory: 1152M real, 24M free, 1555M swap in use, 502M swap free PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND 587 informix 5 59 -10 1036M 617M sleep 643:39 0.00% oninit 589 informix 5 59 -10 1036M 99M sleep 111:18 0.00% oninit 590 informix 5 59 -10 1036M 90M sleep 44:50 0.00% oninit 594 informix 5 59 -10 1036M 67M sleep 0:56 0.00% oninit 588 informix 4 59 -10 1036M 67M sleep 0:01 0.00% oninit 595 informix 4 59 -10 1036M 66M sleep 0:00 0.00% oninit 593 informix 4 59 -10 1036M 66M sleep 0:00 0.00% oninit 596 informix 4 59 -10 1036M 66M sleep 0:00 0.00% oninit 591 informix 4 59 -10 1036M 66M sleep 0:00 0.00% oninit 592 informix 4 59 -10 1036M 66M sleep 0:00 0.00% oninit 22477 informix 5 59 -10 386M 62M sleep 94:01 0.13% oninit 22479 informix 5 59 -10 386M 32M sleep 116:52 0.01% oninit ...

The Memory line is important

Memory: 1152M real, 24M free, 1555M swap in use, 502M swap free

As long as the sum of the free and the swap free is at least 100 MB, memory is fine. It does not matter if one or the other drops to near 0, as long as there is still space in the other

Unless an informix server has been bounced recently, the SIZE column tells you which processes belong to A or B. The oninit's with a size of 1036M belong to B, the production server, and those with a size of 386M belong to test. The maximum size of the production and test servers are both capped at those numbers.

If we are running out of memory then restart the server.

Informix Server Status - onstat

onstat is the usual process for monitoring the status of the database:

% onstat Informix Dynamic Server 2000 Version 9.20.UC2 -- On-Line -- Up 17 days 18:46:29 -- 1048576 Kbytes Userthreads address flags sessid user tty wait tout locks nreads nwrites e0d2018 ---P--D 1 informix - 0 0 0 2155 13667 e0d25d8 ---P--F 0 informix - 0 0 0 0 78273 e0d2b98 ---P--F 0 informix - 0 0 0 0 37934 e0d3158 ---P--F 0 informix - 0 0 0 0 34596 e0d3718 ---P--- 5 informix - 0 0 0 0 1820 e0d3cd8 ---P--B 6 informix - 0 0 0 27 1524 e0d4e18 ---P--M 9 informix 12 0 0 0 0 0 e0d5f58 ---P--D 13 informix - 0 0 0 14 0 8 active, 128 total, 21 maximum concurrent Profile dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached 1026307 4898384 867193765 99.88 309164 2452435 3265074 90.53 isamtot open start read write rewrite delete commit rollbk 719812284 53898854 58036138 486726929 1262800 132010 65721 9703 9 gp_read gp_write gp_rewrt gp_del gp_alloc gp_free gp_curs 20682 18 372 15 0 0 48 ovlock ovuserthread ovbuff usercpu syscpu numckpts flushes 0 0 0 48017.15 473.98 2444 10012 bufwaits lokwaits lockreqs deadlks dltouts ckpwaits compress seqscans 59417 0 623811662 0 0 54 48202 263714 ixda-RA idx-RA da-RA RA-pgsused lchwaits 14707 2 468407 481982 496435

In this example output, the database is idle. There are only 8 users listed, all of them are informix, and none of them hold locks. If someone is accessing the database their login will show up in the list.

Accesses that come in through the web pages show up as user zfinner. These are usually fleeting. However, occasionally, a user will request a query that takes a while to run. In such cases other users may back up behind that zfinner's request. If you see more than 3 zfinner users in the output of onstat then watch the database closely for the next 2 minutes. If the logjam doesn't break then the server might be in trouble and it might be good to restart the server.

The first line of output is also important:

Informix Dynamic Server 2000 Version 9.20.UC2 -- On-Line -- Up 17 days 18:46:29 -- 1048576 Kbytes

The important part there is the "-- On-Line --" Other options are "-- Quiescent --", which means the DB is up and running, but it's not letting any users in. It shouldn't ever be in Quiescent state unless you put it there explicitly or the server is in the process of coming up after a reboot.

If the database is down entirely, instead of getting an onstat report, you will get

shared memory not initialized for INFORMIXSERVER 'server_name'

in which case the server needs to be started.

If you get something other than an onstat report or the "shared memory not initialized" error, then the DB is in a bad shape, and is not likely to be brought down through normal techniques. See "Dealing with a badly broken server" below.

Another possibility for the first line of the onstat report is:

Informix Dynamic Server 2000 Version 9.20.UC2 \ -- On-Line (CKPT REQ) -- Up 12 days 20:48:45 -- 753664 Kbytes Blocked:CKPT

This means that the transaction logs have filled up. And what that probably means is that the last time the Informix server was rebooted (probably about 7-10 days before this happens), that ontape was not started. ontape runs in the background and copies logs from informix raw disk to cooked disk where they are then backed up to tape. See the "Starting ontape" section below for more information.

See the "Solutions" section below for what to do about all these problems.

%onstat -g ses [session id] : gives a detailed view of a particular session

%onstat -g : will give a list of all onstat commands

The options and output of onstat are explained in the Informix Administrator's Reference.

Active Queries

The best way we know to display what queries are running in the server right now is to run this query against the sysmaster database:

select sqs_sessionid, sqs_statement from syssqlstat where sqs_statement <> "-" and sqs_statement not like "select sqs_sessionid%";

This is far from perfect. It only displays the first 200 characters of the query. However, it works better than any other query we have found. These two queries tell us something, but we aren't sure what:

select scs_sqlstatement from syssqlcurses where scs_sqlstatement <> "" and scs_sqlstatement not like "select scs_sqlstatement%"; select sqx_sqlstatement from syssqexplain where sqx_sqlstatement not like "select sqx_sqlstatement%" and sqx_sqlstatement not like "create view %syssqexplain%" and sqx_sqlstatement <> "";

Finally, onstat can also be used to monitor active queries:

onstat -g sql Displays all databases currently executing queries and what SQL keyword is currently being executed by these databases.
onstat -g stm Displays all current sql statements. Statements executed via the web (through webexplode) usually are not very informative, as the database statement being executed is 'webexplode' (which is displayed in this output as 'EXECUTE FUNCTION'.

Monitoring Locks

A good repository for scripts that can be used for monitoring the database can be found at www.iiug.org. The script:

/private/ZfinLinks/Commons/bin/locks.sh

modified from one found at www.iiug.org) shows the locks currently in use on an informix server. The following is sample output from embryonix:

Lock usage report Total locks in use 8 out of 131072 ---------------------------------------------------------------------------- SESSION OWNER DATABASE TABLE LEVEL TYPE #LOCKS ---------------------------------------------------------------------------- 32710 staylor smartbs3 LO_hdr_partn TABLE INT-EX 1 32712 zfinner sysmaster sysdatabases ROW SHRED 1 32710 staylor smartbs4 LO_hdr_partn TABLE INT-EX 1 32710 staylor smartbs1 LO_hdr_partn TABLE INT-EX 1 32710 staylor sysmaster sysdatabases ROW SHRED 1 32710 staylor smartbs2 LO_hdr_partn TABLE INT-EX 1 32710 staylor hoovdb fish_image TABLE INT-EX 1 32713 staylor sysmaster sysdatabases ROW SHRED 1 32710 staylor smartbs2 LO_hdr_partn ROW EXCLV 1

The type of lock is important:
SHRED: is a non-exclusive, shared lock.
EXCLV: is an exclusive lock.

Monitoring Temp Table Creation and Use

The following query can be used against the sysmaster db to determine the number and names of temporary tables currently in use.

SELECT n.dbsname, n.owner, n.tabname, COUNT(*), SUM(i.ti_nptotal), SUM(i.ti_nrows) FROM systabnames n, systabinfo i WHERE BITVAL(i.ti_flags, "0x0020") = 1 AND i.ti_partnum = n.partnum group by n.dbsname, n.owner, n.tabname

Informix Server Status - Message Log

It's also a good idea to check the server message log. The Informix server writes messages to the file at regular intervals (checkpoints). It also writes messages whenever the server encounters a 'serious' problem. These problems don't necessarily need to be dealt with, but if the server is acting up you might find some clues about it in the message log.

The message log for production is in

$INFORMIXDIR/online.log

The messages in the message log are explained in somewhat more detail in the Informix Universal Server Administrator's guide, Chapter 40.

Other Informix Monitoring Tools

At least two other tools are available:

  1. onperf
  2. onmonitor
  3. oncheck

These are explained in the Administrator's Guide.

Fixing Immediate Problems

The first subsection here discusses what should always be done when a problem is encountered. The remaining sections discuss what should be done in response to various conditions.

Run onstat -a

The first thing to do when you think the server is having a problem to run onstat with the -a option and save the results to a file:

% onstat -a > /tmp/onstat-a

This will take a fairly complete picture of the state of the database, and will be handy for doing the postmortem. This usually runs in less than a minute. The output of onstat is explained in the Administrator's Guide, pages 39-75 to 39-107.

Errors

onstat says -- On-Line --, but zfinners piling up

In this case onstat indicates the database is alive and well, but response time is either very bad or infinite and has been this way for a couple of minutes. My usual assumption in this case is that someone has a lock held and everyone else is queuing up behind it.

If you can easily identify the lock holder and it's one of us, have them log out. If the problem doesn't go away, or if no one local has a lock then I usually reboot the server.

See the "Rebooting the server" section below for how to do this.

onstat says -- On-Line (CKPT REQ) --

In this case, the transaction logs have filled up, but ontape is not running. Start ontape to backup the logs. After ontape has started the 'CKPT REQ' should go away.

See the "Starting ontape" section below.

onstat says 'shared memory not initialized for ...'

In this case the Informix server is entirely down and needs to be brought back up.

See the "Starting the server" section below.

onstat returns a different one line error

If onstat does not return a list of users, or the 'shared memory not initialized error', but rather something else or not at all, then the server is in a very bad state. It is halfway down and halfway up and it may not respond to a command to bring it down.

To deal with this situation, see the "Dealing with a badly broken server" section below.

Solutions

This section talks about the onmode command quite a bit. The onmode command is discussed in detail in the Administrators Guide, Chapter 8

All of these solutions assume you are logged in as user informix when doing this.

Dealing with a badly broken server

This section describes what to do when the server doesn't respond (in a regular way) to onstat, or you have tried to stop the server with onmode and it hasn't completely gone down.

If you haven't already done so, try and kill the server with onmode:

% onmode -k This will take Informix Dynamic Server 2000 OFF-LINE - Do you wish to continue (y/n)? y There are 5 user threads that will be killed. Do you wish to continue (y/n)? y %

If your conversation with onstat looked like the above then that is promising. If this took the server down, a subsequent onstat will get the following:

% onstat shared memory not initialized for INFORMIXSERVER 'server_name'

If you get this then go to the "Starting the server" section below for how to bring the server back up.

If you don't get that then onmode failed to kill the server. At this point we are indeed in deep trouble. You will have to manually kill some processes and/or shared memory segments to bring the server entirely down.

Gather information about any oninit processes that are still running:

% ps -ef | grep oninit root 594 588 0 Oct 02 ? 0:59 oninit informix 590 588 0 Oct 02 ? 45:04 oninit root 593 588 0 Oct 02 ? 0:00 oninit informix 587 1 0 Oct 02 ? 665:20 oninit root 595 588 0 Oct 02 ? 0:00 oninit informix 589 588 0 Oct 02 ? 112:40 oninit root 596 588 0 Oct 02 ? 0:00 oninit informix 11742 5338 0 15:20:20 pts/8 0:00 grep oninit root 592 588 0 Oct 02 ? 0:00 oninit root 591 588 0 Oct 02 ? 0:00 oninit root 588 587 0 Oct 02 ? 0:01 oninit

Now identify the 'original' oninit based on its process ID and parent process IDs. In this case process 587 is the original oninit, since all the other oninits were either started by 587, or started by processes that were started by 587.

After identifying the original oninit, kill it.

% kill 587 %

If you are lucky (and you probably will be) this will cascade out to all the other oninit processes and they will die as well. Run ps again to check. If their are still some oninits that are alive then you will have to kill them individually. If they don't respond, you will have to kill -9 them.

After all of the oninit processes have been killed you will need to check if all the associated shared memory segments and semaphores have also been killed. To check what IPC resources are still allocated run ipcs:

% ipcs IPC status from <running system> as of Fri Oct 20 15:29:43 PDT 2000 Message Queue facility inactive. T ID KEY MODE OWNER GROUP Shared Memory: m 1200 0x52574801 --rw-rw---- root informix m 1201 0x52574802 --rw-rw---- root informix m 703 0x52564801 --rw-rw---- root informix m 704 0x52564802 --rw-rw---- root informix m 905 0x52574803 --rw-rw---- informix informix m 906 0x52574804 --rw-rw---- informix informix m 907 0x52574805 --rw-rw---- informix informix m 908 0x52574806 --rw-rw---- informix informix m 909 0x52574807 --rw-rw---- informix informix m 910 0x52574808 --rw-rw---- informix informix m 911 0x52574809 --rw-rw---- informix informix m 912 0x5257480a --rw-rw---- informix informix m 913 0x5257480b --rw-rw---- informix informix m 914 0x5257480c --rw-rw---- informix informix m 915 0x5257480d --rw-rw---- informix informix m 916 0x5257480e --rw-rw---- informix informix m 917 0x5257480f --rw-rw---- informix informix m 918 0x52574810 --rw-rw---- informix informix m 919 0x52574811 --rw-rw---- informix informix m 920 0x52574812 --rw-rw---- informix informix m 921 0x52574813 --rw-rw---- informix informix m 922 0x52574814 --rw-rw---- informix informix m 923 0x52574815 --rw-rw---- informix informix m 824 0x52574816 --rw-rw---- informix informix m 825 0x52564803 --rw-rw---- informix informix m 826 0x52564804 --rw-rw---- informix informix m 827 0x52564805 --rw-rw---- informix informix m 828 0x52564806 --rw-rw---- informix informix m 829 0x52564807 --rw-rw---- informix informix m 830 0x52564808 --rw-rw---- informix informix m 831 0x52564809 --rw-rw---- informix informix m 832 0x5256480a --rw-rw---- informix informix m 833 0x52574817 --rw-rw---- informix informix m 834 0x52574818 --rw-rw---- informix informix m 835 0x52574819 --rw-rw---- informix informix m 836 0x5257481a --rw-rw---- informix informix m 737 0x5257481b --rw-rw---- informix informix m 738 0x5257481c --rw-rw---- informix informix m 739 0x5257481d --rw-rw---- informix informix m 740 0x5257481e --rw-rw---- informix informix Semaphores: s 786432 0 --ra-ra---- root informix s 458753 0 --ra-ra---- root informix

This ipcs command was run while both test and production were up and running. If you get anything reported in this list then you will need to kill it manually. To manually kill a shared memory sequence:

% ipcrm -m _id_

where _id_ is ID from the ipcs listing for the segment. To kill a semaphore enter

% ipcrm -s _id_

where _id_ is ID from the ipcs listing for the semaphore.

You may need to asks Systems to kill segments or semaphores owned by root.

When all the oninits, shared memory, and semaphores are down, the Informix server is down. See the section titled "Starting the server" for how to bring it back up.

Rebooting the server

Use the instructions in this section if the server is still up and running and responding to onstat.

To bring the server down, run onmode:

% onmode -k This will take Informix Dynamic Server 2000 OFF-LINE - Do you wish to continue (y/n)? y There are _n_ user threads that will be killed. Do you wish to continue (y/n)? y %

To verify that it came down, run onstat:

% onstat shared memory not initialized for INFORMIXSERVER 'server_name'

If it doesn't say that they something went wrong and you should see the section above titled "Dealing with a badly broken server."

If does say that then the server was successfully brought down and is ready to be brought back up. See the "Starting the server" section below for instructions on how to do that.

Starting the server

The instructions in this section assume that the server has been brought down completely. To verify that the server is down completely run onstat. You should see:

% onstat shared memory not initialized for INFORMIXSERVER 'server_name'

If you don't then the server is not down yet. See either the "Dealing with a badly broken server" section or the "Rebooting the server" sections above.

however, if you get the above message, then the server is ready to be brought back up. To bring the server back up simply run oninit.

% oninit %

After oninit has finished, run onstat to check that the server came back up. onstat may cycle through quiescent or checkpoint required states, but within a minute of oninit finishing the database should be online again.

At this point the server is back up, but we still need to start ontape to write the transaction logs out. See the "Starting ontape" section below for how to do this.

Starting ontape

Whenever the server is stopped the ontape process that runs in the background also stops. However, restarting the server does not automatically restart the ontape process as well. ontape must be manually restarted after every server restart. We only need to restart ontape for the production server as all of the test servers are configured to write their logs to /dev/null.

See the Logical Log Dumps as the Logs Become Full section for information on restarting ontape.

Disk Usage

This section on disk usage could be split between this section and the System Administration section, as it pertains to both raw disk (used by Informix) and the Unix file system. However, we cover it all here for the convenience of having it here, and because managing the raw disk is the hard part.

Informix stores its data in raw disk partitions. Standard Unix file IO operations don't work on raw disk. Instead, whatever process owns the raw disk takes full responsibility for IO to and from the disk. The OS does not buffer IO to raw disk, and raw disk does not use the standard inode structure that most Unix file systems do.

The following sections discuss how disk is allocated on ZFIN machines. They attempt to show disk allocation graphically. Two programs were used to gather the data used to draw these diagrams.

The diagrams themselves are drawn in xfig, using the same settings that were used to generate the other diagrams in this doc. See the section on how to update figures for how these were generated and how to update them.

All offsets that are shown are in sectors. Sectors are 512 bytes under Solaris. Informix usually asks for things in kilobytes or pages. Informix page sizes are 2048 bytes under Solaris. You will need to some math when working with Informix utilities.

Ownership

Any Unix file partitions that are used by Informix as raw disk must be owned by user informix and have group informix and the owner and group must have read and write privileges.

RAID

The production server has a Sun A1000 Storedge RAID attached to it, and the development server has an NStor 4150S RAID attached to it. Before we get into the details of them, here is a primer on RAID in general and Sun RAID technology in particular.

Without RAID, a logical disk drive and a physical disk drive are one and the same. That is, what the OS thinks is a disk drive actually corresponds to a single physical disk drive. With RAID, the mapping between logical disks and physical disks is much more complicated. What appears as a single disk to the OS may in fact actually map to multiple physical disk drives, and each physical disk drive may contain parts of multiple logical disk drives.

RAID Levels

RAID levels determine how logical disk drives are mapped to physical disk drives. At ZFIN, there are four RAID levels of interest, 0, 1, 5 and 10 (also known as 1+0). This document used to offer a breif description of each level, but I dropped it because there are much better explanations on-line.

AC&NC's RAID.edu site has an excellent graphical description of RAID levels 0, 1, and 5. There is a good explanation of RAID level 10 (and how it is different from RAID level 0+1/01) in the PC Guide's RAID section.

ZFIN uses RAID 10 on the production RAID and RAID 5 on the development RAID.

Sun A1000 RAID

The RAID that ZFIN uses on the production server is an A1000 from Sun. The A1000 has some additional terminolgy that needs to be understood:
A1000 RAID Terminology
Drive Group A group of disk drives that LUNs are spread across. This is the highest level division within a RAID. LUNs are never spread across multiple drive groups. You can have up to 32 LUNs per drive group.
Logical Unit (LUN) This appears as a disk drive (not a partition) to the operating system, but in fact a LUN can be distributed across multiple disks within a drive group.
Hot Spare These are unused disks that automatically come online and replace an active disk when that disk has a problem. Not dedicated to any drive group.
RAID Module The whole A1000 unit, including controller and disks is a RAID module.
RAID 1 The Sun documentation never explicitly mentions RAID 10. However, if you create a RAID 1 drive group with 4 or more physical drives in it (RAID 1 drive groups must always have an even number of drives), then that drive group will be implemented as RAID 10.

All the LUNs within a drive group must have the same RAID level. Drive groups are created when the first LUN in them is created.

The typical approach is to put as many disks as possible into a single drive group and then put all of your logical drives (LUNs) into that drive group. With the exception of temporary dbspaces, we have does this at ZFIN.

The A1000 is managed by the CS systems staff (Lauradel) using the Sun StorEdge RAID Manager software (see User's Guide, Installation and Support Guide, Release Notes). Lauradel has also typed up some documentation on how she manages the RAID. See

/home/systems/sysadm/Documents/Zfin/HelixA1000.txt

NStor 4150S RAID

The NStor RAID array was purchased through NCE Computer Group in 2003/02. It is a NexStor 4150S (although it may be a 4120S, we aren't really sure) Ultra 160 Dual Controller, with twelve 146 GB, 10,000 RPM disks. In Solaris 8 and before Lauradel administered the NStor RAID with the Administor Plus software (a package that Lauradel is less than enamored with.) However, starting with Solaris 9, she has been reduced to using a serial connection and a command line interface to administer it.

Although it was written with the A1000 RAID in mind, most of the discussion on RAID also applies to the NStor RAID. One key difference in terminolgy between the A1000 and the NStor, is that NStor uses the term "Array" instead of "Drive Group". ZFIN documentation uses "Drive Group" in most places.

Unlike the A1000, we configured the NStor RAID to be RAID 5 instead of RAID 10. The development server has greater space requirements than does the production server and does not have the speed or reliability requirements that production does either. Therefore we went with the slightly slower and slightly less reliable, but much more space efficient option.

It also worth noting that 1 to 8 MB of space is reserved at the front of each LUN for disk geometry. (NStor tech support tells us that only th first 16KB are actually used, but the software allocates more.) Care must be taken when allocating partitions within LUNs to leave the first 8MB of the LUN as reserved.

RAID and Informix

Informix is oblivious to whether its raw disk partitions are on RAID or not. Furthermore, Informix has no documentation describing how to physically design a database that will reside on RAID. All of their performance documentation assumes you are using non-RAID devices.

However, at the 2002 IBM Data Management conference I did get some information on this from some Informix developers:

Given this information, here are the guidelines that were used in the physical design:

It is worth noting that this physical design, which was tuned for RAID, will also work well in a non-RAID environment.

Production Server Disk

The production server has 4 internal disks and the Sun A1000 RAID hung off of it.

Internal Disks

The 4 internal disks are physically identical, and are laid out in a common pattern.

Helix Disk Controller SD1, LSI 53c1030 SCSI

c1t0d0s*, Fujitsu MAP3735N SUN72G, 68.4 GB
# Offset (KB) Size (KB) Name Notes
0 0 10,532,160 / Root filesystem
1 10,532,160 8,425,728 None (swap) Used for VM, /tmp, /proc
3 18,957,888 2,106,432 None (Wildtype RAW)
Informix Raw Disk Chunks
Offset (KB) Size (KB) Name
0 2,097,152 tempdbs1_c1
2,097,152 9,280 unused
4 21,064,329 50,610,336 /export/helix0 Unused, not backed up

c1t1d0s*, Fujitsu MAP3735N SUN72G, 68.4 GB
# Offset (KB) Size (KB) Name Notes
0 0 10,532,160 /private Private disk for helix.
1 10,532,160 8,425,728 None (swap) Used for VM, /tmp, /proc
3 18,957,888 2,106,432 None (Wildtype RAW)
Informix Raw Disk Chunks
Offset (KB) Size (KB) Name
0 2,097,152 tempdbs2_c1
2,097,152 9,280 unused
4 21,064,329 50,610,336 /export/helix1 Unused, not backed up

c1t2d0s*, Fujitsu MAP3735N SUN72G, 68.4 GB
# Offset (KB) Size (KB) Name Notes
0 0 10,532,160 /root-t2 Reserved for past, future, or backup versions of root filesystem.
1 10,532,160 8,425,728 None (swap) Used for VM, /tmp, /proc
3 18,957,888 2,106,432 None (Wildtype RAW)
Informix Raw Disk Chunks
Offset (KB) Size (KB) Name
0 2,097,152 tempdbs3_c1
2,097,152 9,280 unused
4 21,064,329 50,610,336 /export/helix2 Unused, not backed up

c1t3d0s*, Fujitsu MAP3735N SUN72G, 68.4 GB
# Offset (KB) Size (KB) Name Notes
0 0 10,532,160 /private-t3 Reserved for past, future, or backup versions of /private.
1 10,532,160 8,425,728 None (swap) Used for VM, /tmp, /proc
3 18,957,888 2,106,432 None (Wildtype RAW)
Informix Raw Disk Chunks
Offset (KB) Size (KB) Name
0 2,097,152 tempdbs4_c1
2,097,152 9,280 unused
4 21,064,329 50,610,336 /export/helix3 Unused, not backed up

RAID Disks

Production has a Sun StorEdge A1000 RAID array with 12 18GB disks. If you are unfamiliar with RAID on Solaris, see the RAID section for an overview or RAID levels and terminology.

This section has 2 subsections, one describing how the physical disks in the RAID are allocated to logical disks (LUNs in RAID terminology), and one describing how the logical disks are allocated.

Physical Allocation

The 12 disks are allocated as follows:


Production RAID Physical Layout

Logical Allocation

Helix RAID Disk Controller, Symbios 53C875 Fast20 Wide SCSI (I think)

Drive Group 1

c3t5d0s*, LUN 0, 2.0 GB
# Offset (KB) Size (KB) Name Notes
0 2,048 reserved First 2 MB reserved for disk geometry
0 2,048 2,095,103.5 None (Wildtype RAW)
Informix Raw Disk Chunks
Offset (KB) Size (KB) Name
0 2,095,102 tempdbs5_c1
2,095,102 1.5 unused

c3t5d4s*, LUN 4, 14.8 GB
# Offset (KB) Size (KB) Name Notes
0 2,048 reserved First 2 MB reserved for disk geometry
0 2,048 2,097,152 None (Wildtype RAW)
Informix Raw Disk Chunks
Offset (KB) Size (KB) Name
0 2,097,152 smarttempbs_c1
2,099,200 13,443,072 unused

Drive Group 2

c3t5d1s*, LUN 1, 32.0 GB
# Offset (KB) Size (KB) Name Notes
0 2,048 reserved First 2 MB reserved for disk geometry
0 2,048 33,552,384 None (Wildtype RAW)
Informix Raw Disk Chunks
Offset (KB) Size (KB) Name
0 524,288 tbldbs1_c1
524,288 524,288 tbldbs2_c1
1,048,576 524,288 tbldbs3_c1
1,572,864 4,194,304 llogdbs1_c1
5,767,168 65,536 plogdbss1_c1
5,832,704 262,144 idxdbs2_c1
6,094,848 262,144 idxdbs3_c1
6,356,992 262,144 idxdbs4_c1
6,619,136 262,144 idxdbs1_c1
6,881,280 32,768 defaultdbs_c1
6,914,048 32,768 rootdbs_c1
6,946,816 1,048,576 tempdbs6_c1
7,995,392 1,048,576 tempdbs7_c1
9,043,968 1,048,576 tempdbs8_c1
10,092,544 4,194,304 smartbs1_c1
14,286,848 4,194,304 smartbs2_c1
18,481,152 4,194,304 smartbs3_c1
22,675,456 4,194,304 smartbs4_c1
26,869,760 524,288 smartdfltbs_c1
27,394,048 1,048,576 tempdbs9_c1
28,442,624 1,048,576 tempdbs10_c1
29,491,200 1,048,576 tempdbs11_c1
30,539,776 1,048,576 tempdbs12_c1
31,588,352 262,144 idxdbs1_c2
31,850,496 262,144 idxdbs2_c2
32,112,640 262,144 idxdbs3_c2
32,374,784 262,144 idxdbs1_c2
32,636,928 32768 defaultdbs_c2
32,669,696 884,736 unused

c3t5d2s*, LUN 2, 43.9 GB
# Offset (KB) Size (KB) Name Notes
0 0 54,677,504 /research/zprod,
  a.k.a. /export/zprod
ZFIN production cooked filesystem

Development Server Disk

The development server has 4 internal disks and the NStor RAID attached to it.

The current physical layout was made on 12/27/2007.

Internal Disks

Embryonix has 4 identical internal disks.

Embryonix Disk Controller SD1, LSI 1030 SCSI

c1t0d0s*, Hitachi DK32EJ72NSUN72G, 68.4 GB
# Offset (KB) Size (KB) Name Notes
0 0 10,491,456 /root-t0 Reserved for past, future and backup versions of /
1 10,491,456 6,293,856 None (swap) Used for VM, /tmp, /proc
4 16,785,312 2,101,344 None (Wanda RAW)
Informix Raw Disk Chunks
Offset (KB) Size (KB) Name
0 2,097,152 tempdbs1_c1
2,097,152 4,192 unused
5 18,886,656 52,788,000 /research/zarchive0,
  a.k.a. /export/zarchive0
First disk in the ZFIN archives.

c1t1d0s*, Hitachi DK32EJ72NSUN72G, 68.4 GB
# Offset (KB) Size (KB) Name Notes
0 0 10,491,456 / Root filesystem
1 10,491,456 6,293,856 None (swap) Used for VM, /tmp, /proc
4 16,785,312 2,101,344 None (Wanda RAW)
Informix Raw Disk Chunks
Offset (KB) Size (KB) Name
0 2,097,152 tempdbs2_c1
2,097,152 4,192 unused
5 18,886,656 52,788,000 /research/zarchive1,
  a.k.a. /export/zarchive1
Second disk in the ZFIN archives.

c1t2d0s*, Hitachi DK32EJ72NSUN72G, 68.4 GB
# Offset (KB) Size (KB) Name Notes
0 0 10,491,456 /private Private on embryonix
1 10,491,456 6,293,856 None (swap) Used for VM, /tmp, /proc
4 16,785,312 2,101,344 None (Wanda RAW)
Informix Raw Disk Chunks
Offset (KB) Size (KB) Name
0 2,097,152 tempdbs3_c1
2,097,152 4,192 unused
5 18,886,656 52,788,000 /research/zarchive2,
  a.k.a. /export/zarchive2
Third disk in the ZFIN archives.

c1t3d0s*, Hitachi DK32EJ72NSUN72G, 68.4 GB
# Offset (KB) Size (KB) Name Notes
0 0 10,491,456 /private-t3 Reserved for past, future, and backup copies of /private
1 10,491,456 6,293,856 None (swap) Used for VM, /tmp, /proc
4 16,785,312 2,101,344 None (Wanda RAW)
Informix Raw Disk Chunks
Offset (KB) Size (KB) Name
0 2,097,152 tempdbs4_c1
2,097,152 4,192 unused
5 18,886,656 52,788,000 /research/zarchive3,
  a.k.a. /export/zarchive3
Fourth disk in the ZFIN archives.

RAID Disks

The bulk of ZFIN's filesystems are on the NStor RAID, which is attached to the ZFIN development server.

Physical Allocation

The 12 disks are allocated as follows:

Note: "Drive Groups", the Sun term, are called "Arrays" in NStor terminolgy.


Development RAID Physical Layout

Logical Allocation

Embryonix RAID Disk Controller, Chaparral JS122(A) Ultra-160 SCSI

Drive Group zfish1

c2t1d1s*, LUN 1, 564.7 GB
# Offset (KB) Size (KB) Name Notes
3 0 277,626,880 /research/zunloads,
  a.k.a. /export/zfish1lun1/S3
ZFIN Unloads directory
4 277,626,880 104,857,600 unused
7 382,484,480 209,715,200 /research/zusers,
  a.k.a. /export/zfish1lun1/S7
ZFIN Users directory

c2t1d3s*, LUN 3, 112.0 GB
# Offset (KB) Size (KB) Name Notes
0 8,192 reserved First 8 MB reserved for disk geometry
0 8,192 33,554,432 None (Wanda RAW)
Informix Raw Disk Chunks
Offset (KB) Size (KB) Name
0 25,165,824 tbldbs2_c1
25,165,824 2,097,152 tempdbs5_c1
27,262,976 2,097,152 tempdbs7_c1
29,360,128 2,097,152 tempdbs9_c1
31,457,280 2,097,152 unused
1 33,562,624 33,554,432 None (Wanda RAW)
Informix Raw Disk Chunks
Offset (KB) Size (KB) Name
0 25,165,824 tbldbs3_c1
25,165,824 6,291,456 smartbs1_c1
31,457,280 2,097,152 smarttempbs_c1
6 67,117,056 50,331,648 None (Wanda RAW)
Informix Raw Disk Chunks
Offset (KB) Size (KB) Name
0 1,048,576 smartdfltbs_c1
1,048,576 4,194,304 llogdbs
5,242,880 32,768 plogdbs
5,275,648 16,777,216 idxdbs1_c1
22,052,864 28,278,784 unused

Drive Group zfish2

c2t1d2s*, LUN 2, 564.7 GB
# Offset (KB) Size (KB) Name Notes
0 0 104,857,600 /research/central,
  a.k.a. /export/zfish2lun2/S0
ZFIN Central directory
4 104,857,600 78,643,200 /research/blastdb,
  a.k.a. /export/zfish2lun2/S4
BLAST database directory when BLAST is on embryonix
6 183,500,800 283,557,888 /research/blastfiles,
  a.k.a. /export/zfish2lun2/S6
BLAST files directory when BLAST is on embryonix. This partition is not backed up.

c2t1d4s*, LUN 4, 16.0 GB
# Offset (KB) Size (KB) Name Notes
0 1,024 reserved First 1 MB reserved for disk geometry
0 1,024 2,097,152 None (Wanda RAW)
Informix Raw Disk Chunks
Offset (KB) Size (KB) Name
0 65,536 sysadmindbs
65,536 2,031,616 unused
1 2,098,176 2,097,152 None (Wanda RAW)
Informix Raw Disk Chunks
Offset (KB) Size (KB) Name
0 2,097,152 unused
2 4,195,328 2,097,152 None (Wanda RAW)
Informix Raw Disk Chunks
Offset (KB) Size (KB) Name
0 2,097,152 unused
3 6,292,480 2,097,152 None (Wanda RAW)
Informix Raw Disk Chunks
Offset (KB) Size (KB) Name
0 2,097,152 unused
4 8,389,632 2,097,152 None (Wanda RAW)
Informix Raw Disk Chunks
Offset (KB) Size (KB) Name
0 2,097,152 unused
5 10,486,784 2,097,152 None (Wanda RAW)
Informix Raw Disk Chunks
Offset (KB) Size (KB) Name
0 2,097,152 unused
6 12,583,936 2,097,152 None (Wanda RAW)
Informix Raw Disk Chunks
Offset (KB) Size (KB) Name
0 2,097,152 unused
7 14,681,088 2,097,152 None (Wanda RAW)
Informix Raw Disk Chunks
Offset (KB) Size (KB) Name
0 2,097,152 unused

c2t1d5s*, LUN 5, 80.0 GB
# Offset (KB) Size (KB) Name Notes
0 8,192 reserved First 8 MB reserved for disk geometry
0 8,192 33,554,432 None (Wanda RAW)
Informix Raw Disk Chunks
Offset (KB) Size (KB) Name
0 25,165,824 tbldbs1_c1
25,165,824 2,097,152 tempdbs6_c1
27,262,976 2,097,152 tempdbs8_c1
29,360,128 2,097,152 tempdbs10_c1
31,457,280 32,768 rootdbs_c1
31,490,048 32,768 rootdbs_c2
31522816 2031616 unused
1 33,562,624 33,554,432 None (Wanda RAW)
Informix Raw Disk Chunks
Offset (KB) Size (KB) Name
0 16,777,216 idxdbs2_c1
16,777,216 16,777,216 idxdbs3_c1
6 67,117,056 16,769,024 None (Wanda RAW)
Informix Raw Disk Chunks
Offset (KB) Size (KB) Name
0 2,097,152 defaultdbs_c1
2,097,152 16,769,024 unused

Upgrade Server Disk

Bionix, the upgrade server has 3 disk controllers. One controller has 2 internal disks hung off of it. The remiaining 2 controllers are unused.

Internal Disks

Bionix has two equal sized internal disks, hung off of one controller.

Bionix Disk Controller 0, Symbios 53c875 Fast/UltraWide SCSI

c0t0d0s*, Fujitsu MAJ3364M SUN36G SCSI 33.9 GB
# Offset (KB) Size (KB) Name Notes
0 0 8,193,204 /root-t0 Reserved for backup of /, or Solaris 10
1 8,193,204 2,048,301 None (swap) Used by Solaris virtual memory
6 10,241,505 25,322,085 /research/zscratch0,
  a.k.a /export/scratch0
Scratch disk. This is not backed up.

c0t1d0s*, Fujitsu MAJ3364M SUN36G SCSI 33.9 GB
# Offset (KB) Size (KB) Name Notes
0 0 8,193,204 / Also contains /private
1 8,193,204 2,048,301 None (swap) Used by Solaris virtual memory
5 10,241,505 12,583,039.5 None (Wavy RAW)
Informix Raw Disk Chunks
Offset (KB) Size (KB) Name
0 16,384 rootdbs_c1
16,384 32,768 plogdbs_c1
49,152 31,744 defaultdbs_c1
80,896 1,967,104 llogdbs_c1
2,048,000 131,072 idxdbs1_c1
2,179,072 131,072 idxdbs2_c1
2,310,144 131,072 idxdbs3_c1
2,441,216 131,072 idxdbs4_c1
2,572,288 262,144 tbldbs1_c1
2,834,432 262,144 tbldbs2_c1
3,096,576 262,144 tbldbs3_c1
3,358,720 65,536 smartdfltbs_c1
3,424,256 65,536 smarttempbs_c1
3,489,792 2,097,152 smartbs1_c1
5,586,944 2,097,152 smartbs2_c1
7,684,096 2,097,152 smartbs3_c1
9,781,248 2,097,152 smartbs4_c1
11,878,400 704,638 tempdbs1_c1
6 22,824,544.5 12,739,045.5 /research/zbionix1,
  a.k.a /export/bionix1
User disk for bionix.

Physical Design

At the time of the port to Informix (2000/03), the physcial design consisted only of round-robin fragmentation for every table and a smattering of indexes, mostly on the fast search tables. The vast majority of tables did not have primary keys (or unique constraints) defined on them. I don't believe there was a single foreign key or not null constraint defined in the database. Over the next 21 months primary keys were defined for most tables, and foreign key, not null, and unique constraints were added in many (but not yet all) places as well.

When we moved the database from the old RAID (the Andataco) to the new RAID (the Sun A1000) in 2002/01, we did the first thorough physical design of the database. This physical design was done with the new RAID in mind, but everything that was done will also work well in a non-RAID environment.

See the Disk Usage section for descriptions of the disk layouts on the production and development servers.

DB Spaces

DB spaces, or dbspaces, are the largest logical unit of storage in Informix. When tables and indexes are defined you can specify what dbspace(s) to put the table or index into. dbspaces are implemented with one or more chunks, which are actual contiguous pieces of physical disk. Informix recommends that you have only one chunk per dbspace, and (at least initially) we have done this at ZFIN.

The remainder of this section discusses the particular dbspace setup that ZFIN uses. Also see the RAID and Informix section for additional information on dbspaces.

Separate Table and Index dbspaces

Informix recommends that you store indexes in a different dbspace from the dbspace where the table the index is on is stored. This means that disk accesses to the index will not compete with disk accesses to the table. At ZFIN we have taken that one step further and placed all indexes in dbspaces that contain only indexes.

Table dbspaces

After experimentation (see RAID and Informix), we settled on 3 table dbspaces for ZFIN, called tbldbs1, tbldbs2, and tbldbs3. Table spaces contains only tables.

Index dbspaces

After experimentation (see RAID and Informix), we settled on 4 index spaces for ZFIN. The index spaces, called idxdbs1, idxdbs2, idxdbs3, and idxdbs4, are all the same size. Index spaces contain only indexes.

Each index is assigned to one index dbspace. All the indexes on a given table are usually assigned to the same index dbspace. Indexes are assigned to index dbspaces based on three things:

Default dbspace

When you create a database you specify what the default dbspace for the database is. Any tables and indexes that are not explicitly placed in a dbspace end up in the default dbspace. This includes

Starting with Informix 9.30, ZFIN has a default dbspace called defaultdbs. It is useful for separating the data dictionary tables from data tables, and for easily detecting what tables and indexes have been (erroneously) created without "IN" clauses.

Smart BLOB Spaces

Smart BLOB Spaces, or sbspaces, store BLOBs (binary large objects) and CLOBs (character large objects) in Informix. In ZFIN sbspaces are used to store images and text fields that are too long to fit in LVARCHARs. sbspaces are also used extensively by datablades such as the Web datablade. The HTML datatype from the Web datablade uses CLOBs.

When ZFIN went from Informix 9.21 to 9.30 in 2003, we went from a single smart BLOB space with lots of chunks, to 4 smart BLOB spaces, each with a small number of chunks (at least on production). The hope is that we would get better parallel reads of smart BLOBs. Smart BLOB columns with only a small amount of data are PUT in a single smart BLOB space, while columns with a lot of data are fragmented (round robin) into all 4 smart BLOB spaces.

At that time we also added a default smart BLOB space (smartdfltbs) to hold smart BLOBs that are not explicitly PUT in one of the other smart BLOB spaces. This is mostly/entirely used by web datablade tables. Starting with Informix 9.30 ZFIN also has a temporary smart BLOB space as well. Initial indications are that this is not useful.

Temporary dbspaces

Prior to our performance problems in late 2004 (see FogBUGZ case 452) we always put temporary dbspaces on drives that were not in any RAID level. RAID 10 (production) and RAID 5 (development) make the most sense when you have many more reads than writes on data, since every write requires 2 writes to the disk. Temporary dbspaces have an almost equal number of reads and writes.

Prior to 2004/10 we were willing to put temporary dbspaces on disks that might have been busy with other partitions, such as / or /private. However, with the increased use of temporary tables in app pages that came with gene expression search form work in late 2004 (see FogBUGZ case 27), we started getting bottlenecks on our temp tables. At that time we stopped using temporary dbspaces on high-activity non-RAID disks and created several new ones on RAID disks. This helped a lot.

Tables and Indexes

Fragmentation

In Informix a table is stored in one or more dbspaces. If a table is stored in more than one dbspace, then it is fragmented. Where a table is stored and whether or not it is fragmented, is specified at table creation time.

Types of Fragmentation

Informix supports 2 types of fragmentation:

Both types of fragmentation are good for speeding up disk accesses. Fragmenting a table across multiple dbspaces allows Informix to fire off simultaneous read requests for each dbspace that a table is fragmented into. This is useful when doing full table scans.

In additon, expression based fragmentation also has the potential to greatly reduce the number of records read for some queries. In the linkage group example, if a query has an equality comparison on the linkage group column that was used to do the fragmentation, then Informix will only read records in that fragment.

As of 2002/01, ZFIN uses round robin fragmentation extensively, but does not use expression based fragmentation at all. The main reason we don't use expression based fragmentation is because most of our data does not lend itself to usable fragmentation policies. Fragmenting by ZDB ID makes no sense at all, since when we are querying on ZDB IDs we are usually looking for a specific record (rather than a large group of records). The same logic applies to names and abbreviations as well. In fact, linkage groups and mapping panels are the only obvious examples where expression based fragmentation might pay off in ZFIN.

However, even using linkage group for fragmentation has risks. In particular, there are over 20 linkage groups in zebrafish. In order to get the full benefits of expression based fragmentation we might need to create a fragment for each linakge group. In other words, we would need over 20 fragments and therefore over 20 dbspaces. ZFIN currently has 3 table dbspaces. Adding over 20 more is a non-trivial task, and while it might be worthwhile, it should not be done without first determining if it is really worth it.

Round Robin Fragmentation in ZFIN

In general, large tables in ZFIN are fragmented across 3 table dbsapces using round robin fragmentation.

Extent Sizes

Space within a dbspace is reserved for tables in contiguous blocks of disk pages called extents. A given extent will hold only records for the table the extent was allocated for. When an extent fills up, another extent is allocated. Tables that are fragmented are initially allocated one extent in each dbspace the table is fragmented into.

The extent size for a table is declared when the table is defined. Informix recommends that extents be large enough to hold all of the records for a table in that fragment. That is, each table should have only 1 allocated extent in each dbspace the table is stored in.

All ZFIN tables are assigned explicit extent sizes, and those explicit extent size are always in powers of 2. The hope is that by using only powers of 2, the disk will become less fragmented over time. That also lines up nicely with the read ahead values in the onconfig file for the server. Both RA_PAGES and RA_THRESHOLD are defined as powers of 2.

Extent sizes are used at ZFIN to determine when to fragment a table and when to keep it all in one dbspace. The rule of thumb is something like:

Extent sizes should be reviewed at least once a year. Usually they need to grow.

Lock Mode

A table's lock mode specifies the granularity at which database locks are obtained on that table. There are two options

Page level locking results in fewer locks (since there are multiple records on each page) for the locking process and therefore it runs faster. However, it means that the locking process may be locking records it actually doesn't need, thereby unnecessarily preventing other processes from getting the data they need.

Row level locking allows for better concurrency, but can have an extremely negative performance impact on processes that update large number of records.

Because of the tradeoffs involved, ZFIN uses a mixture of row level and page level locking. (Before upgrading to Informix 9.30 in 2003/08, ZFIN used almost entirely page level locking. It was fast, but we often got concurrency problems.)

For each table, ZFIN uses this set of general guidelines to determine what the lock granularity should be. These rules are guidelines; they are not hard and fast.

Constraint Names

Wherever possible, database constraints (primary key, unique, foreign key, check, not null) are assigned explicit names. This makes table definitions longer, but it has two pleasant side effects:

Primary Key, Unique, and Foreign Key Indexes

Informix requires an index to support each primary key, foreign key, or unique constraint defined in the database. If an index already exists for the columns in the constraint then Informix will use the existing index to support the constraint. If an index does not already exist then Informix will automatically build one.

Unfortuantely, there is no way to control where such automatically built indexes are placed. Anytime Informix has to create an index to support a constraint it places the index in the default dbspace. This is undesireable on a number of levels. First, defaultdbs is busy handling other things like the data dictionary. Secondly, it doesn't allow us to load balance our indexes as described in Index dbspaces above.

In order to control where these indexes go on disk we define our tables in a particular way.

  1. Define the table without any primary key, unique, or foreign key constraints on it.
  2. For each primary key, unique, or foreign key constraint we want to place on the table, we define an index to support the constraint, and then we add the constraint.

This leads to large awkward looking table definitions (see below), but it is the only way to control where indexes are placed.

Example Table Definition

Here is an example table defintion. It includes extent size, round robin fragmentation, explicitly named constraints, and explicit placement of primary key, unique and foreign key indexes.

create table anatomy_item ( anatitem_zdb_id varchar(50), anatitem_name varchar(80), anatitem_comments_relative_url lvarchar not null constraint anatitem_comments_relative_url_not_null, anatitem_start_stg_zdb_id varchar(50) not null constraint anatitem_start_stg_zdb_id_not_null, anatitem_end_stg_zdb_id varchar(50) not null constraint anatitem_end_stg_zdb_id_not_null ) fragment by round robin in tbldbs1 , tbldbs2 , tbldbs3 extent size 128 next size 128 lock mode page; revoke all on "informix".anatomy_item from "public"; -- primary key create unique index anatomy_item_primary_key_index on anatomy_item (anatitem_zdb_id) in tbldbs4; alter table anatomy_item add constraint primary key (anatitem_zdb_id) constraint anatomy_item_primary_key; -- alternate keys create unique index anatitem_name_index on anatomy_item (anatitem_name) in tbldbs4; alter table anatomy_item add constraint unique (anatitem_name) constraint anatitem_name_unique; -- foreign keys create index anatitem_start_stg_zdb_id_index on anatomy_item (anatitem_start_stg_zdb_id) in tbldbs4; alter table anatomy_item add constraint foreign key (anatitem_start_stg_zdb_id) references stage constraint anatitem_start_stg_zdb_id_foreign_key; create index anatitem_end_stg_zdb_id_index on anatomy_item (anatitem_end_stg_zdb_id) in tbldbs4; alter table anatomy_item add constraint foreign key (anatitem_end_stg_zdb_id) references stage constraint anatitem_end_stg_zdb_id_foreign_key;

Which is an awful lot of text just to define a table with 6 columns, a primary key, an alternate key and 3 foreign keys.

Apache and Web Datablade Administration

ZFIN uses Apache as its web server. When an HTTP request arrives the operating system routes it to Apache and then Apache either returns the requested page (in the case of static pages), or routes the request to the appropriate process/module (in the case of dynamic pages). See the big picture.

ZFIN uses the Informix Web DataBlade to tie together Apache and Informix. ZFIN pages that access the database start with a particular URL. Apache recognizes that URL and passes all such requests off to the webdriver module which is actually running inside the Apache process. (The webdriver module can be thought of as the "client" part of the Web Datablade.) The webdriver module sends the request to Informix, invoking the Web Datablade to retrieves the requested dynamic page from the database, interprets it, including the execution of SQL statements, and then returns the formatted results back to Apache, which in turn returns the page to the outside world.

Versions

ZFIN tries to always run the latest version of Apache 1.3.

Prior to the release of Web Datablade 4.13.FC3 in 2004, any given version of the Web DataBlade was officially supported only with certain versions of Apache. However, we always got away with using the latest Apache. In 4.13.FC3, they now officially support any version of Apache, including Apache 2.0.

Apache

The Apache web site (www.apache.org) is a great source of information about Apache. It is also where you can find versions of Apache for download. ZFIN also owns the book Apache Server Unleashed which is also a good source of information.

The Apache source and executables can be found in the same places on all ZFIN server machines. The source is under /private/apps/apache (which is a sym link to /private/apps/apache-version). The executables, configuration files, and logs can be found under /private/httpd. The Apache executable itself is at /private/httpd/bin/httpd.

When we upgraded Informix to 64-bit, we also upgraded Apache to 64-bit. This means that the both the client parts of the Web Datablade (i.e., webdriver) and the server side of the Web Datablade can both come from the same 64-bit release of the Web Datablade.

Note that as long as we use TCP/IP to communicate with Informix (we do), it is not strictly required to have Apache be 64 bits. It is acceptable, and supported by IBM, to have Apache run a 32 bit webdriver, and Informix be 64-bit (and have a 64-bit Web Dat