ZFIN Documentation:
Coding Standards and Best Practices
This documentation is preserved for historical purposes, and NOT updated. Updated documentation is found here: http://almost.zfin.org/doc

$Id: standards.html,v 1.16 2005-10-18 21:14:50 staylor Exp $

Back to Table of Contents
Previous Section    Next Section


"What do you mean we don't have standards? We have 2 or 3 standards for everything!"

Philosophy

The goals of these standards are to

The hard part about standards is getting the right amount of them. Not enough and we won't meet our goals. Too many and we will spend all our time looking up standards, and arguing about them.

ZFIN strives to have just enough standards to meet our goals. We strive to make them easy to follow, which means we put a fair amount of time into making this document as readable as possible. We attempt to express standards (and non-standards in concise statements that are are shown in bold and in a distinct green color.

The emphasis of this document is on naming conventions, modularity and well defined interfaces, and comments.

These standards do not emphasize on the syntax or particular statements These standards generally do not specify the one true way to do something in the code.

Non Standards

In addition to describing coding standards that ZFIN has, this document also includes some discussion of standards that we have agreed not to have. In particular, if we spent time discussing a possible standard and decided that either the issue was too contentious or that it didn't matter then this document will explicitly state that we have no standard in that area.

Updating ZFIN Standards

This is a living document. New standards are needed as we gain more experience or adopt new technology. Some existing standards will need to be tweaked or discarded.

New and revised standards start life in this file as preliminary standards. Preliminary standards are subject to a one month feedback and revision process. At the end of that month, the preliminary standard gets promoted to a standard.

Preliminary standards are displayed with a special background color and are preceded with this comment:

This section is a preliminary (or revised) standard posted on YYYY/MM/DD. It is open to comments and revision until YYYY/MM+1/DD, when it will lose its preliminary status.

New and revised standards are added to this file

Once these standards have settled, I propose a quarterly review of them to figure out what needs to be changed, and then change it. Discussion can be tracked in FogBUGZ.

Enforcement

Enforcement of standards, like standards themselves, is a matter of striking a balance. We want just barely enough enforcement to ensure that standards are followed.

ZFIN considered two extremes of enforcement:

We settled on an approach that is one step up from self-enforcement.

Code Reviews

ZFIN adopted these standards for code reviews:

  1. Trivial revisions do not need to be reviewed. A change is non-trivial if any of these conditions hold:
    1. It affects the defined interface for the module/object/page/program.
    2. It involves anything but simple logic changes or formatting changes.
    If you are in doubt as to whether or not your code needs reviewing, then have it reviewed.
  2. Before a set of revisions can be committed, it must be spot checked and approved by one or more other ZFIN staff.
  3. If the person reviewing the change does not have expertise in all areas of the revision, then additional people need to be called in to review those areas.
  4. The name(s) of the reviewers will be included with the CVS message when the changes are committed.
  5. The reviewer will specify at the time of the review whether or not they want to review the code again after any changes have been made.

When sending changes to other developers for review it is useful to include the following information.

For the set of files being changed:

For each file being changed:

For example,

Can you review the changes described below? I added knowledge of the all_name_ends table to a linkage page. Case: 461 Source: /research/zusers/clements/Zfin/ZFIN_WWW Site: albino.zfin.org Index: ZFIN_WWW/home/ZFIN/APP_PAGES/linkage/add_linkage_members.apg =================================================================== retrieving revision 1.15 diff -r1.15 add_linkage_members.apg 1c1,3 < <!-- this app page is used to add members to a new linkage group. The page supports, search/selection of the markers/mutants. --- > <?MICOMMENT> > *** this app page is used to add members to a new linkage group. > *** The page supports, search/selection of the markers/mutants. 7c9 < --> --- > <?/MICOMMENT> 144c146 < Specify members of linkage: <?MIVAR> --- > Specify members of linkage: 162d163 < <?/MIVAR> 170,176c171,180 < <?MISQL SQL="select distinct mrkr_zdb_id,mrkr_name,mrkrtype_type_display,mrkr_abbrev < from marker a, all_map_names b, marker_types c < where a.mrkr_type='$mtype' < and a.mrkr_zdb_id = b.allmapnm_zdb_id < and b.allmapnm_name_lower like '%$(LOWER,$member)%' < and a.mrkr_type = c.marker_type < order by a.mrkr_abbrev;"> --- > <?MISQL SQL=" > select distinct mrkr_zdb_id, mrkr_name, mrkrtype_type_display, > mrkr_abbrev > from marker, all_map_names, all_name_ends, marker_types > where mrkr_type = '$mtype' > and mrkr_zdb_id = allmapnm_zdb_id > and allmapnm_serial_id = allnmend_allmapnm_serial_id > and allnmend_name_end_lower like '$(LOWER,$member)%' > and mrkr_type = marker_type > order by mrkr_abbrev;"> 216,217c220,221 < <?MIVAR COND="$(=,$MI_ROWCOUNT,0)">No mutant matches.<?/MIVAR> < <p> --- > <?MIVAR COND="$(=,$MI_ROWCOUNT,0)">No mutant matches.<?/MIVAR> > <p> 219,220c223,224 < <?/MIBLOCK> < --- > <?/MIBLOCK> > <?/MIVAR> CVS Entry: Made several changes: o Converted query that did like '%name%' processing on all_map_names to do like 'name%' processing on all_name_ends. The new query will be faster than the old. o Converted header comments from HTML to MICOMMENTs. o Fixed an unterminated MIVAR reported by Informix weblint.

Deadlines versus Code Reviews

Whenver possible, you should allocate sufficient time in your schedule for code reviews and any resulting rework. This is often not hard at ZFIN since many projects do not have hard deadlines. However, many projects do have hard deadlines that cannot be missed due to interaction with other organizations or because of upcoming events such as conferences or publications.

In such cases, development work can go right up to the dealine, not leaving enough time to do a code review before the code needs to go in. What should we do in these cases?

  1. First, try to avoid this situation whenever possible. Really.

  2. If a deadline looms and the changes need to go in, then commit them without code review, stating in the CVS message that the changes were not code reviewed due to deadlines.

  3. Get the changes code reviewed within a week of committing them.

General Standards and Best Practices

This section discusses coding standards and best practices that are independent of the particular language or technology being used. Often the standards given in this section are expanded on in some or all of the language specific sections. When working on code, both the general and the language specific standards must be used.

There are three main areas of emphasis in this document, and several minor areas. The three main areas have both general standards and language specific standards. The three major areas are:

  1. Naming Conventions
  2. Modularity and Defined Interfaces
  3. Readability. This includes Comments and Indenting.

Naming Conventions

ZFIN cares a great deal about naming conventions for almost everything you can think of:

Each language has its own set of naming conventions. See:

Variable Scope

In languages where global variables are the only option (Web Datablade), or where the default scope for variables is global (PERL), then our naming conventions include standards that help us tell where any variable came from.

See also:

Modularity and Defined Interfaces

ZFIN code needs to be modular and it needs to have well defined interfaces. Our goals in this area include:

A well defined interface will specify these things:

  1. A brief description of what the module does.
  2. A list of inputs
  3. A list of outputs
  4. A list of side effects

How modularity is implemented varies from language to language. See:

Comments

Comments are a key part of the ZFIN standards. In addition to just making things more readable, they are key to well defined interfaces.

See also:

Defined Interface Comments

Comments are required at the top of each defined interface. This specifies what the file/module/program/routine/etc does, expects as input, produces as output, and what side effects it has. Each language has a different format for defining interfaces. See:

Closing Comments

Indenting makes a file infinitely more readable. However, good indenting complements the practice of putting comments on closing tags/statements that are a long way from their matching opening tag or statement. Comments are required on some closing tags / statements that are more than 50 lines away from the matching opening tag or statement. The practice of using ending comments helps enormously on blocks of code that span a page or more.

See also:

Inline Comments

Comments are recommended anywhere where it is not obvious from the code what is going on. If it took you some time to figure out what is going on then you should take the time to record your hard-earned knowledge in a comment.

Inline comments are love notes to your future self. Feel the love.

See also:

Comment Flags

There are several types of inline comments that deserve particular attention and that are to be labeled with specific "flags" in the comment. This will allow us to search for this particular type of flag. The flags are
Comment Flags
:TODO: Put this in the code when you have more to do here and don't ant to forget to do it. You should never commit a file that a :TODO: comment flag in it.
:KLUDGE: Use when something isn't pretty, but it works. Flags things that could be improved on the next thorough examination of the code.
:PRODUCT_BUG: Use this when code is working around a bug in a product. When the next release of the product comes out this will allow us to test if the problem has gone away.

Section Separator Comments

Comments are also useful as section separators in large files. Section separator comments are large enough to pop out at you when scrolling through a file in an editor or looking at a printed page. Each language has its own standard section separator comment format. See:

Indenting and Whitespace

Our standard on indenting is that you must indent your code. Code that is not indented or is indented in a misleading way will not be accepted. However, we usually don't have specific standards telling you exactly how you should indent your code.

See also:

Reindenting

Do you have to reindent all the code whenever you add an extra condition, for example, around a large part of a file? No, you only need to reindent a file when it becomes hard to read. In order to avoid the need for constantly reindenting a file, we recommend that your initial indent be at least 4 characters wide. This gives you room to add at least one condition (and up to 3 if you squint) before you need to reindent a large portion of the file.

Also, large scale indenting changes to a file should be done as their own revision. That is, whenever possible, we should not mix logic changes and big indenting changes. They should be done as two separate revisions.

Whitespace

ZFIN has no standard on blank lines. Just make your code readable. Using single spaces to separate multiple items on a single line in encourages, where the language supports it.

Tabs versus Spaces

Configure your editor to use spaces instead of tab characters. There is no standard definition across tools on what a tab means, even on Unix.

Once we figure out how to do this in each editor in use at ZFIN, add a description here on how to do this.

Maximum Line Length

Good indenting practice and good naming conventions are at odds with maximum line length standards. Good indenting and readable variable names leads to long lines. There are also some situations, particularly in HTML and JavaScript, where it is impossible to have short lines.

ZFIN does not have a maximum line length. However, short lines are favored over long lines whenever using short lines does not compromise readability. If you can easily fit things in 80 characters or less, please do so.

Source Code Control and Committing Files

Any file that is part of the web site, that is used to produce the web site, or that is used to update the web site must be under CVS.

This includes things like FileMaker files, and load scripts. Historical load scripts that are not likely to be used again do not need to be loaded into CVS.

Committing Files

Files cannot be committed to CVS until the changes in them have been inspected by someone else. See Enforcement for details.

When committing a file 3 things must be provided.

  1. The FogBUGZ case, if one exists for this revision.
  2. A verbal description of what changed in the file and why.
  3. Who inspected this set of revisions.

Debugging Support

Multi-tier applications like dynamic web sites are notoriously difficult to debug. Does the problem lie in the browser, the web server, HTML code, file permissions, app page tags, SQL syntax, database permissions, or in the database server? Often it is darn hard to tell.

Wherever possible, a standard approach should be taken to debugging in a given language. This means that

  1. There should be a standard way to turn debugging on and off.
  2. There should be a standard way to support debugging in each program/module/page. That is, if there is always some useful information to know when using a particular language, then that information should be displayed when debugging is turned on.

See also:

Overriding Standards

We have standards because they improve the quality of the code in the vast majority of cases in which they can be applied. If this weren't true of a standard then that standard should quickly become a non-standard.

However, even the best standards aren't applicable 100% of the time. If you find yourself in a situation where applying a standard lowers the quality of the code then you are free to ignore the standard as long as:

  1. You document why the standard was ignored, and
  2. You can convince your reviewer(s) that using the standard would have lowered the quality of the code.

Applying Standards to Old Code

Newly written code should always conform to the ZFIN standards. But what about old code that was written before we had standards?

Here are some guidelines

  1. Any new code should conform to standards, no matter how small the code.
  2. Any new variables should conform to ZFIN Naming Conventions.
  3. If you spent the time to figure out any of a previously undefined interface, then create a skeleton defined interface for the code. Label anything you don't know with "Undocumented.".
  4. If you are rewriting a significant portion of the code then bite the bullet and rewrite the surrounding code to conform to standards as well.
  5. If it easy to apply a standard to old code, and the payoff is high, then fix the old code. For example, in app pages it is relatively easy to replace $1, $2, ... variables with named variables.

HTML

This section describes standards and best practices for both HTML only pages (static pages) and for HTML code that is embedded in other languages, such as app pages. There may also be additional language specific standards for HTML that is embedded in other languages.

Style Sheets

Style settings (font, font size, colors, etc.) should be moved from HTML files to the single global ZFIN Cascading Style Sheet. All HTML should use styles from that global style sheet. There will be no local style settings, either in other CSS files, or directly in the HTML file.

HTML Capitalization

HTML tags should be all lower case. We could have picked all upper case as well. We picked all lower case because that is what XHTML uses.

HTML Closing Tags

Not all opening HTML tags require matching closing tags. Here are ZFIN's guidelines for when we should provide closing tags that HTML does not strictly require.

HTML Attribute Values

HTML attribute values should always be placed in quotes. The HTML standard requires this.

Italics

The following should be italicized whenever they are displayed:

Note that genes are the only types of markers that are italicized. All other marker types are shown as plain text.

Non-ASCII Characters in HTML

The standard ASCII character set is 7 bits long. It includes every character that is needed in most of our pages. However, it does not include non-English characters such as "ü" or "à", or non-alphanumeric characters such as the copyright sign, "©".

Including non-ASCII characters directly in HTML pages can be done, but it is problematic. There is no guarantee that editors, and more importantly browsers, will render the characters correctly. A better approach is to replace all non-ASCII characters with HTML Character Entities.

All non-ASCII characters in HTML should be converted to HTML Character Entities before being checked into CVS.

HTML Character Entities

HTML Character Entities serve (at least) two purposes in HTML:

HTML Character Entities have the format:

  &mnemonic;
mnemonic is a 2 to 8 character mnemonic code for the character. Complete lists of HTML Character Entities are available online at:

The official specification: http://www.w3.org/TR/REC-html40/sgml/entities.html
Nicer interface: http://www.alanwood.net/demos/ent4_frame.html

Some of the more popular HTML character entities are:

Character HTML Character Entity
< &lt;
> &gt;
& &amp;
ü &uuml;

HTML Comments

As stated earlier, comments are a developer's best friend. However, for static HTML pages comments should be used sparingly:

For dynamic pages, always use the comment syntax for the dynamic language, rather than HTML comments. These comments do not go out over the web.

Internal ZFIN Links

Links within the ZFIN web site should be relative rather than absolute.

There are several reasons for this:

  1. Development web sites: Relative links mean that when we are in a development/test web site, we will stay in that development/test web sites. Absolute links would take us unexpectedly to the production web site.
  2. Mirror web sites: The point of ZFIN's mirror sites is to provide local access to as much of ZFIN as possible in distant parts of the worlds. Absolute addresses defeat the purpose.
  3. Mixed protocols: When you specify an absolute URL, you also have to specify the protocol, either HTTP or HTTPS. If you used HTTPS to load a page then all of the included content (images, JavaScript, CSS files) in that page must use HTTPS as well. If any of the included content uses HTTP then the user gets a security warning. It is easier just to use relative references, and let the browser default to the page's protocol.

What is an absolute link?

Any link in an "a href" html tag that has "zfin.org" in it.

How do we make a non-absolute (relative) link?

Well, here are some examples: To link to static pages:

  1. a href="/" : (the zfin.org homepage)
  2. a href="/zf_info/glossary.html" : (the zfin.org glossary)
  3. a href="/<!--|CGI_BIN_DIR_NAME|-->/ZFIN_jump?record=ZDB-PUB-011008-1" : (a specific pub)
  4. To link to a specific web datablade, dynamically generated page, use one of our generic tags (all generic tags are found in the .tt file in [yer] database's /Commons/env directory). An example:

  5. a href="<!--|WEBDRIVER_PATH_FROM_ROOT|-->?MIval=aa-fishview.apg&OID="

If you think of "/ZFIN_WWW" , "/" , and "yerdb.zfin.org" as synonymous, finding the relative link to your file under your ZFIN_WWW directory will be easier.

A relative link points to a aliased spot in ZFIN's file system. An absolute link also points to an aliased spot. However, an absolute link specifies which copy of the ZFIN file system it is pointing to (namely, the zfin.org copy), while a relative link allows some other process (namely, the apache configuration file and the gmake/.tt files) to direct it to the appropriate copy.

JavaScript

Kevin will write this section.

Database

This section describes database standards relating to using the database. For logical database design standards, see the Logical Design Conventions section of the Database Design document. For physical design standards see the Physical Design section of the Administration document.

SQL

This section describes some less than obvious best practices for writing SQL at ZFIN. For standards that are specific to SPL routines see the SPL section.

SQL Comments

Informix supports two types of comments in SQL:

Multiline comments (braces) are a good way to comment out large chunks of SQL when debugging. They could also be used to comment out defined interface comments in SPL routines.

However, braces have one drawback: SQL mode in emacs does not recognize braces as SQL comments and does not display comments in braces as comments. Emacs displays them as SQL code, which means that highlighting and colors in multiline comments are displayed more or less randomly. Because of this, ZFIN recommends that multiline SQL comments use dashes on each line, rather than braces to enclose the whole comment.

SQL Indenting

Each SQL keyword should start a new line. This SQL statement needs to be rewritten:

select prfx_a, prfx_b from some_table where prfx_c > 0 and prfx_d is not null;

Here is one possibility:

select prfx_a, prfx_b from some_table where prfx_c > 0 and prfx_d is not null;

SQL Capitalization

Table and column names should be in all lower case. ZFIN has no standard for capitalizing SQL keywords.

Join Syntax

ZFIN does not have a standard join syntax. You can use either the ANSI join syntax where join conditions are specified in the FROM clause or the older syntax where the join conditions are in the WHERE clause.

Examples

SELECT DISTINCT

Putting a distinct clause in a select statement causes the select to get rid of any duplicate rows before returning the results. It does this by performing a sort on the result set, a fairly expensive operation.

Unless you have a reason to use distinct clause in a select statement, don't use one. Use a distinct clause only if there is the potential for duplicates in the result set. This requires some additional work on the programmer. You need to examine the columns and tables in the query to determine if duplicates are possible.

INSERT

When inserting a list of values into a table, always enumerate the columns in the insert statement. This makes it easier to keep insert code up to date with table changes.

Example

Column Names versus Column Numbers

Whenever possible, use column names rather than column numbers in order by and group by clauses. If someone adds or drops columns from a select list then an order by or group by clause using column names will continue to work as expected. Dropping or adding columns in the select list in a query that uses column numbers may result in a query that continues to run, just not the way it was supposed to.

For example, don't do this:

select mrkr_name, mrkr_abbrev, mrkr_type, mrkr_abbrev_order from marker order by 4;

The above query would quietly break if I added mrkr_zdb_id to the front of the select list. Instead, use the column name:

select mrkr_name, mrkr_abbrev, mrkr_type, mrkr_abbrev_order from marker order by mrkr_abbrev_order;

Subqueries

Subqueries can greatly improve the readability of certain queries, and subqueries are the only way to ask some questions. Use subqueries wherever they are appropriate.

IN versus EXISTS

When writing subqueries use "exists/not exists" instead of "in/not in", whenever possible. There are two reasons for this:

  1. It makes it easier for the optimizer. "In/not in" queries can take an order of magnitude longer to run in certain cases. (See below for why.) In general, when writing complicated queries, try and make it easier for the optimizer to figure out what is really being asked

  2. "not in" subqueries are not logically equivalent to "not exists" subqueries when the list of values returned by the subquery includes NULLs. If any value in the list returned is a NULL then the whole subquery condition will evaluate to false. (See below for why.) I can't think of a situation when this is the behavior you would want. Avoid this by always using not exists subqueries instead.

For example, this query:

select xpat_zdb_id from expression_pattern where '$OID' in (xpat_gene_zdb_id, xpat_probe_zdb_id) and ( xpat_zdb_id in ( select xpatfimg_xpat_zdb_id from expression_pattern_image ) or xpat_zdb_id in ( select xpatstg_xpat_zdb_id from expression_pattern_stage where xpatstg_comments is not null ) or xpat_zdb_id in ( select xpatanat_xpat_zdb_id from expression_pattern_anatomy ) ) order by xpat_zdb_id;

should be rewritten as this query:

select xpat_zdb_id from expression_pattern where '$OID' in (xpat_gene_zdb_id, xpat_probe_zdb_id) and ( exists ( select 'x' from expression_pattern_image where xpat_zdb_id = xpatfimg_xpat_zdb_id ) or exists ( select 'x' from expression_pattern_stage where xpatstg_comments is not null and xpat_zdb_id = xpatstg_xpat_zdb_id ) or exists ( select 'x' from expression_pattern_anatomy where xpatanat_xpat_zdb_id = xpat_zdb_id ) ) order by xpat_zdb_id;

Q: Why can the "in" query be an order of magnitude slower than the "exists" query?

A: If we could run the "in" query completely unoptimized, it would do a linear scan of 1 to 3 of the inner tables (expression_pattern_image, expression_pattern_stage, expression_pattern_anatomy) for each matching row in the outer table (expression_pattern). This is in fact what happens in many cases.

The "exists" query more clearly states in the subqueries that we are looking for the existence of rows with a specific value. In the "exists" query, it is much easier for the optimizer to figure out that it can use the indexes on the inner tables, rather than linear scans.

Q: Why do NULLs in "not in" subqueries always cause the condition to be evaluated to false?

A: In SQL any condition involving a NULL value is assigned the value of "unknown", rather than true or false. It maintains that unknown state until the last possible part of the query when the unknown is promoted to false. In the case of "not in" queries, NULLs mean that the database can't tell if the value is in the list or not, because the values of the NULLs are unknown. That unknown will eventually become a false.

SELECT INTO TEMP

This is one of two ways to create temporary tables. However, there are several things you need to be aware of when using it. See the Temporary Tables section for more information.

SPL

This section discusses standards that are specific to stored procedure language (SPL) routines. SPL routines at ZFIN range from simple ones that are called by insert and update triggers to do data validation to very long ones used to generate fast search tables. Most of the SPL routines at ZFIN are defined in the lib/DB_functions/SPL directory. SPL routines that are specific to data loads/unloads are also found in the server_apps/data_transfer/ tree.

SPL Naming Conventions

SPL Variable Naming Conventions

Variables in SPL routines should use mixed case to distinguish words in variable names. SPL supports underscores in variables but we avoid their use at ZFIN. This allows the reader to clearly distinguish between variables and column names.

There is also a more subtle reason for this standard: SPL allows you to define a variable with exactly the same name as a database column. SPL does not issue any warnings when you do this. This can lead to unexpected results because it is not clear in SQL statements if the column or variable is being used. Using mixed case instead of underscores in SPL variable names prevents this from ever happening.

SPL Variable Data Types

Variables in SPL routines that hold data from database columns should use "like table.column_name" in the define, rather than explicitly stating the datatype. For example, this

define mrkrAbbrev like marker.mrkr_abbrev;

is better than

define mrkrAbbrev varchar(40);

Using "like" makes the code much easier to maintain when the definitions of columns change.

SPL Modularity and Defined Interfaces

SPL Defined Interfaces

Every SPL routine should have a defined interface. The defined interface for an SPL routine occurs as comments at the top of the routine, just after the routine definition:

-- --------------------------------------------------------------------- -- A description of what the routine is for. -- -- INPUT VARS: -- var1 Description of first input variable, if any -- -- OUTPUT VARS: -- var2 Description of first output variable, if any -- -- RETURNS: -- Describe the return value for both the success and error cases. -- -- EFFECTS: -- Describe the side effects of this routine for both the success and -- error cases. Side effects include changes to the filesystem or -- database. -- ---------------------------------------------------------------------

SPL Modularity

SPL Comments

See the SQL Comments and SPL Defined Interfaces sections for standards on comments in SPL routines.

SPL Debugging Support

Unless you go through the effort to support it, debugging SPL is darn hard. Fortunately, ZFIN has already gone through the effort to support debugging SPL. The state of the art for debugging SPL and comments describing how it works can be found in regen_names.

All complicated SPL routines should include the debugging features described in regen_names.

Testing Database Scripts

Any script that inserts, deletes, or updates data in the database should be thoroughly tested before running in production. This includes custom test queries that are specific to that script, as well as running two standard testing scripts afterwards:

  1. Run validatedata.pl with the daily, weekly, and orphan options. See the makefile for details. This will detect if the script has broken any of ZFIN's regular validation checks. Check the results to see if any reported item could come from the script you just ran.

  2. Run scrubscan.pl to scan character columns for illigal characters. See the makefile for how to do this. Check the results to see if any reported item could have come from the script you just ran.

Temporary Tables

Temporary tables are widely used at ZFIN. This section describes general best practices for creating and using temp tables.

Temporary Table Creation

See the Temporary Tables in App Pages section for additional information on that topic.

There are two ways to create temp tables:

  1. Use a CREATE TEMP TABLE statement. This is nominally more work then using a SELECT INTO TEMP, but it does avoid all the issues with constants described below, and it makes the code easier to read. An example table definition is:
    CREATE TEMP TABLE tmp_table1 ( col1 varchar(50), col2 int, col3 varchar(10) ) WITH NO LOG;
    Favor create temp table statements over select into temp statements.
  2. Use a SELECT INTO TEMP statement. This statement both creates and populates the temp table. The column names in the temp table are the same as the column names in the select statement, unless a new column name is provided in the query. Expressions must be assigned explicit column names.

    The datatypes of the columns in the temp table are the same as the datatypes of the columns and expressions in the select statement. This is generally what we want, but it can get us into trouble when constants or expressions are in the select list, particularly with character data when the constant is a placeholder for values that will be provided later, such as ZDB IDs.

    ZFIN has a general policy that character data less than 255 characters long is stored in VARCHAR columns rather than CHAR columns. In addition, ZFIN has a function, scrub_char(), and a cron job, scrubscan.pl, that we use to trim/report leading and trailing blanks in all our VARCHAR columns.

    SELECT INTO TEMP statements are the leading cause of trailing blanks in VARCHAR columns at ZFIN, particularly on ZDB ID columns. Here's why, and how to avoid them.

    The statement

    SELECT '123456789 123456789 123456789 123456789 123456789 ' zdb_id_col, other_column(s) FROM some_table(s) INTO TEMP temp_table WITH NO LOG;

    creates a 2 column table. The first column is destined to hold a ZDB ID. However, it is created as CHAR(50), which means that when we populate it with ZDB IDs they will have trailing spaces. If we forget to do a trim() or a scrub_char() (and we will) then the ZDB IDs will show up in the database with trailing spaces, will be reported by the nightly cron job, and will need to be manually fixed.

    This statement creates the column as VARCHAR(50):

    SELECT ' 123456789 123456789 123456789 123456789 123456789'::varchar(50) zdb_id_col, other_column(s) FROM some_table(s) INTO TEMP temp_table WITH NO LOG;

    The ZDB IDs will not have trailing spaces when set in this temp table, and therefore will not have trailing spaces when inserted into permanent tables.

    NOTE: There are several other possibilities that are simpler, but that do not work. In particular, these 2 approaches fail:

    SELECT '1'::varchar(50) zdb_id_col, other_column(s) FROM some_table(s) INTO TEMP temp_table WITH NO LOG; SELECT null::varchar(50) zdb_id_col, other_column(s) FROM some_table(s) INTO TEMP temp_table WITH NO LOG;

    Why? In both cases Informix creates the column as VARCHAR(1). Why? Nobody knows.

With No Log, Indexes, and Constraints

Temp tables should always be created using a "with no log" clause at the end of the table definition. This suppresses logging for the temporary table.

Also note that temp tables can have indexes and constraints as well.

Web Datablade App Pages

Web Datablade app pages, also known as just "the Web Datablade" or "app pages", were the initial motivation for drawing up these standards. The majority of ZFIN's dynamic pages have used the Web Datablade since ZFIN was started. The Web Datablade is an IBM Informix product for tying together the web and Informix databases. It uses an extra layer of markup language/tags embedded in the HTML. The Web Datablade tags are interpreted at run time by a module running inside Apache. Pages that contain the Web Datablade tags are called "app pages".

The syntax of app pages is not particularly pretty, but it is easy to understand. What is less easy to grasp are the subtleties of getting app pages to scale to a web site of our size. We have not done a particularly good job in the past at creating readable and maintainable app pages.

App Page Naming Conventions

App Page Directories

All app pages are stored under the ZFIN_WWW/home/ZFIN/APP_PAGES directory. That directory contains many subdirectories, each covering a different subject area or infrastructure function in the ZFIN web site. All app pages related to a particular topic are stored in that topic's directory. Some directories also have subdirectories to handle particular parts of that area.

Subject area directories, such as gene_files or personbrowser, cover specific subject areas in the web site such as genes or people. Infrastructure directories, such as navigation_files or security_files contain files that are used across many subject areas of the web site.

Directory names should accurately reflect their contents. This makes it easy to know where to look to see if an app page already exists for a particular purpose.

ZFIN does not have a standard on capitalization or underscores in directory names.

App Page File Names

First some general app page file name standards

  1. An indication of the app page's subject area and/or function should be included in the filename. In addition to making it obvious what the file does, this also tells you what directory the file resides in.
  2. File names must be 37 characters or less in length, including the .apg extension. Maximally descriptive filenames are encouraged within that limit.
  3. App pages that display detailed information about an individual object (such as a gene or a person) should have the word "view" in their file name.
  4. App pages that contain query forms should have the word "select" in their file name.
  5. ZFIN does not have a standard on capitalization or underscores in app page file names.

App Page Variable Names

The most significant fact about app page variables is that all app page variables are global in scope. No matter how you WebExplode another app page (there are several ways), the called app page and the calling app page share the same variable name space. If you modify, create, or destroy a variable in either page then the other page knows about it as well.

Thus we need to take great care we need to take great care when naming app page variables so that

  1. We can tell what app page a variable came from.
  2. We avoid accidental variable name collisions between calling and called app pages.

App Page Variable Name Prefix

The main way we address these problems is that every app page has a unique 2 to 10 character variable name prefix associated with it, and every variable that is used in that page starts with that prefix followed by an underscore, followed by a descriptive name of the variable.

For example, the prefix for xpatselect.apg is xpatsel_. All variables in this file should (but currently don't) begin with this prefix.

There is another type of global variable - project global. In Java they would be Protected variables. These variables are shared between a top-level page and subroutine page(s). Either page may change the value and explicitly renaming the variable to pass into the subroutine page is not required. Subroutines list the global variables under INPUT VARS that it uses or modifies.

All project global variables should append G_ to the top-page prefix. For example, xpatselect.apg global variables would begin with the prefix xpatsel_G_ . Thus search by prefix will return all file specific variables.

We discussed if we should treat app page variables that are visible in URLs any different. We decided that we should not. They too should use app page prefixes in their name.

Finally, we talked about what to do with "global" variable names like $OID or any variable name that exists in secure_navigation.apg. We decide to leave these variables as they are, at least for the time being, as it is just too big a task to convert them.

App Page Database Column Variables

Other than using app page prefixes, ZFIN does not have a standard for naming variables that either came from a database table or are destined for a database table.

We discussed the possibility of naming such variables $apgprfx_database_column_name, where database_column_name is the exact name of the column in the database. This is not a bad standard, and you are free to follow it, but there are many situations where you don't want to use it. For example, you might have a page that deals with genes and their associated ESTs. The ZDB IDs for both come from the mrkr_zdb_id column in the marker table. There are two problems here:

  1. They can't both be called $apgprfx_mrkr_zdb_id. One of them will have to be called something else, like $apgprfx_est_zdb_id.
  2. Naming the variable that holds the gene ZDB ID $apgprfx_mrkr_zdb_id is just not as helpful as naming it $apgprfx_gene_zdb_id. The variable will never hold anything other than a gene ZDB ID. Using gene in the name instead of mrkr means the reader need to keep one less fact in their head.

Pick database column variable names that make the code the most readable.

App Page $1, $2, ... Variables

When query results are initially returned to an app page from an SQL statement, they are placed in variables named $1 through $n where n is the number of columns returned by the query. You should always assign $n variables to variables with standard names immediately after the SQL statement that produced them.

Exceptions: You don't have to do this if all these conditions are met:

  1. There are 4 or fewer columns in the query result. And
  2. The $n variables are used only in the 1 line immediately following the SQL query.

Nothing is less clear than using a variable named $1 twenty lines after an SQL query. $1 is also in danger of being stomped on if we insert any SQL between the SQL that originally populated $1 and when $1 gets used.

App Page Boolean Variable Names

Boolean variable names should clearly communicate that they are boolean variables. For example, apgprfx_fishIsMutant and apgprfx_idsAreEqual are preferred over apgprfx_mutant and apgprfx_same. Use "is" and "are" in boolean variable names.

Also see the section on app page boolean variable usage.

App Page Modularity and Defined Interfaces

Some significant facts about app pages:

All of these facts have conspired to produce some pretty heinous app pages at ZFIN. We underutilize the WebExplode() function, resulting in 20 and 30 page long app pages. When we do use WebExplode() we have had such poor naming conventions for app page variables that we have no way to tell where a variable might have come from. Better use of WebExplode, naming conventions, and well defined interfaces will all help address these problems.

Top Level versus Subroutine App Pages

When discussing app pages, it is useful to classify them by how they are called:

  1. Top level app pages: These pages are called directly via HTTP and can show up in URLs. For example, xpatselect.apg is called directly from the home page.
  2. Subroutine app pages: These pages are WebExploded by other pages. These are building block pages, and can be be further classified:
    1. Subroutine app pages that exist for code reuse: These are utility routines that are useful in many different app pages. They can be either in a subject area (for example, an app page to show a short reference to a publication), or in an infrastructure function directory (for example, to check a user's permissions).
    2. Subroutine app pages that exist solely to make a single app page more readable: For example, xpatselectquery.apg is called only from xpatselect.apg. It contains query processing logic that was formerly in xpatselect.apg. Pulling it into its own file made xpatselect.apg much easier to read. These app pages should be named so that it is obvious what other app page they are called from.
    The distinction between the two types of subroutine app pages is blurry. This is a good thing: We could easily convert xpatselectquery.apg so it could be called by any app page that wants to query expression patterns (maybe it supports this already).

An app page should be either a top level app page, or a subroutine app page, but never both.

When to Create Subroutine App Pages

You should create a subroutine app page in any of these situations:

Subroutine App Pages versus Stored Procedures

An alternative to placing code/subroutines in app pages is to place the code in a Stored Procedure Language (SPL) routine inside the database. Here are some guidelines on when you might choose which technology.

However, there are some restrictions on what can be done in SPLs that are called from WebExploded app pages. In particular, you can run any DDL, other than to create temporary tables, and you also may not be able to insert or delete rows from tables.

App Page Defined Interfaces

Every piece of code we have at ZFIN, including app pages, should have a defined interface. Every app page must have this comment block at the top. If you make a non-trivial change to a page that does not have a defined interface comment block at the top then you must add one and document the parts of the interface that you are changing.

<?MICOMMENT> FILE: filename.apg PREFIX: apgprfx_ A description of what the page is for. INPUT VARS: apgprfx_var1 Description of first input variable apgprfx_var2 Description of first input variable ... OUTPUT VARS: apgprfx_varx Description of first output variable, if there is one. ... OUTPUT: If this is app page produces a chunk of HTML then this describes the HTML it produces. EFFECTS: What if anything changed in the filesystem or in the database. Basically any changes not covered by OUTPUT VARS or OUTPUT. DEBUGGING: Any useful advice on how to debug this page, including what app page debugging variables it supports, and any particularly useful test cases. <?/MICOMMENT>

If your revision does not involve exhaustively researching the page's interface then you don't have to fully document the interface. For any section that is not fully documented, insert the statement

SECTION_NAME NOT FULLY DOCUMENTED.
Additional Explanation
INPUT VARS Lists all input variables to this page. If the existence of a variable is optional, then this also describes what the non-existence of a variable means.
OUTPUT VARS Descriptions of each output variable. If existence is optional then includes a description of what non-existence means. If this app page changes the existence or values of any input variables then those variables must be listed as output variables as well.
Note that because of the global scope of app page variables that any variable created in this app page is in fact visible to the caller. However, only variables in the OUTPUT VARS list should be referenced by the calling page.
OUTPUT Many app pages, and all top level app pages, produce HTML as output. This is a description of that HTML. This is a description of the $1 value returned by WebExplode().

App Page Variables

This section covers how to use app page variables. Naming conventions for app page variables are covered elsewhere.

App Page Boolean Variables

The Web Datablade does not have strongly typed variables. There is no boolean type in the Web Datablade. However, like Perl, C/C++ and Java, it does have a definition of True and False. Anything that is 0 is False, and anything that is not 0 is True. This differs from Informix itself which returns 'f' for false and 't' for true.

ZFIN uses a wide variety of conventions to represent boolean values in its app pages. This includes the built in 0/not 0 distinction, a plethora of t/f, T/F, True/False, etc. variations, and the existence or non-existence of a variable.

It would be nice if we could settle on a single way to represent booleans. However, because of the mismatch between Web DataBlade and the Informix DBMS, and because sometimes we have to use existence/non-existence, I don't think it is worth the effort. However, I think we should reduce the number of boolean representations down to these 3:

  1. 0 for false, not 0 for true. This is the representation Web DataBlade uses. Example:

    <?MIVAR NAME=$prfx_geneIsUnchanged>$(EQ,$prfx_currGeneZdbId,$prfx_prevGeneZdbId)<?/MIVAR> ... <?MIBLOCK COND=$prfx_geneIsUnchanged>
  2. 'f' for false, 't' for true. This is the representation boolean columns in Informix have. Example:

    <?MIVAR NAME=$prfx_geneIsUnchanged>f<?/MIVAR> <?MIVAR COND=$(EQ,$prfx_currGeneZdbId,$prfx_prevGeneZdbId) NAME=$prfx_geneIsUnchanged>t<?/MIVAR> ... <?MIBLOCK COND=$(EQ,$prfx_geneIsUnchanged,t)>
    We hoped that their might be an Informix environment variable that we could define to automatically translate boolean values into 0/1 so that we could at least drop this option. However, no such environment variable exists.
  3. Not exists for false, exists for true. We are forced to support this option with checkboxes, and it is darn convenient to use as well. Example:

    <?MIVAR COND=$(EQ,$prfx_currGeneZdbId,$prfx_prevGeneZdbId) NAME=$prfx_geneIsUnchanged><?/MIVAR> ... <?MIBLOCK COND=$(XST,$prfx_geneIsUnchanged)>

App Page Variable Existence and Non-Existence

In app pages we can use the existence or non-existence of a variable to communicate information. ZFIN does not have a standard on using the existence or non-existence of a variable to communicate information.

Note that when using checkboxes, existence/non-existence is the only way to communicate if the box was checked or not.

Also note that subroutine app pages must take care when using app page variable existence/non-existence internally. Subroutine app pages can be called multiple times within one transaction, and any variable created or destroyed on a previous call will be in that state in subsequent calls to that app page in the current transaction.

App Page Vector Variables

App pages support arrays in the form of vector variables. The use of vector variables is encouraged wherever applicable. A known problem with vector variables is that they can't be explicitly passed to subroutine app pages in WebExplode (they can be implicitly passed). Also, you can't use a vector variable in a URL.

Web Datablade Tags and Functions

This section describes some best practices for how to actually use the Web Datablade tags and functions (i.e., the Web Datablade markup language) in app pages.

MIBLOCK and MIELSE versus MIVAR

You can use a COND clause with either MIBLOCK or MIVAR tags to control the execution of blocks of code in app pages. Here are the advantages and disadvantages of each.

If your code is logically using an "if-then-else" structure then use MIBLOCK followed by MIELSEs. This makes the structure of the code clearer.

If your code is logically using an "if-then" structure then you can use either MIBLOCK or MIVAR around the code. If you use MIBLOCK and the block of code contains app page variables then you will need at least one additional MIVAR tag around those variables.

MIVAR versus $(SETVAR)

You have to use the MIVAR tag to create and set the initial value of a variable, but you can use either the MIVAR tag or a $(SETVAR) function to update the variable. ZFIN has no standard on using MIVAR or $(SETVAR). Use whichever one you want.

App Page Comments

See the general comments section for guidelines on using comments in all language. This section covers issues specific to app pages.

The most important comments in an app page are the defined interface comments. See the App Page Defined Interfaces section for details on this.

MICOMMENT versus HTML Comments

HTML and Web Datablade both have comment tags. HTML comments are left in the page by the Web Datablade and are sent out to the world. Web Datablade comments, which are wrapped in MICOMMENT tags, are stripped out by the Web Datablade and are not sent out to the world.

Use MICOMMENTs rather than HTML comments in app pages. There is no reason to be sending out comments to the world. If we need to communicate something to our users, they should see that in the displayed page. They should not have to look at the source code.

MICOMMENTs have the disadvantage that they don't stand out visually in the code as much as HTML comments. Therefore, any MICOMMENTs after the app page defined interface comment should have 3 leading asterisks on each line of the comment. Single line comments should also have 3 trailing asterisks.

For example:

... <?MIVAR NAME=$prfx_geneIsUnchanged>$(EQ,$prfx_currGeneZdbId,$prfx_prevGeneZdbId)<?/MIVAR> <?MICOMMENT> *** A single line comment *** <?/MICOMMENT> <?MIVAR NAME=$prfx_targetTable>marker<?/MIVAR> <?MIVAR NAME=$prfx_pageTitle>ZFIN Listing<?/MIVAR> <?MICOMMENT> *** A multi line comment *** spanning, of all things, *** multiple lines <?/MICOMMENT>

App Page Closing Comments

Any closing Web Datablade tag that is more than 50 lines from its corresponding starting tag should have a comment on the closing tag indicating what it is closing. For example,

<?MIBLOCK COND=$prfx_authorWasSpecified> ... lots of code here ... <?/MIBLOCK> <?MICOMMENT> *** end author was specified *** <?/MICOMMENT>

App Page Section Separator Comments

Since app pages tend to be long, section separator comments can help make them almost manageable. Here is one possibility for their format.

<?MICOMMENT> ============================================================================ ========== DESCRIPTION OF THE SECTION ============================================================================ Optional comments elaborating on what is going on here. <?/MICOMMENT>

HTML in App Pages

HTML in app pages should conform to the general HTML standards for static pages.

In addition, whenever possible, app page tags and functions and HTML tags should be correctly nested. In other words, it is bad practice to open an HTML tag, then open an app page tag, then close the HTML tag, and then close the app page tag. For example, don't do this:

<td> <?MIVAR> Name: $prfx_mrkr_name </td> <?/MIVAR>

The Web Datablade will accept this nesting, it is just hard to read. In most cases it is trivial to convert code to use a nesting that makes sense. Do this instead:

<td> <?MIVAR> Name: $prfx_mrkr_name <?/MIVAR> </td>

In some very complex cases it is not possible to do the correct nesting. If you find yourself in such a case, then the design of the code may be too complex.

JavaScript in App Pages

This section discusses issues that are specific to using JavaScript in app pages. See the JavaScript section for general guidelines on writing JavaScript.

Kevin will write this section.

Temporary Tables in App Pages

The ZFIN web site makes use of temporary tables in some of its web pages. However, starting with Informix 9.4, extreme care must be taken when using temporary tables in app pages that are web exploded by other app pages. In particular, you can still create and use temporary tables in such app pages, but you can no longer drop them. (See the discussion of MIqry2pass variable in the Web Datablade Application Developer's Guide for more information.)

We first tried using permanent tables and Informix session IDs to keep the different invocations of an app page separate. However, this approach got tripped up on -244 errors when the table was under high use.

Our second approach was more successful. We went back to using temporary tables, but we now include the session ID in the temp table name. Before creating a temp table, the code now checks if it already exists by calling the table_exists() database function. The app page also deletes all records from the table before returning.

App page specific tables are named

  app_page_name_temp_description_sessionid

Like other ZFIN tables, each table has a unique prefix.

We do not have foreign key or other constraints on these tables. We don't think it is worth the overhead for records that will exist for less than a second.

Some example code that creates a temp table for the markerselect.apg page. First, the code that uses the session ID to set the table name.

<?MISQL SQL=" select dbinfo('sessionid') from single;"> <?MIVAR NAME=$mrkrsel_sessId>$1<?/MIVAR> <?MIVAR NAME=$markerselect_temp_match>$(CONCAT,markerselect_temp_match_,$mrkrsel_sessId)<?/MIVAR> <?/MISQL>

Then the code that determines if the table already exists, and creates it if it does not:

<?MISQL SQL=" execute function table_exists('$markerselect_temp_match')"> <?MISQL COND=$(EC,$1,f) SQL=" create temp table $markerselect_temp_match ( mstmatch_zdb_id varchar(50), mstmatch_name varchar(255), mstmatch_significance integer, mstmatch_precedence varchar(80) ) with NO LOG;"> <?/MISQL> <?/MISQL>

And finally code to remove all records from the temp table before leaving the page.

<?MISQL COND="$(OR,$(XST,$input_name),$(XST,$input_acc))" SQL=" delete from $markerselect_temp_match;"> <?/MISQL>

An alternative would be to just leave the data in the temp table and then delete it the next time the page is invoked and the code finds that the table already exists.

App Page Debugging Support

ZFIN uses a standardized set of debug-related app page variables to turn debugging print statements on or off.
Standard App Page Debugging Variables
$debug_sql Complex dynamically constructed SQL statements will be displayed.
$debug_timing How long it takes each part of a page to run will be displayed. This code makes extensive use of the get_time() function.
$debug_flow Used to display flow of control information, such as "Decided fish is a wildtype". Often this can be inferred from the displays produced by the other debug variables, but sometimes it is clearer to have explicit flow of control debug information.

These are enabled by defining the variables at the top of the page, or for some pages, they can also be passed in with the URL.

Be careful to strike a balance between keeping the code readable and fast (fewer debug variable tests and print statements) and making standard debugging useful (more debug variable tests and print statements).

We discussed more sophisticated methods for enabling debugging across all pages. (These included defining flags inside the database, or using the web datablade to define the variables across all pages.) We decided to try the simple approach first. If it irritates us enough, we will adopt more sophisticated techniques.

We also discussed using debug levels instead of or in addition to debug categories. We decided that categories gave us better control than levels for the same amount of work, and that adding levels to categories was more work than it was worth.

CGI

Most of ZFIN's dynamic pages use the Web Datablade app pages, but some of our dynamic pages use CGI. Someday, this section will describe standards that apply to all CGIs, no matter what language they are written in.

Each language that CGI pages are written in also has additional standards that apply to writing CGI. See

PERL

Use /private/bin/perl in all ZFIN PERL scripts. We do this for a couple of reasons:

PERL Taint

When using perl, especially as a CGI, always use the taint -T flag and properly untaint all input before using it.