转: Comparison of different SQL implementations (1) [私人]

[ 2008-07-12 12:38:07 | 作者: progame ]
文字大小: | |
原文地址

Comparison of different SQL implementations

The goal of this page — which is a work in progress — is to gather information relevant for people who are porting SQL from one product to another and/or are interested in possibilities and limits of 'cross-product' SQL.

The following tables compare how different DBMS products handle various SQL (and related) features. If possible, the tables also state how the implementations should do things, according to the SQL standard.

I will only write about subjects that I've worked with personally, or subjects which I anticipate to find use for in the near future. Subjects on which there are no significant implementation variances are not covered. Beta-versions of software are not examined.

I'm sorry about the colors. They are a result of wanting to mark each DBMS differently and at the same time wanting to be relatively nice to printers.

If you have corrections or suggestions, please contact me; even notifications about spelling errors are welcome.

Legend, definitions, and notes

The following SQL standard and implementations have been examined, if not otherwise stated:

Standard The latest official version of SQL is SQL:2003.

I don't have access to the official ISO standard text, but Whitemarsh Information Systems Corporation provides a rather final draft as a zip-archive, containing several files. Most important to this page is the file 5WD-02-Foundation-2003-09.pdf.

Not many books decribe SQL:2003; the only book currently covering the subject is in German which I was never any good at. Therefore, I also use the following book as reference:
Jim Melton and Alan Simon: SQL:1999—Understanding Relational Language Components (ISBN 1-55860-456-1).

PostgreSQL PostgreSQL 8.2.0 on Fedora Core Linux.
Documentation
DB2 DB2 Express-C v. 9.1 on Fedora Core Linux. Note that there are differences between various DB2 flavors; this page is about DB2 for "LUW" (Linux/Unix/Windows).
Documentation (takes a while to render properly)
MS SQL Server MS SQL Server 2005 on Windows XP. Microsoft's SQL implementation is sometimes named Transact-SQL, or TSQL. In this document, I'll generally write MSSQL as a short-hand for Microsoft's SQL Server product.
Documentation (takes a while to render properly)
MySQL MySQL Database Server 5.0.18 on Fedora Core Linux (i.e. MySQL AB's "classic" DBMS product—not MaxDB).
Documentation
Oracle Oracle Database 10g Release 2 Standard Edition on Fedora Core Linux. The tables should hold for version 9i, as well.
Documentation

The products are running with their default settings. This is important for MySQL and MSSQL: Their interpretation of SQL may be changed rather drastically by adjusting certain configuration options, potentially increasing the level of standard compliance. However, such non-default configuration options are not of great value for people writing SQL applications because the developer often cannot rely on non-default configuration settings.

Features

Views

Standard Views are part of the standard, and they may be updated, as long as it 'makes sense'.

SQL:2003 has a rather complicated set of rules governing when a view is updatable, basically saying that a view is updatable, as long as the update-operation translates into an unambiguous change.

SQL-92 was more restrictive, specifying that updatable views cannot be derived from more than one base table.

PostgreSQL Has views. Breaks that standard by not allowing updates to views; offers the non-standard 'rules'-system as a work-around.
DB2 Conforms to at least SQL-92.
MSSQL Conforms to at least SQL-92.
MySQL Conforms to at least SQL-92.
Oracle Conforms to at least SQL-92.

Join types and features

All the DBMSes support basic INNER JOINs, but vary in their support for other join types.

In the following feature chart, a yes means yes; an empty table cell means no.

Join type/featurePostgreSQLDB2MSSQLMySQLOracle
Natural joins (only tested: NATURAL LEFT JOIN) yes     yes yes
USING-clause yes     yes yes
FULL joins1 (tested: SELECT...FULL JOIN...ON...=...) yes yes yes yes yes
Explicit CROSS JOIN (cartesian product) yes   yes yes yes

Remarks:

  1. Note that FULL joins may be emulated with a union of a left and a right join.

The SELECT statement

Ordering result sets

Standard The SQL-standard states that relations are unordered, but result sets may be ordered when returned to the user through a cursor:

DECLARE cursorname CURSOR FOR
  SELECT ... FROM ... WHERE ...
  ORDER BY column_name1,column_name2,...

As such, the standard doesn't allow ORDER BY anywhere else than in cursor declarations. Special exceptions exist, such as the ORDER BY part of window functions (including ROW_NUMBER() OVER... and RANK() OVER...).

The standard doesn't specify how NULLs should be ordered in comparison with non-NULL values, except that NULLs are to be considered equal in the ordering, and that NULLs should sort either above or below all non-NULL values.

PostgreSQL As well as in cursor definitions, it allows ORDER BY in other contexts. NULLs are considered higher than any non-NULL value.
DB2 As well as in cursor definitions, it allows ORDER BY in other contexts. NULLs are considered higher than any non-NULL value.
MSSQL As well as in cursor definitions, it allows ORDER BY in other contexts. NULLs are considered lower than any non-NULL value.
MySQL As well as in cursor definitions, it allows ORDER BY in other contexts.

NULLs are considered lower than any non-NULL value, except if a - (minus) character is added before the column name and ASC is changed to DESC, or DESC to ASC; this minus-before-column-name feature seems undocumented.

Oracle As well as in cursor definitions, it allows ORDER BY in other contexts.

By default, NULLs are considered higher than any non-NULL value; however, this sorting behaviour may be changed by adding NULLS FIRST or NULLS LAST to the ORDER BY expression.

Beware of Oracle's strange treatment of empty strings and NULLs as the same 'value'.

Limiting result sets

Simple limit

Objective: Want to only get n rows in the result set. Usually only makes sense in connection with an ORDER BY expression.

Note: This is not the same as a top-n query — see next section.

Note also: Some of the queries below may not be legal in all situations, such as in views or sub-queries.

Standard Non-core Feature ID T611 specifies window functions, of which one is ROW_NUMBER() OVER:
 
SELECT * FROM (
  SELECT
    ROW_NUMBER() OVER (ORDER BY key ASC) AS rownumber,
    columns
  FROM tablename
) AS foo
WHERE rownumber <= n

If your application is stateful (in contrast to web applications which normally have to be seen as stateless), then you might look at cursors (core feature ID E121) instead. This involves:

  • DECLARE cursor-name CURSOR FOR ...
  • OPEN cursor-name
  • FETCH ...
  • CLOSE cursor-name
PostgreSQL Doesn't support ROW_NUMBER(). Supports cursors (in all contexts, not only in embedded, dynamic SQL).

Alternative to using ROW_NUMBER():

SELECT columns
FROM tablename
ORDER BY key ASC
LIMIT n

Note that LIMIT changes the semantics of SELECT...FOR UPDATE.

DB2 Supports both standards-based approaches. When doing casual work, it's often easier to use DB2's non-standard SELECT ... FETCH FIRST n ROWS ONLY construct.
  • OLAP functions
  • FETCH FIRST (general page about the SELECT statement; use your browser's search function to locate FETCH FIRST)
MSSQL Supports both standards-based approaches.

MSSQL 2000 didn't support ROW_NUMBER(). Instead, a MSSQL 2000-specific syntax was needed:
SELECT TOP n columns
FROM tablename
ORDER BY key ASC

The TOP construct is still available in MSSQL 2005, and it's handy for casual SQL work.

MySQL Doesn't support the standard. Alternative solution:

SELECT columns
FROM tablename
ORDER BY key ASC
LIMIT n

Oracle Supports ROW_NUMBER. Seems to have non-compliant cursor facilities.

As Oracle doesn't allow AS for subquery naming (and doesn't need a subquery-name at all in this case), the standard SQL code above needs to be rewritten slightly:

SELECT * FROM (
  SELECT
    ROW_NUMBER() OVER (ORDER BY key ASC) AS rownumber,
    columns
  FROM tablename
)
WHERE rownumber <= n

Top-n query

Objective: Like the simple limit-query above, but include rows with tie conditions. Thus, the query may return more than n rows.

Some call this a quota-query.

The following examples are based on this table:

SELECT * FROM person ORDER BY age ASC;
+----------+-------------+-----+
|PERSON_ID | PERSON_NAME | AGE |
+----------+-------------+-----+
|        7 | Hilda       |  12 |
|        8 | Bill        |  12 |
|        4 | Joe         |  23 |
|        2 | Veronica    |  23 |
|        3 | Michael     |  27 |
|        9 | Marianne    |  27 |
|        1 | Ben         |  50 |
|       10 | Michelle    |  50 |
|        5 | Irene       |  77 |
|        6 | Vivian      |  77 |
+----------+-------------+-----+

Now, we only want the three (n=3) youngest persons displayed, i.e. a result set like this:

+----------+-------------+-----+
|PERSON_ID | PERSON_NAME | AGE |
+----------+-------------+-----+
|        7 | Hilda       |  12 |
|        8 | Bill        |  12 |
|        4 | Joe         |  23 |
|        2 | Veronica    |  23 |
+----------+-------------+-----+
Standard With standard SQL, there are two principal ways to obtain the wanted data:
  • The fast variant:

    One of the major additions in SQL:2003 is the addition of non-core (i.e. optional) OLAP (online analytic processing) features. If the DBMS supports elementary OLAP (feature ID F611), then the top-n query may be formulated using a window function, such as RANK() OVER:

    SELECT * FROM (
      SELECT
        RANK() OVER (ORDER BY age ASC) AS ranking,
        person_id,
        person_name,
        age
      FROM person
    ) AS foo
    WHERE ranking <= 3

    (Change ASC to DESC in the position marked like this in order to get a top-3 oldest query instead.)

  • The slow variant:

    If the DBMS doesn't support the elementary OLAP features, then the top-n solution may be obtained in an alternative way which so slow that it's not a real option in most situations:

    Correlated subquery method, mentioned in the book Practical Issues in Database Management (chapter 9: Quota Queries) by Fabian Pascal (who, again, quotes Date for the solution):

    SELECT * FROM person AS px
    WHERE (
      SELECT COUNT(*)
      FROM person AS py
      WHERE py.age < px.age
    ) < 3

    The query may make more sense if the objective is re-phrased as "Find all persons (px) such that the number of younger, other persons (py) is less than 3".

    (Change < to > in the position marked like this in order to get a top-3 oldest query instead.)

In the article Going To Extremes by Joe Celko, there is a description of yet another principle for performing quota queries, using scalar subqueries. Scalar subqueries are more tedious to write but might yield better performance on your system.

PostgreSQL Supports the slow standard SQL query variant. In practice, a PostgreSQL-only method should be used, in order to obtain acceptable query performance.

SELECT *
FROM person
WHERE (
  age <= (
    SELECT age FROM person
    ORDER BY age ASC
    LIMIT 1 OFFSET 2       -- 2=n-1
  )
) IS NOT FALSE

(Change <= to >= and ASC to DESC in the positions marked like this in order to get a top-3 oldest query instead.)

DB2 Supports the fast standard SQL variant.
MSSQL Supports the fast standard SQL variant.

MSSQL 2000 supported the slow standard SQL variant. In practice, a MSSQL-only expression had to be used, in order to obtain acceptable query performance:
SELECT TOP 3 WITH TIES *
FROM person
ORDER BY age ASC

(Change ASC to DESC in the position marked like this in order to get a top-3 oldest query instead.)

MySQL Supports the slow standard SQL solution. In practice, this MySQL-specific solution should be used, in order to obtain acceptable query performance:

SELECT *
FROM person
WHERE age <= COALESCE( -- note: no space between "COALESCE" and opening parenthesis
  (
    SELECT age
    FROM person
    ORDER BY age ASC
    LIMIT 1 OFFSET 2    -- 2=n-1
  ),
  (
    SELECT MAX(age)
    FROM person
  )
)

(Change <= to >= and ASC to DESC and MAX to MIN in the positions marked like this in order to get a top-3 oldest query instead.)

The offset-value 2 is the result of n-1 (remember: n is 3 in these examples).

The second argument to the COALESCE call makes the query work in cases where the cardinality of the table is lower than n.

Oracle Supports the fast standard SQL variant. However, as Oracle doesn't like "AS ..." after subqueries (and doesn't require naming of subqueries), the query has to be paraphrased slightly:

SELECT * FROM (
  SELECT
    RANK() OVER (ORDER BY age ASC) AS ranking,
    person_id,
    person_name,
    age
  FROM person
)
WHERE ranking <= 3

(Change ASC to DESC in the position marked like this in order to get a top-3 oldest query instead.)

Limit—with offset

Objective: Want to only get n rows in the result set, and we want the first skip rows in the result set discarded. Usually only makes sense in connection with an ORDER BY expression.

In the recipes below, basic ordering is ASCending, i.e. lowest-first queries. If you want the opposite, then change ASC->DESC and DESC->ASC at the places emphasized like this.

Standard Non-core Feature ID T611 specifies window functions, one of which is ROW_NUMBER() OVER:
 
SELECT * FROM (
  SELECT
    ROW_NUMBER() OVER (ORDER BY key ASC) AS rownum,
    columns
  FROM tablename
) AS foo
WHERE rownum > skip AND rownum <= (n+skip)

Alternatively, you may use a cursor (core feature ID E121), if the programming environment permits it. This involves:

  • DECLARE cursor-name CURSOR FOR ...
  • OPEN cursor-name
  • FETCH RELATIVE number-of-rows-to-skip ...
  • CLOSE cursor-name
PostgreSQL Doesn't support ROW_NUMBER(). Supports cursors.

Alternative to ROW_NUMBER():

SELECT columns
FROM tablename
ORDER BY key ASC
LIMIT n OFFSET skip

DB2 Supports the window function based approach.

Regarding cursors: DB2 for Linux/Unix/Windows doesn't support FETCH RELATIVE (which is strange, because DB2 for z/OS seems to support it). Instead, see if the DB2 driver for your programming environment supports SQLFetchScroll().

Documentation: OLAP functions, the FETCH statement.

MSSQL Supports both standard approaches.

MSSQL 2000 didn't support ROW_NUMBER(); instead, a MSSQL-specific syntax had to be used:
SELECT * FROM (
  SELECT TOP n * FROM (
    SELECT TOP z columns      -- (z=n+skip)
    FROM tablename
    ORDER BY key ASC
  ) AS FOO ORDER BY key DESC -- ('FOO' may be anything)
) AS BAR ORDER BY key ASC    -- ('BAR' may be anything)

MySQL Doesn't support the standard approaches. Alternative solution:
 
SELECT columns
FROM tablename
ORDER BY key ASC
LIMIT n OFFSET skip

In older versions of MySQL, the LIMIT-syntax is less clear:
... LIMIT [skip,] n
(i.e. the skip argument is optional).
The old syntax is still supported by later MySQL versions, as the old syntax is widely used.

Oracle Supports ROW_NUMBER(). Oracle's cursor support doesn't look standards-compliant.

As Oracle doesn't accept AS for subquery naming (and doesn't require naming of subqueries in this case), the standard SQL solution has to be re-written slightly. An other reason for the re-write is that ROWNUM is a reserved word in Oracle, with special meaning. The Oracle code becomes:

SELECT * FROM (
  SELECT
    ROW_NUMBER() OVER (ORDER BY key ASC) AS rn,
    columns
  FROM tablename
)
WHERE rn > skip AND rn <= (n+skip)

Note:

LIMIT/TOP/FIRST queries with offset are often used in a result presentation context: To retrieve only—say—30 rows at a time so that the end-user isn't overwhelmed by the complete result set, but instead is offered a paginated result presentation. In this case, be careful not to (only) sort on a non-unique column.

Consider the following example (where PostgreSQL is used):

SELECT * FROM person ORDER BY age ASC;
 person_id | person_name | age
-----------+-------------+-----
         7 | Hilda       |  12
         8 | Bill        |  12
         4 | Joe         |  23
         2 | Veronica    |  23
         3 | Michael     |  27
         9 | Marianne    |  27
         1 | Ben         |  50
        10 | Michelle    |  50
         5 | Irene       |  77
         6 | Vivian      |  77

When ordering is performed on the non-unique age-value, ties may occur and it's not guaranteed that the DBMS will fetch the rows in the same order every time.

Instead of the above listing, the DBMS is allowed to return the following display order where Michael and Marianne are displayed in the opposite order compared to above:

SELECT * FROM person ORDER BY age ASC;
 person_id | person_name | age
-----------+-------------+-----
         7 | Hilda       |  12
         8 | Bill        |  12
         4 | Joe         |  23
         2 | Veronica    |  23
         9 | Marianne    |  27
         3 | Michael     |  27
         1 | Ben         |  50
        10 | Michelle    |  50
         5 | Irene       |  77
         6 | Vivian      |  77

Now, suppose the end-user wants the results displayed five rows at a time. The result set is fetched in two queries where the DBMS happens to sort differently, as above. We will use PostgreSQL's syntax in the example:

SELECT * FROM person ORDER BY age ASC LIMIT 5;
 person_id | person_name | age
-----------+-------------+-----
         7 | Hilda       |  12
         8 | Bill        |  12
         4 | Joe         |  23
         2 | Veronica    |  23
         3 | Michael     |  27
 
SELECT * FROM person ORDER BY age ASC LIMIT 5 OFFSET 5;
 person_id | person_name | age
-----------+-------------+-----
         3 | Michael     |  27
         1 | Ben         |  50
        10 | Michelle    |  50
         5 | Irene       |  77
         6 | Vivian      |  77

Notice that Marianne was not displayed in any of the two split result set presentations.

The problem could be avoided if the result set ordering had been done in a deterministic way, i.e. where the unique person_id value was considered in case of a tie:
SELECT * FROM person ORDER BY age ASC, person_id ASC ...
This is safer than to pray for the DBMS to behave in a predictable way when handling non-unique values.

Note: If the table is updated between parts of the result set pagination, then the user might still get an inconsistent presentation. If you want to guard against this, too, then you should see if use of an insensitive cursor is an option in your application. Use of cursors to paginate result sets usually require that your application is stateful, which is not the case in many web-application settings. Alternatively, you could let the application cache the complete result set (e.g. in a session if your web application environment provides for sessions).

The INSERT statement

Inserting several rows at a time

Standard An optional SQL feature is row value constructors (feature ID F641). One handy use of row value constructors is when inserting several rows at a time, such as:

INSERT INTO tablename
VALUES (0,'foo') , (1,'bar') , (2,'baz');

— which may be read as a shorthand for

INSERT INTO tablename VALUES (0,'foo');
INSERT INTO tablename VALUES (1,'bar');
INSERT INTO tablename VALUES (2,'baz');

PostgreSQL Supported.(since version 8.2)
DB2 Supported.
MSSQL Not supported.
MySQL Supported.
Oracle Not supported.

Data types

The BOOLEAN type

Standard The BOOLEAN type is optional (has feature ID T031), which is a bit surprising for such a basic type. However, it seems that endless discussions of how NULL is to be interpreted for a boolean value is holding BOOLEAN from becoming a core type.

The standard says that a BOOLEAN may be one of the following literals:

  • TRUE
  • FALSE
  • UNKNOWN or NULL (unless prohibited by a NOT NULL constraint)

The DBMS may interpret NULL as equivalent to UNKNOWN. It is unclear from the specification if the DBMS must support UNKNOWN, NULL or both as boolean literals. In this author's opinion, you should forget about the UNKNOWN literal in order to simplify the situation and let the normal SQL three-way logic apply.

It's defined that TRUE > FALSE (true larger than false).

PostgreSQL Follows the standard.

Accepts NULL as a boolean literal; doesn't accept UNKNOWN as a boolean literal.

DB2 Doesn't support the BOOLEAN type.
 
Judging from various JDBC-documentation, it seems that IBM recommends a CHAR(1) field constrained to values '0' and '1' (and perhaps NULL) as the way to store boolean values.
MSSQL Doesn't support the BOOLEAN type.

Possible alternative type: the BIT type which may have 0 or 1 (or NULL) as value. If you insert an integer value other than these into a field of type BIT, then the inserted value will silently be converted to 1.

Rudy Limeback has some notes about oddities with the MSSQL BIT type.

MySQL Offers a non-conforming BOOLEAN type. MySQL's BOOLEAN is one of many aliases to its TINYINT(1) type.

(Never use TINYINT(1) as the column type if you use JDBC with MySQL and expect to get non-boolean values from it.)

MySQL accepts the literals TRUE and FALSE as aliases to 1 and 0, respectively. However, you may also assign a value of — e.g. — 9 to a column of type BOOLEAN (which is non-conforming).

If you use JDBC with MySQL, then BOOLEAN is the preferred type for booleans: MySQL's JDBC-driver implicitly converts between Java's boolean and MySQL's pseudo-BOOLEAN type.

Side note: MySQL has a BIT type which may be interesting for people with enormous amounts of boolean-type data.

Oracle Doesn't support the BOOLEAN type.
 
Judging from various JDBC documentation, it seems that Oracle recommends NUMBER(1) as the way to store boolean values; it's probably wise to constrain such columns to values 0 and 1 (and perhaps NULL).

Warning to JDBC users:
According to the JDBC standard, getBoolean() must convert a SQL-'value' of NULL to the false Java value. To check if the database-value was really NULL, use wasNull().

The CHAR type

For the following section, I have used this test-SQL to try to illuminate differences (unfortunately, even standard SQL as simple as this has to be adjusted for some products):

Test steps:
CREATE TABLE chartest (
  charval1 CHAR(10) NOT NULL,
  charval2 CHAR(10) NOT NULL,
  varcharval VARCHAR(30) NOT NULL
);
INSERT INTO chartest VALUES ('aaa','aaa','aaa');
INSERT INTO chartest
  VALUES ('aaaaaa      ','aaa','aaa'); -- should truncate to 'aaaaaa    '
INSERT INTO chartest
  VALUES ('aaaaaaaaaaaa','aaa','aaa'); -- should raise error
SELECT * FROM chartest; -- should show two rows
DELETE FROM chartest WHERE charval1='aaaaaa';
SELECT * FROM chartest; -- should show one row
SELECT * FROM chartest WHERE charval1=varcharval;
SELECT charval1 || 'X' AS res FROM chartest;
SELECT CHAR_LENGTH(charval1 || charval2) AS res FROM chartest;
SELECT CHAR_LENGTH(charval1) + CHAR_LENGTH(charval2)
  AS res
  FROM chartest;

Expected results, after CREATE and INSERTs:

SELECT * FROM chartest; -- should show two rows
CHARVAL1   CHARVAL2   VARCHARVAL
========== ========== ==============================
aaa        aaa        aaa
aaaaaa     aaa        aaa
 
 
DELETE FROM chartest WHERE charval1='aaaaaa';


SELECT * FROM chartest; -- should show one row
CHARVAL1   CHARVAL2   VARCHARVAL
========== ========== ==============================
aaa        aaa        aaa


SELECT * FROM chartest WHERE charval1=varcharval;
CHARVAL1   CHARVAL2   VARCHARVAL
========== ========== ==============================
aaa        aaa        aaa


SELECT charval1 || 'X' FROM chartest AS res;
    res
===========
aaa       X


SELECT CHAR_LENGTH(charval1 || charval2) AS res FROM chartest;
    res
===========
         20


SELECT character_length(charval1) + character_length(charval2)
AS res
FROM chartest;
    res
============
          20

Actual results.

Standard
  • Return with an exception state if the inserted string is too long, unless the characters exceeding the limit are all spaces.
  • Pad CHAR columns with spaces if the inserted string is shorter than the specified CHAR-length.
  • Pad with trailing spaces as needed when casting or comparing to other string-like values (e.g. VARCHARs).
PostgreSQL Generally follows standard, but (conceptually) truncates trailing white-space before performing some functions (like the CHARACTER_LENGTH-function).
DB2 Follows the standard.
MSSQL Generally follows standard, but (conceptually) truncates trailing white-space before performing some functions (at least before LEN()).
MySQL Breaks the standard by silently inserting the string, truncated to specified column CHAR-length.
(It's actually not completely silent, as it issues warnings if values were truncated: If you manually check for warnings, you will know that something bad happened, but not which of the rows are now invalid.)
 
Violates the standard by effectively truncating all trailing spaces.
The documentation states that MySQL truncates trailing spaces when CHAR values are retrieved. That may be true, but it seems that truncation even happens before the CHAR values are used as input in functions like CONCAT, CHAR_LENGTH, etc.
Oracle Follows the standard, with a minor exception: Oracle doesn't remove trailing spaces which exceed the specified CHAR length, but raises an exception.

Date and time

The TIMESTAMP type

Standard Part of the Core requirements, feature ID F051-03.
Stores year, month, day, hour, minute, second (with fractional seconds; default is 6 fractional digits).
Extension to Core SQL (feature ID 411): TIMESTAMP WITH TIME ZONE which also stores the time zone.

Examples of TIMESTAMP literals:

  • TIMESTAMP '2003-07-29 13:19:30'
  • TIMESTAMP '2003-07-29 13:19:30.5'

Examples of TIMESTAMP WITH TIME ZONE literals:

  • TIMESTAMP '2003-07-29 13:19:30+02:00'
  • TIMESTAMP '2003-07-29 13:19:30.5+02:00'

It's strange that TIMESTAMP WITH TIME ZONE literals are not represented as, e.g., TIMESTAMP WITH TIME ZONE '2003-07-29 13:19:30+01:00', but according to Melton & Simon's book, they aren't.

PostgreSQL Follows that standard with one exception:
TIMESTAMP '2003-08-23 01:02:03 +02:00' is interpreted as a TIMESTAMP WITHOUT TIME ZONE (discarding the '+02:00' part)—not as a TIMESTAMP WITH TIME ZONE value. The standard may be illogical regarding this, but a standard is a standard...

Performs good sanity checks on inserted timestamp values; e.g. this will work:
   INSERT INTO tablename (columnname)
   VALUES (TIMESTAMP '2003-02-28 00:05:00')

while this will fail:
   INSERT INTO tablename (columnname)
   VALUES (TIMESTAMP '2003-02-29 00:05:00')

DB2 DB2 has the TIMESTAMP data type, but not the extended TIMESTAMP WITH TIME ZONE type.
DB2 accepts TIMESTAMP literals like '2003-07-23 00:00:00', however it doesn't accept the typed TIMESTAMP '2003-07-23 00:00:00' variant.

Performs good sanity checks on inserted timestamp values; e.g. this will work:
   INSERT INTO tablename (columnname)
   VALUES ('2003-02-28 00:05:00')

while this will fail:
   INSERT INTO tablename (columnname)
   VALUES ('2003-02-29 00:05:00')

MSSQL Note that MSSQL's choice of words related to date and time is confusing: In MSSQL's vocabulary, datetime is a concrete data type, whereas in the SQL standard, datetime is a general term covering the DATE, TIME and TIMESTAMP types.

MSSQL has a strange pseudo-type called TIMESTAMP, but has deprecated it; don't use it in new code.

The closest match to the SQL standard's TIMESTAMP type is DATETIME. This type stores the combination of date and time. It has a maximum of three fractional digits for seconds.

Performs good sanity checks on inserted timestamp values; e.g. this will work:
   INSERT INTO tablename (columnname)
   VALUES ('2003-02-28 00:05:00')

while this will fail:
   INSERT INTO tablename (columnname)
   VALUES ('2003-02-29 00:05:00')

MySQL No matter what date/time data type chosen in MySQL, storage of fractional seconds and time zones are not supported (the TIME type accepts time literals with fractional seconds, but discards the fractional part when storing the value). You will have to invent your own systems for such information.
Note also, that MySQL's choice of words related to date and time is confusing: In MySQL's vocabulary, datetime is a concrete data type, whereas in the SQL standard, datetime is a general term covering the DATE, TIME and TIMESTAMP types.

MySQL has a type called TIMESTAMP, but it is quite different from the standard TIMESTAMP: It's a 'magic' data type with side effects in that it's automatically updated to the current date and time if some criteria are fulfilled.

MySQL has a type called DATETIME. Like MySQL's TIMESTAMP type, it stores a combination of date and time without fractional seconds. There are no side effects associated with the DATETIME type—which makes it the closest match to the SQL standard's TIMESTAMP type.

By default, MySQL's sanity checks with regard to dates and time are (deliberately) poor. For example, MySQL accepts DATETIME values of '2003-02-29 00:05:00' and '2003-01-32 00:00:00'. Such values yield warnings (which you must check for if you want to be warned), but result in a value of zero being stored.

Oracle Follows the standard. Oracle has both the TIMESTAMP and the extended TIMESTAMP WITH TIME ZONE types.

A special gotcha applies, though: Oracle forbids columns of type TIMESTAMP WITH TIME ZONE as part of a unique key; this includes primary and foreign keys. Timestamps without time zone (and Oracle's special TIMESTAMP WITH LOCAL TIME ZONE) are accepted.

Performs good sanity checks on inserted timestamp values; e.g. this will work:
   INSERT INTO tablename (columnname)
   VALUES (TIMESTAMP'2003-02-28 00:05:00')

while this will fail:
   INSERT INTO tablename (columnname)
   VALUES (TIMESTAMP'2003-02-29 00:05:00')

SQL functions

CHARACTER_LENGTH

Standard CHARACTER_LENGTH(argument)
If the optional feature T061 is implemented, the function may be augmented with an indication of string unit:
CHARACTER_LENGTH(argument USING string-unit)
string-unit may be UTF8, UTF16, UTF32.

Returns NUMERIC. Returns NULL if the input is NULL.
Alias: CHAR_LENGTH.
The argument may be of type CHAR or VARCHAR.
Part of the Core SQL requirements (feature ID E021-04).
Related function: OCTET_LENGTH.

PostgreSQL Follows the standard, providing CHARACTER_LENGTH (and CHAR_LENGTH).

Note that PostgreSQL removes trailing (not leading) space from from CHAR values before counting. Note also that the behaviour of CHARACTER_LENGTH with regard to CHAR values has changed between versions 7.4 and 8.0 of PostgreSQL.

DB2 Has a CHARACTER_LENGTH function, but it's non-compliant because it requires indication of string unit, and db2's string units are different from the standard's.

Provides the LENGTH function for those who don't want to think about string units.

Note that CHAR values are space-padded (like the standard says they should be), so the length of 'HEY  ' is 5. Consider using LENGTH(TRIM(foo)) if you want the length without trailing spaces.

Documentation: CHARACTER_LENGTH and LENGTH

MSSQL Doesn't have CHARACTER_LENGTH. Provides the LEN and DATALENGTH functions instead (the latter is especially valid for 'special' data types like the TEXT type).
Note that MSSQL's LEN-function removes trailing (not leading) spaces from CHAR values before counting; MSSQL's DATALENGTH doesn't discard spaces.

Documentation: LEN and DATALENGTH

MySQL Provides CHARACTER_LENGTH.
Aliases: CHAR_LENGTH, LENGTH.
Note that MySQL removes trailing (not leading) spaces from CHAR values before counting.
Oracle Doesn't have CHARACTER_LENGTH. Provides the LENGTH function instead.

Behaves in strange ways if the input is the empty string or NULL, because of Oracles non-standard NULL handling (it considers NULL and the empty string identical 'values').

Note that CHAR values are space-padded (like the standard says they should be), so the length of 'HEY  ' is 5. Consider using LENGTH(TRIM(TRAILING FROM foo)) if you want the length without leading/trailing spaces.

SUBSTRING

Standard The standard defines two variants of the SUBSTRING function:
  1. To comply with Core SQL (Feature E021-06), the DBMS must support an 'ordinary' SUBSTRING function which extracts characters from a string:
    SUBSTRING(input FROM start-position [FOR length])
    Strings start at position 1. The start-position argument is a numeric value, as is the optional length-argument. If no length parameter is indicated, length becomes infinite

    (SQL:2003 specifies an extra optional argument—USING x—that has to do with Universal Character Sets, e.g. Unicode. x may be one of OCTETS or CHARACTERS.)

    The result is NULL if any of the arguments is NULL.

    Some cases of out-of-range values for start-position and length are allowed. Examples:

    • SUBSTRING('12345' FROM 6) yields the empty string.
    • A start-position less than 1 effectively sets start-position to 1 and reduces the value of length by 1+abs(start-position).
      I.e., if start-position is -3 and length is 6, then the length value becomes 2.
       
      Another way to put it is that when start-position is negative, a bunch of arbitrary/blank characters are prepended to the input-value. bunch=1-start-position.

    For an exact definition: see item three in the "General Rules" part of section 6.29 in the standard.

  2. The DBMS may optionally offer a regular expression variant (Feature T581) of SUBSTRING:
    SUBSTRING(input SIMILAR pattern ESCAPE escape-char)
    Pattern deserves some explanation. It's a string which needs to consist of three parts: A part matching before the wanted sub-string, the wanted substring, and a part matching after the wanted substring.
    The parts must be separated by a combination of the indicated escape-char (escape-character) and a double-quote ("). Example:
     SUBSTRING('abc' SIMILAR 'a#"b#"c' ESCAPE '#')
    should yield
     b
    The pattern description rules in SQL don't completely resemble POSIX regular expressions, as far as I can see.
PostgreSQL PostgreSQL provides three SUBSTRING flavors:
  • Ordinary SUBSTRING: As the standard's ordinary SUBSTRING variant.
  • POSIX regular expression SUBSTRING: Syntax is
    SUBSTRING(input FROM pattern-string)
    Pattern rules are of the POSIX variant. Returns NULL when pattern doesn't match.
  • Sort-of SQL-style regular expression SUBSTRING: Syntax is
    SUBSTRING(input FROM pattern-string FOR escape-char)
    Pattern-rules are supposed to match the SQL-standard's rules, although my tests sometimes suggest otherwise (hasn't been reported as bugs, because I'm not completely sure how SQL's regex-rules are supposed to be expressed). Returns NULL when pattern doesn't match.
DB2 Provides (since version 9) the SUBSTRING function, but requires you to indicate string unit by appending "USING unit".
The unit identifier may be CODEUNITS16, CODEUNITS32, or OCTETS. CODEUNITS16/CODEUNITS32 seem non-standard. The standard's CHARACTERS unit isn't supported by DB2.
Example:
SELECT SUBSTRING(somecolumn FROM 3 USING OCTETS) FROM sometable
SELECT SUBSTRING(somecolumn FROM 3 FOR 2 USING OCTETS) FROM sometable

For old DB2 versions, use the non-standard SUBSTR function.

DB2 doesn't provide any built-in regular expression facilities at all (but you may manually add PCRE capabilities).

Documentation: SUBSTRING and SUBSTR

MSSQL MSSQL has a SUBSTRING function, but its syntax differs from that of the standard. The syntax is:

SUBSTRING(input, start, length)

where start is an integer specifying the beginning of the string, and length is a non-negative integer indicating how many characters to return.

MSSQL has no regular expression functionality.

MySQL MySQL supports the standard's ordinary SUBSTRING function, with some twists (see below). No regular expression based substring extraction is supported.
 
MySQL breaks the standard when negative values are used as either start-position or length:
  • According to the standard, SUBSTRING('abc' FROM -2 FOR 4) should yield 'a';in MySQL, the result is 'bc'.
  • According to the standard, SUBSTRING('abc' FROM 2 FOR -4) should yield an error; MySQL returns an empty string.
Oracle Doesn't provide the standard SUBSTRING function. Provides SUBSTR(input,start-pos[,length]) instead (i.e. length is optional).
Oracle provides a number of SUBSTR-variants (SUBSTRB, SUBSTRC, SUBSTR2, SUBSTR4, same syntax as for SUBSTR), mainly for handling various kinds of non-latin-only string-types.
Oracle doesn't have support for string-extraction with the special SQL-style regular expressions. Instead, it has the REGEXP_SUBSTR function which offers string extraction, using POSIX-style regular expression pattern matching.

Documentation: SUBSTR and REGEXP_SUBSTR.

Note: If you find yourself using SUBSTRING in a WHERE-expression, then consider if LIKE could be used instead: The use of LIKE will typically make your DBMS try to use an index, whereas it will typically not try to do so in connection with functions.

REPLACE

By REPLACE is meant a string-function which searches a source string (haystack) for occurrences of a string to be replaced (needle) and replaces it with a new string (replacement).

Standard Not mentioned. May be obtained through a combination of other functions (have a look at the OVERLAY, POSITION and CHARACTER_LENGTH functions).
 
A de facto standard seems to have emerged with regard to REPLACE:
 
REPLACE (haystack:string,needle:string,replacement:string)
 
which means 'replace needle with replacement in the string haystack'. Replacement is done case-sensitively unless otherwise stated.
 
The REPLACE function may be handy for correcting spelling errors (and other situations):
UPDATE tablename
SET fullname=REPLACE(fullname,'Jeo ','Joe ')
PostgreSQL Follows de facto standard.
Documentation
DB2 Follows de facto standard.
Documentation
MSSQL Follows de facto standard with the exception that MSSQL by default works case insensitively.
Documentation
MySQL Follows de facto standard.
MySQL even works case sensitively.1
Note that the REPLACE-function is different from MySQL's non-standard REPLACE INTO expression.
Documentation
Oracle Follows de facto standard.
Documentation

Note 1:
In this author's opinion, it's confusing that most (if not all) string-related functions in MySQL work case sensitively, while MySQL's default behaviour is to work case insensitively in plain WHERE-clauses involving string comparisons.

TRIM

Standard Core SQL feature ID E021-09: TRIM(where characters FROM string_to_be_trimmed)

where may be one of LEADING, TRAILING or BOTH—or omitted which implies BOTH.

characters indicates what character(s) to remove from the head and/or tail of the string. It may be omitted which implies the value ' ' (space character).

In other words, the shortest form is TRIM(string_to_be_trimmed) which in effect means TRIM(BOTH ' ' FROM string_to_be_trimmed).

Trimming NULL returns NULL.

PostgreSQL Follows the standard.
DB2 Follows the standard.(since version 9.1)

In db2 versions lower than 9.1, you only have:
  LTRIM(string_to_be_trimmed)
and
  RTRIM(string_to_be_trimmed)

MSSQL Doesn't support the standard TRIM function.

Provides
  LTRIM(string_to_be_trimmed)
and
  RTRIM(string_to_be_trimmed)

Documentation: LTRIM and RTRIM

MySQL Follows the standard.
Oracle Follows the standard with two exceptions:
  • Oracle doesn't allow you to trim multiple characters. I.e., TRIM('**' FROM foo) is illegal in Oracle.
  • Due to Oracle's non-standard NULL-handling, you may get strange results of trimming NULL or the empty string.
评论Feed 评论Feed: /feed.asp?q=comment&id=1785

标签: sql different

这篇日志没有评论.

发表
表情图标
[smile] [confused] [cool] [cry]
[eek] [angry] [wink] [sweat]
[lol] [stun] [razz] [redface]
[rolleyes] [sad] [yes] [no]
[heart] [star] [music] [idea]
UBB代码
转换链接
表情图标
悄悄话
用户名:   密码:   (非注册用户不需要输入密码) 注册?
验证码(不区分大小写) * 请输入验证码