Wednesday, April 30, 2003

Clustered JDBC

Here's an open source project that CFMX and J2EE developers will find interesting: c-jdbc which stands for Clustered JDBC. It's basically a JDBC driver that lets you cluster several databases (any db with a JDBC driver pretty much), you can even have a cluster of different types of databases that operate on the same schema (eg Oracle and PostgreSQL).

The concept is called RAIDb (Redundant array of inexpensive databases), and they define several types of RAIDb levels:

  • SingleDB: load balancer for a single database backend instance.
  • RAIDb-0: full database partitioning (no table can be replicated) with an optional policy specifying where new tables are created.
  • RAIDb-1: full database mirroring (all tables are replicated everywhere) with an optional policy specifying how distributed queries (writes/commit/rollback) completion is handled (when the first, a majority or all backends complete).
  • RAIDb-1ec: full database mirroring (like RAIDb-1) with error checking for byzantine failures detection.
  • RAIDb-2: partial replication (each table must be at least replicated once) with optional policies for new table creation (like RAIDb-0) and distributed queries completion (like RAIDb-1).
  • RAIDb-2ec: partial replication (like RAIDb-2) with error checking for byzantine failures detection.

This technology is great for the open source databases because it allows you to cluster them easily. MySQL does support replication, and there are some solutions for PostgreSQL as well but I think something like this is more flexbile at the moment. The one drawback is that all applications that access the db must use JDBC.

Working with PostgreSQL

I'm working on migrating a SQL Server database to PostgreSQL. So far it has gone smooth, I did run into two issues today however.

I wrote some scripts that generate the DDL for PostgreSQL from my SQL Server metadata, the scripts also grab all the data, and create INSERT statements (I should probably be using COPY statements). So I have a 5MB file that contains the insert statements, and there was an error somewhere in there. I managed to figure out (after a lot of grep's) which line the error was on, it wasn't obvious to psql because it was a termination issue with a single quote. Here's what the query looked like:

INSERT INTO someTable (someColumn)
VALUES ('This is the content \')
Single quotes are escaped in PostgreSQL with either another single quote ('') or with a backslash. Turns out one of the columns in one of the tables ended with a \ so PostgreSQL just escaped the ending '

The next problem I ran into was with the bit datatype. I use a bit in SQL Server to represent booleans, so I assumed that in PostgreSQL a bit would have the same semantics. The bit datatype in PostgreSQL can actually hold a string of bits, not just 0,1, or NULL like in SQL Server. So a query like this:

SELECT * FROM news WHERE archived = 0
Won't work on PostgreSQL server if archived is of the bit datatype. You would have to do something like this:
SELECT * FROM news WHERE archived = B'0'
Which cast's the string '0' to a binary bit value.

I'd rather not change any SQL code in my applications, so I will be mapping my bit types to PostgreSQL's boolean type. It's actually a handy type that allows for more than just 0/1. You can use the following values for true:

TRUE 
't' 
'true' 
'y' 
'yes' 
'1' 
And the following values for false:
FALSE 
'f' 
'false' 
'n' 
'no' 
'0' 
This is nice because it corresponds to ColdFusion's boolean datatype.

Another way to resolve this problem would be to use a numeric type, such as integer. Many developers prefer to use integer datatypes over a bit or boolean, because performance is better. Using an integer does require more storage space however.

UPDATE the boolean type requires single quotes around 1 or 0, so in order to keep SQL code consistant I will be using the integer datatype.

Monday, April 28, 2003

ColdFusion Conference on a Boat

Actually it's on a large boat, a cruise ship and will be sailing in the Caribbean! You have until thursday (5/1/03) to signup, there is still some room left for you.

I'm ofcourse talking about the 1st. Annual ColdFusion Cruise-N-Learn, it's a week long cruise where you can learn about ColdFusion, and hang out with folks like Ben Forta and myself. It will be a small event so you will have VIP access to the speakers.

The Cost starts at $1995 all inclusive, plus you get $200 of on ship credit. The $1995 cost can be split between two people if you want to bring your spose, friend, or coworker to share a cabin with. I think that's a pretty good deal, so I hope to see you there.

Thursday, April 24, 2003

Macromedia Central, and Synchronized Applications

Macromedia has a presentation about the upcoming Macromedia Central platform.

The presentation notes that there are 750,000 Flash developers, and there is a $2.2B market for these "Permium Internet Content Applications" (Jupiter Research). I hope all the flash developers don't work on these because that only leaves each developer with just under $3000 :).

I'm interested to get a look at the API that central will provide developers. One thing I'm perticurally interested in these days is applications that synchronize data. I use at least 3 different computers on a regular basis (work, laptop, home pc), should I have to maintain 3 different address books? Definetly not. I think Macromedia Central will be a good platform for developing such applications, though from what I understand creating this synchronization would still be on the sholders of the app developers. (By the way I do have an app for keep my address book sync'd, I use IntelliSync for Yahoo, it sync's your Outlook and Yahoo! address books. It doesn't work perfectly however)

Wednesday, April 23, 2003

Microsoft Research

You can find lots of research infromation from the Microsoft Resarch web site. Microsoft funds quite a bit of research, and works with universities a lot. You will want to check out the projects page that lists all the projects they are or have worked on. Some of the research areas include Databases, Software Engineering, Security, Performance and Load Testing, Social Computing, Networking, etc.

Lots of interesting research there, just reading the description of some of the topics is enough to get you thinking let alone the research papers...

You will also want to check out the Multi University Research Lab. You can watch seminars on this site.

I first found this site a few years ago, and just thought of it again today, they have added several projects since then.

Tuesday, April 22, 2003

New ColdFusion MX book published

A new ColdFusion MX book called The ColdFusion MX Developers Cookbook was recently published by SAMS. I Co-Authored the book with Brad Leupen, and Chris Reeves.

The main difference between this book and other ColdFusion books is the format. It is setup as follows:

Technique - Explain the problem in one or two sentences
Example - Show the code to solve the problem
Comments - Discuss the problem, and the solution in detail

I think the main benefit to that format, is that it works well for both beginners and advanced people. Advanced developers can get the solution in as few words as possible. Beginners can see the solution, and then learn more about it.

The book doesn't cover the fundamentals of ColdFusion, or Programming, it assumes you know how to use what a variable is, and what a conditional is. We had the benefit of assuming the reader was comfortable with these fundamental skills when we wrote it. So if your not comfortable with the fundamentals you should check out some of the other books on the market, and then buy this one :).

Friday, April 18, 2003

Returning TOP N Records

Returning only the first N records in a SQL query differs quite a bit between database platforms. Here's some samples:

Microsoft SQL Server

SELECT TOP 10 column FROM table

PostgreSQL and MySQL

SELECT column FROM table
LIMIT 10

Oracle

SELECT column FROM table
WHERE ROWNUM <= 10

Due to these differences if you want to keep your code database independent you should use the maxrows attribute in the cfquery tag. The tradeoffs to database independance is performance, I would expect maxrows to be slower than specifying the rows in the SQL.

<cfquery datasource="#ds#" maxrows="10">
  SELECT column FROM table
</cfquery>

PostgreSQL has a cool feature that will let you return an arbitrary range of rows (eg return rows 10-20). This is very handy for displaying pages of records:

SELECT column FROM table
LIMIT 10 OFFSET 20

The above query will return rows 20-30

Wednesday, April 16, 2003

SourceForge has RSS feeds

SourceForge, the huge repository for open source projects now has RSS feeds for each project including Project News, File Releases, Documentation, and project summary. This must be a new feature, I haven't noticed it in the past, but I could be wrong. Anyways that is a good way to keep upto date with your favorite open source projects. They also have a feed for all new releases.

Tuesday, April 15, 2003

Free ColdFusion Magazine

There is a free monthly ColdFusion magazine in the works. You can signup to receive it throught the web site. The magazine is being put together by Pablo Varando of EasyCFM.com.

Monday, April 14, 2003

Confirming Transaction support

Want to know if your ColdFusion database driver supports transactions (the <cftransaction> tag)? I was wondering how I might test this, and I came up with a solution. The code I wrote essentially creates a dead lock if transactions are supported by the db driver, if the timeout is reached an exception is thrown, and we know that our database and driver support transactions.

The Code: (warning I wouldn't run this against a live database, because it does cause a deadlock)

<cftransaction>
  <cfquery datasource="#ds#">
  	UPDATE table 
	SET column = 'value'
	WHERE id = 1
  </cfquery>
  
  <cftry>
    <cfhttp url="http://localhost/deadlock.cfm" 
	  method="get" timeout="5" throwonerror="true">
    <cfcatch type="any">
 	  Transactions work!
    </cfcatch>
  </cftry>
 
  <cfquery datasource="#ds#" name="data">
  	SELECT column FROM table
	WHERE id = 1
  </cfquery>
</cftransaction>

<cfdump var="#data#">

Now create a file called deadlock.cfm if possible put this file on a different server, the <cfhttp> call above should call this file.

<cfquery datasource="#ds#" timeout="8">
  	UPDATE table 
	SET column = 'deadlock'
	WHERE id = 1
</cfquery>

If the page says "Transactions work!", then transactions ofcourse seam to be working. I used this method to check transaction support of PostgreSQL 7.2.3 running on Redhat 8, using ColdFusion 5 on Windows with the PostgreSQL 07.02.0005 ODBC driver. And they do indeed work.

I may also check the mySQL odbc drivers, connecting to a mySQL 4.x database, mySQL 4.x supports transactions using the Berkley DB, or InnoDB file formats (not the default MyISAM table format), but I still need to install 4.x it on my server. If anyone has this setup, or has already tested please let me know.

Friday, April 04, 2003

Fast XSLT - Compiled XSL with XSLTC

An article on xml.com - Fast XSLT talks about the race for building faster XSL transformers. One approch focused on is generating a "translet" (a java class) from the style sheet, then executing the compiled class. Compiled execution is great, but compiling isnt. I would hope that application servers that plan to use XSLTC would do it in a transparent way (such as how JSP is compiled into a servlet and then cached behind the scenes).

It would be cool if you could write Java code under your web root, and then have the java code compiled automatically by the ColdFusion server. That is one feature I had wished CFMX would have. However I think this could be implemented on your own with a filter, and a fancy class loader.

Thursday, April 03, 2003

Using CustomTags in ColdFusion

Using custom tags in the presentation layer can greatly organize your code. Here's a quick start guide that I wrote up. Building a simple ColdFusion custom tag is as easy as:

A simple custom tag

create a file called todaysdate.cfm:

<cfoutput>#DateFormat(Now(), "mm/dd/yyyy")#</cfoutput>

Then place todaysdate.cfm in ColdFusion server's CustomTags directory, or in the same directory that you will call it from. To call it simply create a page with the following:

<cf_todaysdate>

Using attributes

To pass an attribute into a tag, is also fairly easy.
greet.cfm:
<cfparam name="attributes.name" default="Dude" type="string">
<cfoutput>Hello #attributes.name#!</cfoutput>

Now to call it:

<cf_greet name="Pete">

In the greet tag there is an attribute called "name" which we passed the value "Pete" to in our example. We can access attributes using the attributes scope, attributes.attributename. In this tag we used the <cfparam> tag to set the default value of the name attribute to "Dude".

Using start and end tags

Another thing you can do with custom tags is to use start and end tags. Something like this:

<cf_myTag> This is some content </cf_myTag>

To access the content inside the tag we use another scope called thisTag. The thisTag scope contains variables that hold information about the tag that is being executed, such as the content between the start and end tags. Such content may be accessed with the variable thisTag.generatedContent.

Because you may want to do some processing when the start tag is invoked, and then some more when the end tag is invoked, your custom tag will file will be executed twice when you use an end tag. Luckily you can use the variable thisTag.executionMode to determine if your currently executing the start tag or the end tag. Lets consider a highly trivial example, of creating a ColdFusion tag to make text bold, there are several ways we can do this. Here is the simplest:

bold.cfm:
<cfif thisTag.executionMode IS "start"><b><cfelse></b></cfif>

The above tag will output a <b> tag when the start tag executes, and a </b> tag when the end tag executes.

Another way of solving this problem is to use the thisTag.generatedContent variable:

bold.cfm:
<cfif thisTag.executionMode IS "end">
	<cfset thisTag.generatedContent = 
		"<b>" & thisTag.generatedContent & "</b>">
</cfif>

In this case we are resetting the value of thisTag.generatedContent to include the <b> tags. When the end tag is finished executing ColdFusion will output the contents of thisTag.generatedContent.

Note that the value of thisTag.generatedContent is not set until the end tag begins executing. Setting the value of it in the start tag execution mode will have no effect on the output.

A third way to solve this problem is similar to the last, but in many cases this will be the way to go:

bold.cfm:
<cfif thisTag.executionMode IS "end">
	<cfoutput><b>#thisTag.generatedContent#</b></cfoutput>
	<cfset thisTag.generatedContent = "">
</cfif>

In this example your handling the output yourself, and setting the value of thisTag.generatedContent to an empty string. This is the way to go if your using a lot of conditionals to produce your output.

A few more tricks we can use to ensure that our tag is being called properly include:

To ensure that the end tag is present:

<cfif thisTag.executionMode IS "end">
	Do something.
<cfelseif NOT thisTag.hasEndTag>
	<cfthrow message="Missing end tag.">
</cfif>

To ensure that the file is only called as a custom tag:

<cfif NOT IsDefined("thisTag.executionMode")>
	Must be called as customtag.<cfabort>
</cfif>
<cfif thisTag.executionMode IS "end">
	Do something.

<cfelseif NOT thisTag.hasEndTag>
	<cfthrow message="Missing end tag.">
</cfif>

Update 4/4/03:Ray Camden also suggested this handy code snippet:

To ignore an end tag:

<cfif thisTag.executionMode is "end">
    <cfexit>
</cfif>

That code simply ignores the end tag. This is useful if your trying to use valid XML within your CFML, such as <cf_mytag />. Many people have gotten in the habit of doing that now, but keep in mind that CFML is not valid XML, mainly CFIF statements tend to break the rules.

More bashing

Thanks to my brother Steve, I was able to get my backtracking cd to the way I wanted it to

Just add the following to your ~/.bashrc or if you want to make this work system wide add this to /etc/bashrc:

#redefine pushd and popd so they don't output the directory stack
pushd()
{
    builtin pushd "$@" > /dev/null
}
popd()
{
    builtin popd "$@" > /dev/null
}

#alias cd so it uses the directory stack
alias cd='pushd'
#aliad cdb as a command that goes one directory back in the stack
alias cdb'popd'

The redefinition of pushd and popd redirects their output to /dev/null instaed of your terminal. This prevents them from displaying the entire stack every time they are called

Tuesday, April 01, 2003

Backtracking with bash

I was working with linux quite a bit today, and frequently changing between directories, when I wondered if there was a way to go back to the directory I was in previously.

Turns out there is a way:

 cd ~-
So if I was doing something like this:
[pete@bigred /]$ cd /etc
[pete@bigred etc]$ cd /usr/local
[pete@bigred local]$ cd ~-
[pete@bigred etc]$ pwd
/etc
If you want to create a command so you don't have to type ~- you can create an alias:
alias cdb='cd ~-'
This ~- thing works great if you only need to go back one directory, but what if you wanted to go back two directories. Continuing the last code sample:
[pete@bigred etc]$ cd ~-
[pete@bigred local]$ cd ~-
[pete@bigred etc]$ pwd
/etc
We are back to /etc and not / our starting point. What I want is something that keeps a history of the directories I've been to.

It turns out that the Bash (the "Bourne again shell") has a directory stack builtin. Three command line tools for manipulating the stack are avaliable dirs, pushd, and popd. More info about the directory stack in bash here.

If we pushd a directory onto the directory stack, we can retreive the top of the stack using dirs +1. I tried setting up some aliases to get it to work the way I wanted:

alias cdd='pushd'
alias cdb='cd `dirs +1`'
Those worked a bit, but I ran into a lot of problems, especially when in the home directory. Also when you run pushd, popd, or dirs it always prints the contents of the stack, I don't know how to suppress that. So I figured I would post it here, and see if anyone can come up with a solution, or if anyone knows of a better way of going about this.

Isn't it funny how software developers will spend hours of time trying to save a few seconds of their future time.