David Beveridge

Subscribe to David Beveridge: eMailAlertsEmail Alerts
Get David Beveridge: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Related Topics: CMS Journal

CMS: Article

Content Management at the Department of Energy

A WebSphere success story

When the Office of the Chief Information Officer at the U.S. Department of Energy (DOE) called for a renovation of the Department's signature Web site, energy.gov, the plans were ambitious: blow up and reorganize the site's organization and navigation, develop a new look and feel to bring the site a high level of sophistication and polish, and deploy a publishing system that DOE's content team can use to create and maintain the site easily and efficiently, at less cost.

DOE put out the call to Brook Group, a Web services firm, to rescue the Web site and asked them for a full redesign and re-architecture of the site content. As for the publishing system, Brook Group answered the call with Tacklebox, its Java-based Web content management system (CMS), running on an IBM server platform.

Web Publishing for the 1990s
In the "old" days of the Web, sites were hand-coded, page by laborious page, using now-familiar technologies such as HTML, FTP, and TCP/IP. These technologies looked then (and still look today) far too much like programming for most nontechnical people. It was unrealistic to expect people to acquire such specialized technical knowledge and training just so they could enter press releases onto the site or change a phone number on the Contact Us page. In most organizations, that meant the people closest to content and its presentation - writers, artists, product specialists, marketers, PR folks - had neither the training nor the inclination to do what was necessary to publish on what was quickly becoming the organization's most public forum - its Web site.

Worse yet, the people most likely to acquire that technical knowledge rarely possessed the background or position in the organization to make decisions about content presentation. The disconnect that resulted -- what we call the Webmaster bottleneck - led to inefficiency and frustration at both ends of the equation. Content "owners" want to control how content is managed. They need the ability to make edits (often simply fixing typos or changing a few words) without waiting for their requests to cycle through the IT department's priority list of to-dos. The IT people called into service for Web site maintenance are frustrated by the mundane nature of much of their work, particularly when the content owners always seem to want the work done yesterday. Add to that the unavoidable culture clash between techies and nontechies, and it became clear that a new solution was required. Enter the CMS.

Content Management
Tacklebox and other such systems allow organizations to put content management in the hands of anyone by simply assigning them rights to manage content and requiring a minimum of training. A CMS automates many of the mundane tasks that make up the publishing cycle. Content is entered through a Web-based content creation and editing engine. The engine provides "what you see is what you get" controls, allowing nontechnical staff to maintain content in a recognizable, word processor-like interface. This means that those simple edits that need to be done "now" can be done, simply and immediately, by anyone in the organization. A workflow system allows site administrators to automate the editorial review and approval process by assigning staff members to step-by-step workflows that can be assigned to individual pieces of content. Each staff member has a specific role in the workflow. Once those workflow assignments are made, the system provides the right users access to the right content, and then routes that content through its appropriate approval path.

Workflows are created by site administrators to mimic their organization's real-life publishing flow, moving content from author to editor to reviewer until one of three outcomes is reached: the content is published on the live Web site; it is sent back to the author for changes and then iteratively through the process until it is ready to be published; or it is not approved and is stored in the content archive as a "disapproved" item.

The system automates notifications by alerting workflow role players when to act. It provides content scheduling, an automated way for content to "go live" or be removed by simply applying a scheduled date to the content. Instead of the 1990s method of requiring Web staffers to sit at the keyboard waiting for midnight to strike so they could punch the Enter key to FTP a new page to the site, CMS system daemons constantly monitor workflows and content states, moving content onto or off of the site as scheduled. This self-monitoring allows authors to develop content well ahead of its needed publication date. The content then goes through the entire approval workflow, but is not posted live on the site until the time is right.

In addition to its main job of content management, full-feature CMS publishing provides two key site management features.

First, navigation management allows site managers to manipulate the actual site structure. Where pages and folders once used the server's file system to create a navigation tree, Tacklebox allows the DOE staff to add, delete, move, or rename sections, subsections, and subsites, or create page redirects to assist in marketing and in managing legacy URLs. Managers can hide or reveal sections of the site, allowing them to fill pages with content in the background, and only reveal them when they are fully ready for publication.

Second, all publishing activities related to a given piece of content, or a given user, are monitored and maintained in an audit table, allowing site managers to know precisely which content was updated, when and by whom, and maintaining those specific relationships so that content can be "recalled" if necessary. This function is crucial to federal government Web sites like energy.gov. Federal Web content is part of the public record of federal documents, subject to retention policies just as is any document produced by the Government Printing Office. Preservation of documents and an audit trail that preserves all published versions of a document, as well as recording the actions taken during the publication cycle are crucial features of a government CMS.

Building the Infrastructure
HTML sites were simple. They needed an Internet-connected Web server, a set of HTML files containing embedded static content, and a collection of image files that were referenced from within the HTML: that is, a Web server, a file server, and a DNS entry. While the technical infrastructure required to drive a CMS like Tacklebox is certainly within the capability of many organizations, it far exceeds that of a typical HTML site. The main Tacklebox engine is built in Java using the Struts framework. Its database back end contains all site content, graphics, page templates, navigation, and site-specific configuration options, and the front-end templates are sets of JavaServer pages (JSP).

Tacklebox requires an application server to manage interactions between the Web server and the application/database. The application is serving three masters at once: the set of users visiting the site over the public Internet and viewing HTML pages served by the system; the site managers and content folks who are working behind the scenes to dynamically change the site while it is in production; and the application functions that drive the system workflow, site management, user management, e-mail notifications, and automated publishing functions.

Brook Group knew that DOE would require a high level of reliability and industrial-strength performance from every tier of the site. For that reason, they recommended the IBM infrastructure: IBM Web Server at the front end and a WebSphere Application Server in the middle tier. DOE planned to use an existing Oracle 9i database server for the back end. Brook Group's Tacklebox development team uses WebSphere Studio Application Developer as its primary development tool. Given the DOE mandate that the site maintain maximum uptime, even under potentially heavy load, Brook Group decided that WebSphere Application Server (WAS) was the best choice. With large-scale clients such as America Online, Ameritrade, the Department of Justice, and the White House Office of Management and Budget, they know that their solutions have to be reliable.

CMS systems can be deployed in one of two modes. Clients such as DOE, who have the existing infrastructure and resources to host the application themselves, often prefer that method. It provides them with economies of scale through shared resources and ensures that all data and system management functions stay behind the firewall. Brook Group also provides a hosted CMS solution in which clients use a hosted version of the site. For that method, Brook Group also uses WAS as its application server.

"Stuff" Happens
The process of designing and building a department-level Web site for a federal agency is fraught with bureaucracy: endless meetings and conference calls, steering committees, listservs, and multiple layers of approval for every project element from color schemes to search engines to hardware acquisition plans to content types. The Office of the CIO at DOE led a 30-plus member steering committee of stakeholders throughout the organization. They shared two characteristics: they all wanted a new site, and they all disagreed on what that site should be.

The recommendation to use a WebSphere Application Server met with minor opposition. Although DOE already used the product, the specific staff members assigned to this project were unfamiliar with it. They had valid concerns about their ability to support a product they did not know well, and also were concerned that it was more power than this application and this one Web site needed. As it turned out, the recommendation paid off.

After a number of months developing design approaches and content maps, acquiring and installing equipment, gaining approvals, and conducting meetings, we finally had a working application in place. The DOE staff charged with developing content were hard at it, and the site took shape. The final element to be implemented, a non-IBM search engine, was the final technical piece to be deployed. As the launch date neared, the search engine was proving a particularly sticky issue. It was not well documented and the software company provided little technical support. A few days before final testing, the engine was finally configured and working. Final testing showed all systems working. Almost. The site, now running in a staging environment and ready for a public launch, began to bog down. Performance seemed to regularly slow to a crawl and the site began to drop. The application server was crashing and no one knew why.

When a large, public project begins to go south, there is no lack of finger-pointing and panic button-hitting. Various members of the team, and a few new faces who popped out of the bureaucratic woodwork for the first time, began taking pot shots. "The application has a memory leak!" "The search engine is no good!" "The testing was inadequate!" "The implementation was faulty!"

But amid the clamor, a few cooler heads began poring through the data, methodically analyzing the situation until it became clear that a minor component in the system, the Java drivers used by WebSphere and the search engine, were incompatible. Although the team had deployed the correct drivers, and the search engine documentation suggested that its driver would work with IBM's, it was not the case. The driver mismatch was causing sessions to spin out of control, eventually crashing the application server and the site along with it.

Happy Endings
Throughout the analysis, the project team made numerous calls to the search engine vendor and IBM. Once the problem was isolated, IBM stepped up with a solution, forwarding an about-to-be-released WebSphere patch that addressed the specific incompatibility. The patch was installed, the problem immediately disappeared, and eighteen months later, energy.gov is still running strong.

The DOE site garnered an Internet Best In Class award from Content Week magazine, a testament in part to the decision to go with a proven platform. As Brook Group prepares to launch the next version of Tacklebox at energy.gov this year, the site remains a WebSphere success story. From the use of an IBM SPC test lab during the development process, which proved that the site would withstand far more capacity than was expected on energy.gov, to the crucial last-minute driver fix, the power and reliability of the WebSphere-Tacklebox combination has proven a winner for energy.gov.

More Stories By David Beveridge

David Beveridge has served in every aspect of Web development. He oversees all areas of project management and operations for Brook Group. Prior to joining Brook Group, David amassed extensive experience running Web shops for a large not-for-profit association and a dot-com startup. During 20 years at the Boston Globe and National Geographic Society, David entered the interactive age in the late ?70s, cutting his teeth on early full-text search and retrieval systems for print publishing. He was named the Geographic?s first full-time online staff member in 1994, running its sites on America Online and CompuServe, and served as a founding member of the National Geographic Interactive management team.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.