Menu
Teradata scales a data mountain

Teradata scales a data mountain

Windows NT shops that have longed for a highly scalable and reliable data warehouse are finally seeing their wishes fulfilled. NCR has ported its Teradata database system to Windows NT, and the result - a feature-rich product capable of running and managing the very largest data warehouses - is a smashing success. Teradata for NT (TNT) earned a score of Excellent in our evaluation, thanks in part to its flexible, scalable architecture.

Teradata has provided unprecedented performance on Unix systems for more than a decade. Major organisations all over the world rely on its scalability and performance to run the most complex and aggressive data warehousing solutions ever deployed. Two years ago, NCR began the process of porting Teradata from Unix to Windows NT as part of its "Open Systems" strategy. Consequently, NCR now provides an excellent solution for businesses that want to use Windows NT with true scalability and performance for their data warehousing solutions without the high cost of a Unix-based Teradata environment.

Hello BYNET

Teradata is a relational database system that will provide decision-support capabilities for organisations that need to store and analyse gigabytes and even terabytes of data. The heart of TNT is how it brings MPP (massively parallel processing) and SMP (symmetric multiprocessing) technologies together. TNT uses the same BYNET architecture that NCR created for the Unix version, porting BYNET from its native MP-RAS (NCR's Unix distribution) environment. The BYNET architecture loosely couples up to four Windows NT SMP systems, or nodes, into a single, logical database system.

Each of the NT nodes runs two types of virtual processors: PEs (parsing engines) and AMPs (access module processors). PEs process the SQL statements from clients and manage the sessions with the AMPs. AMPs manage the database access and provide the database parallelism.

For example, a client application executes a standard ANSI SQL SELECT statement used to retrieve a results set from the Teradata database. The request is received and processed by the PEs. They parse the SQL statements, optimise the query plans, and send the requests to the AMPs through BYNET. Each AMP processes its portion of the request and sends the results back to the PEs. The PEs combine the results and return the answer set to the client. The result to the business is incredibly fast information retrieval, even with the most complex queries.

I performed my evaluation using an NCR WorldMark 4800 system configured with two NT nodes. The physical implementation for the whole system included eight Pentium III Xeon 500MHz CPUs, 2GB RAM, and 80 9GB disk drives. Teradata currently supports up to four CPUs in an NT node. This is fewer than the eight supported by the OEM version of NT but still provides the performance of the equivalent on a Unix-configured system.

I began the evaluation by testing the Database Window on the administrative workstation. I was able to find the 10 AMPs and two PEs configured for each node, but Database Window's displays were character-based and more consistent with results I would expect from a Unix system. Fortunately, NCR configures all customer systems before shipping them and also performs all upgrades required by its customers. Therefore, I felt that getting to know this application in detail would not necessarily be a top priority for systems administrators.

Instead, I found Teradata Manager, which I ran on the desktop, to be a much better administration tool than the Database Window. It provided a friendly Windows GUI, with the capability of managing performance monitoring, alert management, graphical analysis, reporting, and system configuration.

Getting to data

I spent a lot of my time using two of Teradata's SQL tools: the WinDDI (Windows Data Dictionary Interface), a GUI for performing SQL DDL (Data Definition Language) commands to manage database objects and users; and Queryman, which I used to enter my SQL query commands. I really began to understand the features and benefits of Teradata while using these utilities.

I created a new table using the CREATE TABLE command in WinDDI. I paid special attention to how I entered the table into the system; using other RDBMS systems, performance can be greatly enhanced through careful planning of the physical placement of tables in the database. However, with Teradata, the only syntax that identified where the table would be stored was PRIMARY INDEX (fieldname); no segmentation or paging extensions were required.

Teradata uses the fieldname to determine the placement of the table information across the AMPs using a hashing algorithm. As the administrator, table placement was taken care of for me, making this task incredibly simple. Organisations that adopt TNT will find such simplified administration fuels faster physical implementations of the data warehouse.

Adding users to Teradata was initially somewhat confusing. I had expected Win-DDI to display administrative data by database, in much the same way you would expect things to appear with Microsoft SQL Server. Instead, it displays users.

Users are created using the CREATE USER statement. To access the user's tables, you reference the user's name as you would a database followed by the table name. With tables belonging to users, Teradata manages the table contents across the AMPs without being restricted by a database structure. The user's definition is used to identify table ownership and the maximum space allotted to the application that accesses them. Administrators will appreciate Teradata's simplicity here once it becomes familiar.

I used Queryman to enter SQL query commands, and one window proved to be extremely valuable. It was a list of all queries previously run with full statistics available at the click of the mouse. This was very useful during my evaluation because I frequently selected queries from this view for reuse. Another impressive feature was the capability of running concurrent queries in the same window. I used this several times to compare results between a single query request and multiple query requests. As an administrator, I find it extremely helpful to run my queries in parallel; this allows me to maximise the resources of the database and get my work done more quickly.

I had no problems entering any of the standard ANSI SQL commands. However, there is no stored procedure language; NCR says it expects to include one in the next release. Stored procedures are typically created to get the database server to perform much of the processing that otherwise occurs in the client application. Teradata doesn't require this capability because the PEs and AMPs are responsible for the query processing. The only thing the client is required for is to issue the command. Overall, the lack of a stored procedure language does not strike me as a significant oversight. Teradata does come with a macro language that resembles a procedure language, but it is limited in functionality.

Using derived tables in complex queries was very useful. This was accomplished by entering a SELECT statement in the FROM clause instead of having to use temporary tables. This should help developers reduce the time they spend building queries. Teradata also includes complex RANK and QUANTILE functions that greatly enhance developers' abilities to write queries that perform complex calculations.

NCR is working closely with Cognos and several other companies to help them build functions into their products that take advantage of Teradata's high performance. The goal is to enhance query performance by using Teradata to process the results instead of the workstation. I ran some reports with Cognos using functions that were optimised and they returned in minutes. Reports without the optimised functions required a coffee and donut run. I should add that the functions being tested were very complex queries.

If you're going to deploy Teradata, you'll want to identify the products you intend to use with it, because NCR is working with a long list of third-party vendors to help them take advantage of Teradata's power. If you don't use one of the optimised solutions, you won't get the benefits of this collaboration.

I didn't have a chance to execute client applications, but Teradata supports CLI (call level interface), API, and ODBC interfaces. The company also provides the CLI interface to mainframe platforms, enabling mainframe applications to directly update the Teradata database.

If you're still reluctant to consider Windows NT for your enterprise applications, Teradata for NT may be the product that makes you reconsider your position. Teradata's features and scalability finally make NT a suitable operating system for world-class enterprise data warehousing solutions. The rich tools and utilities provide everything you need to integrate and manage a data warehouse within a business's existing environment.

I give NCR Teradata NT an excellent overall score with only the slightest hesitation on its lack of a stored procedure language. As Windows supports more processors, I hope NCR will continue to enhance the CPU support on the SMP nodes, further improving scalability. Stored procedures and implementation on Windows 2000 are planned in the next release in June 2000. This product will only get better.

MPP versus SMP: Two architectures

To fully understand how NCR implements Teradata NT, one should understand the differences between MPP and SMP technologies.

SMP shares disk storage with all of the CPUs in the system.

SMP systems are very scalable up to a certain number of processors. Once that threshold is reached, the overhead to manage them becomes greater than the benefits of adding another CPU.

MPP architecture configures two or more SMP systems.

MPP systems are unlimited in their scalability. As SMP nodes are added, the overhead remains the same. Scalability is linear. In fact, some Teradata customers have MPP systems comprising more than 150 CPUs.

NCR Teradata 3.0. NT combines MPP and SMP to take advantage of both architectures. Windows NT SMP nodes running Teradata have been certified for up to four CPUs, and four SMP nodes can be connected using MPP.

This results in the scalability of NT to 16 processors at this time. The next release of Teradata NT, scheduled for June, will support up to 16 SMP nodes for scalability to 64 CPUs.the bottom lineTeradata 3.0.1 for Windows NTBusiness Case: Teradata for Windows NT is a relational database system that provides decision-support capabilities along with high-end performance and scalability.

Technology Case: Having ported this product from the Unix MP-RAS environment, NCR has provided a level of parallelism that no other RDBMS system can claim. Furthermore, Teradata's database architecture simplifies administration.

Pros:

l Excellent system scalability

l Simplified database administration and development

l Helpful client utilities

Cons:

l Lack of a stored procedure language for the RDBMS database

l Vendor assistance required for system setup

l Some character-based utilities

Platforms: Windows NT 4.0 Server or Enterprise Edition with Service Pack 5.

Price: Available on application.

NCR in Australia can be reached on (02) 9964 8248 www.ncr.com


Follow Us

Join the newsletter!

Error: Please check your email address.
Show Comments