Under the hood of Cisco’s Tetration Analytics platform
- 21 June, 2016 06:18
Cisco’s entrance into the data center analytics market with the introduction of Tetration is the culmination of two years worth of wrangling various open source projects and developing proprietary algorithms in the areas of big data, streaming analytics and machine learning.
Tetration is an analytics platform that provides deep visibility into data center and cloud infrastructure operational information. Here’s a description from Network World’s story on Tetration:
The platform, Cisco Tetration Analytics gathers information from hardware and software sensors and analyzes the information using big data analytics and machine learning to offer IT managers a deeper understanding of their data center resources. The system will dramatically simplify operational reliability, application migrations to SDN and the cloud as well as security monitoring.
+MORE AT NETWORK WORLD: The time is right for Cisco to jump into analytics | First look at Tetration +
But what’s under the hood of Tetration? Below are some of the components used to build the product (by the way, the word Tetration is a mathematical term used to indicate very large numbers):
Apache Spark is an engine for large-scale data processing. To understand what Spark is, it’s helpful to understand the basics of Hadoop. Hadoop has two main components: the Hadoop Distributed File System (HDFS), which is the storage layer and MapReduce, which is the analytics and compute layer. Spark was developed as an alternative to MapReduce as an in-memory cluster processing platform that can deliver up to 100x faster response times compared to MapReduce for some applications. A key feature for Spark is that programs can load data into Spark’s cluster memory system and be queried repeatedly, making it an ideal platform for machine learning and artificial intelligence applications.
Kakfa is an Apache publish-subscribe messaging platform used in big data analytics programs. Kakfa is meant to serve as the “central data backbone” for programs and can handle hundreds of reads and writes per second from thousands of clients. “Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers,” according to the Apache Kafka description.
Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact, according to Apache.
While Cisco used Spark and Kafka for data processing and messaging, Tetration engineers also used Druid as a column-oriented distributed data storage system. “Druid provides low latency (real-time) data ingestion, flexible data exploration, and fast data aggregation,” according to Apache’s Druid description, noting that existing Druid deployments have scaled to managing trillions of events and petabytes of data.
While these were the major open source components Cisco used to create Tetration, the company also developed customized software to link it all together. “Some critical new components we wrote because there’s no equivalent in the open source domain or they have not been open sourced,” a Cisco spokesperson wrote in an email. The company indicated this was most of the case in the machine learning aspects of the product. “Naturally we need to keep private the smart algorithms where a lot of the magic and differentiation occur.”
Running all this software is a powerful combination of servers and switches. Cisco says Tetration comes with 36 1RU Cisco UCS C220 Rack Servers and three Cisco Nexus 9372PQ Switches to provide connectivity to the servers. Tetration uses smart memory and storage hierarchy management, along with DRAMs, flash storage and spinning disks to optimize the performance of real-time data and keep relevant existing data.