Time Series Database for high volumes of data: interview with InfluxDB

Stefano Marfella

Stefano Marfella, 5 marzo 2020 | Time Series Database InfluxDB

In the era of Industry 4.0, companies are dealing with high volumes of Time Series Data coming from their smart devices. In order to exploit this data and collect valuable insights for business decision-making and continous improvement, efficency and optimization, a reliable Time Series Database solution is the only choice possible.

Time series are simply measurements or events that are tracked, monitored, downsampled and aggregated over time. This could be server metrics, application performance monitoring, network data, sensor data, events, clicks, trades in a market and many other types of analytics data. All data points are plotted on a graph and one axis must be time. Time series data is everywhere as time is a component of everything that is observable. Every day there are more sensors and systems producing a constant stream of time series data. Time series data has numerous applications across various industries. 

Extra Red is the only Italian consulting partner for InfluxData, the creator of InfluxDB, the purpose-built time series database. We had a chat with Tim Hall, VP Products at InfluxData, to ask him a few questions about high-performance databases and the future of time series data.

"The promise of Industry 4.0 is clearly in front of us and Extra Red is well positioned within the Italian market to help organizations take advantage of new technologies, like InfluxDB, to capitalize on this transformation." - Tim Hall - VP, Products, InfluxData 

Out of which need was InfluxDB born, and what makes it different from previous time series database solutions?

InfluxDB is often compared to other databases. There are multiple types of databases that get pulled up for comparison. When comparing InfluxDB with other databases, there are some stark differences. First, those databases require a significant investment in developer time and code to recreate the functionality provided out-of-the-box with InfluxDB.

Specifically, developers will need to write code to share the data across the cluster, aggregate and downsampling functions, data eviction and lifecycle management, and summarization. Then, they’ll have to create an API to write and query their new service, write tools for data collection, introduce a real-time processing system and write code for monitoring and alerting. Finally, they’ll need to write a visualization engine to display the time series data to the user.

What makes InfluxDB unique?

InfluxDB is used by organizations to accumulate, analyze and act upon their time-stamped data. Users can get any data — metrics, events, logs and traces from everywhere (systems, sensors, queues, databases and networks) — and store in a high-performing server capable of ingesting millions of data points per second. Users are able to analyze all of their data across all data sets. By downsampling data, users can provide real-time analytics for better insights. Data stored in InfluxDB enables developers to act upon their data to set up automation, alerting and anomaly detection. 

We also built compression that was optimized for time series data. We organized the data in a way that would index tag data for efficient queries. At the database level, there were many optimizations we could get.

We’ve found that most users run into a common set of problems they need to solve — how to collect the data, how to store it, how to process and monitor it, and how to visualize it. We find that we can get better performance than more generalized databases while also reducing the developer’s effort to get a solution up by at least an order of magnitude.

Doing something that might have taken months to get running on Cassandra or MySQL could take as little as an afternoon using our platform. By focusing on time series, we can solve problems for developers so that they can focus on the code that creates unique value inside their app.

We’ve found that having a common API makes it easier for the community to build solutions around our stack. We have line protocol to simply represent time series data and InfluxDB 2.0 unifies an HTTP-based API for writing and querying, alerting and more. This means that over time, we can have pre-built components, which we call templates, for the most common use cases which work effectively across both our open source and Cloud editions. There are a number of time series databases used for many different purposes — some are established and some are recent projects. In all cases, these differ from InfluxData which offers an entire platform and not solely a time series database.

How important is the open source philosophy to InfluxDB?

We believe in open source and are committed to participating in and contributing to the open source community in meaningful ways. We built a complete Open Source Platform specifically for metrics and events and have seen community contributions in the form of over 250 Telegraf plugins and in 2020 so far, we’ve had community members open 350 issues and complete 380 commits. This is significant because it enriches and exercises the code base based on the experience and contributions of people outside of InfluxData. We see that this leads to higher quality software as the wide variety of use rapidly identifies defects and allows us to eliminate them. 

Why is time series data becoming increasingly important in the IT field?

Time series data has always been important in the IT field, which has been monitoring their infrastructure for years. What has changed is the significant increase in the number of things we need to monitor. Virtual Machines, containers, orchestrators, microservices, and the like have increased the number of metrics that are needed to ensure you are delivering the best user experience to your customers. 

The good news is that InfluxDB is a high-performance database written specifically for time-stamped data, including infrastructure monitoring, application metrics, IoT sensor data, and real-time analytics. You can conserve space by configuring InfluxDB to keep data for a defined length of time, and automatically expiring and deleting any unwanted data. InfluxDB is also open in terms of how you can visualize and work with the data.  While a native user interface is available, developers can easily build their own visualization solutions or use other technologies from dashboarding solutions to data science notebooks. A lot of solutions require a lot of engineering effort to match the features that are inherent in a time series database and often can require a lot of hardware to support.

Which industries benefit the most out of time-stamped data analysis and monitoring?

We find that a wide span of industries benefit from time-stamped data analysis and monitoring - including: finance, energy, telecommunications, technology, retail, etc. InfluxDB is trusted by hundreds of organizations in nearly every industry across a wide range of use cases, including networking monitoring, IoT monitoring, Industrial IoT and infrastructure and application monitoring.

Contact Extra Red for your Time Series Database needs

If you are interested in analysing your Time Series Data with an high performance Database such as InfluxDB, contact us for a free consultancy, we will be glad to help you.

Leave a comment