Personal tools
Skip to content. | Skip to navigation
Grafana is an open source, feature rich metrics dashboard and graph editor for Graphite, InfluxDB & OpenTSDB. Features Graphite Target Editor Graphite target expression parser Feature rich query composer Quickly add and edit functions & parameters Templated queries See it in action Graphing Fast rendering, even over large timespans Click and drag to zoom Multiple Y-axis, logarithmic scales Bars, Lines, Points Smart Y-axis formating Series toggles & color selector Legend values, and formatting options Grid thresholds, axis labels Annotations Any panel can be rendered to PNG (server side using phantomjs) Dashboards Create, edit, save & search dashboards Change column spans and row heights Drag and drop panels to rearrange Templating Scripted dashboards Dashboard playlists Time range controls Share snapshots publicly InfluxDB Use InfluxDB as a metric data source, annotation source Query editor with series and column typeahead, easy group by and function selection OpenTSDB Use as metric data source Query editor with metric name typeahead and tag filtering
Welcome to Heartbeat. This is a new EXPERIMENTAL beat for testing service availability using PING based on ICMP, TCP or higher level protocols.
InfluxDB is an open source distributed time series database with no external dependencies. It's useful for recording metrics, events, and performing analytics. It has a built-in HTTP API so you don't have to write any server side code to get up and running. InfluxDB is designed to be scalable, simple to install and manage, and fast to get data in and out. It aims to answer queries in real-time. That means every data point is indexed as it comes in and is immediately available in queries that should return in < 100ms.
Open source framework for processing, monitoring, and alerting on time series data
This project adds a basic high availability layer to InfluxDB. With the right architecture and disaster recovery processes, this achieves a highly available setup. The architecture is fairly simple and consists of a load balancer, two or more InfluxDB Relay processes and two or more InfluxDB processes. The load balancer should point UDP traffic and HTTP POST requests with the path /write to the two relays while pointing GET requests with the path /query to the two InfluxDB servers. Buffering The relay can be configured to buffer failed requests for HTTP backends. The intent of this logic is reduce the number of failures during short outages or periodic network issues. This retry logic is NOT sufficient for for long periods of downtime as all data is buffered in RAM Buffering has the following configuration options (configured per HTTP backend): buffer-size-mb -- An upper limit on how much point data to keep in memory (in MB) max-batch-kb -- A maximum size on the aggregated batches that will be submitted (in KB) max-delay-interval -- the max delay between retry attempts per backend. The initial retry delay is 500ms and is doubled after every failure. If the buffer is full then requests are dropped and an error is logged. If a requests makes it into the buffer it is retried until success. Retries are serialized to a single backend. In addition, writes will be aggregated and batched as long as the body of the request will be less than max-batch-kb If buffered requests succeed then there is no delay between subsequent attempts. If the relay stays alive the entire duration of a downed backend server without filling that server's allocated buffer, and the relay can stay online until the entire buffer is flushed, it would mean that no operator intervention would be required to "recover" the data. The data will simply be batched together and written out to the recovered server in the order it was received. NOTE: The limits for buffering are not hard limits on the memory usage of the application, and there will be additional overhead that would be much more challenging to account for. The limits listed are just for the amount of point line protocol (including any added timestamps, if applicable). Factors such as small incoming batch sizes and a smaller max batch size will increase the overhead in the buffer. There is also the general application memory overhead to account for. This means that a machine with 2GB of memory should not have buffers that sum up to almost 2GB. Recovery InfluxDB organizes its data on disk into logical blocks of time called shards. We can use this to create a hot recovery process with zero downtime. The length of time that shards represent in InfluxDB are typically 1 hour, 1 day, or 7 days, depending on the retention duration, but can be explicitly set when creating the retention policy. For the sake of our example, let's assume shard durations of 1 day. Let's say one of the InfluxDB servers goes down for an hour on 2016-03-10. Once midnight UTC rolls over, all InfluxDB processes are now writing data to the shard for 2016-03-11 and the file(s) for 2016-03-10 have gone cold for writes. We can then restore things using these steps: Tell the load balancer to stop sending query traffic to the server that was down (this should be done as soon as an outage is detected to prevent partial or inconsistent query returns.) Create backup of 2016-03-10 shard from a server that was up the entire day Restore the backup of the shard from the good server to the server that had downtime Tell the load balancer to resume sending queries to the previously downed server During this entire process the Relays should be sending current writes to all servers, including the one with downtime. Sharding It's possible to add another layer on top of this kind of setup to shard data. Depending on your needs you could shard on the measurement name or a specific tag like customer_id. The sharding layer would have to service both queries and writes. As this relay does not handle queries, it will not implement any sharding logic. Any sharding would have to be done externally to the relay. Caveats While influxdb-relay does provide some level of high availability, there are a few scenarios that need to be accounted for: influxdb-relay will not relay the /query endpoint, and this includes schema modification (create database, DROPs, etc). This means that databases must be created before points are written to the backends. Continuous queries will still only write their results locally. If a server goes down, the continuous query will have to be backfilled after the data has been recovered for that instance. Overwriting points is potentially unpredictable. For example, given servers A and B, if B is down, and point X is written (we'll call the value X1) just before B comes back online, that write is queued behind every other write that occurred while B was offline. Once B is back online, the first buffered write succeeds, and all new writes are now allowed to pass-through. At this point (before X1 is written to B), X is written again (with value X2 this time) to both A and B. When the relay reaches the end of B's buffered writes, it will write X (with value X1) to B... At this point A now has X2, but B has X1. It is probably best to avoid re-writing points (if possible). Otherwise, please be aware that overwriting the same field for a given point can lead to data differences. This could potentially be mitigated by waiting for the buffer to flush before opening writes back up to being passed-through.
Poll network devices via SNMP and save the data in InfluxDB (version 0.12.x) It uses github.com/paulstuart/snmputil for snmp processing, and therefore has the following functionality: SNMP versions 1, 2/2c, 3 Bulk polling of all tabular data Regexp filtering by name of resulting data Auto conversion of INTEGER and BIT formats to their named types Auto generating OID lookup for names (if net-snmp-utils is installed) Optional processing of counter data (deltas and differentials) Overide column aliases with custom labels Auto throttling of requests - never poll faster than device can respond influxsnmp uses a datafile of parsed MIB objects in order to use symbolic names and to do automated formatting of polled data. If a previously saved file is not available, it will generate and same one automatically. The resulting file of such actions may be quite large (all OIDs included). To create a MIB file of only the OIDs that will be used, run the following command: influxsnmp -dump -filter > mibFile.json As it is using snmptranslate to create the dump file, one can export MIBDIRS to point to the directories containing mib files
JRuby is a 100% Java implementation of the Ruby programming language. It is Ruby for the JVM. JRuby provides a complete set of core "builtin" classes and syntax for the Ruby language, as well as most of the Ruby Standard Libraries.
Metricbeat fetches a set of metrics on a predefined interval from the operating system and services such as Apache web server, Redis, and more.
Morgoth is a framework for flexible anomaly detection algorithms packaged to be used with Kapacitor Morgoth provides a framework via the for implementing the smaller pieces of an anomaly detection problem. The basic framework is that Morgoth maintains a dictionary of normal behaviors and compares new windows of data to the normal dictionary. If the new window of data is not found in the dictionary then it is considered anomalous. Morgoth uses algorithms, called fingerprinters, to compare windows of data to determine if they are similar. The Lossy Counting Algorithm(LCA) is used to maintain the dictionary of normal windows. The LCA is a space efficient algorithm that can account for drift in the normal dictionary, more on LCA below. Morgoth uses a consensus model where each fingerprinter votes for whether it thinks the current window is anomalous. If the total votes percentage is greater than a consensus threshold then the window is considered anomalous. Fingerprinters A fingerprinter is a method that can determine if a window of data is similar to a previous window of data. In effect the fingerprinters take fingerprints of the incoming data and can compare fingerprints of new data to see if they match. These fingerprinting algorithms provide the core of Morgoth as they are the means by which Morgoth determines if a new window of data is new or something already observed. An example fingerprinting algorithm is a sigma algorithm that computes the mean and standard deviation of a window and store them as the fingerprint for the window. When a new window arrives it compares the fingerprint (mean, stddev) of the new window to the previous window. If the windows are too far apart then they are not considered at match. By defining several fingerprinting algorithms Morgoth can decide whether new data is anomalous or normal. Lossy Counting Algorithm The LCA counts frequent items in a stream of data. It is lossy because to conserve space it will drop less frequent items. The result is that the algorithm will find frequent items but may loose track of less frequent items. More on the specific mathematical properties of the algorithm can be found below. There are two parameters to the algorithm, error tolerance (e) and minimum support (m). First e is in the range [0, 1] and is an error bound, interpreted as a percentage value. For example given and e = 0.01 (1%), items less the 1% frequent in the data set can be dropped. Decreasing e will require more space but will keep track of less frequent items. Increasing e will require less space but will loose track of less frequent items. Second m is in the range [0, 1] and is a minimum support such that items that are considered frequent have at least m% frequency. For example if m = 0.05 (5%) then if an item has a support less than 5% it is not considered frequent, aka normal. The minimum support becomes the threshold for when items are considered anomalous. Notice that m > e, this is so that we reduce the number of false positives. For example say we set e = 5% and m = 5%. If a normal behavior X, has a true frequency of 6% than based on variations in the true frequency, X might fall below 5% for a small interval and be dropped. This will cause X's frequency to be underestimated, which will cause it to be flagged as an anomaly, triggering a false positive. By setting e < m we have a buffer to help mitigate creating false positives. Properties The Lossy Counting algorithm has three properties: there are no false negatives, false positives are guaranteed to have a frequency of at least (m - e)*N, the frequency of an item can underestimated by at most e*N, where N is the number of items encountered. The space requirements for the algorithm are at most (1 / e) * log(e*N). It has also been show that if the item with low frequency are uniformly random than the space requirements are no more than 7 / e. This means that as Morgoth continues to processes windows of data its memory usage will grow as the log of the number of windows and can reach a stable upper bound.
Parse CSS and add vendor prefixes to CSS rules using values from the Can I Use website