What Time Is It?
You’re ingesting measures (not metrics. Measures aren’t Metrics — know the difference!), and, clearly, you’re time-stamping stuff, right?
Well, let’s think about that for a moment. What exactly does “time-stamping” mean? More to the point, what exactly are you time-stamping?
Well, let’s think about that for a moment. What exactly does “time-stamping” mean? More to the point, what exactly are you time-stamping?
Consider the following
- Creation Time: Theoretically, the time at which that data was created. Question is, whose wall-clock are we looking at? Unless this is a single monolith (and even there, you might be screwing up time measurements 😱), you’re looking at a distributed system, and, well, everything is relative at that point. (•)
- Ingestion Time: The time you received the data. Again, this is true for a given value of “you”, after all, “you” might be a distributed system too.
- Storage Time: The time when you actually stored the data that you ingested. And the reason you might care about this is because you might make, or have made, decisions based on data, and you need to know what data you relied on for that decision. For example, if the fire-department was called, you’d like to know which data you had in the system at the time the call was made.
- Outliers: There is going to be data that you ignore. Oh, not “ignore” in the sense of “I never look at the charts anyhow”, this is more “ignore” in the sense of “This data is clearly an outlier, and we should ignore it”. The problem, of course is (a) knowing that it is an outlier, and (b) keeping track of it in case it isn’t an outlier. Either way, you need to know whenthese data occured.
Each of the above is relevant in and as of itself, and each of them is something that you will be either creating metrics from (Latency!), or using as a baseline for other metrics (Time-Frames! Peak throughput!).
The bottom line here — understand what “time-stamping” actually refers to under the hood.
(•) “We don’t need vector clocks, we’ve got ntp” (••): This answer might be good for some things (e.g. synchronizing your AWS instances), but is definitely wrong for most everything else, especially once you start talking about distributed systems.
(••) Yup, actual quote from a #CowboyDeveloper I once knew.
Comments