Let’s say we have an sql table of jobs. They can be in any of several states. They have a starte time and a completed time. And they have have a type of A or B.
type | started | completed | state |
---|---|---|---|
A | 9:01am | 9:02am | completed |
A | 9:10am | 9:14am | aborted |
B | 9:20am | 9:21am | cancelled |
A | 9:20am | 9:22am | completed |
Let’s say we want to know for A and B, what percentage of jobs complete. We’d also like to know what percentage of jobs finish under a certain time threshold. Now this is drastically simplified, but you can imagine that job type could be any complex set of variables you want to analyze by: region, language, type of rendering, whatever.
Here’s a query that will calculate the ratio of completed jobs by type:
select type,
avg(cast(
case
when state='completed' then 1
else 0
end
as float)
) as completed_ratio
from jobs
group by type
The way it works:
- The case statement sets of value of 1 for the success condition, and 0 for any other.
- These are integers, so we cast them as float, so we’ll get the averaging behavior we want.
- We avg them, to get the average.
Similarly, if we wanted to calculate the ratio of jobs completed under a certain time threshold, we could do that in the case statement, such as this example which calculates the ratio of jobs whose end time is under 3 minutes:
select type,
avg(cast(
case
when datediff(minute, completed, started) < 3 then 1
else 0
end
as float)
) as completed_ratio
from jobs
group by type