In SQL you can generate statistical summaries of the records in a query using aggregation functions and clauses. In many cases, you will need to do some processing and transformations to your data before summarizing it.
Aggregation Functions
These functions operate on the values of a column (x) and return a value as shown in this table:
Aggregation function | Returns |
MIN(x) | minimum value in column x |
MAX(x) | maximum value in column x |
SUM(x) | sum of values in column x |
AVG(x) | average value in column x |
STDEV(x) | standard deviation of values in column x |
COUNT(x) | the number of values in column x |
The following queries will return a single value as result of the aggregation function:
SELECT AVG (FinalGrade)
FROM Registrations
WHERE CourseID = “BAMS 580”
SELECT COUNT (StudentID)
FROM Students
WHERE Program = “MMOR”
COUNT() Function
Depending on what information you are looking for, the Count function can be used in three different ways:
COUNT(x) | Returns the number of non-null values in column x. |
COUNT(*) | Returns the number of rows in the queried table. |
COUNT(DISTINCT x) | Returns the number of distinct values in column x. Note: this function is not available in MS Access |
This query will return the same value as the previous example (assuming the students table has no entries with null ID):
SELECT COUNT (*)
FROM Students
WHERE Program = “MMOR”
Aggregation Clauses
The GROUP BY clause divides rows into groups and apply aggregation operations to each group. For example, this query will return the average grade for each course and year:
SELECT CourseID, Year, AVG (FinalGrade)
FROM Registrations
GROUP BY CourseID, Year
The HAVING clause specifies a condition for groups displayed. This query will display the number of students only for courses after 2010.
SELECT CourseID, Year, COUNT (StudentID)
FROM Registrations
GROUP BY CourseID, Year
HAVING Year > 2010
Aggregation functions can be included in the having clause:
SELECT CourseID, Year, COUNT (StudentID)
FROM Registrations
GROUP BY CourseID, Year
HAVING COUNT (StudentID) > 10
Functions for data transformation
When summarizing in SQL you might need to do some conversions to the data you want to display. The IIF function returns one of two values depending on a logical condition.
SELECT StudentName, IIF( Program=’MMOR’, 1, 0) as MMORStudent
FROM Students
When summarizing it can be useful to transform categorical variables into dummy variables that can then be aggregated using math functions:
SELECT AVG( IIF( FinalGrade<50, 1, 0)) as Total_Students_Failed_BAMS580
FROM Registrations
WHERE CourseID=’BAMS580′
This webpage shows other examples of conversions you can do. Keep in mind that IIF function works only for Microsoft databases since 2012. A more portable function across SQL platforms is CASE expression (except for MS Access).
Another way to summarize data in SQL is the TRANSFORM statement, which allows to display values from specified fields or expressions as column headings.