Query function in Google Spreadsheets

Most versatile effective function and unique to Google Spreadsheets. I am documenting this function to understand it properly and for those who do not have programming background. I do not have previous experience with query. i will try to illustrate with examples.

If you have confusion and stuck some where please post your questions Google Docs forum at the following link https://productforums.google.com/forum/#!categories/docs/spreadsheets. There are experts in this area waiting for your questions.

Google Spreadsheet query is designed to be similar to SQL  with few exceptions. it is a subset of  SQL with a few feature of its own. if you are familiar with SQL it will be easy to learn.

The Syntax of the function is as follow:

QUERY(data, query, headers)

DATA: it can be columns(A:C open ranges) of data you want to query, range of cells such as A1:C10, result of function such as importrange, index etc.,

QUERY: It is similar to SQL with small exceptions there is no FROM clause in the this since DATA itself is acting like a FROM clause.

HEADERS: If your data has headers in the row you can specify this here (suppose your first row has headers you can specify this as 1)

Notes:

  1. column headers have to capital letter such A, B, C if you are picking up the raw data with in the same spreadsheet.

  2. if you are using the array formulas to manipulate the data (Index, Filter, importrange to name a few) then column headers will Col1, Col2, Col3 etc. observe i have used C capital letter in Col1. this is syntax you have to follow this otherwise you will get a parse error

  3. parse error:  an error of language resulting from code that does not conform to the syntax of the programming language; “syntax errors can be recognized at compilation time”

  4. Data types: supports data types are string, number, boolean, date, datetime and timeof day. all values of the column will have a data type that matches the column type or a null value

  5. The syntax of the query language is composed of the following clauses. Each clause starts with one or two keywords. All clauses are optional. Clauses are separated by spaces. The order of the clauses must be as follows:

    Clause

    Useage

    select

    Select which columns to return, and in what order, if omitted, all the table’s columns are returned, in their default order

    where

    Return only rows that match a condition. if omitted, all rows are returned.

    group by

    Aggregates values across rows

    pivot

    Transforms distinct values in columns into new column

    order by

    sorts rows by values in columns

    limit

    Limits the number of returned rows

    offset

    skips a given number of first rows

    label

    sets column labels

    format

    formats the values in certain columns using given formatting patterns

    options

    sets additional options

    now let try to understand these clauses with an example

  6. our data set like this

    s

  7. try this formula

    =QUERY(A:D;”select A,B”;1) the result will be as show in the below image

  8. We have some special keywords called functions, Functions are bits of code that perform an operation on a value or values. The first we will see is to perform a mathematical operation on a column. We will see Sum function which by totaling the values in a column designated by parentheses.

    suppose we want to Sum Column C where the Column B is Nicole

    =QUERY(A:D;”select B, Sum(C) where B = ‘Nicole’ group by B”;1)

  9. the result will be

  10.  

    Note:

    (1) you have to use S capital in the Sum followed by column you want sum in parentheses

    (2) whenever you are using the group by clause same column has to be selected in the select clause otherwise you may get value error

    (3) condition you want the check in the column B has to be in single quote

    this can be sorted using the order by clause

    =QUERY(A:D;”select B, Sum(C) where B <>” group by B order by B asc”;1)

    the result will be

    Note:

    (1) observe the column 1 has been sorted in ascending order

    (2) to get the the descending order you can use the desc instead of asc

  11. you can also sort the based on the result of the Sum(C)

    =QUERY(A:D;”select B, Sum(C) where B <>” group by B order by Sum(C) desc”;1)

    the results will be

    Note:

    (1) we have sorted data by descending order based on Sum of sales

    (2) you have to follow the order of clause listed above

    We can also limit the results top 3 or top 2 using the limit clause

    =QUERY(A:D;”select B, Sum(C) where B <>” group by B order by Sum(C) desc limit 3″;1)

    We can also the name the column sum Sales as Top 3 sales

    =QUERY(A:D;”select B, Sum(C) where B <>” group by B order by Sum(C) desc limit 3 label Sum(C) ‘Top 3 Sales'”;1)

    We can also pivot the data based on date in the D column.

    =QUERY(A:D;”select B, Sum(C) where B <>” group by B pivot D”;1)

    Note:

    (1) You might have observed the column D is not selected the select clause

    (2) Date are formatted in yyyy-mm-dd format

    (3) pivot is unique to google Sheets Query function

  12. In the next blog post we will see how to manipulate data using the cell value

  13. the link the spreadsheet is as follows:
  14. https://docs.google.com/spreadsheets/d/12B3C3NPLcZ1Dpa0jNvF2ClPWvLJBznCdHif6–m334c/edit?usp=sharing

OFFSET FUNCTION

The OFFSET function returns a range of cells that is the specified number of rows and columns away from the specified base cell.

OFFSET(reference, rows, columns, height, width)

reference:  The address of the cell from which the offsets are measured.

rows: The number of rows from the reference cell to the target cell. rows is a number value. 0 means the target cell is in the same row as the base cell. A negative number means the target is in a row above the reference.

columns: The number of columns from the reference cell to the target cell. columns is a number value. 0 means the target cell is in the same column as the refernce cell. A negative number means the target is in a column to the left of the base.

height:  An optional value specifying the number of rows to return starting with the offset location. height  is a number value.

width:  An optional value specifying the number of columns to return starting with the offset location. width is a number value.

Usage Notes

OFFSET can return an array for use with another function. For example, assume you have entered into A1, A2, and A3, the base cell, the number of rows, and the number of columns, respectively, that you wish to have summed. The sum could be found using =SUM(OFFSET(INDIRECT(A1),0,0,A2,A3)).

lets see one question asked in the google docs forum:

Hello, I am a total spreadsheet n00b and I am banging my head on this problem: I have a huge amount of data I need to sum up. Every column has a few thousands rows, and I want to sum them up 16 by 16 in another cell. So, for example: A1:A16 summed in B1, A17:A32 summed in B2, and so on. Is there a clever and quick way of doing this, without manually changing a formula every time but just using one formula? I have tens of these columns to sum up, and it’s driving me kinda crazy, I tried searching for something similar to no avail.

the link to the blog is

https://groups.google.com/a/googleproductforums.com/forum/#!starred/docs/qgKeHI-nJ4A

the solution suggested by me in cell b1 is as follows:

=SUM(OFFSET($A$1,(ROW()-1)*16,,16))

and drag it down

we have taken the first argument reference as cell A1 as absolute reference.

In the second argument is rows we have intelligently put the Row() function, this function will first evaluate the row() function since we are in cell B1 the row function will return 1 We are subtracting 1 from row() function which will give us the value 0 and it is multiplied by 16 and it will result in 0 that means the rows offset is 0 for the first cell indirectly we are setting the cell start the reference cell to A1.

When we drag the formula to cell B2 the row() function gets the value of 2 and subtracting 1 from that will give us the value 1 and multiplying this with 16 will give the reference cell as A17.

The third argument is Columns which we have ignored because there is no column offset the in the question

The fourth argument is height  we have assumed the value 16, since in the question he want to total of sum of every 16 rows

The fifth argument is left since it is a optional one.

If we try to solve the same problem using the other formulas the logic will be complicated.