Top 16 SQL Techniques Every Beginner Needs to Know

1. Incremental tables and combine 

streamlining table is important. It's important indeed. Ideal situation is when you have deals that are a PRIMARY key, unique integers and bus proliferation. Table update in this case is simple 

insert target_table (transaction_id)
select transaction_id from source_table where transaction_id > (select max(transaction_id) from target_table)

That isn't always the case when working with denormalized star- schema datasets in ultramodern data storages. you might be assigned to produce sessions with SQL and/ or incrementally update datasets with just a portion of data.transaction_id might not live but rather you'll have to deal with data model where unique crucial depends on the rearmosttransaction_id( or timestamp) known. For illustration,user_id inlast_online dataset depends on the rearmost known connection timestamp. In this case you would want to modernize being druggies and fit the new bones

combine and incremental updates 

You can use combine or you can resolve the operation into two conduct. One to modernize being records with new bones
and one to fit fully new bones
that do not exits( LEFT JOIN situation). 

combine is a statement that's generally used in relational databases. Google BigQuery MERGE Command is one of the Data Manipulation Language( DML) statements. It's frequently used to perform three main functions atomically in one single statement. These functions are UPDATE, INSERT, and cancel. 


  • UPDATE or cancel clause can be used when two or further data match. 
  • INSERT clause can be used when two or further data are different and don't match. 
  • The UPDATE or cancel clause can also be used when the given data doesn't match the source. 

This means that the Google BigQuery MERGE Command enables you to combine Google BigQuery data by streamlining, fitting , and deleting data from your Google BigQuery tables. 

create temp table last_online as (
    select 1 as user_id
    , timestamp('2000-10-01 00:00:01') as last_online
create temp table connection_data  (
  user_id int64
  ,timestamp timestamp
insert connection_data (user_id, timestamp)
    select 2 as user_id
    , timestamp_sub(current_timestamp(),interval 28 hour) as timestamp
union all
    select 1 as user_id
        , timestamp_sub(current_timestamp(),interval 28 hour) as timestamp
union all
    select 1 as user_id
        , timestamp_sub(current_timestamp(),interval 20 hour) as timestamp
union all
    select 1 as user_id
    , timestamp_sub(current_timestamp(),interval 1 hour) as timestamp

merge last_online t
using (
    , last_online
        ,   max(timestamp) as last_online

            date(_partitiontime) >= date_sub(current_date(), interval 1 day)
        group by

    ) y

) s
on t.user_id = s.user_id
when matched then
  update set last_online = s.last_online, user_id = s.user_id
when not matched then
  insert (last_online, user_id) values (last_online, user_id)
select * from last_online

Consider this SQL 

2. Counting words 

Doing UNNEST() and check if the word you need is in the list you need migth be useful in numerous situation, i.e. data storehouse sentiment analysis.

with titles as (
    select 'Title with word foo' as title union all
    select 'Title with word bar'
, data as (
    split(title, ' ') as words 
select * from data, unnest(words) words
    words in ('bar')

3. Using IF() statement outside of the SELECT statement 

This gives us an occasion to save some lines of law and be more eloquentcode-wise. typically you would want to put this into asub-query, and add a sludge in the where clause but you can do this rather 

with daily_revenue as (
      current_date() as dt
    , 100          as revenue
    union all
      date_sub(current_date(), interval 1 day) as dt
    , 100          as revenue
from daily_revenue
    if(revenue >101,1,0) = 1

Another illustration how NOT to use it with partitioned tables. Do not do this. This is bad illustration because since the matching table suffixes are presumably determined stoutly( grounded on commodity in your table) you'll be charged for a full table checkup. 

WHERE IF(condition,
         _TABLE_SUFFIX BETWEEN '20170101' AND '20170117',
         _TABLE_SUFFIX BETWEEN '20160101' AND '20160117')

You can also use it in HAVING clause and AGGREGATE functions. 

4. Using GROUP in ROLLUP 

The ROLLUP function is used to perform aggregation at multiple situations. This is useful when you have to work with dimension graphs. 

The following query returns the total credit spend per day by the sale type(is_gift) specified in the where clause, and it also shows the total spend for each day and the total spend by all the dates available. 

with data as (
    current_timestamp() as ts           
    ,'stage'            as context_type 
    ,1                  as user_id      
    ,100                as credit_value 
    , true              as is_gift
union all
    timestamp_sub(current_timestamp(), interval 24 hour) as ts           
    ,'user'             as context_type 
    ,1                  as user_id      
    ,200                as credit_value 
    ,false              as is_gift
union all
    timestamp_sub(current_timestamp(), interval 24*2 hour) as ts           
    ,'user'             as context_type 
    ,3                  as user_id      
    ,300                as credit_value 
    ,true               as is_gift

, results as (
     date(ts) as date 
    ,sum(credit_value)/100 as daily_credits_spend
from data    

group by rollup(1, context_type)
order by 1

  ,if(context_type is null, 'total', context_type) as context_type
from results
order by date

5. Convert table to JSON 

Imagine you're needed to convert your table into JSON object where each record is an element of nested array. This is whereto_json_string() function becomes useful 

with mytable as (
 select 1 as x, 'foo' as y, true as z union all
 select 2, 'bar', false
    concat("{", "\"MyTable\":", "[", string_agg(to_json_string(t), ","), "]", "}")
from mytable as t

also you can use it anywhere dates, selling tubes, indicators, histogram graphs,etc. 

6. Using PARTITION in 

Givenuser_id, date andtotal_cost columns. For EACH date, how do you show the total profit value for EACH client while keeping all the rows? You can achieve this like so 

    ,sum(total_cost) over (partition by date,user_id) as revenue_per_day
from production.payment_transaction

7. Moving average 

veritably frequently BI inventors are assigned to add a moving normal to their reports and fantastic dashboards. This might be 7, 14, 30 day/ month or even time Mama line graph. So how do we do it? 

with dates as (
    unnest(generate_date_array(date_sub(current_date(), interval 90 day), current_date(), interval 1 day)) as dt

, data as (
    select dt
        , CEIL(RAND()*1000) as revenue -- just some random data.
, revenue
, AVG(revenue) OVER(ORDER BY unix_date(dt) RANGE BETWEEN 6 PRECEDING AND CURRENT ROW) as seven_day_moving_average
from data

8. Date arrays 

Becomes really handy when you work with stoner retention or want to check some dataset for missing values, i.e. dates. BigQuery has a function calledGENERATE_DATE_ARRAY 

    unnest(generate_date_array('2019–12–04', '2020–09–17', interval 1 day)) as dt


This is useful to get commodity rearmost from your data, i.e. rearmost streamlined record,etc. or indeed to remove duplicates 

with reputation_data as (
      1     as user_id
    , 100   as reputation
    , 1     as reputation_level
    , timestamp_sub(current_timestamp(), interval 3 hour) as ts
union all
      1     as user_id
    , 101   as reputation
    , 1     as reputation_level
    , timestamp_sub(current_timestamp(), interval 2 hour)
union all
      1     as user_id
    , 200   as reputation
    , 2     as reputation_level
    , timestamp_sub(current_timestamp(), interval 1 hour)
select *
from reputation_data a
qualify row_number() over (partition by a.user_id order by a.ts desc) = 1
10. NTILE() 

Another numbering function. Really useful to cover effects like Login duration in seconds if you have a mobile app. For illustration, I've my App connected to Firebase and when druggies login I can see how long it took for them. 

This function divides the rows intoconstant_integer_expression pails grounded on row ordering and returns the 1- grounded pail number that's assigned to each row. The number of rows in the pails can differ by at most 1. The remainder values( the remainder of number of rows divided by pails) are distributed one for each pail, starting with pail 1. still, 0 or negative, an error is handed, Ifconstant_integer_expression evaluates to NULL. 

select (case when tile = 50 then 'median' when tile = 95 then '95%' else '5%' end) as tile
    , dt
    , max(cast( round(duration/1000) as numeric)/1000 ) max_duration_s
    , min(cast( round(duration/1000) as numeric)/1000 ) min_duration_s

from (
         trace_info.duration_us duration
        , ntile(100) over (partition by (date(event_timestamp)) order by trace_info.duration_us) tile
        , date(event_timestamp) dt

    from firebase_performance.my_mobile_app 
        date(_partitiontime) >= parse_date('%y%m%d', @ds_start_date) and date(_partitiontime) <= parse_date('%y%m%d', @ds_end_date)
        date(event_timestamp) >= parse_date('%y%m%d', @ds_start_date)
        date(event_timestamp) <= parse_date('%y%m%d', @ds_end_date)
    and lower(event_type) = "duration_trace"
    and lower(event_name) = 'logon'
) x
WHERE tile in (5, 50, 95)
group by dt, tile
order by dt

11. Rank/dense_rank 

They're also called numbering functions. I tend to useDENSE_RANK as dereliction ranking function as it does not skip the coming available ranking whereas RANK would. It returns successive rank values. You can use it with a partition which divides the results into distinct pails. Rows in each partition admit the same species if they've the same values. illustration 

with top_spenders as (
    select 1 as user_id, 100 as total_spend, 11   as reputation_level union all
    select 2 as user_id, 250 as total_spend, 11   as reputation_level union all
    select 3 as user_id, 250 as total_spend, 11   as reputation_level union all
    select 4 as user_id, 300 as total_spend, 11   as reputation_level union all
    select 11 as user_id, 1000 as total_spend, 22   as reputation_level union all
    select 22 as user_id, 1500 as total_spend, 22   as reputation_level union all
    select 33 as user_id, 1500 as total_spend, 22   as reputation_level union all
    select 44 as user_id, 2500 as total_spend, 22   as reputation_level 


    , rank() over(partition by reputation_level order by total_spend desc) as rank
    , dense_rank() over(partition by reputation_level order by total_spend desc) as dense_rank

Another illustration with product prices 

with products as (
        2                    as product_id      
        , 'premium_account'  as product_type    
        , 100                as total_cost   
    union all
        1                    as product_id      
        , 'premium_group'    as product_type
        , 200                as total_cost
    union all
        111                  as product_id      
        , 'bots'             as product_type    
        , 300                as total_cost      
    union all
        112                  as product_id      
        , 'bots'             as product_type    
        , 400                as total_cost      
    union all
        113                  as product_id      
        , 'bots'             as product_type    
        , 500                as total_cost      
    union all
        213                  as product_id      
        , 'bots'             as product_type    
        , 300                as total_cost      
select * from (
		, product_type
		, total_cost as product_price
		, dense_rank () over ( 
			partition by product_type
			order by total_cost desc
		) price_rank 
) t
where price_rank < 3

12. Pivot/ unpivot 

Pivot changes rows to columns. It's each it does. Unpivot does the contrary. 

select * from
  -- #1 from_item
     extract(month from dt) as mo         
  from (
        date(current_date()) as dt              
        , 'premium_account'  as product_type    
        , 100                as revenue   
    union all
        date_sub(current_date(), interval 1 month) as dt
        , 'premium_group'    as product_type
        , 200                as revenue
    union all
        date_sub(current_date(), interval 2 month) as dt
        , 'bots'             as product_type
        , 300                as revenue
  -- #2 aggregate
  avg(revenue) as avg_revenue_
  -- #3 pivot_column
  for product_type in ('premium_account', 'premium_group')


That is another useful function which helps to get a delta for each row against the first/ last value in that particular partition. 

with top_spenders as (
    select 1 as user_id, 100 as total_spend, 11   as reputation_level union all
    select 2 as user_id, 150 as total_spend, 11   as reputation_level union all
    select 3 as user_id, 250 as total_spend, 11   as reputation_level union all
    select 11 as user_id, 1000 as total_spend, 22   as reputation_level union all
    select 22 as user_id, 1500 as total_spend, 22   as reputation_level union all
    select 33 as user_id, 2500 as total_spend, 22   as reputation_level 

, data as (
    over (partition by reputation_level order by total_spend desc
    rows between unbounded preceding and unbounded following) as top_spend
  from top_spenders

    ,top_spend          as top_spend_by_rep_level
    ,total_spend - top_spend as delta_in_usd
from data

14. Convert a table into Array of structs and pass them to UDF 

This is useful when you need to apply a stoner defined function( UDF) with some complex sense to each row or a table. You can always consider your table as an array of TYPE STRUCT objects and also pass each one of them to UDF. It depends on your sense. For illustration, I use it to calculate purchase expire times 

                , user_id
                , product_type_id
                , product_id
                , item_count
                , days
                , expire_time_after_purchase
                , transaction_id 
                , purchase_created_at 
                , updated_at
            order by purchase_created_at
    ) AS processed

from new_batch

In a analogous way you can produce tables with no need to use UNION ALL. For illustration, I use it to mock some test data for unit tests. This way you can do it veritably presto just by using Alt Shift Down in your editor. 

select * from unnest([
            1                                 as user_id
        ,   111                               as reputation
        ,   timestamp('2021-12-16 13:00:01')  as update_time

            2                                 --as user_id
        ,   111                               --as reputation
        ,   timestamp('2011-12-16 13:00:01')  --as update_time

            3                                 --as user_id
        ,   111                               --as reputation
        ,    timestamp(format_timestamp("%Y-%m-%d 12:59:01 UTC" ,timestamp(date_sub(current_date(), interval 0 day))))   --as update_time
    ) as t

15. Creating event tubes using FOLLOWING AND UNBOUNDED FOLLOWING 

Good illustration might be selling tubes. Your dataset might contain continiously repeating events of the same type but immaculately you would want to chain each event with coming one of a different type. This might be useful when you need to get a list of commodity, i.e. events, purchases,etc. in order to make a tubes dataset. Working with PARTITION in it gives you the occasion to group all the follwoing events no matter how numerous of them exists ineach partition. 

with d as (
select * from unnest([
  struct('0003f' as user_pseudo_id, 12322175 as user_id, timestamp '2020-10-10 16:46:59.878 UTC' as event_timestamp, 'join_group' as event_name),
  ('0003',12,timestamp '2022-10-10 16:50:03.394 UTC','set_avatar'),
  ('0003',12,timestamp '2022-10-10 17:02:38.632 UTC','set_avatar'),
  ('0003',12,timestamp '2022-10-10 17:09:38.645 UTC','set_avatar'),
  ('0003',12,timestamp '2022-10-10 17:10:38.645 UTC','join_group'),
  ('0003',12,timestamp '2022-10-10 17:15:38.645 UTC','create_group'),
  ('0003',12,timestamp '2022-10-10 17:17:38.645 UTC','create_group'),
  ('0003',12,timestamp '2022-10-10 17:18:38.645 UTC','in_app_purchase'),
  ('0003',12,timestamp '2022-10-10 17:19:38.645 UTC','spend_virtual_currency'),
  ('0003',12,timestamp '2022-10-10 17:19:45.645 UTC','create_group'),
  ('0003',12,timestamp '2022-10-10 17:20:38.645 UTC','set_avatar')
  ) as t)

  , event_data as (
  , user_id
  , event_timestamp
  , event_name
              event_name AS event_name
            , event_timestamp AS event_timestamp
    OVER(PARTITION BY user_pseudo_id ORDER BY event_timestamp ROWS BETWEEN 1 FOLLOWING AND  UNBOUNDED FOLLOWING ) as next_events

DATE(event_timestamp) = "2022-10-10" 

  , user_id
  , event_timestamp
  , event_name
  , (SELECT 
        event_name FROM UNNEST(next_events) next_event
    WHERE t.event_name != event_name
    ORDER BY event_timestamp  LIMIT 1
    -- change to ORDER BY event_timestamp desc if prev event needed
  ) next_event
  , (SELECT 
        event_timestamp FROM UNNEST(next_events) next_event
    WHERE t.event_name != event_name
    ORDER BY event_timestamp  LIMIT 1
    -- change to ORDER BY event_timestamp desc if prev event needed
  ) next_event_ts

from event_data t

16. Regexp 

You would to use it if you need to prize commodity from unshaped data, i.e. fx rates, custom groupings,etc. 

Working with currency exchange rates using regexp 
Consider this illustration with exchange rates data 

-- One or more digits (\d+), optional period (\.?), zero or more digits (\d*).
with object as
(select  '{"aed":3.6732,"afn":78.45934,"all":110.586428}' as rates)

, data as (
select "usd" as base_currency,
  regexp_extract_all(rates, r'"[^"]+":\d+\.?\d*') as pair
from object
, splits as (
select base_currency, pair, split(pair, ':') positions 
from data cross join unnest (pair) as pair
select base_currency, pair,  positions[offset(0)] as rate_currency,  positions[offset(1)] as rate
from splits  

Working with App performances using regexp 
occasionally you might want to use regexp to get major, release or mod performances for your app and a produce a custom report 

with events as (
  select  'open_chat' as event_name, '10.1.0' as app_display_version union all
  select  'open_chat' as event_name, '10.1.9' as app_display_version union all
  select  'open_chat' as event_name, '9.1.4' as app_display_version union all
  select  'open_chat' as event_name, '9.0.0' as app_display_version
    ,REGEXP_EXTRACT(app_display_version, '^[^.^]*') main_version
    ,safe_cast(REGEXP_EXTRACT(app_display_version, '[0-9]+.[0-9]+') as float64) release_version
    ,safe_cast(REGEXP_EXTRACT(app_display_version, r"^[a-zA-Z0-9_.+-]+.[a-zA-Z0-9-]+\.([a-zA-Z0-9-.]+$)") as int64) as mod_version
from events


SQL is a important tool that helps to manipulate data. Hopefuly these SQL use cases from digital marketing will be useful for you. It's a handy skill indeed and can help you with numerous systems. These SQL particles made my life a lot easier and I use at work alomost every day. further, SQL and ultramodern data storages are rudiments tools for data wisdom. Its robust shoptalk features allow to model and fantasize data with ease. Because SQL is the language that data storages and business intelligence professionals use, it's an excellent selection if you want to partake data with them. It's the most common way to communicate with nearly every data storehouse/ lake result in the request. 

