Append Results From A Query To The Same Result Row In Postgresql - Redshift

December 18, 2023 Post a Comment

I have a table, with 3 columns A, B , C - where A is not the primary key. We need to select the B, C pairs for each distinct A(group by A), and append the results at the end of the

Solution 1:

PostgreSQL

SELECT
  a,
  STRING_AGG('('|| c ||','|| b ||')', ' ; ')
FROM
  tbl
GROUPBY
  a;

Edit: For versions of PostgreSQL before 9.0 (when STRING_AGG was introduced) and even before 8.4 (when ARRAY_AGG was added) you can create your own custom aggregate function.

Edit 2: For versions before 8.0 (perhaps Amazon Redshift is based on PostgreSQL 7.4 somehow) the $$ syntax is not supported, so the function body needs to be enclosed in quotes, and quotes inside the body need to be escaped.

CREATE FUNCTION cut_semicolon(text) RETURNS textAS'
BEGIN
  RETURN SUBSTRING($1FROM4);
END;
' LANGUAGE 'plpgsql' IMMUTABLE;


CREATE FUNCTION concat_semicolon(text, text) RETURNS textAS'
BEGIN
  RETURN $1 || '' ; '' || $2;END;
' LANGUAGE 'plpgsql' IMMUTABLE;

CREATE AGGREGATE concat_semicolon(
  BASETYPE=text,
  SFUNC=concat_semicolon,
  STYPE=text,
  FINALFUNC=cut_semicolon,
  INITCOND=''
);

Then use that aggregate instead.

SELECT
  a,
  CONCAT_SEMICOLON('('|| c ||','|| b ||')')
FROM
  tbl
GROUPBY
  a;

MySQL

SELECT
  a,
  GROUP_CONCAT(CONCAT('(', c, ',', b, ')') SEPARATOR ' ; ')
FROM
  tbl
GROUPBY
  a;

Solution 2:

Unless you have a very specific reason to do this kind of stuff from within the DB itself, it ought to be done from within your app. Else, you'll end up with sets returning convoluted text fields that you might need to parse for post-processing and so forth.

In other words:

select A, B, C fromtable

And then, something like (Ruby):

res = {}
rows.each do |row|
  res[row['a']] ||= []
  res[row['a']][] = [row['b'], row['c']]
end

If you insist on doing it within Postgres, your options aren't many -- if any at all, in Redshift.

The array_agg() and string_agg() aggregate functions would both potentially be useful, but they were introduced in 8.4 and 9.0 respectively, and Redshift apparently supports neither.

Insofar as I'm aware, Redshift doesn't support array constructors, so using the ARRAY((select ...)) construct, which might have worked, flies out the window as well.

Returning something that makes use of the ROW() construct is not possible either. Even if it were, it would have been ugly as sin, and impossibly convoluted to manipulate from within Python.

A custom aggregate function seems out of the question if the other answer, and the leads it made you follow, are anything to go by. Which is unsurprising: the docs seem clear that you cannot create a user-defined function, let alone create a pl/language to begin with.

In other words, your only option insofar as I can tell is to do this type of aggregation from within your app. And that, incidentally, is the place where you should be doing this kind of stuff anyway.

Solution 3:

It is probably achievable in PostgreSQL. Especially if B and C are of the same type. you may produce two column result and aggregate data in second column using ARRAY, otherwise use JSON. I'm not sure how to produce it in MySQL, but probably there you would need to serialize to string, and invert it in Python.

Either way in my opinion proper answer is: don't do it. You'll get much less readable, hacky, unportable solution, which may not necessarily be a faster one. There's nothing wrong in some post processing of data in Python to give them final form and in fact it is quite a common practice. Especially if it is purely reformatting output and not used for producing aggregate results.

Solution 4:

Try this One to get

a1 | (c1,b1) ; (c2,b2);(c3;b3) 
a2 | (c2,b1) ; (c5,b2)

This is the code :

make temporary table, with running ID, that is example for SQL Server, you can try with another query

Selectidentity(int, 1, 1) as ID, A, '('+C+';'+B+')'as aa
Into #table2
From #tableOrderBY A, aa

Make query with Loop

Declare@sSqlasVarchar(1000), @AasVarchar(2), @A2asVarchar(2), @aaasVarchar(10)
Declare@iRecasint, @iLasintSet@iRec= (SelectCount(*) From #table2)
Set@iL=1Set@sSql=''

While @iL<=@iRecBeginSet@A= (Select A  From #table2 Where ID =@iL)
    Set@aa= (Select aa From #table2 Where ID =@iL)

    if @A=@A2BeginSet@sSql=Left(@sSql, Len(@sSql)-1)+';'+@aa+'`'EndElseBEGINSet@sSql=@sSql+' Union Select `'+@A+'`,`'+@aa+'`'ENDSet@A2=@ASet@iL=@iL+1EndSet@sSql=Right(@sSql, Len(@sSql)-7)
Set@sSql= Replace(@sSql, '`', '''')
Exec(@sSql)

is it work ?

Solution 5:

Came to the conclusion that it cant be solved in postgres+ Redshift stack. This is how I solved it.

import pandas as pd
df =pd.DataFrame({'A':[1,1,1,2,2,3,3,3],'B':['aaa','bbb','cc','gg','aaa','bbb','cc','gg']})

def f(x):
    return [x['B'].values]

#s=df.groupby('A').apply(f)
series =df.groupby('A').apply(f)
series.name = 'metric'
s=pd.DataFrame(series.reset_index())
print s

   A            metric
01[[aaa, bbb, cc]]12[[gg, aaa]]23[[bbb, cc, gg]]

Python Guru