BigQuery Reddit Comment Data Analysis -
bigquery - newbie
trying pair of users have both commented on top 10 subreddits , count of common subreddits on have commented using bigquery reddit data
i have started bq , beginner @ sql , finding hard query. can give me pointers started ?
never had real needs in playing reddit data below throwing @ least start seems noone willing.
quick logic:
step - 1: identify top 10 commented subreddits
select subreddit [fh-bigquery:reddit_comments.subr_rank_201505] order comments desc limit 10
step - 2: each subreddit identify [solid] users (with more 50 comments)
select author, subreddit, count(1) comments [fh-bigquery:reddit_comments.2016_01] subreddit in ( select subreddit [fh-bigquery:reddit_comments.subr_rank_201505] order comments desc limit 10) , author not in ('automoderator', '[deleted]') group author, subreddit having comments > 50
step - 3: each subreddit identify pair of common users (via join) step - 4: , finally, each pair of users count number of common subreddits
select usera, userb, count(1) subreddits ( select a.author usera, b.author userb, a.subreddit subreddit, ( select author, subreddit, count(1) comments [fh-bigquery:reddit_comments.2016_01] subreddit in (select subreddit [fh-bigquery:reddit_comments.subr_rank_201505] order comments desc limit 10) , author not in ('automoderator', '[deleted]') group author, subreddit having comments > 50 ) join ( select author, subreddit, count(1) comments [fh-bigquery:reddit_comments.2016_01] subreddit in (select subreddit [fh-bigquery:reddit_comments.subr_rank_201505] order comments desc limit 10) , author not in ('automoderator', '[deleted]') group author, subreddit having comments > 50 ) b on a.subreddit = b.subreddit a.author < b.author ) group usera, userb having subreddits > 3 order subreddits desc, usera, userb
hope helps
Comments
Post a Comment