amazon web services - dynamodb table design for SET like scenario -


moving rdbms , not sure how best design below scenario

i have table around 200,000 questions question id partition key.

users view questions , not wish show viewed question again user. 1 better option?

  1. have table question id partition key , set of user ids attribute
  2. have table user id partition key , set of question ids have viewed attribute
  3. have table question id partition key , user id sort key. once user has viewed question, add row table

1 , 2 might have problem 400 kb size limit item. third seems better option though end 100 million items there 1 row per user per question viewed. assume not problem dynamo?

another problem how 10 random questions not viewed user. generate 10 random numbers between 1 , 200,000 (the number of questions) , check if not in table mentioned in point 3 above?

i not go option 1 or 2 reason mentioned: limiting scalability 400kb limit. uuid of 128 bits, limited 250 users per question.

option 3 way go dynamodb, need consider partition key , range key. have user_id partition key , question_id range key. answer decision depends on how data going accessed. dynamodb divides total table throughput each partition key: each 1 of n partition keys gets 1/nth of table throughput. example, if have subset of partition keys accessed more others, won't efficiently utilizing table throughput because partition keys use less 1/nth of throughput still provisioned 1/nth of throughput. general idea want have each of partition keys utilized equally. think have correct, i'm assuming each question given randomly , no more popular another, while users might more active others.

the other part of question little bit more difficult answer / determine. way have tables contain question , user pairs questions users have read or have tables contain pairs questions users haven't read. tradeoff here between initial write cost , subsequent read cost, , answer depends on amount of questions have compared consumption rate.

when have large amount of questions compared rate users progress through them, chances of randomly selecting chosen 1 small, you're going want store have-read question-user pairs. setup don't pay lot initialize user (you don't have write question-user pair each question) , won't have lot of miss-read costs (i.e. select question-user pair , turns out read it, still consumes read-write units).

if have small amount of questions compared rate users consume them, you're going want store haven't-read question-user pairs. pay initialize each user (writing in 1 question-user pair each question), don't have accidental miss-reads. if stored them have-read pairs when small amount of questions, encounter lot of miss-reads percentage of read questions approaches 100% (to point have been better off setting them haven't-read pairs).

i hope helps design considerations. drop comment if need clarification!


Comments

Popular posts from this blog

java - nested exception is org.hibernate.exception.SQLGrammarException: could not extract ResultSet Hibernate+SpringMVC -

sql - Postgresql tables exists, but getting "relation does not exist" when querying -

asp.net mvc - breakpoint on javascript in CSHTML? -