amazon web services - dynamodb table design for SET like scenario -
moving rdbms , not sure how best design below scenario
i have table around 200,000 questions question id partition key.
users view questions , not wish show viewed question again user. 1 better option?
- have table question id partition key , set of user ids attribute
- have table user id partition key , set of question ids have viewed attribute
- have table question id partition key , user id sort key. once user has viewed question, add row table
1 , 2 might have problem 400 kb size limit item. third seems better option though end 100 million items there 1 row per user per question viewed. assume not problem dynamo?
another problem how 10 random questions not viewed user. generate 10 random numbers between 1 , 200,000 (the number of questions) , check if not in table mentioned in point 3 above?
i not go option 1 or 2 reason mentioned: limiting scalability 400kb limit. uuid of 128 bits, limited 250 users per question.
option 3 way go dynamodb, need consider partition key , range key. have user_id partition key , question_id range key. answer decision depends on how data going accessed. dynamodb divides total table throughput each partition key: each 1 of n partition keys gets 1/nth of table throughput. example, if have subset of partition keys accessed more others, won't efficiently utilizing table throughput because partition keys use less 1/nth of throughput still provisioned 1/nth of throughput. general idea want have each of partition keys utilized equally. think have correct, i'm assuming each question given randomly , no more popular another, while users might more active others.
the other part of question little bit more difficult answer / determine. way have tables contain question , user pairs questions users have read or have tables contain pairs questions users haven't read. tradeoff here between initial write cost , subsequent read cost, , answer depends on amount of questions have compared consumption rate.
when have large amount of questions compared rate users progress through them, chances of randomly selecting chosen 1 small, you're going want store have-read question-user pairs. setup don't pay lot initialize user (you don't have write question-user pair each question) , won't have lot of miss-read costs (i.e. select question-user pair , turns out read it, still consumes read-write units).
if have small amount of questions compared rate users consume them, you're going want store haven't-read question-user pairs. pay initialize each user (writing in 1 question-user pair each question), don't have accidental miss-reads. if stored them have-read pairs when small amount of questions, encounter lot of miss-reads percentage of read questions approaches 100% (to point have been better off setting them haven't-read pairs).
i hope helps design considerations. drop comment if need clarification!
Comments
Post a Comment