apache spark - Recursive case class to DataFrame -
how make dataframe
given case class
references itself? take following:
case class testcase(id: long, parent: option[testcase])
if do:
val testcases = seq(testcase(1l, none), testcase(2l, some(testcase(1l, none)))).todf
it throws big, 'olde scalareflection
error. of course, do:
case class testcase(id: long, parentid: option[long])
but that's not want.
incidentally, avro
has no issue encoding , decoding recursive schemas. don't think i'm asking impossible. seems normal use-case handling parent-child relationships.
update
i can manually create schema, far can tell, have hard-code how far chain can go repeatedly nesting structtype
s. this:
val schema = structtype(array( structfield("id", longtype, false), structfield("parent", structtype(array( structfield("id", longtype, false), structfield("parent", structtype(array( structfield("id", longtype, false), structfield("parent", nulltype) ))) ))) ))
note last parent
chain, it's type nulltype
. above schema, following work:
df.select($"parent") df.select($"parent.parent") df.select($"parent.parent.parent")
based on schema above, first 2 might return null
, or parent. third 1 returns null
.
interestingly enough, create row
objects dataframe, have do:
val testcaseseq = seq[testcase](...) val df = sqlcontext.createdataframe( sc.parallelize(testcaseseq.map(tc => row(tc.id, tc.parent))), schema )
i guess more or less works. have figure out in advance how many levels of parent-child hierarchy support. kind of sucks. can better this?
Comments
Post a Comment