apache spark - Recursive case class to DataFrame -


how make dataframe given case class references itself? take following:

case class testcase(id: long, parent: option[testcase]) 

if do:

val testcases = seq(testcase(1l, none), testcase(2l, some(testcase(1l, none)))).todf 

it throws big, 'olde scalareflection error. of course, do:

case class testcase(id: long, parentid: option[long]) 

but that's not want.

incidentally, avro has no issue encoding , decoding recursive schemas. don't think i'm asking impossible. seems normal use-case handling parent-child relationships.

update

i can manually create schema, far can tell, have hard-code how far chain can go repeatedly nesting structtypes. this:

val schema = structtype(array(   structfield("id", longtype, false),   structfield("parent", structtype(array(     structfield("id", longtype, false),     structfield("parent", structtype(array(       structfield("id", longtype, false),       structfield("parent", nulltype)     )))   ))) )) 

note last parent chain, it's type nulltype. above schema, following work:

df.select($"parent") df.select($"parent.parent") df.select($"parent.parent.parent") 

based on schema above, first 2 might return null, or parent. third 1 returns null.

interestingly enough, create row objects dataframe, have do:

val testcaseseq = seq[testcase](...) val df = sqlcontext.createdataframe(   sc.parallelize(testcaseseq.map(tc => row(tc.id, tc.parent))),   schema ) 

i guess more or less works. have figure out in advance how many levels of parent-child hierarchy support. kind of sucks. can better this?


Comments

Popular posts from this blog

java - nested exception is org.hibernate.exception.SQLGrammarException: could not extract ResultSet Hibernate+SpringMVC -

sql - Postgresql tables exists, but getting "relation does not exist" when querying -

asp.net mvc - breakpoint on javascript in CSHTML? -