scala - Spark ClosedChannelException exception during parquet write -
we have massive legacy sql table need extract data out of , pushing s3. below how i'm querying portion of data , writing output.
def writetableinparts(tablename: string, numidsperparquet: long, numpartitionsatatime: int, startfrom : long = -1, endto : long = -1, fileprefix : string = s3prefix) = { val minid : long = if (startfrom > 0) startfrom else findmincol(tablename, "id") val maxid : long = if (endto > 0) endto else findmaxcol(tablename, "id") (minid until maxid numidsperparquet).tolist.sliding(numpartitionsatatime, numpartitionsatatime).tolist.foreach(list => { list.map(start => { val end = math.min(start + numidsperparquet, maxid) sqlcontext.read.jdbc(mysqlconstr, s"(select * $tablename id >= ${start} , id < ${end}) tmptable", map[string, string]()) }).reduce((left, right) => { left.unionall(right) }) .write .parquet(s"${fileprefix}/$tablename/${list.head}-${list.last + numidsperparquet}") }) }
this has worked many different tables whatever reason table continues java.nio.channels.closedchannelexception
no matter how reduce scanning window or size.
based on this answer guess have exception somewhere in code i'm not sure rather simple code. how can further debug exception? logs didn't have helfpul , doen't reveal cause.
problem due below error, not spark related... cumbersome chase down spark isn't @ displaying errors. darn...
'0000-00-00 00:00:00' can not represented java.sql.timestamp error
Comments
Post a Comment