Hi,
May I ask a question about using AccumuloFeatureWriter for
ingesting data in Spark. The idea is to make
AccumuloFeatureWriter as singleton for each spark executor so
that it can be reused and closed at the end.
I am relatively new to Spark and my code (in Java) looks
like this:
dataRDD.foreachPartition(iterator -> {
// get or init writer as a static variable in
MyTool.class
AppendAccumuloFeatureWriter writer =
MyTool.getSingletonWriter(...);
SimpleFeature feature =
buildFeature(iterator.next());
while(iterator.hasNext()){
SimpleFeature toWrite =
FeatureUtils.copyToWriter(writer, feature, true);
dataRDD.foreachPartition(iterator -> {
// close writer if it hasn't been closed
MyTool.closeWriter()
}
Problem: A random sets of data may not be saved to Accumulo
in this way. I find a solution which is always do
writer.flush() after writer.write(), but that is
very inefficient.. I wanna ask if there are better ways to use
AccumuloFeatureWriter .. and what may cause this weird losing
data in writing. Thank you!
Regards
Yikai Gong