Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[geomesa-users] Ingest data through Spark

Hi,

May I ask a question about using AccumuloFeatureWriter for ingesting data in Spark. The idea is to make AccumuloFeatureWriter as singleton for each spark executor so that it can be reused and closed at the end.

I am relatively new to Spark and my code (in Java) looks like this:

dataRDD.foreachPartition(iterator -> {
    // get or init writer as a static variable in MyTool.class
    AppendAccumuloFeatureWriter writer = MyTool.getSingletonWriter(...);
    SimpleFeature feature = buildFeature(iterator.next());
    while(iterator.hasNext()){
        SimpleFeature toWrite = FeatureUtils.copyToWriter(writer, feature, true);
        writer.write();
    }
});

dataRDD.foreachPartition(iterator -> {
    // close writer if it hasn't been closed
    MyTool.closeWriter()
}


Problem: A random sets of data may not be saved to Accumulo in this way. I find a solution which is always do writer.flush() after writer.write(), but that is very inefficient.. I wanna ask if there are better ways to use AccumuloFeatureWriter .. and what may cause this weird losing data in writing. Thank you!

Regards
Yikai Gong

Back to the top