Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] Is setting locality groups and group by queries possible in geomesa

Vaibhav,

In general, as you know, the problem of how to handle JOINs in
non-relational data is tricky.  GeoMesa is optimized for the
geo-temporal storage and retrieval, although there is good support for
secondary indexes.  That is why there are multiple index tables managed
by GeoMesa.  For the most part, GeoMesa-managed queries use
row-selection in Accumulo to remove the bulk of true-negative results
quickly, and use column families to differentiate the index types
(rather than to store user data directly).  Consequently, we typically
use the API rather than writing custom scans against GeoMesa-managed
tables.

Using GeoMesa and its API, there are a few different approaches you
might take, depending on the scale of your data and your response-time
requirements.  In the event that this is useful to you, you might
consider the following:

1.  You could flatten everything, and use hints in the SimpleFeatureType
to denote relative cardinalities / selectivities of the attributes you
most commonly include in your queries.  GeoMesa could provide secondary
indexes per attribute.

2.  You could not only flatten everything, you could add synthetic
fields that are the concatenation of your most common query
combinations, allowing GeoMesa to provide secondary indexes on those
synthetic fields.

3.  You could use GeoMesa strictly for the entities that have
geo-temporal data, and flatten the rest of your data into a separate
table that you manage.  At query time, partition your query into the
geo-temporal and non-geo-temporal components, let GeoMesa handle the
former while you handle the latter with whatever indexing you choose,
and then use a sort-merge join between the two result sets.

4.  Use the Spark SQL support that Emilio mentioned earlier.

5.  ... other approaches you can imagine...

If most of your queries are on combinations of many non-geo-temporal
attributes of mixed selectivity, then GeoMesa may not be the best choice
as an RDBMS replacement.  If most of your queries are geo-temporal, and
you can identify a few key attributes that are often used in conjunction
with the geo-time selectors, then GeoMesa may be a very good choice.  It
depends on your use case.

I hope this helps some.  If not, please just let us know.

Thanks!

Sincerely,
  -- Chris



On Thu, 2016-03-31 at 15:17 +0530, vaibhav.thapliyal wrote:
> Hi,
> 
> I have GPS data in postgres (relational). It has 26 tables which needs
> to be de-normalized before storing them in to Geomesa as it can not
> provide relational queries. we have to join/combine two or more tables
> to create one. This increases the number of columns in a table. Some
> attributes are mostly queried together. As we want to group attributes
> queried together we need to apply locality groups over column
> families.   For which I at-least need to know the names of column
> families of my attributes.
> 
> Thanks 
> Vaibhav
> 
> On 03/30/2016 06:54 PM, Emilio Lahr-Vivaz wrote:
> 
> > Hi Vaibhav,
> > 
> > 
> > GeoMesa will create certain locality groups based on your data, but
> > we don't provide any hooks to modify them, and in general I wouldn't
> > suggest it. Of course, 
> > you can always set up locality groups  through the Accumulo shell.
> > Do you have a particular use case in mind?
> > 
> > 
> > GeoTools CQL doesn't support group-by, but you can use the GeoMesa
> > Apache Spark integration to do so. You use an initial CQL filter
> > (which could be Filter.INCLUDE)
> > to select your features, then you can manipulate them directly. See
> > the examples here:
> > 
> > 
> > http://www.geomesa.org/documentation/tutorials/spark.html
> > 
> > 
> > Although currently it's not well documented, GeoMesa also provides
> > SparkSql integration which will let you mix SQL and CQL in your
> > queries. See:
> > 
> > 
> > https://github.com/locationtech/geomesa/blob/master/geomesa-compute/src/main/scala/org/locationtech/geomesa/compute/spark/sql/GeoMesaSparkSql.scala
> > 
> > 
> > Thanks,
> > 
> > 
> > Emilio
> > 
> > 
> > On Wed, 2016-03-30 at 17:42 +0530, vaibhav.thapliyal wrote:
> > > Hi everyone,
> > > 
> > > Is there a way to set up locality groups on geomesa database? And can we 
> > > use group by clause in doing queries that are not geospatial?
> > > 
> > > Thanks
> > > Vaibhav
> > > _______________________________________________
> > > geomesa-users mailing list
> > > geomesa-users@xxxxxxxxxxxxxxxx
> > > To change your delivery options, retrieve your password, or unsubscribe from this list, visit
> > > https://www.locationtech.org/mailman/listinfo/geomesa-users
> > 
> > 
> > _______________________________________________
> > geomesa-users mailing list
> > geomesa-users@xxxxxxxxxxxxxxxx
> > To change your delivery options, retrieve your password, or unsubscribe from this list, visit
> > https://www.locationtech.org/mailman/listinfo/geomesa-users
> 
> _______________________________________________
> geomesa-users mailing list
> geomesa-users@xxxxxxxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
> https://www.locationtech.org/mailman/listinfo/geomesa-users




Back to the top