logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

mongoc_aggregate - Aggregation Framework Examples

Aggregations Using The Zip Codes Data Set

       Each document in this collection has the following form:

          {
            "_id" : "35004",
            "city" : "Acmar",
            "state" : "AL",
            "pop" : 6055,
            "loc" : [-86.51557, 33.584132]
          }

       In these documents:

       • The _id field holds the zipcode as a string.

       • The city field holds the city name.

       • The state field holds the two letter state abbreviation.

       • The pop field holds the population.

       • The loc field holds the location as a [latitude,longitude] array.

Author

       MongoDB, Inc

Average City Population By State

       To  get  the  first  three  states  with  the  greatest  average  population  per city, use the following
       aggregation:

          pipeline = BCON_NEW ("pipeline", "[",
             "{", "$group", "{", "_id", "{", "state", "$state", "city", "$city", "}", "pop", "{", "$sum", "$pop", "}", "}", "}",
             "{", "$group", "{", "_id", "$_id.state", "avg_city_pop", "{", "$avg", "$pop", "}", "}", "}",
             "{", "$sort", "{", "avg_city_pop", BCON_INT32 (-1), "}", "}",
             "{", "$limit", BCON_INT32 (3) "}",
          "]");

       This aggregate pipeline produces:

          { "_id" : "DC", "avg_city_pop" : 303450.0 }
          { "_id" : "FL", "avg_city_pop" : 27942.29805615551 }
          { "_id" : "CA", "avg_city_pop" : 27735.341099720412 }

       The above aggregation pipeline is build from three pipeline operators: $group, $sort and $limit.

       The first $group operator creates the following documents:

          { "_id" : { "state" : "WY", "city" : "Smoot" }, "pop" : 414 }

       Note, that the $group operator can't use nested documents except the _id field.

       The second $group uses these documents to create the following documents:

          { "_id" : "FL", "avg_city_pop" : 27942.29805615551 }

       These documents are sorted by the avg_city_pop field in descending order. Finally,  the  $limit  pipeline
       operator returns the first 3 documents from the sorted set.

Name

       mongoc_aggregate - Aggregation Framework Examples

       This  document  provides  a number of practical examples that display the capabilities of the aggregation
       framework.

       The AggregationsusingtheZipCodesDataSet examples uses a publicly available data set of all zipcodes
       and populations in the United States. These data are available at: zips.json.

Requirements

       Let's check if everything is installed.

       Use the following command to load zips.json data set into mongod instance:

          $ mongoimport --drop -d test -c zipcodes zips.json

       Let's use the MongoDB shell to verify that everything was imported successfully.

          $ mongo test
          connecting to: test
          > db.zipcodes.count()
          29467
          > db.zipcodes.findOne()
          {
                "_id" : "35004",
                "city" : "ACMAR",
                "loc" : [
                        -86.51557,
                        33.584132
                ],
                "pop" : 6055,
                "state" : "AL"
          }

States With Populations Over 10 Million

       To get all states with a population greater than 10 million, use the following aggregation pipeline:

       aggregation1.c

          #include <mongoc/mongoc.h>
          #include <stdio.h>

          static void
          print_pipeline (mongoc_collection_t *collection)
          {
             mongoc_cursor_t *cursor;
             bson_error_t error;
             const bson_t *doc;
             bson_t *pipeline;
             char *str;

             pipeline = BCON_NEW ("pipeline",
                                  "[",
                                  "{",
                                  "$group",
                                  "{",
                                  "_id",
                                  "$state",
                                  "total_pop",
                                  "{",
                                  "$sum",
                                  "$pop",
                                  "}",
                                  "}",
                                  "}",
                                  "{",
                                  "$match",
                                  "{",
                                  "total_pop",
                                  "{",
                                  "$gte",
                                  BCON_INT32 (10000000),
                                  "}",
                                  "}",
                                  "}",
                                  "]");

             cursor = mongoc_collection_aggregate (
                collection, MONGOC_QUERY_NONE, pipeline, NULL, NULL);

             while (mongoc_cursor_next (cursor, &doc)) {
                str = bson_as_canonical_extended_json (doc, NULL);
                printf ("%s\n", str);
                bson_free (str);
             }

             if (mongoc_cursor_error (cursor, &error)) {
                fprintf (stderr, "Cursor Failure: %s\n", error.message);
             }

             mongoc_cursor_destroy (cursor);
             bson_destroy (pipeline);
          }

          int
          main (int argc, char *argv[])
          {
             mongoc_client_t *client;
             mongoc_collection_t *collection;
             const char *uri_string =
                "mongodb://localhost:27017/?appname=aggregation-example";
             mongoc_uri_t *uri;
             bson_error_t error;

             mongoc_init ();

             uri = mongoc_uri_new_with_error (uri_string, &error);
             if (!uri) {
                fprintf (stderr,
                         "failed to parse URI: %s\n"
                         "error message:       %s\n",
                         uri_string,
                         error.message);
                return EXIT_FAILURE;
             }

             client = mongoc_client_new_from_uri (uri);
             if (!client) {
                return EXIT_FAILURE;
             }

             mongoc_client_set_error_api (client, 2);
             collection = mongoc_client_get_collection (client, "test", "zipcodes");

             print_pipeline (collection);

             mongoc_uri_destroy (uri);
             mongoc_collection_destroy (collection);
             mongoc_client_destroy (client);

             mongoc_cleanup ();

             return EXIT_SUCCESS;
          }

       You should see a result like the following:

          { "_id" : "PA", "total_pop" : 11881643 }
          { "_id" : "OH", "total_pop" : 10847115 }
          { "_id" : "NY", "total_pop" : 17990455 }
          { "_id" : "FL", "total_pop" : 12937284 }
          { "_id" : "TX", "total_pop" : 16986510 }
          { "_id" : "IL", "total_pop" : 11430472 }
          { "_id" : "CA", "total_pop" : 29760021 }

       The above aggregation pipeline is build from two pipeline operators: $group and $match.

       The $group pipeline operator requires _id field where we specify grouping; remaining fields  specify  how
       to  generate  composite  value  and  must  use one of the group aggregation functions: $addToSet, $first,
       $last, $max, $min, $avg, $push, $sum. The $match pipeline  operator  syntax  is  the  same  as  the  read
       operation query syntax.

       The $group process reads all documents and for each state it creates a separate document, for example:

          { "_id" : "WA", "total_pop" : 4866692 }

       The  total_pop field uses the $sum aggregation function to sum the values of all pop fields in the source
       documents.

       Documents created by $group are piped to the $match pipeline operator. It returns the documents with  the
       value of total_pop field greater than or equal to 10 million.

See Also