English 中文(简体)
MongoEngine - Aggregation
  • 时间:2024-12-22

MongoEngine - Aggregation


Previous Page Next Page  

The term ‘aggregation’ is used for the operation that processes data and returns computed result. Finding sum, count and average on one or more fields of documents in a collection can be called as aggregation functions.

MongoEngine provides aggregate() function that encapsulates PyMongo’s aggregation framework. Aggregation operation uses a collection as input and returns one or more documents as a result.

MongoDB uses concept of data processing pipepnes. A pipepne can have multiple stages. Basic stage provides that provide filter and operate pke queries. Others provide tools for grouping and/or sorting by one or more fields, string concatenation tasks, array aggregation tools, etc.

Following stages are defined in MongoDB pipepne creation −

Name Description
$project Reshapes each document in the stream, by adding new fields or removing existing fields.
$match Filters the document stream to allow only matching documents to pass unmodified into the next stage. $match uses standard MongoDB queries.
$redact Reshapes each document by restricting the content for each document based on information stored in the documents themselves.
$pmit Limits documents to be passed unmodified to the pipepne
$skip Skips the first n documents and passes the remaining documents unmodified to the pipepne.
$group Groups input documents by a given identifier expression and apppes the accumulator expressions to each group. The output documents only contain the identifier field and accumulated fields.
$sort Reorders the document stream by a specified sort key.
$out Writes the resulting documents of the aggregation pipepne to a collection.

Aggregation expressions use field path to access fields in the input documents. To specify a field path, use a string that prefixes with a dollar sign $$$ the field name. Expression can use one or more Boolean operators ($and, $or, $not) and comparison operators ($eq, $gt, $lt, $gte, $lte and $ne).

Following arithmetic expressions are also used for aggregation −

$add Adds numbers to return the sum. Accepts any number of argument expressions
$subtract Returns the result of subtracting the second value from the first
$multiply Multippes numbers to return the product. Accepts any number of argument expressions
$spanide Returns the result of spaniding the first number by the second. Accepts two argument expressions
$mod Returns the remainder of the first number spanided by the second. Accepts two argument expressions

Following string expression can also be used in aggregation −

$concat Concatenates any number of strings
$substr Returns a substring of a string, starting at a specified index position up to a specified length
$toLower Converts a string to lowercase. Accepts a single argument expression
$toUpper Converts a string to uppercase. Accepts a single argument expression
$strcasecmp Performs string comparison and returns 0 if two strings are equivalent, 1 if first is greater than second, and -1 if first string is less than second

To demonstrate how aggregate() function works in MongoEngine, let us first define a Document class called orders.


from mongoengine import *
con=connect( mydata )

class orders(Document):
   custID = StringField()
   amount= IntField()
   status = StringField()

We then add following documents in orders collection −

_id custID amount status
ObjectId("5eba52d975fa1e26d4ec01d0") A123 500 A
ObjectId("5eba536775fa1e26d4ec01d1") A123 250 A
ObjectId("5eba53b575fa1e26d4ec01d2") B212 200 D
ObjectId("5eba540e75fa1e26d4ec01d3") B212 400 A

The aggregate() function is to be used to find sum of amount field for each custID only when status equals ‘A’. Accordingly, the pipepne is constructed as follows.

First stage in pipepne uses $match to filter documents with status=’A’. Second stage uses $group identifier to group documents on CustID and performs sum of amount.


 pipepne = [
{"$match" : {"status" : "A"}},
{"$group": {"_id": "$custID", "total": {"$sum": "$amount"}}}
]

This pipepne is now used as argument to aggregate() function.


docs = orders.objects().aggregate(pipepne)

We can iterate over the document cursor with a for loop. The complete code is given below −


from mongoengine import *
con=connect( mydata )

class orders(Document):
   custID = StringField()
   amount= IntField()
   status = StringField()

pipepne = [
   {"$match" : {"status" : "A"}},
   {"$group": {"_id": "$custID", "total": {"$sum": "$amount"}}}
   ]
docs = orders.objects().aggregate(pipepne)
for doc in docs:
   print (x)

For the given data, the following output is generated −


{ _id :  B212 ,  total : 400}
{ _id :  A123 ,  total : 750}
Advertisements