I have a projection stage as follows,
{ 'name': {$ifNull: [ '$invName', {} ]},, 'info.type': {$ifNull: [ '$invType', {} ]}, 'info.qty': {$ifNull: [ '$invQty', {} ]}, 'info.detailed.desc': {$ifNull: [ '$invDesc', {} ]} }
I am projecting empty object({}
) in case of a field not present, because if sorting is performed in a field and the field doesn’t exist, that document is coming first in sort order(Sort Documents Without Existing Field to End of Results). Next stage is sorting and wanted non-existing fields to come last in sorting order. This is working as expected.
Now, I want to remove those fields which are having empty object as values(if info.detailed.desc
is empty info.detailed
should not be there in output). I could do this in node level using lodash
like this(https://stackoverflow.com/a/38278831/6048928). But I am trying to do this in mongodb level. Is it possible? I tried $redact
, but it is filtering out entire document. Is is possible to PRUNE
or DESCEND
fields of a document based on value?
Advertisement
Answer
Removing properties completely from documents is not a trivial thing. The basics are that the server itself has not had any way of doing this prior to MongoDB 3.4 and the introduction of $replaceRoot
, which essentially allows an expression to be returned as the document context.
Even with that addition it’s somewhat impractical to do so without further features of $objectToArray
and $arrayToObject
as introduced in MongoDB 3.4.4. But to run through the cases.
Working with a quick sample
{ "_id" : ObjectId("59adff0aad465e105d91374c"), "a" : 1 } { "_id" : ObjectId("59adff0aad465e105d91374d"), "a" : {} }
Conditionally return root object
db.junk.aggregate([ { "$replaceRoot": { "newRoot": { "$cond": { "if": { "$ne": [ "$a", {} ] }, "then": "$$ROOT", "else": { "_id": "$_id" } } } }} ])
That’s a pretty simple principle and can in fact be applied to any nested property to remove it’s sub-keys but would require various levels of nesting $cond
or even $switch
to apply possible conditions. The $replaceRoot
of course is needed for “top level” removal since it’s the only way to conditionally express top level keys to return.
So whilst you can in theory use $cond
or $switch
to decide what to return, it’s generally cumbersome and you would want something more flexible.
Filter the Empty Objects
db.junk.aggregate([ { "$replaceRoot": { "newRoot": { "$arrayToObject": { "$filter": { "input": { "$objectToArray": "$$ROOT" }, "cond": { "$ne": [ "$$this.v", {} ] } } } } }} ])
This is where $objectToArray
and $arrayToObject
come into use. Instead of writing out the conditions for every possibly key we just convert the object contents into an “array” and apply $filter
on the array entries to decide what to keep.
The $objectToArray
translates any object into an array of documents representing each property as "k"
for the name of the key and "v"
for the value from that property. Since these are now accessible as “values”, then you can use methods like $filter
to inspect the each array entry and discard the unwanted ones.
Finally $arrayToObject
takes the “filtered” content and translates those "k"
and "v"
values back into property names and values as a resulting object. In this way, the “filter” conditions removes any properties from the result object that did not meet the criteria.
A Return to $cond
db.junk.aggregate([ { "$project": { "a": { "$cond": [{ "$eq": [ "$a", {} ] }, "$$REMOVE", "$a" ] } }} ])
MongoDB 3.6 introduces a new player with the $$REMOVE
constant. This is a new feature that can be applied with $cond
in order to decide whether or not to show the property at all. So that is another approach when of course the release is available.
In all those above cases the "a"
property is not returned when the value is the empty object that we wanted to test for removal.
{ "_id" : ObjectId("59adff0aad465e105d91374c"), "a" : 1 } { "_id" : ObjectId("59adff0aad465e105d91374d") }
More Complex Structures
Your specific ask here is for data containing nested properties. So continuing on from the outlined approaches we can work with demonstrating how that is done.
First some sample data:
{ "_id" : ObjectId("59ae03bdad465e105d913750"), "a" : 1, "info" : { "type" : 1, "qty" : 2, "detailed" : { "desc" : "this thing" } } } { "_id" : ObjectId("59ae03bdad465e105d913751"), "a" : 2, "info" : { "type" : 2, "qty" : 3, "detailed" : { "desc" : { } } } } { "_id" : ObjectId("59ae03bdad465e105d913752"), "a" : 3, "info" : { "type" : 3, "qty" : { }, "detailed" : { "desc" : { } } } } { "_id" : ObjectId("59ae03bdad465e105d913753"), "a" : 4, "info" : { "type" : { }, "qty" : { }, "detailed" : { "desc" : { } } } }
Applying the filter method
db.junk.aggregate([ { "$replaceRoot": { "newRoot": { "$arrayToObject": { "$filter": { "input": { "$concatArrays": [ { "$filter": { "input": { "$objectToArray": "$$ROOT" }, "cond": { "$ne": [ "$$this.k", "info" ] } }}, [ { "k": "info", "v": { "$arrayToObject": { "$filter": { "input": { "$objectToArray": "$info" }, "cond": { "$not": { "$or": [ { "$eq": [ "$$this.v", {} ] }, { "$eq": [ "$$this.v.desc", {} ] } ] } } } } } } ] ] }, "cond": { "$ne": [ "$$this.v", {} ] } } } } }} ])
This needs more complex handling because of the nested levels. In the main case here you need to look at the "info"
key here independently and remove any sub-properties that do not qualify first. Since you need to return “something”, we basically then need to remove the "info"
key itself when all of it’s inner properties are removed. This is the reason for the nested filter operations on each set of results.
Applying $cond with $$REMOVE
Where available this would at first seem a more logical choice, so it helps to look at this from the most simplified form first:
db.junk.aggregate([ { "$addFields": { "info.type": { "$cond": [ { "$eq": [ "$info.type", {} ] }, "$$REMOVE", "$info.type" ] }, "info.qty": { "$cond": [ { "$eq": [ "$info.qty", {} ] }, "$$REMOVE", "$info.qty" ] }, "info.detailed.desc": { "$cond": [ { "$eq": [ "$info.detailed.desc", {} ] }, "$$REMOVE", "$info.detailed.desc" ] } }} ])
But then you need to look at the output this actually produces:
/* 1 */ { "_id" : ObjectId("59ae03bdad465e105d913750"), "a" : 1.0, "info" : { "type" : 1.0, "qty" : 2.0, "detailed" : { "desc" : "this thing" } } } /* 2 */ { "_id" : ObjectId("59ae03bdad465e105d913751"), "a" : 2.0, "info" : { "type" : 2.0, "qty" : 3.0, "detailed" : {} } } /* 3 */ { "_id" : ObjectId("59ae03bdad465e105d913752"), "a" : 3.0, "info" : { "type" : 3.0, "detailed" : {} } } /* 4 */ { "_id" : ObjectId("59ae03bdad465e105d913753"), "a" : 4.0, "info" : { "detailed" : {} } }
Whilst the other keys are removed the "info.detailed"
still stays around because there is nothing that actually tests at this level. In fact you simply cannot express this in simple terms, so the only way to work around this is to evaluate the object as an expression and then apply additional filtering an conditions on each level of output to see where the empty objects still reside, and remove them:
db.junk.aggregate([ { "$addFields": { "info": { "$let": { "vars": { "info": { "$arrayToObject": { "$filter": { "input": { "$objectToArray": { "type": { "$cond": [ { "$eq": [ "$info.type", {} ] },"$$REMOVE", "$info.type" ] }, "qty": { "$cond": [ { "$eq": [ "$info.qty", {} ] },"$$REMOVE", "$info.qty" ] }, "detailed": { "desc": { "$cond": [ { "$eq": [ "$info.detailed.desc", {} ] },"$$REMOVE", "$info.detailed.desc" ] } } } }, "cond": { "$ne": [ "$$this.v", {} ] } } } } }, "in": { "$cond": [ { "$eq": [ "$$info", {} ] }, "$$REMOVE", "$$info" ] } } } }} ])
That approach as with the plain $filter
method actually removes “all” empty objects from the results:
/* 1 */ { "_id" : ObjectId("59ae03bdad465e105d913750"), "a" : 1.0, "info" : { "type" : 1.0, "qty" : 2.0, "detailed" : { "desc" : "this thing" } } } /* 2 */ { "_id" : ObjectId("59ae03bdad465e105d913751"), "a" : 2.0, "info" : { "type" : 2.0, "qty" : 3.0 } } /* 3 */ { "_id" : ObjectId("59ae03bdad465e105d913752"), "a" : 3.0, "info" : { "type" : 3.0 } } /* 4 */ { "_id" : ObjectId("59ae03bdad465e105d913753"), "a" : 4.0 }
Doing it all in Code
So everything here really depends on latest features or indeed “coming features” to be available in the MongoDB version you are using. Where these are not available the alternate approach is to simply remove the empty objects from the results returned by the cursor.
It’s often the most sane thing to do, and really is all you require unless the aggregation pipeline needs to continue past the point where the fields are being removed. Even then, you probably should be logically working around that and leave the final results to cursor processing.
As JavaScript for the shell you can use the following approach, and the principles essentially stay the same no matter which actual language implementation:
db.junk.find().map( d => { let info = Object.keys(d.info) .map( k => ({ k, v: d.info[k] })) .filter(e => !( typeof e.v === 'object' && ( Object.keys(e.v).length === 0 || Object.keys(e.v.desc).length === 0 ) )) .reduce((acc,curr) => Object.assign(acc,{ [curr.k]: curr.v }),{}); delete d.info; return Object.assign(d,(Object.keys(info).length !== 0) ? { info } : {}) })
Which is pretty much the native language way of stating the same as the examples above being that where one of the expected properties contains an empty object, remove that property from the output completely.