I am using mongoexport to export some data into .json formatted file, however the document has a large size overhead introduced by _id:IDVALUE tuples.

I found a similar post Is there a way to retrieve data from MongoDB without the _id field? on how to omit the _id field when retrieving data from mongo, but not exporting. It is suggested to use: .Exclude("_id"). I tried to reqrite the --query parameter of mongoexport to somehow include the .Exclude("_id") parameter, but all of the attempts failed so far.

Please suggest what is the proper way of doing this, or should I revert to using some post-export techniques?

Thanks

Solution 1

There appears to be no way to exclude a field (such as _id) using mongoexport.

Here's an alternative that has worked for me on moderate sized databases:

mongo myserver/mydb --quiet --eval "db.mycoll.find({}, {_id:0}).forEach(printjson);" > out.txt

On a large database (many millions of records) it can take a while and running this will affect other operations people try to do on the system:

Solution 2

This works:

mongoexport --db db_name --collection collection_name | sed '/"_id":/s/"_id":[^,]*,//' > file_name.json

Solution 3

Pipe the output of mongoexport into jq and remove the _id field there.

mongoexport --uri=mongodb://localhost/mydb --collection=my_collection \
  | jq 'del(._id)'

Update: adding link to jq.

Solution 4

I know you specified you wanted to export in JSON but if you could substitute CSV data the native mongo export will work, and will be a lot faster than the above solutions

mongoexport --db <dbName> --collection <collectionName> --csv --fields "<fieldOne>,<fieldTwo>,<fieldThree>" > mongoex.csv

Solution 5

mongoexport doesn't seem to have such option.

With ramda-cli stripping the _id would look like:

mongoexport --db mydb --collection mycoll -f name,age | ramda 'omit ["_id"]'

Solution 6

I applied quux00's solution but forEach(printjson) prints MongoDB Extended JSON notation in the output (for instance "last_update" : NumberLong("1384715001000").

It will be better to use the following line instead:

db.mycoll.find({}, {_id:0}).forEach(function (doc) {

    print( JSON.stringify(doc) );
});

Solution 7

mongo <server>/<database> --quiet --eval "db.<collection>.find({}, {_id:0,<field>:1}).forEach(printjson);" > out.txt

If you have some query to execute change "" to '' and write your condition in find with "" like find("age":13).

Solution 8

The simplest way to exclude the sub-document information such as the "_id" is to export it as a csv, then use a tool to convert the csv into json.

Solution 9

mongoexport can not omit "_id"

sed is a powerful command to do it:

mongoexport --db mydb --collection mycoll -f name,age | sed '/"_id":/s/"_id":[^,]*,//'

The original answer is from Exclude _id field using MongoExport command

Solution 10

Just use --type=csv option in mongoexport command.

mongoexport --db=<db_name> --collection=<collection_name> --type=csv --field=<fields> --out=<Outfilename>.csv

For MongoDb version 3.4, you can use --noHeaderLine option in mongoexport command to exclude the field header in csv export too.

For Detail: https://docs.mongodb.com/manual/reference/program/mongoexport/

Solution 11

export into a file and just use replace empty value using Regular expression, in my case

"_id": "f5dc48e1-ed04-4ef9-943b-b1194a088b95"

I used "_id": "(\w|-)*",

Solution 12

Have you tried specifying your fields with the --fields flag? All fields that are not mentioned are excluded from the export.

For maintainability you can also write your fields into a seperate file and use --fieldFile.