https://expressjs.com/en/4x/api.html states
As req.body’s shape is based on user-controlled input, all properties and values in this object are untrusted and should be validated before trusting. For example, req.body.foo.toString() may fail in multiple ways, for example foo may not be there or may not be a string, and toString may not be a function and instead a string or other user-input.
I think I understand this (to a limited degree) in the context of prototype pollution reading https://github.com/expressjs/body-parser/issues/347 and https://learn.snyk.io/lessons/prototype-pollution/javascript/ (don’t perform certain operations on unvalidated objects that may pollute proto or constructor)
I take it that without such an operation being performed (the merge function in the Snyk example) req.body.foo.toString() cannot itself immediately contain remote / user supplied code? (that is for the toString() function to be completely arbitrary?).
Or have I misunderstood, and it would be necessary to Check if a variable is a string in JavaScript on req.body.foo before being safe to call its toString()?
Many thanks
Advertisement
Answer
Express will not allow custom code to sneak into req.body
. So, you don’t have to worry about malicious code in .toString()
method being on a particular property, but it is possible that an object in req.body
could have a custom property called .toString
that has a value (something other than code) as was illustrated in your learn.snky.io
example.
Keep in mind that req.body
in Express comes from parsing the body of the incoming request. That body will have been encoded according to the content-type. Most commonly, it will be applications/json
or application/x-www-form-urlencoded
and neither of these encodings has any way for the sender to include Javascript code that will become live code as part of the decoding. So, those content-types are safe from live code injection. Now, there could be code in a property as a string, but it won’t be live or executed unless you do something incorrect in how you use that data (like call eval()
on it).
And, you do NOT necessarily know what type a given property is on req.body
. You may be expecting the client to send a string for a property value, but it may be sending a number or an object (which both have different .toString()
implementations).
If you’re expecting the value to be a string, you can verify that it is a string by checking if (typeof req.body.foo === "string")
and, if it is a string, you don’t need to call .toString()
on it as it’s already a string.
Robust protection for your server will verify the type of each incoming property you intend to use in req.body
before attempting to use it and it will be very careful in how it copies information from req.body
to other objects because you can create prototype pollution if you use the wrong method for copying. And, to avoid any funky method replacements that might be on an object in req.body
, you can copy the single property to a fresh object and use it there.
My safe and simple rule is to first check the type and validate the value of any property I want to use and then copy individual named properties only from req.body
to my own objects. I never copy entire objects using functions that recursively copy all properties. That’s how you become vulnerable to prototype pollution, by blindly copying things from req.body
that aren’t the known properties you’re expecting.
If I want to copy 10 properties to another object, I will make an array of the 10 property names and use a loop to copy those 10 named properties individually, leaving behind any other potentially malicious properties (such as __proto__
that I don’t want). If I’m just using a couple properties, I will just manually assign them to a new object that I created. I will not use things like Object.assign()
to bulk copy from req.body
because that can and will copy things that I do not want polluting my own programming objects.