monitor, binary, binary system

How to Hash Data Without Errors (part 3)

Posts in this series:

  1. Introduction to Hash Functions
  2. The Principles of Hashing (in Python)
  3. Hash Functions for Ethereum Developers

A few weeks ago, I started a series on hash functions, and how to avoid crucial pitfalls when using them. In the last post, I used Python as an example to cover all the essentials. In this installment, I bring the same issues to the realm of smart contracts, and show how to compute hashes in Solidity and in Truffle / Node.js, so that you can properly test your smart contracts.

The Need for Hash Functions

I have talked about the possible uses of hash functions in the first post of this series, but they seem to crop up even more often when coding smart contracts. This is possibly due to the limitations of Solidity itself, which encourage programmers to be creative in their workarounds. A fine example of this is string comparison. There is no compact way to natively compare two strings in Solidity. The default solution is to convert the strings to byte arrays and iterate through them to find any difference.

A de facto way of doing the same task, instead, is to hash both strings and obtain, again, two byte-arrays and test the result. This is advantageous when you have long strings, since the result will always be at most 32 bytes long (currently, Solidity supports 160- and 256-bit long hashes only). Not only would it be quicker to iterate through the arrays, but it is also possible to convert them into uints and do a native integer comparison.

Another reason to use hashes is to simulate random numbers. There is not a function for that in Solidity, and indeed the whole business of using random numbers for sensitive purposes is a difficult problem, due to the ability of miners to select which transactions they put forward in a block candidate. But for the cases where this is not a concern, then we can use a hash function to simulate randomness. This is similar to the Random Oracle Model in cryptography, where we pretend a Hash Function is just a random function.

Hash Functions in Solidity

Solidity makes 3 hash functions available to the programmer: the only native one is Keccak 256, which has its own EVM opcode. The other two, SHA-2 256 and RIPEMD160, are available only through precompiled contracts. However, it is equally easy to use them as they can all be invoked by their corresponding Solidity function:

keccak256()
sha256()
ripemd()

All of them expect an argument of type bytes memory, and return an array of bytes32 (Keccak and SHA) or bytes20
(RIPEMD). This means that we cannot invoke the function without any argument, as we do in Python, to compute the hash of an empty string (remember this should be the first test, to ensure we are using the right function). Since it is not possible to pass no argument, let’s try the next thing, which is to pass an empty array.

// Define pre-image
bytes b;
keccak256(b);
0xc5d2460186f7233c927e7db2dcc703c0e500b653ca82273b7bfad8045d85a470
sha256(b);
0xe3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
ripemd160(b);
0x9c1185a5c5e9fc54612808977ee8f548b2258d31

These values can be easily verified online. Notice that the function keccak256() here is not SHA3. Indeed, there used to be such a function sha3() before, which was simply an alias for keccak256().As I have covered in the last post, this is not the same as the standardized SHA3, and so that misleading name was removed in Solidity 0.5.0.

Strings and Byte Arrays

Earlier in this series, I used the string "abc" as an example. Python could not handle it without converting to bytes first, and neither can Solidity. The corresponding encoding is [97, 98, 99]. Let’s try that:

bytes b = [97, 98, 99];
Error

Solidity does not allow this syntax. It considers the array to have three elements of type uint8, which cannot be mapped to an array of byte. Strange, because an integer of 8 bits is a byte. Let’s try another way:

bytes b;
b.push(97);
Error

This gives an even more obscure error: it complains that it can’t convert from int_const, an integer constant, to bytes1, a single byte. We can go the round-about way, for example like this:

b.push(byte(uint8(97)));

but it is so long-winded I can’t help but feel there must be a better way. There is a shorter one, but you have to make some work first: enter the number in hexadecimal.

bytes b;
b.push(0x61);
b.push(0x62);
b.push(0x63);
sha256(b);
0xba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
keccak256(b);
0x4e03657aea45a94fc7d47ba826c8d667c0d1e6e33a64a036ec44f58fa12d6c45

Note that 0x61 is a literal representing the value 97. Solidity accepts byte b = 0x61, but not byte b = 97. For good measure, it does not accept byte b = 0x0061 either, although all of these represent the same number. The reason is that a number has innumerable representations by padding it with zero bytes to its left. But although quantitatively those 0s don’t change a value’s number, if we look at them as data the result is different. A binary string 00000010 is different from a string 10, and will have a different hash.

Therefore, when converting a number to bytes, Solidity does not presume to predict how many bytes the programmer wants. Unless, that is, we specify the number in hexadecimal and the number of hex digits matches the size of the elements of the target’s type. In the above, each position in the array b takes exactly 1 byte, that is 8 bits. The representation 0x61 stands for exactly 8 bits and so is legal, but the representation 0x0061 suggests 16 bits, and so is not.

ABI Encoding

We now know how to pass bytes directly to a hash function, but I admit it is not very convenient when we actually want to hash a particular string. Is there an alternative? After all, in Solidity there is a very close relationship between the types string and bytes. If we try to write this directly:

string s = "abc";
sha256(s);
Error

we’ll find that we receive instead an error because it is not possible to convert string to bytes. Rather helpfully, the error message suggests we use instead the function abi.encodePacked() to first transform our data into a suitable byte-array. Let’s try that:

string s = "abc";
sha256(abi.encodePacked(s));
0xba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad

This is a convenient way to turn a string into bytes, but keep in mind that, unlike Python, this does not give us any chance of choosing the charset. This encoding is always UTF-8.

The above function gives us the ABI Encoding of a string. ABI stands for Application Binary Interface, and specifies how Ethereum smart contract applications can make their components inter-communicate. It specifies a format to serialize a call to a specific function and their arguments so that contracts can be invoked both from other contracts and from code sitting outside the blockchain. Therefore, this encoding must be non-ambiguous, so that arguments passed from one part of the application may be uniquely reconstructed in the receiver component. The specification is long, but I leave it to you to check the details.

Hashing Numbers

The next step to this investigation is how to encode numbers. This is, of course, also supported by the ABI. There are two things we have to specify when encoding integers: how many bits we want and the endianness. Let’s see how we can use it:

abi.encodePacked(5);
TypeError: Cannot perform packed encoding for a literal. Please convert it to an explicit type first.

We cannot pass an integer literal because this does not have any type information attached to it.  ‘5’ could represent a single byte, a 32-bit C-like integer or even a full-fledged Ethereum uint. It could also be implicitly signed, of course, and so the compiler does not know what sort of byte representation we want. We have to be more specific. One way to do so is to first cast it to a type, either by assigning that value to a variable or casting it before encoding. I’ll use a simple contract to provide some examples:

Contract code:

pragma solidity ^0.5.0;
contract Test {
function encodeInt8(int8 n) public pure returns (bytes memory) {
return abi.encodePacked(n);
}
function encodeUint8(uint8 n) public pure returns (bytes memory) {
return abi.encodePacked(n);
}
function encodeInt32(int32 n) public pure returns (bytes memory) {
return abi.encodePacked(n);
}
function encodeUint32(uint32 n) public pure returns (bytes memory) {
return abi.encodePacked(n);
}
function encodeInt(int n) public pure returns (bytes memory) {
return abi.encodePacked(n);
}
function encodeUint(uint n) public pure returns (bytes memory) {
return abi.encodePacked(n);
}
}

Then,

encodeUint32(5)
'0x00000005'
encodeUint(5)
'0x0000000000000000000000000000000000000000000000000000000000000005'

This shows that the ABI encoding of integers is big-endian.

encodeUint8(-16)
'0xf0'

And this shows how a negative number is turned into its 2’s complement representation.

In Big-Endian format, the first value in the array is the most-significant bit of the number, or its big end. It’s how we naturally write numbers, from left to right, we the units (the least significant digit) occupying the rightmost position. Little-endian notation may be more familiar to programmers. In this notation, the least significant bit, corresponding to the power of index 0, is in position 0 of the array and in general, the ith bit is in position i of the array, which leads to easier code if we have to do mathematical bit manipulation.

Coercing Values to Types

In the following examples, I invoke a contract method that defines the type of the variable I want to encode. This is first encoded on the calling site (eg Truffle console) and then copied into the argument of the method. There is a bit of a cheat going on here, though. In the contract, all the methods expect to receive an argument of a well-determined type. This absolutely fixes the number of bits that it can represent, but as this example shows, that is not strictly respected when calling functions from truffle (ultimately, from JavaScript).

The function arguments are first ABI encoded into a sequence of bytes, which are then serialized and passed with the contract invocation. They are then loaded into the stack to be passed into the function, taking just as many bits as the argument type allows. This is the same as implicitly calling a conversion function, as in the following method:

int8 a = int8(300);
int8 b = int8(-100);
uint8 c = uint8(-100);
function testConversions() public returns (bytes memory, bytes memory, bytes memory, int8, int8, uint8) {
   int8 a = int8(300);
   int8 b = int8(-100);
   uint8 c = uint8(-100);
 
   bytes memory b1 = encodeInt8(a);
   bytes memory b2 = encodeInt8(b);
   bytes memory b3 = encodeUint8(c);
      
   return (b1, b2, b3, a, b ,c);
}

When I invoke this method, I get this:

testConversions()
[0x2c,0x9c,0x9c,44,-100,156]

Notice that the last three elements of the result have the exact same data as the first three, represented differently. The first trio is in essence a hexadecimal string (without the quotes) where each two digits specify a single unsigned byte (try it with int32 where this effect will be more evident). The second trio, on the other hand, is a decimal interpretation of those bytes, done according to the type of the corresponding variable. It is only here that the signed/unsigned variant of each type has an impact.

When I try to convert -100 to bytes, I always get the same value, 0x9c independently of whether I cast the argument as a signed or unsigned byte. But this value is interpreted either as -100 or 156, depending on whether the rightmost bit should be interpreted as the signal bit. If the type is unsigned, then any value should be treated as a positive number and so there is no signal bit — this gives us 156. Otherwise, the most significant bit indicates the number is negative, and so we receive -100 instead.

Another Look at Negative Numbers

Conversion and interpretation of negative numbers in binary is a matter for another post, so let me just conclude this section with an observation: -100 and 156 are the same number modulo 28 = 256. That is, they are separated by a multiple of 256.  If we had used 32 bits instead, and the value -1000, we would still have gotten -1000 as one of the results, and 4294966296 as the other. Again, they are separated by the power of 2 corresponding to the number of bits in the representation, that is, 232.

This is not a coincidence. When converting a sequence of bytes to just n bits, only the n least significant bits are used, and the rest discarded. Mathematically, if we start with a value v the result is only v mod 2n. At the time of converting to a decimal, if the type is unsigned the number is represented as belonging in the set [0, 2n - 1]. Otherwise, it is represented as being a member of the set [-2n-1, 2n-1 - 1]. For example, with 8 bits, we can either represent  integers from 0 to 255, or from -128 to 127.

At any rate, though, their bit representation is exactly the same, as shown by the hexadecimal strings above.

Hashing multiple values

The previous two sections showed that we can use ABI encoding to convert strings and numbers to bytes. In fact, we can do that to any data structure, and we can combine several of these in the same call. Imagine we have a shop selling various products of different lines. We have devised a scheme of unique identifiers for each article we sell, which is composed by one or two numbers (category and optional sub-category) and a string (brand and model). We want to be as economical as possible, and so we use as few bytes as possible to encode the each of these elements. We can use the function abi.encodePacked to discard as many redundant bytes as possible, for example:

bytes32  r1 = sha256(abi.encodePacked(int16(356), "AB"));
bytes32 r2 = sha256(abi.encodePacked(int8(1), int8(100), "AB"));
bytes32 r3 = sha256(abi.encodePacked(int8(1), "dAB"));

0xc74d7c150304b8ac70727aa0b5ab680c2a582e93d0e2558faaf560ec36de277c
0xc74d7c150304b8ac70727aa0b5ab680c2a582e93d0e2558faaf560ec36de277c
0xc74d7c150304b8ac70727aa0b5ab680c2a582e93d0e2558faaf560ec36de277c

Notice what I did here. I created a collision on the hash without much trouble, for three different encodings:

356-AB
1-100-AB
1-dAB

A Bit of Coding Theory

No need to panic, this is not a vulnerability of the hash function. No, the problem is that I deliberately set the arguments in such a way that they encode to the same bytes. What this is, after all, is a problem in my encoding.

Although each individual piece of encoded data is uniquely decodable, there are chains of codewords that result in the same encoding. This is an error we must avoid. As an illustration, suppose we have the following simple code:

0 --> 0
1 --> 01
2 --> 10
3 --> 11
4 --> 110
5 --> 111

Each word can be uniquely encoded, but the code itself is ambiguous. For example, the following string can be decoded in different ways:

10110010110:
------------
0 110 01 01 10: 24112
10 11 0 0 10 110: 230024

The ABI Encoding does ensure that an encoded text is uniquely decodable. After all, you wouldn’t like the array [2,4,1,1,2] turning into [2,3,0,0,2,4] when invoking a smart contract. To access that encoding,  you can use abi.encode() instead of abi.encodePacked(). The problem, of course, is that now if you want your hashes to work cross-platform you have to reproduce this specific encoding elsewhere as well. It’s either that or you define your own encoding, but at the end of the day, remember that it is your responsability to enforce unique-decodability to avoid collisions before hashing. For more on ABI encoding, you can check these links.

If you don’t like that encoding, there are two simple ways of ensuring that a code is uniquely decodable:

  • make all the codewords the same length
  • prefix each codeword with an encoding of its length

Fixed-Length Code Example

Under the first strategy, the code above would become:

0 --> 000
1 --> 001
2 --> 010
3 --> 011
4 --> 110
5 --> 111

Now, the corresponding coded text would be:

 010011000000010110 (230024)

Since all the codewords have the same length, there is a single way in which this message can be broken into individual symbols.

Self-Delimiting Code Example

Self-delimiting codes are very important in Algorithmic Information Theory because they allow any string to represent a unique different program, without ambiguity. This is a fascinating subject, based on the central concept of Kolmogorov Complexity. The all-encompassing reference is this book, but I warn you, it is a hard read and I only recommend it if you want to  do research in this field.

Under the second strategy, we prefix a codeword by a string that encodes its length. The decoder first reads this length, and then takes exactly the prescribed number of bits for the next codeword, which means it is never ambiguous where each element of the encoded string begins and ends.

We have, of course, to define an encoding for the length, otherwise we won’t be able to tell where the prefix is separated from the codeword itself.As a first step, we would turn the code to something like this:

0 --> 1 0
1 --> 2 01
2 --> 2 10
3 --> 2 11
4 --> 3 110
5 --> 3 111

but since we can’t use non-binary symbols this is after all

0 --> 1 0
1 --> 10 01
2 --> 10 10
3 --> 10 11
4 --> 11 110
5 --> 11 111

To standard technique to encode the prefix is to double its bits, with a special caveat:

  • every bit before the last one is doubled: 1 is encoded as 11, 0 is encoded as 00.
  • the last bit is encoded by appending the opposite of itself: 1 is encoded as 10 and 0 is encoded as 01.
0 --> 100
1 --> 110101
2 --> 110110
3 --> 110111
4 --> 1110110
5 --> 1110111

If you’re worrying that this has exploded the size of the code words, you are right, but that is because our initial codewords were too small in the first place. For a codeword with 100 bits, the encoded length would add 14 bits, and if the encoded part were a string with, say, 250 bytes, that is 2000 bits, then the added length encoding would be only 22 bits which is about 1% only. The point is that for a codeword that is n bits long, the added length is 2 log n, which is exponentially smaller.

At any rate, it does not matter which encoding you choose, as long as you correctly implement it in all parts of the application and it allows for a unique decoding.

Hash Functions in Web3.js

If you’re writing Solidity smart contracts, you most likely will want to test them, and a great tool for that is Truffle. Truffle runs on Node and makes available to the programmer the web3.js module. In this section, I explore how we can use the hash functions available here.

In the latest version of web3.js (1.2.0) there are two hash functions: sha3() and soliditySha3(). You also have the function keccak256(), which is referred in the documentation as an alias of sha3(). This is an unfortunate naming choice, as I have mentioned above: sha3() and keccak256() are not synonyms, although the history is confusing.

Keccak256 is a specific configuration of the Keccak family of algorithms which was chosen as the winner of the SHA3 competition, so it should be fair to say that SHA3 is Keccak. The problem is that the keccak256() function in Ethereum and other tools was developed before the final SHA3 was standardized and ended up with different constants, meaning we have two different algorithms.

Accessing Hash Functions

To demonstrate the use of these functions, install truffle if you don’t have it (follow instructions here, or maybe see this post for a broader setup), and start it with truffle develop. If you type web3, you’ll get a listing of all the objects and functions at your disposal, but you can get a more compact list with Object.keys(web3). Let’s run a few test cases with these hash functions, which all live in web3.utils.

Empty string:
-------------
web3.utils.sha3()
C:\Users\user\AppData\Roaming\npm\node_modules\truffle\build\cli.bundled.js:46884
if (str.slice(0, 2) === "0x") {
^
TypeError: Cannot read property 'slice' of undefined
[...]

web3.utils.sha3("0x")
null

This is rather surprising. The empty string test is the first one I rely on to verify I am using the right hash function. Removing this option looks like a strange decision to me. On the other hand, it seems these functions can understand string inputs, so I turn to the next best case: let’s hash a simple string.

Hashing Strings in Truffle

As I explain in part 2, strings should be converted to bytes first, which implies choosing a string encoding. Picking a character in the ASCII set should be safe, as western-script charsets tend to match ASCII in their first half.

web3.utils.sha3("a")
'0x3ac225168df54212a25c1c01fd35bebfea408fdac2e31ddd6f80a4bbf9a5f1cb'

If we try the same string with Python, though, we get:

hashlib.sha3_256(b"a").hexdigest()
'80084bf2fba02475726feb2cab2d8215eab14bc6bdd8bfb2c8151257032ecd8b'

This confirms that sha3() is the wrong name for this function, as it does not conform to the standard as Python’s hashlib does. Therefore, I will carry on the examples using the more accurate alias keccak256. To confirm we have the right function, we can use an online generator and enter the same input a, to get:

3ac225168df54212a25c1c01fd35bebfea408fdac2e31ddd6f80a4bbf9a5f1cb

You could argue against my methodology, and claim I have jumped to conclusions too fast. Maybe passing a string to a hash function in Truffle behaves differently from Python, and it is not treating it as an array of bytes. The following example should put that to rest:

web3.utils.keccak256([97])
'0x3ac225168df54212a25c1c01fd35bebfea408fdac2e31ddd6f80a4bbf9a5f1cb'

which is the same result as above. Let’s try another classic:

web3.utils.keccak256([97,98,99])
'0x4e03657aea45a94fc7d47ba826c8d667c0d1e6e33a64a036ec44f58fa12d6c45'
web3.utils.keccak256("abc")
'0x4e03657aea45a94fc7d47ba826c8d667c0d1e6e33a64a036ec44f58fa12d6c45'

So we can see that truffle converts strings to bytes before hashing. What that does not tell us is what encoding it is using, nor how you can choose other encodings. The situation is more confusing in JavaScript than in Python, and I recommend a great blog post by Kevin Burke explaining it.

String Encoding in Truffle

For example, you can transform a Javascript into a Buffer (a byte array) with

Buffer.from("action")
<Buffer 61 63 74 69 6f 6e>

and you can output the contents of the buffer in a decimal string like this:

Buffer.from("action").join(", ")
'97, 99, 116, 105, 111, 110'

You can specify an encoding by passing a second argument to the from function, but the argument specification here differs from Python. I’ll invoke my Python example of the previous post for comparison:

String: "acção"

Python Encodings:

Default: [97, 99, 195, 167, 195, 163, 111]
UTF8 : [97, 99, 195, 167, 195, 163, 111]
UTF16 : [255, 254, 97, 0, 99, 0, 231, 0, 227, 0, 111, 0]
ASCII : can't convert
LATIN1 : [97, 99, 231, 227, 111]
Node.js:
Default: '97, 99, 195, 167, 195, 163, 111'
UTF8 : '97, 99, 195, 167, 195, 163, 111'
UTF16LE: '97, 0, 99, 0, 231, 0, 227, 0, 111, 0'
ASCII : '97, 99, 231, 227, 111'
LATIN1 : '97, 99, 231, 227, 111'

Once we have a firm way to convert strings to bytes, all I said in the previous post applies here, and you can once again compute the hash with

web3.utils.keccak256(Buffer.from("abc", "utf8")
'0x4e03657aea45a94fc7d47ba826c8d667c0d1e6e33a64a036ec44f58fa12d6c45'

The Odd Case of SoliditySha3()

If you try soliditySha3(), you get different results:

web3.utils.soliditySha3(Buffer.from("abc","utf8"))
'0x0fe227e19a6809146a0a4841392cbe27febf897d48635ee8e7987ee9ef7fbdc0'

And yet, if we choose the simpler versions,

web3.utils.keccak256("abc")
'0x4e03657aea45a94fc7d47ba826c8d667c0d1e6e33a64a036ec44f58fa12d6c45'
web3.utils.soliditySha3("abc")
'0x4e03657aea45a94fc7d47ba826c8d667c0d1e6e33a64a036ec44f58fa12d6c45'

There’s something confusingly fishy here. The problem is that SoliditySha3() is doing too much. The documentation says it computes the hash (Keccak256, mind you, not SHA3) in the way Solidity would, by ABI encoding and then tightly packing the bytes before hashing. This refers to the time before Solidity 0.5.0, when abi.encodePacked could accept a list of several parameters, and be flexible about their types.

Indeed, before 0.5.0, Solidity’s keccak256() or sha3() function would first call abi.encodePacked on its arguments and then hash. Accordingly to the Solidity 0.4.24 documentation, for example, these are all the same:

keccak256("ab", "c")
keccak256("abc")
keccak256(0x616263)
keccak256(6382179)
keccak256(97, 98, 99)
0x4e03657aea45a94fc7d47ba826c8d667c0d1e6e33a64a036ec44f58fa12d6c45

We can reasonably infer that web3.utils.soliditySha3("abc") is correcly encoding "abc" to bytes [97,98,99], but why is it doing something different with a Buffer containing the same elements?

Well, soliditySha3 is too clever by half, and tries to detect the type of its arguments and properly encode them. It is possible to pass an object indicating the type and value, but if not, the function will handle strings, numbers (including BNs), hex strings and hex literals. I list some examples from the documentation, but you should definitely check the others in there:

web3.utils.soliditySha3('Hello!%'); // auto detects: string
"0x661136a4267dba9ccdf6bfddb7c00e714de936674c4bdb065a531cf1cb15c7fc"
web3.utils.soliditySha3('234'); // auto detects: uint256
"0x61c831beab28d67d1bb40b5ae1a11e2757fa842f031a2d0bc94a7867bc5d26c2"
web3.utils.soliditySha3(0xea); // same as above
"0x61c831beab28d67d1bb40b5ae1a11e2757fa842f031a2d0bc94a7867bc5d26c2"
web3.utils.soliditySha3(new BN('234')); // same as above
"0x61c831beab28d67d1bb40b5ae1a11e2757fa842f031a2d0bc94a7867bc5d26c2"
web3.utils.soliditySha3(234); // same as above
"0x61c831beab28d67d1bb40b5ae1a11e2757fa842f031a2d0bc94a7867bc5d26c2"
web3.utils.soliditySha3({type: 'uint256', value: '234'})); // same as above
"0x61c831beab28d67d1bb40b5ae1a11e2757fa842f031a2d0bc94a7867bc5d26c2"

Notice that all cases but the first are equivalent representations of the same number, and that even when that number is expressed as a single byte (0xea) it is still treated as a 256 bit integer (unlike Solidity, which specifically treats 0xea as represented by 8 bits, and 0x00ea as represented by 16 bits). Notice also that this function does not have a problem in hashing numbers directly. Other examples show how the same argument, a hexadecimal string, can generate a different hash if we specify that its type is bytes32 instead of bytes or address.

So soliditySha3() can handle simple cases in a sensible way. When it finds a complex type, though, it tries to convert it to something it understands: it calls the function web3.utils.toHex() and then feeds the result to the keccak256() function. In this case, we get:

b = Buffer.from("abc", "utf8")
<Buffer 61 62 63>
web3.utils.keccak256(b)
'0x4e03657aea45a94fc7d47ba826c8d667c0d1e6e33a64a036ec44f58fa12d6c45'
web3.utils.soliditySha3(b)
'0x0fe227e19a6809146a0a4841392cbe27febf897d48635ee8e7987ee9ef7fbdc0'
hexb = web3.utils.toHex(b)
'0x7b2274797065223a22427566666572222c2264617461223a5b39372c39382c39395d7d'
web3.utils.soliditySha3(hexb)
'0x0fe227e19a6809146a0a4841392cbe27febf897d48635ee8e7987ee9ef7fbdc0'
web3.utils.keccak256(hexb)
'0x0fe227e19a6809146a0a4841392cbe27febf897d48635ee8e7987ee9ef7fbdc0'

which finally shows where the mismatch between keccak256() and soliditySha3() came from.

Final Remarks

I am at the end of this long exploration of hashes, but let me just summarize. Hash functions are unforgiving. If you change a single bit, the final value will be widely different and you won’t have a clue of what went wrong. Therefore, you want to be absolutely sure of what you feed to the hash function. Do not rely on clever functions that try to massage your input into a byte array in some unpredictable way. In particular, do not use soliditySha3(), which has lots of exceptions. That is asking for maintenance hell, because definitions may change (have changed!) and Node.js and Solidity functions no longer match.

It is far safer to create your own byte arrays instead, or be absolutely sure you know how your raw data gets translated into bytes. Then, pick a function with the right name that accurately describes what it is doing. If you’re using Ethereum’s Keccak256, don’t use a function called sha3 if you can avoid it (and if you can’t, create one properly named and use that instead).

It is hard enough to use hash functions without falling into some pitfall, let’s not create more of them by using confusing functions.

Take care with your hashes in your explorations, and see you again some time soon.

Leave a Reply