Skip to content Skip to sidebar Skip to footer

Different Behaviour Between Same Implementations Of TensorFlow And Keras

I have TensorFlow 1.9 and Keras 2.0.8 on my machine. When training a neural network with some toy data, the resulting training curves are very different between TensorFlow and Kera

Solution 1:

After carefully examining your implementations, I observed that all the hyperparameters match except for the batch size. I don't agree with the answer from @Ultraviolet, because the default kernel_initializer of tf.layers.conv2d is also Xavier (see the TF implementation of conv2d).

The learning curves don't match for the following two reasons:

  1. The parameters from the Keras implementation (version 2) are receiving many more updates than those of the TF implementation (version 1). In version 1, you're feeding the full dataset simultaneously into the network at each epoch. This results in only 30 adam updates. In contrast, version 2 is performing 30 * ceil(len(training_label_data)/batch_size) adam updates, with batch_size=4.

  2. The updates of version 2 are noisier than those of version 1, because the gradients are averaged over less samples.


Solution 2:

I didn't notice any difference between the two implementations of yours. Assuming there is none, I think,

  • First thing is that they started at different initial losses. That suggests that the initializations of the graphs are different. As you didn't mention any initializer. Looking into the documentation (tensorflow Conv2D, Keras Conv2D) I have found that the default initializers are different.

    tensorflow uses no initializer on the other hand Keras uses Xavier initializer.

  • Second thing is that (this is my assumption) tensorflow loss is very sharply decreased initially but later didin't decrease much compared to the Keras one. As the designed network is not very robust and not very deep, because of the bad initialization tensorflow suffered by falling into local minima.

  • Thirdly, there may be some little differences between the two as the default parameter may vary. Generally, the wrapper frameworks try to handle some default parameters so that we need fewer tweaks to get to the optimal weights.
    I have used FastAI framework based on pytorch and Keras framework for a certain classification problem using same VGG network. I have got a significant improvement in FastAI. Because it's default parameters are recently tweaked with the latest best practices.

Edit:

I failed to notice that the batch size was different which is one of the most important hyperparameters here. @rvinas made it clear in his answer.


Post a Comment for "Different Behaviour Between Same Implementations Of TensorFlow And Keras"