Different Behaviour Between Same Implementations Of TensorFlow And Keras
Solution 1:
After carefully examining your implementations, I observed that all the hyperparameters match except for the batch size. I don't agree with the answer from @Ultraviolet, because the default kernel_initializer
of tf.layers.conv2d
is also Xavier (see the TF implementation of conv2d).
The learning curves don't match for the following two reasons:
The parameters from the Keras implementation (version 2) are receiving many more updates than those of the TF implementation (version 1). In version 1, you're feeding the full dataset simultaneously into the network at each epoch. This results in only 30 adam updates. In contrast, version 2 is performing
30 * ceil(len(training_label_data)/batch_size)
adam updates, withbatch_size=4
.The updates of version 2 are noisier than those of version 1, because the gradients are averaged over less samples.
Solution 2:
I didn't notice any difference between the two implementations of yours. Assuming there is none, I think,
First thing is that they started at different initial losses. That suggests that the initializations of the graphs are different. As you didn't mention any initializer. Looking into the documentation (tensorflow Conv2D, Keras Conv2D) I have found that the default initializers are different.
tensorflow
uses no initializer on the other handKeras
uses Xavier initializer.Second thing is that (this is my assumption)
tensorflow
loss is very sharply decreased initially but later didin't decrease much compared to theKeras
one. As the designed network is not very robust and not very deep, because of the bad initializationtensorflow
suffered by falling into local minima.Thirdly, there may be some little differences between the two as the default parameter may vary. Generally, the wrapper frameworks try to handle some default parameters so that we need fewer tweaks to get to the optimal weights.
I have used FastAI framework based onpytorch
and Keras framework for a certain classification problem using same VGG network. I have got a significant improvement in FastAI. Because it's default parameters are recently tweaked with the latest best practices.
Edit:
I failed to notice that the batch size was different which is one of the most important hyperparameters here. @rvinas made it clear in his answer.
Post a Comment for "Different Behaviour Between Same Implementations Of TensorFlow And Keras"