Error In Joblib.load When Reading File From S3
When trying to read a file from s3 with joblib.load() I get the error ValueError: embedded null byte when attempting to read files. The files were created by joblib and can be su
Solution 1:
The following code reconstructs a local copy of the file in memory before feeding into joblib.load()
, enabling a successful load.
from io import BytesIO
import boto3
from sklearn.externals import joblib
s3 = boto3.resource('s3')
bucket_str = "my-aws-bucket"
bucket_key = "some-pseudo/folder-set/my-filename.joblib"with BytesIO() as data:
s3.Bucket(bucket_str).download_fileobj(bucket_key, data)
data.seek(0) # move back to the beginning after writing
df = joblib.load(data)
I assume, but am not certain, that something in how boto3 chunks files for download creates a null byte that breaks joblib, and BytesIO fixes this before letting joblib.load()
see the datastream.
PS. In this method the file never touches the local disk, which is helpful under some circumstances (eg. node with big RAM but tiny disk space...)
Post a Comment for "Error In Joblib.load When Reading File From S3"