Skip to content Skip to sidebar Skip to footer

Google Cloud Storage + Python : Any Way To List Obj In Certain Folder In Gcs?

I'm going to write a Python program to check if a file is in certain folder of my Google Cloud Storage, the basic idea is to get the list of all objects in a folder, a file name li

Solution 1:

Update: the below is true for the older "Google API Client Libraries" for Python, but if you're not using that client, prefer the newer "Google Cloud Client Library" for Python ( https://googleapis.dev/python/storage/latest/index.html ). For the newer library, the equivalent to the below code is:

from google.cloud import storage

client = storage.Client()
for blob in client.list_blobs('bucketname', prefix='abc/myfolder'):
  print(str(blob))

Answer for older client follows.

You may find it easier to work with the JSON API, which has a full-featured Python client. It has a function for listing objects that takes a prefix parameter, which you could use to check for a certain directory and its children in this manner:

from apiclient import discovery

# Auth goes here if necessary. Create authorized http object...
client = discovery.build('storage', 'v1') # add http=whatever param if auth
request = client.objects().list(
    bucket="mybucket",
    prefix="abc/myfolder")
while request isnotNone:
  response = request.execute()
  print json.dumps(response, indent=2)
  request = request.list_next(request, response)

Fuller documentation of the list call is here: https://developers.google.com/storage/docs/json_api/v1/objects/list

And the Google Python API client is documented here: https://code.google.com/p/google-api-python-client/

Solution 2:

This worked for me:

client = storage.Client()
BUCKET_NAME = 'DEMO_BUCKET'
bucket = client.get_bucket(BUCKET_NAME)

blobs = bucket.list_blobs()

for blob in blobs:
    print(blob.name)

The list_blobs() method will return an iterator used to find blobs in the bucket. Now you can iterate over blobs and access every object in the bucket. In this example I just print out the name of the object.

This documentation helped me alot:

I hope I could help!

Solution 3:

You might also want to look at gcloud-python and documentation.

from gcloud import storage
connection = storage.get_connection(project_name, email, private_key_path)
bucket = connection.get_bucket('my-bucket')

for key in bucket:
  if key.name == 'abc.txt':
    print'Found it!'break

However, you might be better off just checking if the file exists:

if'abc.txt'in bucket:
  print'Found it!'

Solution 4:

Install python package google-cloud-storage by pip or pycharm and use below code

from google.cloud import storage
client = storage.Client()
for blob in client.list_blobs(BUCKET_NAME, prefix=FOLDER_NAME):
  print(str(blob))

Solution 5:

I know this is an old question, but I stumbled over this because I was looking for the exact same answer. Answers from Brandon Yarbrough and Abhijit worked for me, but I wanted to get into more detail.

When you run this:

from google.cloud importstoragestorage_client= storage.Client()
blobs = list(storage_client.list_blobs(bucket_name, prefix=PREFIX, fields="items(name)"))

You will get Blob objects, with just the name field of all files in the given bucket, like this:

[<Blob: BUCKET_NAME, PREFIX, None>, 
 <Blob: xml-BUCKET_NAME, [PREFIX]claim_757325.json, None>, 
 <Blob: xml-BUCKET_NAME, [PREFIX]claim_757390.json, None>,
 ...]

If you are like me and you want to 1) filter out the first item in the list because it does NOT represent a file - its just the prefix, 2) just get the name string value, and 3) remove the PREFIX from the file name, you can do something like this:

blob_names = [blob_name.name[len(PREFIX):] for blob_name in blobs if blob_name.name != folder_name]

Complete code to get just the string files names from a storage bucket:

from google.cloud import storage
storage_client = storage.Client()
blobs = list(storage_client.list_blobs(bucket_name, prefix=PREFIX, fields="items(name)"))
blob_names = [blob_name.name[len(PREFIX):] for blob_name in blobs if blob_name.name != folder_name]
print(f"blob_names = {blob_names}")

Post a Comment for "Google Cloud Storage + Python : Any Way To List Obj In Certain Folder In Gcs?"