Python Write To Hdfs File
Solution 1:
try HDFS liberary.. its really good You can use write().
to create connection:
from hdfs import InsecureClient
client = InsecureClient('http://host:port', user='ann')
from json import dump, dumps
records = [
{'name': 'foo', 'weight': 1},
{'name': 'bar', 'weight': 2},
# As a context manager:
with client.write('data/records.jsonl', encoding='utf-8') as writer:
dump(records, writer)
# Or, passing in a generator directly:
client.write('data/records.jsonl', data=dumps(records), encoding='utf-8')
For CSV you can do
import pandas as pd"file.csv")
with client_hdfs.write('path/output.csv', encoding = 'utf-8') as writer:
Solution 2:
What's wrong with other answers
They use WebHDFS, which is not enabled by default, and insecure without Kerberos or Apache Knox.
This is what the upload
function of that hdfs
library you linked to uses.
Native (more secure) ways to write to HDFS using Python
You can use pyspark
Example - How to write pyspark dataframe to HDFS and then how to read it back into dataframe?
has been mentioned, but it doesn't write files
has a function that should be able to write to HDFS as well, though I've not tried.
Solution 3:
Without using a complicated library built for HDFS, you can also simply use the requests package in python for HDFS as:
import requests
from json import dumps
params = (
('op', 'CREATE')
data = dumps(file) # some file or object - also tested for pickle library
response = requests.put('http://host:port/path', params=params, data=data)
If response is 200, then your connection is working! This technique lets you use all the utitities given by Hadoop's RESTful API: ls, md, get, post, etc.
You can also convert CURL commands to python through this:
- Get Command for HDFS:
- Convert to python:
Hope this helps!
Post a Comment for "Python Write To Hdfs File"