Skip to content Skip to sidebar Skip to footer

Partitioning A Table

Bigquery allow partitioning, only by date, at this time. Lets supose I have a 1billion table rows with inserted_timestamp field. Lets supose this field has dates from 1 year ago. W

Solution 1:

All of the functionality necessary to do this exists in Beam, although it may currently be limited to the Java SDK.

You would use BigQueryIO. Specifically, you may use DynamicDestinations to determine a destination table for each row.

From the example of DynamicDestinations:

events.apply(BigQueryIO.<UserEvent>write()
  .to(newDynamicDestinations<UserEvent, String>() {
        publicStringgetDestination(ValueInSingleWindow<String> element) {
          return element.getValue().getUserId();
        }
        publicTableDestinationgetTable(String user) {
          returnnewTableDestination(tableForUser(user), 
            "Table for user " + user);
        }
        publicTableSchemagetSchema(String user) {
          returntableSchemaForUser(user);
        }
      })
  .withFormatFunction(newSerializableFunction<UserEvent, TableRow>() {
     publicTableRowapply(UserEvent event) {
       returnconvertUserEventToTableRow(event);
     }
   }));

Post a Comment for "Partitioning A Table"