跳到主要内容

hive metastore 元数据仓库 ranger 管控

之前讨论spark的ranger权限管控问题的时候, 就提到hive server2 ranger插件只对经过hive server2的请求生效, 如果spark等应用直接读取hive metatore, ranger hive插件那就无法管控到了. 业内有一些解法, 可以从组件自身的解析流程入手, 比如解析spark logical plan转化为对应的元数据操作权限请求, 也可以从hive metastore入手限制权限, 不过一直看到的讨论都是hive metastore的权限管控还不成熟. 这次看ranger官方文档, 发现在hive4版本之后, metastore算是正式支持鉴权了. 这时候有另一个问题, hive server2插件还有存在的必要吗?

看了ranger的设计文档, 一个主要的特性是在ddl的时候, 提供了自动授权. 新建库表的时候, 提供授权; 重命名库表的时候, 对应的权限会进行切换;删除库表的时候, 对应的权限策略会被自动删除.

metastore从元数据操作的角度上进行鉴权, ddl的操作是比较容易理解的, 比如建库建表删库删表, 有很明显的元数据操作信息, 但是dql查询类的也有明显的元数据操作痕迹可以进行定位吗? 没仔细看源代码, 先留下个疑问.

metastore很早就支持storaged based control了, 调用元数据的时候自动切换检查存储端hdfs的权限信息, 不过没有仔细用过, 对细节也是不了解. 因为元数据里保存有table location之类的存储信息, 拿着去检查存储权限也比较合理, 但这好像跟直接用hdfs ranger权限没多大差别了.

一些参考文档

ranger 设计文档

Design Proposal for Hive Metastore Plugin

https://cwiki.apache.org/confluence/display/RANGER/Design+Proposal+for+Hive+Metastore+Plugin

2016年的ranger设计文档, 2023年才正式提供了插件.

picture 0

一些有意思的内容摘录. 使用sql鉴权并且用hive超级管理员账号访问hdfs, 导致hive无法与其他只使用hdfs的应用进行数据交互. So it is hoped that there is a seamless way of controlling the access to, and supporting the sharing of, the Hadoop data between Hive and other Hadoop applications.

Hadoop Users of M/R, Pig, Hive CLI want to access data sets created by HiveServer2

HiveServer2 users want to enjoy the service by the HiveServer2 as a SQL data source with SQL-flavored access control on finer granular objects such as columns, among other advantages from a SQL server. Currently HiveServer2 supports two modes of authorization. The first is “storage based authorization” and the second is “SQL Standard based authorization”.

The first mode is the default and is intended to share the data between Hive and other Hadoop applications. But the downside is that the Hive SQL access privileges have to be used in combination with those of the underlying HDFS privileges; which is not convenient and natural to SQL users.

The second mode is enabled by setting the “impersonate” flag to false, and is intended to provide the access controls the same as a SQL user would enjoy. This is realized through a “superuser” named “hive” who has the full access to the Hive tables. The downside is that the data sharing with other Hadoop application is virtually none.

So it is hoped that there is a seamless way of controlling the access to, and supporting the sharing of, the Hadoop data between Hive and other Hadoop applications.

ranger官方特性介绍文档

2016年的文章. 看这里只有ddl的时候, 才走hive metastore plugin? 普通操作还是走hive server plugin.

Ranger Plugin for Hive MetaStore

https://cwiki.apache.org/confluence/display/RANGER/Ranger+Plugin+for+Hive+MetaStore

picture 0

看起来一个特性是在ddl的时候, 提供了自动授权: 新建库表的时候, 提供授权; 重命名库表的时候, 对应的权限会进行切换;删除库表的时候, 对应的权限策略会被自动删除.

2.2 Automatic policy updates as result of Hive DDL

  • 2.2.1 The automatic privilege grants for newly created tables and databases as configured in hive.security.authorization.createtable.user.grants, hive.security.authorization.createtable.group.grants and hive.security.authorization.createtable.role.grants;
  • 2.2.2 The DDL command’ ALTER TABLE … RENAME TO … will cause the corresponding table name changes for the “exactly” matched, vs. pattern-matched, Ranger policies;
  • 2.2.3 The “Drop Table …” command will cause the “exactly” matched Ranger policy to be dropped.

This new feature will be available for both Hive Server2 and Hive CLI.

Failures of the policy updates will not cause the whole operation to fail: the changes on the Database Object are still successful; it is just that the related Ranger policies will not experience the corresponding changes, and that a warning will be logged to the effect.

2.3 Hive CLI

  • a) The privilege checks are the same as now by the current HiveServer2 Ranger Plugin;
  • b) The Grant/Revoke operations are subject to the same privilege checks, and are dependent on the same configuration parameter, xasecure.hive.update.xapolicies.on.grant.revoke, as by the current HiveServer2 Ranger Plugin.

hive 支持metastore鉴权的文档

Update HiveMetastore authorization to enable use of HiveAuthorizer implementation

https://issues.apache.org/jira/browse/HIVE-21753

Fix Version/s:4.0.0-alpha-1, 看起来hive4.0才会支持.

如果都使用hms metastore鉴权, 那么只使用metastore的sparksql等服务能拥有统一的权限模式.

Currently HMS supports authorization using StorageBasedAuthorizationProvider which relies on permissions at filesystem – like HDFS. Hive supports a pluggable authorization interface, and multiple authorizer implementations (like SQLStd, Ranger, Sentry) are available to authorizer access in Hive. Extending HiveMetastore to use the same authorization interface as Hive will enable use of pluggable authorization implementations; and will result in consistent authorization across Hive, HMS and other services that use HMS (like Spark).

hive 支持metastore的代码

变更 git commit 代码, hive类的开源代码太庞大了, 直接看变更代码能更快从功能特性层面入手熟悉.

HIVE-21753: Update HiveMetastore authorization to enable use of HiveA… #636

https://github.com/apache/hive/pull/636/files

主要实现方案看起来就是在不同操作的时候发送event事件, 然后listener进行解析鉴权.

创建授权context, 然后check privilege


/**
* HiveMetaStoreAuthorizer : Do authorization checks on MetaStore Events in MetaStorePreEventListener
*/

public class HiveMetaStoreAuthorizer extends MetaStorePreEventListener implements MetaStoreFilterHook {

HiveMetaStoreAuthzInfo buildAuthzContext(PreEventContext preEventContext) throws MetaException {
LOG.debug("==> HiveMetaStoreAuthorizer.buildAuthzContext(): EventType=" + preEventContext.getEventType());

HiveMetaStoreAuthorizableEvent authzEvent = null;

if (preEventContext != null) {

switch (preEventContext.getEventType()) {
case CREATE_DATABASE:
authzEvent = new CreateDatabaseEvent(preEventContext);
break;
case ALTER_DATABASE:
authzEvent = new AlterDatabaseEvent(preEventContext);
break;
case DROP_DATABASE:
authzEvent = new DropDatabaseEvent(preEventContext);
break;
case CREATE_TABLE:
authzEvent = new CreateTableEvent(preEventContext);
if (isViewOperation(preEventContext) && (!isSuperUser(getCurrentUser(authzEvent)))) {
//we allow view to be created, but mark it as having not been authorized
PreCreateTableEvent pcte = (PreCreateTableEvent)preEventContext;
Map<String, String> params = pcte.getTable().getParameters();
params.put("Authorized", "false");
}
break;
case ALTER_TABLE:
authzEvent = new AlterTableEvent(preEventContext);
if (isViewOperation(preEventContext) && (!isSuperUser(getCurrentUser(authzEvent)))) {
//we allow view to be altered, but mark it as having not been authorized
PreAlterTableEvent pcte = (PreAlterTableEvent)preEventContext;
Map<String, String> params = pcte.getNewTable().getParameters();
params.put("Authorized", "false");
}
break;
case DROP_TABLE:
authzEvent = new DropTableEvent(preEventContext);
if (isViewOperation(preEventContext) && (!isSuperUser(getCurrentUser(authzEvent)))) {
//TODO: do we need to check Authorized flag?
}
break;
case ADD_PARTITION:
authzEvent = new AddPartitionEvent(preEventContext);
break;
case ALTER_PARTITION:
authzEvent = new AlterPartitionEvent(preEventContext);
break;
case LOAD_PARTITION_DONE:
authzEvent = new LoadPartitionDoneEvent(preEventContext);
break;
case DROP_PARTITION:
authzEvent = new DropPartitionEvent(preEventContext);
break;
case READ_TABLE:
authzEvent = new ReadTableEvent(preEventContext);
break;
case READ_DATABASE:
authzEvent = new ReadDatabaseEvent(preEventContext);
break;
case CREATE_FUNCTION:
authzEvent = new CreateFunctionEvent(preEventContext);
break;
case DROP_FUNCTION:
authzEvent = new DropFunctionEvent(preEventContext);
break;
case CREATE_DATACONNECTOR:
authzEvent = new CreateDataConnectorEvent(preEventContext);
break;
case ALTER_DATACONNECTOR:
authzEvent = new AlterDataConnectorEvent(preEventContext);
break;
case DROP_DATACONNECTOR:
authzEvent = new DropDataConnectorEvent(preEventContext);
break;
case AUTHORIZATION_API_CALL:
case READ_ISCHEMA:
case CREATE_ISCHEMA:
case DROP_ISCHEMA:
case ALTER_ISCHEMA:
case ADD_SCHEMA_VERSION:
case ALTER_SCHEMA_VERSION:
case DROP_SCHEMA_VERSION:
case READ_SCHEMA_VERSION:
case CREATE_CATALOG:
case ALTER_CATALOG:
case DROP_CATALOG:
if (!isSuperUser(getCurrentUser())) {
throw new MetaException(getErrorMessage(preEventContext, getCurrentUser()));
}
break;
default:
break;
}
}

HiveMetaStoreAuthzInfo ret = authzEvent != null ? authzEvent.getAuthzContext() : null;

LOG.debug("<== HiveMetaStoreAuthorizer.buildAuthzContext(): EventType=" + preEventContext.getEventType() + "; ret=" + ret);

return ret;
}

private void checkPrivileges(final HiveMetaStoreAuthzInfo authzContext, HiveAuthorizer authorizer) throws MetaException {
LOG.debug("==> HiveMetaStoreAuthorizer.checkPrivileges(): authzContext=" + authzContext + ", authorizer=" + authorizer);

HiveOperationType hiveOpType = authzContext.getOperationType();
List<HivePrivilegeObject> inputHObjs = authzContext.getInputHObjs();
List<HivePrivilegeObject> outputHObjs = authzContext.getOutputHObjs();
HiveAuthzContext hiveAuthzContext = authzContext.getHiveAuthzContext();

try {
authorizer.checkPrivileges(hiveOpType, inputHObjs, outputHObjs, hiveAuthzContext);
} catch (Exception e) {
throw new MetaException(e.getMessage());
}

LOG.debug("<== HiveMetaStoreAuthorizer.checkPrivileges(): authzContext=" + authzContext + ", authorizer=" + authorizer);
}

}

代码还没仔细看, 这段AuthorizationPreEventListener, 倒像是创建库表后授权, 而不是鉴权?


/**
* AuthorizationPreEventListener : A MetaStorePreEventListener that
* performs authorization/authentication checks on the metastore-side.
*
* Note that this can only perform authorization checks on defined
* metastore PreEventContexts, such as the adding/dropping and altering
* of databases, tables and partitions.
*/
@Private
public class AuthorizationPreEventListener extends MetaStorePreEventListener {


@Override
public void onEvent(PreEventContext context) throws MetaException, NoSuchObjectException,
InvalidOperationException {

if (!tConfigSetOnAuths.get()){
// The reason we do this guard is because when we do not have a good way of initializing
// the config to the handler's thread local config until this call, so we do it then.
// Once done, though, we need not repeat this linking, we simply call setMetaStoreHandler
// and let the AuthorizationProvider and AuthenticationProvider do what they want.
tConfig.set(context.getHandler().getConf());
// Warning note : HMSHandler.getHiveConf() is not thread-unique, .getConf() is.
tAuthenticator.get().setConf(tConfig.get());
for(HiveMetastoreAuthorizationProvider authorizer : tAuthorizers.get()){
authorizer.setConf(tConfig.get());
}
tConfigSetOnAuths.set(true); // set so we don't repeat this initialization
}

tAuthenticator.get().setMetaStoreHandler(context.getHandler());
for(HiveMetastoreAuthorizationProvider authorizer : tAuthorizers.get()){
authorizer.setMetaStoreHandler(context.getHandler());
}

switch (context.getEventType()) {
case CREATE_TABLE:
authorizeCreateTable((PreCreateTableEvent)context);
break;
case DROP_TABLE:
authorizeDropTable((PreDropTableEvent)context);
break;
case ALTER_TABLE:
authorizeAlterTable((PreAlterTableEvent)context);
break;
case READ_TABLE:
authorizeReadTable((PreReadTableEvent)context);
break;
case READ_DATABASE:
authorizeReadDatabase((PreReadDatabaseEvent)context);
break;
case ADD_PARTITION:
authorizeAddPartition((PreAddPartitionEvent)context);
break;
case DROP_PARTITION:
authorizeDropPartition((PreDropPartitionEvent)context);
break;
case ALTER_PARTITION:
authorizeAlterPartition((PreAlterPartitionEvent)context);
break;
case CREATE_DATABASE:
authorizeCreateDatabase((PreCreateDatabaseEvent)context);
break;
case ALTER_DATABASE:
authorizeAlterDatabase((PreAlterDatabaseEvent) context);
break;
case DROP_DATABASE:
authorizeDropDatabase((PreDropDatabaseEvent)context);
break;
case LOAD_PARTITION_DONE:
// noop for now
break;
case AUTHORIZATION_API_CALL:
authorizeAuthorizationAPICall();
default:
break;
}

}


private void authorizeReadTable(PreReadTableEvent context) throws InvalidOperationException,
MetaException {
if (!isReadAuthzEnabled()) {
return;
}
try {
org.apache.hadoop.hive.ql.metadata.Table wrappedTable = new TableWrapper(context.getTable());
for (HiveMetastoreAuthorizationProvider authorizer : tAuthorizers.get()) {
authorizer.authorize(wrappedTable, new Privilege[] { Privilege.SELECT }, null);
}
} catch (AuthorizationException e) {
throw invalidOperationException(e);
} catch (HiveException e) {
throw metaException(e);
}
}

private void authorizeReadDatabase(PreReadDatabaseEvent context)
throws InvalidOperationException, MetaException {
if (!isReadAuthzEnabled()) {
return;
}
try {
for (HiveMetastoreAuthorizationProvider authorizer : tAuthorizers.get()) {
authorizer.authorize(new Database(context.getDatabase()),
new Privilege[] { Privilege.SELECT }, null);
}
} catch (AuthorizationException e) {
throw invalidOperationException(e);
} catch (HiveException e) {
throw metaException(e);
}
}

created at 2023-08-30