跳到主要内容

iceberg-ranger


slug: /iceberg-ranger title: iceberg ranger问题 tags: [bigdata, ranger, iceberg]


如果只授予iceberg的库表权限, 查询metadata会报错没权限, ranger插件并没有自动识别过滤。

要么手动授权,要么改写插件。

[Bug][AuthZ] Kyuubi has no permission to access the Iceberg metadata table after integrating Ranger #3924

https://github.com/apache/kyuubi/issues/3924

use testdb;
CREATE TABLE testdb.iceberg_tbl (id bigint, data string) USING iceberg;
INSERT INTO testdb.iceberg_tbl VALUES (1, 'a'), (2, 'b'), (3, 'c');
select * from testdb.iceberg_tbl;
+-----+-------+
| id | data |
+-----+-------+
| 1 | a |
| 2 | b |
| 3 | c |
+-----+-------+

SELECT * FROM testdb.iceberg_tbl.history;

22/12/07 17:16:37 ERROR ExecuteStatement: Error operating ExecuteStatement: org.apache.kyuubi.plugin.spark.authz.AccessControlException: Permission denied: user [test_user] does not have [select] privilege on [testdb.iceberg_tbl/history/made_current_at]
at org.apache.kyuubi.plugin.spark.authz.ranger.SparkRangerAdminPlugin$.verify(SparkRangerAdminPlugin.scala:128)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.$anonfun$checkPrivileges$5(RuleAuthorization.scala:94)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.$anonfun$checkPrivileges$5$adapted(RuleAuthorization.scala:93)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization$.checkPrivileges(RuleAuthorization.scala:93)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:36)
at org.apache.kyuubi.plugin.spark.authz.ranger.RuleAuthorization.apply(RuleAuthorization.scala:33)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:91)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:125)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:183)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:183)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:121)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:117)
at org.apache.spark.sql.execution.QueryExecution.assertOptimized(QueryExecution.scala:135)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:153)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:150)
at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:201)
at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:246)
at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:215)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:98)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3704)
at org.apache.spark.sql.Dataset.toLocalIterator(Dataset.scala:3000)
at org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$2.iterator(ExecuteStatement.scala:107)
at org.apache.kyuubi.operation.IterableFetchIterator.<init>(FetchIterator.scala:78)
at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.$anonfun$executeStatement$1(ExecuteStatement.scala:106)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.kyuubi.engine.spark.operation.SparkOperation.withLocalProperties(SparkOperation.scala:98)
at org.apache.kyuubi.engine.spark.operation.ExecuteStatement.org$apache$kyuubi$engine$spark$operation$ExecuteStatement$$executeStatement(ExecuteStatement.scala:90)
at org.apache.kyuubi.engine.spark.operation.ExecuteStatement$$anon$3.run(ExecuteStatement.scala:149)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
  • 顺便了解下iceberg 元数据的常见查询操作

For the Iceberg table, it is normal to query some metadata information, such as:

# history
0: jdbc:hive2://xx.xx.xx.xx:10011/default> SELECT * FROM shdw.iceberg_tbl.history;
+--------------------------+----------------------+------------+----------------------+
| made_current_at | snapshot_id | parent_id | is_current_ancestor |
+--------------------------+----------------------+------------+----------------------+
| 2022-05-09 10:58:35.835 | 6955843267870447517 | NULL | true |
+--------------------------+----------------------+------------+----------------------+

# snapshots
0: jdbc:hive2://xx.xx.xx.xx:10011/default> SELECT * FROM shdw.iceberg_tbl.snapshots;
+--------------------------+----------------------+------------+------------+----------------------------------------------------+----------------------------------------------------+
| committed_at | snapshot_id | parent_id | operation | manifest_list | summary |
+--------------------------+----------------------+------------+------------+----------------------------------------------------+----------------------------------------------------+
| 2022-05-09 10:58:35.835 | 6955843267870447517 | NULL | append | hdfs://cluster1/tgwarehouse/shdw.db/iceberg_tbl/metadata/snap-6955843267870447517-1-e8206624-fbc3-4cf5-b2cb-2db672393253.avro | {"added-data-files":"3","added-files-size":"1929","added-records":"3","changed-partition-count":"1","spark.app.id":"spark-application-1652065040852","total-data-files":"3","total-delete-files":"0","total-equality-deletes":"0","total-files-size":"1929","total-position-deletes":"0","total-records":"3"} |
+--------------------------+----------------------+------------+------------+----------------------------------------------------+----------------------------------------------------+

# history join snapshot
0: jdbc:hive2://xx.xx.xx.xx:10011/default> select
h.made_current_at,
s.operation,
h.snapshot_id,
h.is_current_ancestor,
s.summary['spark.app.id']
from shdw.iceberg_tbl.history h
join shdw.iceberg_tbl.snapshots s
on h.snapshot_id = s.snapshot_id
order by made_current_at
+--------------------------+------------+----------------------+----------------------+----------------------------------+
| made_current_at | operation | snapshot_id | is_current_ancestor | summary[spark.app.id] |
+--------------------------+------------+----------------------+----------------------+----------------------------------+
| 2022-05-09 10:58:35.835 | append | 6955843267870447517 | true | spark-application-1652065040852 |
+--------------------------+------------+----------------------+----------------------+----------------------------------+