-
Notifications
You must be signed in to change notification settings - Fork 28.7k
Insights: apache/spark
Overview
-
0 Active issues
-
- 0 Merged pull requests
- 85 Open pull requests
- 0 Closed issues
- 0 New issues
Could not load contribution data
Please try again later
85 Pull requests opened by 50 people
-
approx_top_k_combine
#51393 opened
Jul 7, 2025 -
[IN PROGRESS] Support getting pod state using Informers/Listers
#51396 opened
Jul 8, 2025 -
[DRAFT] Parameter markers in DDL.
#51410 opened
Jul 9, 2025 -
[SPARK-52729][SQL] Add MetadataOnlyTable in DS v2 API
#51419 opened
Jul 9, 2025 -
[SPARK-52741][SQL] RemoveFiles ShuffleCleanup mode doesnt work with non-adaptive execution
#51432 opened
Jul 9, 2025 -
[WIP][SQL] Clarify schema mismatch types in insertInto error
#51446 opened
Jul 10, 2025 -
[SPARK-52767][SQL] Optimize the performance of maxRows for join and union
#51451 opened
Jul 11, 2025 -
[SPARK-52769][SQL] InjectRuntimeFilter should take into account join type and hints
#51453 opened
Jul 11, 2025 -
[SPARK-52777][SQL] Enable shuffle cleanup mode configuration in Spark SQL
#51458 opened
Jul 12, 2025 -
[SPARK-52457][SQL]ParseToDate/ParseToTimestamp can return incorrect value for TimestampNTZ
#51465 opened
Jul 13, 2025 -
[WIP][SQL] Incapsulate type operations
#51467 opened
Jul 13, 2025 -
[SPARK-52449][CONNECT][PYTHON][ML] Make datatypes for Expression.Literal.Map/Array optional
#51473 opened
Jul 14, 2025 -
[SPARK-52790][CORE] Introduce new grid testing method in SparkFunSuite
#51474 opened
Jul 14, 2025 -
[DO NOT REVIEW] temp
#51501 opened
Jul 15, 2025 -
[SPARK-52807][SDP] Proto changes to support analysis inside Declarative Pipelines query functions
#51502 opened
Jul 15, 2025 -
[SPARK-52798] [SQL] Add function approx_top_k_combine
#51505 opened
Jul 15, 2025 -
[SPARK-52813][CONNECT] Allow DAGs in Spark Connect
#51516 opened
Jul 16, 2025 -
[SPARK-52828][SQL] Make hashing for collated strings collation agnostic
#51521 opened
Jul 16, 2025 -
Add client env proto to spark connect client requests
#51529 opened
Jul 17, 2025 -
[WIP][SPARK-51169] Set up a daily job for Python 3.14
#51532 opened
Jul 17, 2025 -
[WIP][SPARK-52764][PYTHON][ML][CONNECT][TESTS] Retry flaky tests in `test_parity_classification`
#51535 opened
Jul 17, 2025 -
Keep coverage data when running pip tests
#51552 opened
Jul 18, 2025 -
[SPARK-52621][SQL] Cast TIME to/from VARIANT
#51553 opened
Jul 18, 2025 -
[SPARK-52865][SQL]Remove usage of deprecated FileCommitProtocol.newTaskTempFile method
#51554 opened
Jul 18, 2025 -
[SPARK-52867][SQL] Remove redundant GetTimestamp
#51556 opened
Jul 18, 2025 -
[SPARK-52868][SQL] CBO: OOM-risky stats underestimation for some filters and sources
#51558 opened
Jul 18, 2025 -
added the file in readme
#51571 opened
Jul 19, 2025 -
[DRAFT][DO-NOT-REVIEW][SPARK-51XXX][SQL] Enable implicit cast from STRING to TIME type
#51583 opened
Jul 20, 2025 -
[SPARK-51920][SS][PYTHON] Fix composite/nested type in value state for python
#51621 opened
Jul 22, 2025 -
[SPARK-52923][CORE] Allow ShuffleManager to control push merge during shuffle registration
#51629 opened
Jul 23, 2025 -
[SPARK-52943][PYTHON] Enable arrow_cast for all pandas UDF eval types
#51635 opened
Jul 23, 2025 -
[SPARK-52937][SDP] Sinks
#51644 opened
Jul 24, 2025 -
[SPARK-52942][YARN][BUILD] YARN External Shuffle Service jar should include `scala-library`
#51650 opened
Jul 24, 2025 -
[SPARK-52930][CONNECT] Use DataType.Array/Map for Array/Map Literals
#51653 opened
Jul 24, 2025 -
[SPARK-52953][SQL] Incorrect parameter order in some ExpressionEvalHelper.checkResult() method invocations
#51664 opened
Jul 25, 2025 -
[SPARK-52931][Core] Restrict declare variable naming
#51669 opened
Jul 25, 2025 -
[SPARK-52971] [PYTHON] Limit idle Python worker queue size
#51684 opened
Jul 28, 2025 -
[SPARK-52844][PYTHON][TESTS] Update black to 24.3.0
#51687 opened
Jul 28, 2025 -
[SPARK-52976][PYTHON] Fix Python UDF not accepting collated string as input param/return type
#51688 opened
Jul 28, 2025 -
[SPARK-52978][SQL] Make FileFormatWriter customizable via SQL configuration
#51690 opened
Jul 28, 2025 -
[SPARK-52988][SQL] Fix race conditions in SessionCatalog's metastore function handling
#51696 opened
Jul 29, 2025 -
[SPARK-52991][SQL] Implement MERGE INTO with SCHEMA EVOLUTION for V2 Data Source
#51698 opened
Jul 29, 2025 -
[SPARK-52989][SS] Add explicit close() API to State Store iterators
#51701 opened
Jul 29, 2025 -
[SPARK-52996][TESTS] Update brace-expansion to 1.1.12
#51703 opened
Jul 29, 2025 -
[SPARK-52998][Core] Multiple variables inside declare
#51705 opened
Jul 29, 2025 -
[SPARK-53015][BUILD] Upgrade log4j to 2.25.1
#51719 opened
Jul 30, 2025 -
[SPARK-53019][SQL] Fix job attempt path conflicts in o.a.hadoop..FileOutputCommitter
#51724 opened
Jul 30, 2025 -
[SPARK-53022][TESTS] Add MemoryConsumerBenchmark
#51728 opened
Jul 30, 2025 -
[SPARK-53038][SQL][HIVE] Call initialize only once per GenericUDF instance
#51743 opened
Jul 31, 2025 -
[SPARK-52844][PYTHON] Update protobuf to 5.29.5
#51747 opened
Jul 31, 2025 -
[SPARK-53044] Change Declarative Pipelines import alias convention from "sdp" to "dp"
#51752 opened
Jul 31, 2025 -
Wip naming sources
#51756 opened
Jul 31, 2025 -
[SPARK-53030][PYTHON] Support Arrow writer for streaming Python data sources
#51757 opened
Jul 31, 2025 -
[SPARK-42360][SQL] Rule to convert Left Outer Join with suitable filter to Left Anti Join
#51762 opened
Aug 1, 2025 -
[SPARK-53060] Test to showcase Aggregate followed by ORDER BY doesn't preserve orders
#51768 opened
Aug 1, 2025 -
[SPARK-52844][PYTHON] Update mlflow to 3.1.0
#51774 opened
Aug 1, 2025 -
[SPARK-53064][CORE] Rewrite MDC LogKey in Java
#51775 opened
Aug 1, 2025 -
[SPARK-53066][SQL] Improve EXPLAIN output for DSv2 Join pushdown
#51781 opened
Aug 1, 2025 -
[SPARK-53069][SS] Fix incorrect state store metrics with virtual column families
#51790 opened
Aug 2, 2025 -
[SPARK-53084][CORE] Supplement default GC options in SparkContext initialization
#51796 opened
Aug 3, 2025 -
Fix invalid exit codes and enhance CLI validation tools
#51797 opened
Aug 3, 2025 -
Just test
#51798 opened
Aug 3, 2025 -
[SPARK-53094][SQL] Fix cube-related data quality problem
#51810 opened
Aug 4, 2025 -
[SPARK-53097][CONNECT] Make WriteOperationV2 in SparkConnectPlanner side effect free
#51813 opened
Aug 4, 2025 -
[SPARK-53103][SS] Throw an error if state directory is not empty on batch 0
#51817 opened
Aug 4, 2025 -
[SPARK-53074][SQL] Avoid partial clustering in SPJ to meet a child's required distribution
#51818 opened
Aug 4, 2025 -
[SPARK-53094][SQL] Fix CUBE with aggregate containing HAVING clauses
#51820 opened
Aug 4, 2025 -
[SPARK-53104][PS] Introduce ansi_mode_context to avoid multiple config checks per API call
#51821 opened
Aug 4, 2025 -
[SPARK-53106] Add schema evolution tests for TWS Scala spark connect suites
#51822 opened
Aug 4, 2025 -
[SPARK-53107][SQL] Implement the time_trunc function in Scala
#51823 opened
Aug 4, 2025 -
[SPARK-53113][SQL] Support the time type by try_make_timestamp()
#51824 opened
Aug 4, 2025 -
[SPARK-53110][SQL][PYTHON][CONNECT] Implement the time_trunc function in PySpark
#51825 opened
Aug 4, 2025 -
[SPARK-53108][SQL] Implement the time_diff function in Scala
#51826 opened
Aug 4, 2025 -
[SPARK-53109][SQL] Support TIME in the make_timestamp_ntz and try_make_timestamp_ntz functions in Scala
#51828 opened
Aug 4, 2025 -
[SPARK-53111][SQL][PYTHON][CONNECT] Implement the time_diff function in PySpark
#51829 opened
Aug 4, 2025 -
[SQL] Run scalafmt on DateTimeUtils and DateTimeUtilsSuite
#51830 opened
Aug 4, 2025 -
[SPARK-53105][Structured Streaming] Fix tests for checkpoint v2 in RocksDBSuite
#51834 opened
Aug 4, 2025 -
[SPARK-53124][SQL] Prune unnecessary fields from JsonTuple
#51843 opened
Aug 5, 2025 -
[SPARK-53077][CORE] reduce insertion count in SparkBloomFilterSuite
#51845 opened
Aug 5, 2025 -
[SPARK-53125][TEST] RemoteSparkSession prints whole `spark-submit` command
#51846 opened
Aug 5, 2025
37 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[SPARK-52582][SQL] Improve the memory usage of XML parser
#51287 commented on
Aug 4, 2025 • 49 new comments -
[SPARK-52617][SQL] Cast TIME to/from TIMESTAMP_NTZ
#51381 commented on
Jul 24, 2025 • 38 new comments -
[SPARK-52495][SQL] Allow including partition columns in the single variant column
#51206 commented on
Jul 10, 2025 • 15 new comments -
[SPARK-52407][SQL] Add support for Theta Sketch
#51298 commented on
Aug 4, 2025 • 9 new comments -
[SPARK-42746][SQL] Fix optimizer failure for SortOrder in the LISTAGG function
#51117 commented on
Jul 17, 2025 • 6 new comments -
Enable -Xsource:3 compiler flag
#50474 commented on
Jul 29, 2025 • 5 new comments -
[SPARK-52394][PS] Fix autocorr divide-by-zero error under ANSI mode
#51192 commented on
Aug 5, 2025 • 5 new comments -
[SPARK-52593][PS] Avoid CAST_INVALID_INPUT of `MultiIndex.to_series`, `Series.dot` and `DataFrame.dot` in ANSI mode
#51310 commented on
Aug 5, 2025 • 4 new comments -
[SPARK-51400] Replace ArrayContains nodes to InSet
#50170 commented on
Jul 21, 2025 • 4 new comments -
[SPARK-52444][SQL][CONNECT] Add support for Variant/Char/Varchar Literal
#51215 commented on
Jul 28, 2025 • 4 new comments -
[SPARK-51585][SQL] Oracle dialect supports pushdown datetime functions
#50353 commented on
Aug 5, 2025 • 4 new comments -
[SPARK-52858][INFRA] Retry SBT compilation when OOM
#51149 commented on
Jul 21, 2025 • 3 new comments -
[SPARK-52226] [SQL] Fix unusual equality checks in three operators
#50949 commented on
Jul 29, 2025 • 1 new comment -
[SPARK-48359][SQL] Built-in functions for Zstd compression and decompression
#46672 commented on
Jul 8, 2025 • 1 new comment -
[SPARK-51069][SQL] Add big-endian support to UnsafeRowUtils.validateStructuralIntegrityWithReasonImpl
#49773 commented on
Jul 29, 2025 • 1 new comment -
[SPARK-52545][SQL][DOCS] Update string literal docs for quote escaping rules
#51379 commented on
Jul 7, 2025 • 0 new comments -
[SPARK-52659][SQL]Misleading modulo error message in ansi mode
#51378 commented on
Jul 11, 2025 • 0 new comments -
[SPARK-52669][PySpark]Improvement PySpark choose pythonExec in cluster yarn client mode
#51357 commented on
Jul 8, 2025 • 0 new comments -
[SPARK-52640][SDP] Propagate Python Source Code Location
#51344 commented on
Jul 7, 2025 • 0 new comments -
[MINOR][DOCS] Updated the docstring of DataStreamWriter.foreach() method
#51316 commented on
Aug 5, 2025 • 0 new comments -
[SPARK-52598][DOCS] Reorganize Spark Connect programming guide
#51305 commented on
Jul 15, 2025 • 0 new comments -
[SPARK-51243][CORE][ML] Configurable allow native BLAS
#49986 commented on
Jul 29, 2025 • 0 new comments -
[WIP][SPARK-51348][BUILD][SQL] Upgrade Hive to 4.0
#50213 commented on
Aug 4, 2025 • 0 new comments -
[SPARK-52864] [CORE] [Tests] Let LocalSparkContext clear active context in beforeAll
#51284 commented on
Jul 18, 2025 • 0 new comments -
[SPARK-52544][SQL] Allow configuring Json datasource string length limit through SQLConf
#51235 commented on
Jul 25, 2025 • 0 new comments -
[WIP][SPARK-51224][BUILD] Test Maven 4
#51230 commented on
Jul 9, 2025 • 0 new comments -
[SPARK-51359][CORE][SQL] Set INT64 as the default timestamp type for Parquet files
#50215 commented on
Jul 24, 2025 • 0 new comments -
[SPARK-51756][CORE] Computes RowBasedChecksum in ShuffleWriters
#50230 commented on
Jul 23, 2025 • 0 new comments -
[SPARK-51728][SQL] Add SELECT EXCEPT Support
#50536 commented on
Jul 15, 2025 • 0 new comments -
[SPARK-52486][SQL] Fix Spark Driver Planning OOM issue due to unworthwhile dpp expression before Execution when enabling AQE
#51184 commented on
Jul 21, 2025 • 0 new comments -
[SPARK-51554][SQL] Add the time_trunc() function for TIME datatype
#50607 commented on
Jul 22, 2025 • 0 new comments -
[SPARK-52439][SQL] Support creating check constraint with NULL
#51146 commented on
Jul 18, 2025 • 0 new comments -
[SPARK-51883][DOCS][PYTHON] Python Data Source user guide for filter pushdown
#50684 commented on
Jul 9, 2025 • 0 new comments -
[SPARK-52388][SQL] Handle named and positional parameters under `PlanWithUnresolvedIdentifier`
#51073 commented on
Aug 5, 2025 • 0 new comments -
Increase report interval of spaming logs to 10 seconds
#51012 commented on
Jul 25, 2025 • 0 new comments -
[SPARK-52020][TEST] Build hive-test-udfs.jar from source
#50790 commented on
Jul 28, 2025 • 0 new comments -
[SPARK-52012][CORE][SQL] Restore IDE Index with type annotations
#50798 commented on
Jul 8, 2025 • 0 new comments