Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Series.drop_duplicates raised a TypeError #2518

Open
hekaisheng opened this issue Oct 13, 2021 · 5 comments
Open

[BUG] Series.drop_duplicates raised a TypeError #2518

hekaisheng opened this issue Oct 13, 2021 · 5 comments
Labels

Comments

@hekaisheng
Copy link
Contributor

Describe the bug

Failed to execute Series.drop_duplicates.

In [75]: a = md.DataFrame(np.random.rand(10, 2), columns=['a', 'b'], chunk_size=2)                  

In [76]: a['a'].drop_duplicates().execute()                                                         
  0%|                                                                       | 0/100 [00:00<?, ?it/s]Failed to run subtask l8o2G1V5iJMZVFK7USec2C0k on band numa-0
Traceback (most recent call last):
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/services/scheduling/worker/execution.py", line 263, in internal_run_subtask
    subtask, band_name, subtask_api, batch_quota_req)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/services/scheduling/worker/execution.py", line 340, in _retry_run_subtask
    return await _retry_run(subtask, subtask_info, _run_subtask_once)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/services/scheduling/worker/execution.py", line 83, in _retry_run
    raise ex
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/services/scheduling/worker/execution.py", line 67, in _retry_run
    return await target_async_func(*args)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/services/scheduling/worker/execution.py", line 301, in _run_subtask_once
    return await asyncio.shield(aiotask)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/services/subtask/api.py", line 59, in run_subtask_in_slot
    return await ref.run_subtask(subtask)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/oscar/backends/context.py", line 154, in send
    return self._process_result_message(result)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/oscar/backends/context.py", line 59, in _process_result_message
    raise message.error.with_traceback(message.traceback)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/oscar/backends/pool.py", line 496, in send
    result = await future
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/oscar/api.py", line 118, in __on_receive__
    return await super().__on_receive__(message)
  File "mars/oscar/core.pyx", line 351, in __on_receive__
    raise ex
  File "mars/oscar/core.pyx", line 345, in mars.oscar.core._BaseActor.__on_receive__
    return await self._handle_actor_result(result)
  File "mars/oscar/core.pyx", line 250, in _handle_actor_result
    result = list(dones)[0].result()
  File "mars/oscar/core.pyx", line 273, in mars.oscar.core._BaseActor._run_actor_async_generator
    with debug_async_timeout('actor_lock_timeout',
  File "mars/oscar/core.pyx", line 275, in mars.oscar.core._BaseActor._run_actor_async_generator
    async with self._lock:
  File "mars/oscar/core.pyx", line 279, in mars.oscar.core._BaseActor._run_actor_async_generator
    res = await gen.athrow(*res)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/services/subtask/worker/runner.py", line 104, in run_subtask
    result = yield self._running_processor.run(subtask)
  File "mars/oscar/core.pyx", line 284, in mars.oscar.core._BaseActor._run_actor_async_generator
    res = await self._handle_actor_result(res)
  File "mars/oscar/core.pyx", line 219, in _handle_actor_result
    result = await result
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/oscar/backends/context.py", line 154, in send
    return self._process_result_message(result)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/oscar/backends/context.py", line 59, in _process_result_message
    raise message.error.with_traceback(message.traceback)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/oscar/backends/pool.py", line 496, in send
    result = await future
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/oscar/api.py", line 118, in __on_receive__
    return await super().__on_receive__(message)
  File "mars/oscar/core.pyx", line 351, in __on_receive__
    raise ex
  File "mars/oscar/core.pyx", line 345, in mars.oscar.core._BaseActor.__on_receive__
    return await self._handle_actor_result(result)
  File "mars/oscar/core.pyx", line 250, in _handle_actor_result
    result = list(dones)[0].result()
  File "mars/oscar/core.pyx", line 273, in mars.oscar.core._BaseActor._run_actor_async_generator
    with debug_async_timeout('actor_lock_timeout',
  File "mars/oscar/core.pyx", line 275, in mars.oscar.core._BaseActor._run_actor_async_generator
    async with self._lock:
  File "mars/oscar/core.pyx", line 279, in mars.oscar.core._BaseActor._run_actor_async_generator
    res = await gen.athrow(*res)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/services/subtask/worker/processor.py", line 482, in run
    result = yield self._running_aio_task
  File "mars/oscar/core.pyx", line 284, in mars.oscar.core._BaseActor._run_actor_async_generator
    res = await self._handle_actor_result(res)
  File "mars/oscar/core.pyx", line 219, in _handle_actor_result
    result = await result
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/services/subtask/worker/processor.py", line 374, in run
    stored_keys, store_sizes, memory_sizes, data_key_to_object_id = await self._store_data(chunk_graph)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/services/subtask/worker/processor.py", line 248, in _store_data
    result_chunk.params = result_chunk.get_params_from_data(result_data)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/dataframe/core.py", line 1443, in get_params_from_data
    value=data.dtypes)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/dataframe/core.py", line 355, in __init__
    super().__init__(_key=key, _value=value, **kw)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/serialization/serializables/core.py", line 67, in __init__
    object.__setattr__(self, key, val)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/serialization/serializables/field.py", line 106, in __set__
    raise type(e)(f'Failed to set `{self._attr_name}`: {str(e)}')
TypeError: Failed to set `_value`: value needs to be instance of (<class 'pandas.core.series.Series'>,), got <class 'numpy.dtype[float64]'>
Subtask l8o2G1V5iJMZVFK7USec2C0k errored
Traceback (most recent call last):
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/services/scheduling/worker/execution.py", line 263, in internal_run_subtask
    subtask, band_name, subtask_api, batch_quota_req)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/services/scheduling/worker/execution.py", line 340, in _retry_run_subtask
    return await _retry_run(subtask, subtask_info, _run_subtask_once)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/services/scheduling/worker/execution.py", line 83, in _retry_run
    raise ex
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/services/scheduling/worker/execution.py", line 67, in _retry_run
    return await target_async_func(*args)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/services/scheduling/worker/execution.py", line 301, in _run_subtask_once
    return await asyncio.shield(aiotask)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/services/subtask/api.py", line 59, in run_subtask_in_slot
    return await ref.run_subtask(subtask)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/oscar/backends/context.py", line 154, in send
    return self._process_result_message(result)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/oscar/backends/context.py", line 59, in _process_result_message
    raise message.error.with_traceback(message.traceback)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/oscar/backends/pool.py", line 496, in send
    result = await future
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/oscar/api.py", line 118, in __on_receive__
    return await super().__on_receive__(message)
  File "mars/oscar/core.pyx", line 351, in __on_receive__
    raise ex
  File "mars/oscar/core.pyx", line 345, in mars.oscar.core._BaseActor.__on_receive__
    return await self._handle_actor_result(result)
  File "mars/oscar/core.pyx", line 250, in _handle_actor_result
    result = list(dones)[0].result()
  File "mars/oscar/core.pyx", line 273, in mars.oscar.core._BaseActor._run_actor_async_generator
    with debug_async_timeout('actor_lock_timeout',
  File "mars/oscar/core.pyx", line 275, in mars.oscar.core._BaseActor._run_actor_async_generator
    async with self._lock:
  File "mars/oscar/core.pyx", line 279, in mars.oscar.core._BaseActor._run_actor_async_generator
    res = await gen.athrow(*res)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/services/subtask/worker/runner.py", line 104, in run_subtask
    result = yield self._running_processor.run(subtask)
  File "mars/oscar/core.pyx", line 284, in mars.oscar.core._BaseActor._run_actor_async_generator
    res = await self._handle_actor_result(res)
  File "mars/oscar/core.pyx", line 219, in _handle_actor_result
    result = await result
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/oscar/backends/context.py", line 154, in send
    return self._process_result_message(result)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/oscar/backends/context.py", line 59, in _process_result_message
    raise message.error.with_traceback(message.traceback)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/oscar/backends/pool.py", line 496, in send
    result = await future
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/oscar/api.py", line 118, in __on_receive__
    return await super().__on_receive__(message)
  File "mars/oscar/core.pyx", line 351, in __on_receive__
    raise ex
  File "mars/oscar/core.pyx", line 345, in mars.oscar.core._BaseActor.__on_receive__
    return await self._handle_actor_result(result)
  File "mars/oscar/core.pyx", line 250, in _handle_actor_result
    result = list(dones)[0].result()
  File "mars/oscar/core.pyx", line 273, in mars.oscar.core._BaseActor._run_actor_async_generator
    with debug_async_timeout('actor_lock_timeout',
  File "mars/oscar/core.pyx", line 275, in mars.oscar.core._BaseActor._run_actor_async_generator
    async with self._lock:
  File "mars/oscar/core.pyx", line 279, in mars.oscar.core._BaseActor._run_actor_async_generator
    res = await gen.athrow(*res)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/services/subtask/worker/processor.py", line 482, in run
    result = yield self._running_aio_task
  File "mars/oscar/core.pyx", line 284, in mars.oscar.core._BaseActor._run_actor_async_generator
    res = await self._handle_actor_result(res)
  File "mars/oscar/core.pyx", line 219, in _handle_actor_result
    result = await result
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/services/subtask/worker/processor.py", line 374, in run
    stored_keys, store_sizes, memory_sizes, data_key_to_object_id = await self._store_data(chunk_graph)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/services/subtask/worker/processor.py", line 248, in _store_data
    result_chunk.params = result_chunk.get_params_from_data(result_data)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/dataframe/core.py", line 1443, in get_params_from_data
    value=data.dtypes)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/dataframe/core.py", line 355, in __init__
    super().__init__(_key=key, _value=value, **kw)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/serialization/serializables/core.py", line 67, in __init__
    object.__setattr__(self, key, val)
  File "/Users/hekaisheng/Documents/mars_dev/mars/mars/serialization/serializables/field.py", line 106, in __set__
    raise type(e)(f'Failed to set `{self._attr_name}`: {str(e)}')
TypeError: Failed to set `_value`: value needs to be instance of (<class 'pandas.core.series.Series'>,), got <class 'numpy.dtype[float64]'>
@hekaisheng hekaisheng added type: bug Something isn't working mod: dataframe labels Oct 13, 2021
@hekaisheng hekaisheng added this to the v0.8.0rc1 milestone Oct 13, 2021
@hekaisheng
Copy link
Contributor Author

hekaisheng commented Oct 19, 2021

The output type is wrong in DataFrameDropDuplicates's tile, here is the related code

output_types=in_chunks[0].op.output_types
).new_chunk(in_chunks, **kw)
chunk_op = op.copy().reset_key()
chunk_op._method = method
chunk_op.stage = (

For series input, output type should always be series.

@qinxuye qinxuye modified the milestones: v0.8.0rc1, v0.9.0a1 Oct 23, 2021
@qinxuye qinxuye modified the milestones: v0.9.0a1, v0.9.0a2 Dec 16, 2021
@qinxuye qinxuye removed this from the v0.9.0a2 milestone Jan 29, 2022
@Srivathsan-V
Copy link

Hello! Iam a beginner to open source. I would like to contribute to this issue. Could you please explain me this issue?

@BhargavGurav
Copy link

Can you let me help

@us107
Copy link

us107 commented Oct 15, 2024

Hi! I'd love to contribute to this issue. Could you please assign it to me?

@SitaGanesh
Copy link

SitaGanesh commented Dec 4, 2024

Hey! @hekaisheng I encountered the issue you are facing, and I’ll show you what it was.

Series.drop_duplicates() fails with a TypeError

  • I encountered an issue where the Series.drop_duplicates() method in the Mars framework fails to execute. A TypeError occurs during the subtask execution when attempting to drop duplicates from a DataFrame.
  • The error happens when trying to set the _value parameter of a result chunk, which expects an instance of pandas.core.series.Series, but a numpy.dtype[float64] is being provided instead.
  • To reproduce the issue:
  1. Install Mars and necessary dependencies like NumPy.
  2. Create a DataFrame with random data and define a chunk size:
    import mars.dataframe as md
    import numpy as np

Create a DataFrame with random values, 10 rows, 2 columns
a = md.DataFrame(np.random.rand(10, 2), columns=['a', 'b'], chunk_size=2)`

  1. Attempt to drop duplicates from the 'a' column:
    a['a'].drop_duplicates().execute()
  2. Running the above code results in a TypeError being raised.
  • The expected behavior is that the drop_duplicates() method should execute without errors, returning a Series with duplicates removed.

  • The observed behavior is that the operation fails with the following error:
    TypeError: Failed to set _value: value needs to be instance of (<class 'pandas.core.series.Series'>,), got <class 'numpy.dtype[float64]'>

  • The error occurs within the mars.dataframe.core.get_params_from_data method, where the type mismatch happens.

  • The bug prevents the drop_duplicates() operation from functioning as expected, disrupting workflows that rely on deduplication, especially in data cleaning and manipulation tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants