How to make python pandas datetime conversion jobs faster to numpy

Asked 2 weeks ago, Updated 2 weeks ago, 2 views

The data in df["click"] is in the form of a string yyyyymmddHHMMSS as shown below.

 ["20211122000000", "20211122000000", "20211122000000", "20211122000000" ...]

The code below is being used to convert to datetime values.

df["click"] = df["click"].apply(pd.to_datetime, errors="coerce")

But the number of rows in DataFrame exceeded 1 million lines, so it was too slow. Is it possible to convert the string data to datetime (yyyy-mm-dd HH:MM:SS) using numpy?

Or even if it's not numpy, what's faster than the source I'm using? DataFrame is Pandas because PySpark, Koalas, and Dask are not available.

python mongodb

2022-09-20 14:30

1 Answers

pd.to_datetime can receive series as a factor.

df["click_dt"] = pd.to_datetime(df["click"], error="coerce")

If the format of the time string is constant, if you fix the format factor, pd.to_datetime won't worry about what format it is, so it will be faster. If it's the format you asked, it's probably like the code below.

df["click_dt"] = pd.to_datetime(df["click"], format="%Y%m%d%H%M%S", error="coerce")


2022-09-20 14:30

If you have any answers or tips


© 2022 pinfo. All rights reserved.