morpheus.parsers.url_parser

Functions

parse(urls[, req_cols]) Extract hostname, domain, subdomain and suffix from URLs.
parse(urls, req_cols=None)[source]

Extract hostname, domain, subdomain and suffix from URLs.

Parameters
urlscudf.Series

URLs to be parsed.

req_colstyping.Set[str]

Selected columns to extract. Can be subset of (hostname, domain, subdomain and suffix).

Returns
cudf.DataFrame

Parsed dataframe with selected columns to extract.

Examples

Copy
Copied!
            

>>> from cudf import DataFrame >>> from morpheus.parsers import url_parser >>> >>> input_df = DataFrame( ... { ... "url": [ ... "http://www.google.com", ... "gmail.com", ... "github.com", ... "https://pandas.pydata.org", ... ] ... } ... ) >>> url_parser.parse(input_df["url"]) hostname domain suffix subdomain 0 www.google.com google com www 1 gmail.com gmail com 2 github.com github com 3 pandas.pydata.org pydata org pandas >>> url_parser.parse(input_df["url"], req_cols={'domain', 'suffix'}) domain suffix 0 google com 1 gmail com 2 github com 3 pydata org

Previous morpheus.parsers.splunk_notable_parser.SplunkNotableParser
Next morpheus.parsers.windows_event_parser
© Copyright 2023, NVIDIA. Last updated on Feb 2, 2024.