Posted By: Anonymous
I have a group of strings with the following examples:
'a b c investments pvt ltd' 'a a group of companies' 'adani group p ltd'
The expected output would be:
'abc investments pvt ltd' 'aa group of companies' 'adani group p ltd'
Is there any way to sort this issue.
I tried to utilize
regex module instead of
import regex as re input = "a b c investments pvt ltd" output = re.sub(r"(?<=^[a-z](?: [a-z])*) (?=[a-z] )", '',input) print(output) # abc investments pvt ltd
Here I used a combination of a non-fixed-width lookbehind and a positive lookahead:
(?<=– Open positive lookbehind:
^[a-z]– Start line anchor followed by lowercase alpha.
(?:– Open nested non-capture group:
[a-z]– Match literal space followed by a lowercase alpha.
)*– Close non-capture group and match 0+ times.
)– Close positive lookbehind.
(?=[a-z] )– Positive lookbehind to assert position is followed by a lowercase alpha and a space.