Data extraction problem based on upper case words

I have a data file in the following format:

source.dat (Example for illustration purposes.)

ALBANIA Duck 1000 BELGIUM CHARLEROI Donald Duck 10234 CYPRUS J. Mickey 567 

I'm looking for a Linux script (Bash, perl, whatever) to extract the info into csv format, with the given rules:

  • 1st field in csv contains only the entire words which are in uppercase
  • 3rd field in csv contains the last input field (number)
  • 2nd field in csv contains the remaining middle part

So the expected output should be:


ALBANIA,Duck,1000 BELGIUM CHARLEROI,Donald Duck,10234 CYPRUS J.,Mickey,567 


If do directly as asked

sed -r 's/([[:upper:] .]+)\s+(.*)\s+([0-9]+)\s*$/\1,\2,\3/' file

As we can see the field1 is remain on the place, so we can simplify script

sed -r 's/\s+(\w*[[:lower:]].*)\s+([0-9]+)\s*$/,\1,\2/' file


sed -r 's/\s+(\w*[[:lower:]].*[^0-9])\s+/,\1,/' file


sed 's/\([^[:lower:]]*\) \(.*\) /\1,\2,/' < source.dat > output.csv

Category: text processing Time: 2016-07-29 Views: 0

Related post

iOS development

Android development

Python development

JAVA development

Development language

PHP development

Ruby development


Front-end development


development tools

Open Platform

Javascript development

.NET development

cloud computing


Copyright (C), All Rights Reserved.

processed in 0.137 (s). 12 q(s)