Blog

Formula - How to convert a single delimited string to rows and columns

This post demonstrates a method we can use to convert semi-structured strings to rows and columns.

When building systems, we can often be required to parse input data that's been provided in rather strange formats.

To illustrate this scenario, this post highlights a scenario where it's necessary to convert a long string of comma-separated values into rows and columns.

Example of converting unstructured data

This post uses the example data beneath.

This input data contains a string of multiple records in a single long string. Each field is comma separated. There are no line breaks or record separation characters.

The fields for each record include 3 fields that appear in the following sequence- address, city, and date.
 
1181 Hague St.,Sheffield,2016-08-13,22 Green Second Avenue,Norwich,2020-01-02,79 North Oak Road,Sunderland,2020-03-11,57 White Fabien Boulevard,Oxford,2016-03-24,173 North Rocky Nobel St.,Walsall,2017-06-13,269 Clarendon Drive,Preston,2017-11-23,39 Rocky Oak St.,Manchester,2011-11-30,53 Rocky Oak St.,Reading,2018-04-07,744 South Second Freeway,Norwich,2008-02-27,904 White Milton Freeway,Crawley,2018-08-17

Overview of how to process this data

Here's an overview of how to process this data. Since each data item is separated by a comma character, the natural start point is to split this input data by the comma character. The result of the Split function will return a single-column table that looks like this.

Split(txtUnstructuredText.Text, ",")


Fig 1

The next step is to transform the single column into three columns.

To do this, we can call the Sequence function to return the ordinal row number from the split data (eg the list of items shown in the data date from fig 1) that corresponds to a new record.  The formula to do this looks like this.
.
Sequence(CountRows(
Split(txtUnstructuredText.Text,",")
),
1,3
)


Fig 2


Putting the above together - Converting a string to rows and columns

To generate the final output, we can call the ForAll function to iterate through the output of the Sequence function.

For each row of the sequence, we can call the Index function to return the relevant item from the original Split output.

With({items:Split(txtUnstructuredText.Text,",")},         
ForAll(Sequence(CountRows(items),1,3),
{
Address: Index(items,Value).Value,
City: Index(items,Value +1).Value,
CreateDate: Index(items,Value +2).Value
}
)
)

The screenshot illustrates how the final result looks.

Conclusion

There may be cases where we need to parse semi-structured data into rows and columns. This post highlighted a methodology we can use to carry out this type of task.
Related posts