Case studies — QuantSpark - Transformation through Analytics and AI

Utilising existing data and technology to streamline decision-making for asset management

Written by Cem Bektas | 30 June, 2022

Profile

The client is an ESG-focussed asset manager with a global portfolio and multiple funds. The client has a sophisticated internal data environment but maintaining and updating legacy tools proved to be time-consuming and cumbersome. The focus was on utilising data and technology in a smarter way to streamline their decision-making process and ensure their methods were future-proof for the rapidly evolving technological landscape.

 

Situation

It is essential for an asset manager to have immediate access to reliable and up-to-date data in order to make strategic decisions quickly with a high level of confidence. Legacy Excel tools that were once the best tools for calculating metrics and storing data are no longer appropriate for handling the large amounts of data needed to stay on top in the financial sector.

Despite the client’s sophisticated data environment, several Excel tools hindered the process of making quick and reliable decisions. With data refreshes and logic changes carried out manually, this left important data open to mistakes, and the latest data was often not available. Focussing on an Excel tool that calculates Enterprise Value metrics for an investment universe as a proof of concept, QS established a roadmap to automate the data extraction, manipulation, and subsequent upload to the client’s internal database to streamline the process and improve decision-making.

The Excel tool contained over 20 sheets with additional logic incorporated over several years, causing the tool to grow ever larger. Using an Excel plug-in, each sheet drew in data and performed calculations for ~160 companies over 260 time periods. The combination of adding new logic and having to manually update the tool meant that ensuring consistency across sheets was difficult and time-consuming. The large amount of data to be processed also meant that the tool had to be refreshed in chunks, meaning that some data was not updated.

Going into the future, the client wanted a more efficient data pipeline with greatly reduced maintenance time, and the flexibility to carry out data refreshes as and when needed to ensure accurate and up-to-date reporting. The logic and calculations in the Excel tool were replicated via this new pipeline, allowing the Excel tool to be completely retired, therefore removing a bottleneck in the decision-making process.

 

Action

The first stage of the process was to gain a thorough understanding of the existing Excel tool and visually map the flow of data through each sheet in the tool. We created documentation of the tool to record places where historic logic had changed and act as a reference not only for ourselves but for the client too. We proposed a new dataflow, with data extracted through a connection to the 3rd party Snowflake database rather than the extant Excel plug-in, which currently pulls in the same data. 

To ensure a smooth, singular process from extraction to upload of the output table, we constructed a Python script within their internal coding environment, which incorporates SQL queries to handle connections to the data source, performs several layers of calculations, and uploads to the output database. We replicated Calculations performed via complex Excel formulae, including custom formulae unique to the plug-in, using Pandas dataframe methods. We also added functionality to update the data daily.

A comprehensive regression testing and reconciliation process took place to ensure the Excel outputs were being replicated correctly. Our attention to detail led us to match the Excel tool to within 1% on all key metrics, with any differences explained by differences present in the two data sources (Excel plug-in and Snowflake) themselves.

All coding followed best practices according to the client's existing code environment. Smart data engineering to break the process down into easily digestible stages and the addition of loggers means our work is simple to understand and can be picked quickly up by the client.

 

Tools and techniques used in this work

  • SQL
  • Snowflake connection
  • Python
  • Advanced data engineering
  • Data flow mapping
  • High level of attention to detail in reconciliation
  • Pandas and NumPy
  • Proof of Concept tool

 

Impact

Our new dataflow allows large amounts of data to be updated in minutes, compared to 4/5 manual hours previously spent updating the legacy tool. Historical records are also fully updated and correct, and any future changes to logic can be easily implemented for all data, including historical data.

The new tool is extremely agile as it allows data refreshes at any time – ideal for making strategic decisions with short timelines. The much-reduced runtime also allows for more granular results with automated daily refreshes now possible, enabling metrics to be modelled with a higher degree of sensitivity and accuracy.

The new model is more robust and accurate, with loggers to indicate that each part of the pipeline is working correctly. The python script also has version control, which was previously not possible. This allows visibility and accountability for any changes made and allows new logic to be trialled in a simple and quick manner.

 

So what?

This proof-of-concept tool will give the client greater confidence in investment decisions as these are now based on more reliable and up-to-date data. The several working weeks saved per year will also allow analysts to spend little to no time maintaining tools and more time advising investors. The tool will allow the client to move towards a single source of truth and consolidate data sources going forward, creating a cohesive and efficient catalogue of investment tools to aid strategic decision-making in the future.