Introducing AI to Activity Classification

Blog | August 19, 2021

Introducing our most accurate Qualified R&D Activities classifier to date

We've been hard at work building new features for our accountant partners to use, and we recently launched a big one: RetroacDev now uses artificial intelligence to determine the breakdown of activities an employee worked on.

For those less familiar, RetroacDev's data-driven R&D Study automation software helps accountants automate traditionally time-intensive R&D processes like interviewing engineers and doing write-up summaries for each person by analyzing existing data from the cloud platforms teams use. It's important to note, employees don't need to be onboarded to a new system to record data specifically for the purpose of claiming the R&D credit. RetroacDev intakes data from the cloud tools employees are already using in their day-to-day work and analyzes it in the context of the IRS definition of R&D.

When you import project management data into an R&D study, RetroacDev analyzes each task and the metadata associated with it and classifies it into one of three types of activities:

Activities that qualify as research per the Four Part Test
Activities that may qualify, or partially qualify, as research
Activities that are at a "High Risk of Exclusion" under audit, per the IRS Audit Guidelines

Then RetroacDev totals up the the time an employee spent on activities of each type to calculate the total percent of time they spent on activities that qualify as research under the IRS rules (example shown below).

Activity Breakdown QualifiedPartiallyExcluded

Historically, we've used a set of rules to categorize each task. Starting a few months ago, we let you edit those rules for each R&D study, on the manage Git data page (screenshot below). Now, the new AI classifier is the default.

How does the AI classifier work?

The RetroacDev engineering team created a set of "training data" by having our software engineers go through hundreds of real Git commits and manually classify each one. When you run the AI classifier, it uses a machine learning algorithm to compare every new commit to the ones in the training dataset, and decides which it's most similar to. The classifier will become more accurate over time as it acquires more data to learn from.

"After we got a good look at the results, we quickly saw that the AI classifier is more accurate than the older rule-based classifier."

Evaluating AI models... using a process of experimentation :)

We tested several machine learning models to compare results before landing on one we felt best met the use case. While building machine learning into our Qualified R&D Activities classifier originally began as a project we felt might increase the marketability of our data-driven R&D Study automation platform, we quickly saw that the AI classifier is also more accurate than the older rule-based classifier, which is why we're making it the default classifier for all new studies.

For now, the AI classifier only works on Git data. If you're doing an R&D study based on project management data (from Jira, Trello, Pivotal Tracker, etc.), the rule-based classifier is still the only option. And don't forget, the rule-based classifier can now be customized to work the way you want it to. We hope to roll out an AI classifier for project management data soon.

If you want to see a demo, or have questions about how the AI classifier works, or how to present it to clients, please get in touch - we're more than happy to help!

Introducing AI to Activity Classification

Introducing our most accurate Qualified R&D Activities classifier to date

Activities that qualify as research per the Four Part Test

Activities that may qualify, or partially qualify, as research

Activities that are at a "High Risk of Exclusion" under audit, per the IRS Audit Guidelines

How does the AI classifier work?

See What a Better Documentation Platform Can Do for You