Case study

How Gem uses Rubicon to power customer search


Gem’s talent engagement platform helps recruiting teams use data and automation to improve hiring speed, quality, and diversity.

Gem unifies data and context from the tools recruiters use every day, including email, applicant tracking systems, LinkedIn, and other social networks, to create a source of truth for all talent relationships. Armed with insights about the entire recruiting process, automation to reach out to talent at scale, and true cross-functional collaboration tools, TA teams can proactively and strategically plan for what’s ahead.

Gem’s customers are savvy recruiting teams from industry-leading companies — like Wayfair, Dropbox, Nasdaq, Cisco, and Doordash — who understand that hiring the best talent in the market is key to maintaining their competitive advantage.

Customer search

One of the most popular features in Gem is the multi-faceted search functionality that allows recruiters to identify prospective candidates for their company using various filters. When performing these searches, it is really important for recruiters to have access to fresh data so that they can ensure they are reaching out to candidates with the most up-to date information.

From an engineering point of view, Gem uses Elasticsearch as the backend to serve their search queries. The biggest challenge they had to solve was how to build an index and keep it updated on a regular basis. The transformation required processing of more than 3TB of data using more than 80 joins across 50 tables and index hundreds of millions of documents to serve their customers. Additionally, the enrichment logic to build a document for the Elasticsearch was extremely complex, and constantly changing as the developers shipped new features. This added an additional level of complexity for developers at Gem.

The team had built a batch data pipeline using Snowflake with a custom Elasticsearch ingestor. Given the data was not time-partitioned, the pipeline had to generate the full index every time and took about 14 hours to run every day, which meant the data on average was stale for more than 36 hours. And, if a daily run failed for some reason, this data would be more than 2 days old making their customers unhappy. This was a major source of customer complaints as they would report search being broken for Gem.

Additionally, it was becoming increasingly harder for developers to build new functionality that required them to change the complex SQL transformation they were running on Snowflake. The query was already reaching 600 lines of code and there was no easy way to test changes to it. This made product development slow and fragile.

On the hunt for a better solution

Velocity is one of Gem’s core company values. In Gem’s own words, “we move fast, we're scrappy, and we take calculated risks to keep up with the transforming industry.” Building a data pipeline solution in-house would have slowed down feature development and added operational overhead - this wasn’t in line with Gem’s value of moving fast.

The engineering team decided to explore alternative solutions to provide more real-time updates to their customers. Given the small size of the team, their focus was on finding a solution with great developer experience and a clear set of must-haves:

  1. Easy to build, modify and test new data pipelines

  2. No operational overhead of maintaining infrastructure

  3. Handle large complicated workloads

They evaluated various solutions in the market and none of them delivered on all three criteria. Either they could not support the complicated JOINs needed for the data enrichment and aggregation for search or required them to learn managing new infrastructure.

Gem meet Rubicon, Rubicon meet Gem

Rubicon is a server-less continuous ETL platform that enables developers to build data pipelines using SQL. Developers build real-time data pipelines without taking on the complexity of a streaming platform. Rubicon customers use the platform to build rich customer facing data applications without taking on any additional operational overhead.

Gem’s engineering team started working with Rubicon as one of their early design partners to see if the product meets their requirements. After a few months of doing the PoC, Gem decided to replace their legacy batch pipeline with Rubicon, reducing their data staleness from 20+ hours to a few seconds. The workload from these pipelines also enabled the Rubicon team to vastly improve their continuous ETL platform, details of which you can see on our Blog.

Another critical much loved feature of Rubicon was the integrated IDE and built-in support for developing and testing data pipelines within the platform. Developers could easily build complex data pipelines within the product, preview the results as well as write simple unit-tests to ensure they are not breaking anything - a critical need when building user-facing products.

Future Plans

Since the launch of the real-time search pipeline, Gem's teams have identified additional use cases for Rubicon.

  1. Enhancement of Gem's customer facing analytics product which allows for more advanced data slicing and dicing capabilities for their customers.

  2. Development of a predictor which would use Rubicon to detect signals that someone is more likely to be looking for a new job.

These use cases demonstrate the potential of Rubicon to be integrated into various aspects of Gem's business and contribute to their success in the long-term.

Want to learn more?

Rubicon is a server-less continuous ETL platform that enables developers to build data pipelines using SQL. Book a demo to learn how Rubicon can help your data teams.

Book a demo