Accelerating Services at Airbnb by Building

阿新 • • 發佈：2018-12-29

Achieving Bare-Metal Performance in Ruby

Writing performant code in Ruby can be difficult due to its dynamic nature: unlike lower-level languages where the idiom is zero-cost-abstractions, pretty much everything in a dynamic language is expensive. As such, it was no surprise that the Ruby Thrift Binding took advantage of an important tool that is often used to improve the performance of libraries in dynamic languages:

C extensions. As a kind of abstraction, C extensions hide all the detail of a performant implementation with an elegant interface in a high-level language. However, simply using C extensions does not guarantee high performance. In fact, many patterns we observed in the Ruby Thrift Binding are heavily detrimental to its performance. We will discuss some of these patterns and show how Sparsam is able to avoid them.

Avoiding Costly String Allocations

Creating a new Ruby string is slow, and it’s not much faster when you do it in C either. One of the reasons the Thrift Ruby Binding was slow was the excessive object allocation it does when deserializing data. For each field it reads/writes, a ruby string needs to be allocated and interned through rb_iv_set/get

. This pattern is problematic and adds significant overhead to accessing each field. In one of our experiments, simply caching the interned ID of a string resulted in 25% speedup. In early versions of Sparsam, we store every field inside a hash map of {FieldID => Value}. This way, we avoid the cost of creating strings and string interning completely.

Eliminating Excessive Cross-Language Function Calls

An important reason C extensions are fast is that they circumvent the Ruby VM. By doing so, C extensions do not share the overhead of a dynamic language. Calls that cross the language barrier are not free, especially when calling a ruby function from C, so the best practice is to handle as much as possible inside a big C function.

Although the Thrift Ruby Binding handles a large portion of serialization inside C, it also relies on the dynamic dispatch of Ruby VM in the runtime. As a result, a significant chunk of time was spent resolving the correct method to call in Ruby VM. This trait diminishes the point of using a C extension and can cause performance regressions when the message either contains a large number of fields or has a deeply-nested structure. Sparsam, on the other hand, does not rely on the Ruby VM for dispatching. By doing so, we minimize the number of Ruby VM calls in serialization and greatly improves the performance.

Caching Schema Information in C++ Containers

One of the bottlenecks we identified was accessing Thrift’s struct definitions in the serializer. Thrift’s highly compact binary format requires both ends of the communication to have the schema of the struct that’s being serialized. For example, in Ruby, thrift compiles a definition for a struct into a ruby hash like this:

The schema is stored inside FIELDS, a constant defined under the Ruby class, and such objects are only accessible through the Ruby VM. This means that for every read/write of a field, the C extension needs to access such schema and perform type conversion between Ruby and C data types to determine which method to use. This problem is made worse by Thrift’s nested struct support, as nested structs will result in nested hash objects. To alleviate this effect, we cache the schema information of structs inside a C++ map<FieldID, FieldType>. Besides being faster in itself, we also avoid the cost of invoking functions in Ruby VM and type conversion.

Removing Layer of Indirection by Using Instance Variables

One of the problems of using our {FieldID => Value} map was that ruby has to constantly grow the hash map: each time a value is read, the hash map’s capacity needs to be expanded to store another pair of data, result in an expensive realloc call. Furthermore, when accessing a field, two hash lookups are involved: from Field Name to Field ID, and from Field ID to Value. Therefore, we replaced this design with using instance variables directly to store the data. The benefits to this approach are tri-fold: ruby’s hash-growing behavior for instance variables are different from that of hash maps, making it more suitable for storing deserialized data; a layer of indirection is avoided when accessing data; and an object created by Sparsam is much closer to a PORO (Plain Old Ruby Object). This optimization gained us almost 3x speedup on the read path, with no impact on the write performance.

Benchmarks

To test the speed of Sparsam, we compared the speed of several serializers with a simple schema that we’re using in production at Airbnb:

This schema is simple, yet complex enough to have both required fields and container types. Results of items/second is shown below (higher is better):

QPS Comparison for Different Serializers in Ruby. Higher is better.

Through optimizations, Sparsam achieved 25x speedup on the writing path and 8x speedup on the read path, accelerating Thrift in Ruby to be as fast as MessagePack, and significantly faster than JSON, allowing us to move more of our endpoints from legacy JSON endpoints to newer Thrift endpoints without hurting performance.

Strict & Powerful Validation of Thrift Structs

Besides being fast, Sparsam also provides extensive validation of Thrift structs. By default, Thrift’s only checks for required fields; in Sparsam, we provide two additional validation modes: “strict” and “recursive”.

Strict: besides checking required fields, strict mode also checks the types of fields inside a struct. However, if one of the fields is a struct type, it will only check whether this struct has the correct ruby struct, and won’t check the types of its nested fields.
Recursive: checks required fields, types of fields, and goes into each nested structs to check the types of fields in nested structs.

Conclusion

Open-source software plays an important role at Airbnb. Faster serialization reduces the overhead of Service Oriented Architecture, and thereby improves the experience of the Airbnb community. By open sourcing Sparsam, we hope to contribute back to the community.

Accelerating Services at Airbnb by Building

Achieving Bare-Metal Performance in Ruby

Avoiding Costly String Allocations

Eliminating Excessive Cross-Language Function Calls

Caching Schema Information in C++ Containers

Removing Layer of Indirection by Using Instance Variables

Benchmarks

Strict & Powerful Validation of Thrift Structs

Conclusion

Accelerating Services at Airbnb by Building

Batch normalization:accelerating deep network training by reducing internal covariate shift的筆記

prometheus 結合 kubernetes時，提示User cannot list services at the cluster scope.如何解決？

論文學習：Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

《Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift》論文筆記

工程實踐也能拿KDD最佳論文？解讀Embeddings at Airbnb

【論文學習】Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

【閱讀筆記】Real-time Personalization using Embeddings for Search Ranking at Airbnb

How VICE mastered Google Analytics Troubleshooting by building our own chrome extension

Scaling Knowledge Access and Retrieval at Airbnb

Production Secret Management at Airbnb

Superset: Scaling Data Access and Visual Insights at Airbnb

Alerting Framework at Airbnb

Learn Blockchains by Building One

Interning at Airbnb with a disability

[論文閱讀] Batch Normalization: Accelerating Deep Network Training By Reducing Internal Covariate Shift

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

思辨“從外至內的認識和表達”——By Me at 20140928

Building your Deep Neural Network: Step by Step¶

Several ports (8005, 8080, 8009) required by Tomcat v7.0 Server at localhost are already in use. The

Accelerating Services at Airbnb by Building

Achieving Bare-Metal Performance in Ruby

Avoiding Costly String Allocations

Eliminating Excessive Cross-Language Function Calls

Caching Schema Information in C++ Containers

Removing Layer of Indirection by Using Instance Variables

Benchmarks

Strict & Powerful Validation of Thrift Structs

Conclusion

相關推薦