|速度超快!字节跳动开源序列推理引擎LightSeq( 五 )


传送门:
GitHub项目地址:
https://github.com/bytedance/lightseq
[1] Vaswani, Ashish, et al. ''Attention is all you need.'' Advances in neural information processing systems. 2017.
[2] Devlin, Jacob, et al. ''Bert: Pre-training of deep bidirectional transformers for language understanding.'' arXiv preprint arXiv:1810.04805 (2018).
[3] Brown, Tom B., et al. ''Language models are few-shot learners.'' arXiv preprint arXiv:2005.14165 (2020).
[4] WMT2020, http://www.statmt.org/wmt20/
[5] Li, Jiwei, Will Monroe, and Dan Jurafsky. ''A simple, fast diverse decoding algorithm for neural generation.'' arXiv preprint arXiv:1611.08562 (2016).
[6] TurboTransformers, https://github.com/Tencent/TurboTransformers
[7] FasterTransformer, https://github.com/NVIDIA/DeepLearningExamples/tree/master/FasterTransformer
[8] NVIDIA Triton Inference Server, https://github.com/triton-inference-server/server
[9] LightSeq proto, https://github.com/bytedance/lightseq/tree/master/proto
[10] LightSeq性能评测报告, https://github.com/bytedance/lightseq/blob/master/docs/performance.md
[11] LightSeq Layer Normalization, https://github.com/bytedance/lightseq/blob/master/kernels/transformerKernels.cu.cc#L269
[12] cuBLAS, https://docs.nvidia.com/cuda/cublas/index.html
【|速度超快!字节跳动开源序列推理引擎LightSeq】[13] GPT2,''Language Models are Unsupervised Multitask Learners''


推荐阅读