InfoQServerless 实战:3 分钟实现文本敏感词过滤( 三 )


if p.isWord:result.append(p.word)p = self.rootcurrentposition += 1return result
# 加载敏感词库函数def parse(self, path):with open(path, encoding='utf-8') as f:for keyword in f:temp_root = self.rootfor char in str(keyword).strip():if char not in temp_root.next:temp_root.next[char] = Node()temp_root = temp_root.next[char]temp_root.isWord = Truetemp_root.word = str(keyword).strip()
# 敏感词替换函数def wordsFilter(self, text):""":param ah: AC 自动机:param text: 文本:return: 过滤敏感词之后的文本"""result = list(set(self.search(text)))for x in result:m = text.replace(x, '*' * len(x))text = mreturn text
def response(msg, error=False):return_data = http://news.hoteastday.com/a/{"uuid": str(uuid.uuid1()),"error": error,"message": msg}print(return_data)return return_data
acAutomation = AcAutomation()path = './sensitive_words'acAutomation.parse(path)
def main_handler(event, context):try:sourceContent = json.loads(event["body"])["content"]return response({"sourceContent": sourceContent,"filtedContent": acAutomation.wordsFilter(sourceContent)})except Exception as e:return response(str(e), True)
最后 , 为了方便本地测试 , 我们可以再增加以下代码:
def test():event = {"requestContext": {"serviceId": "service-f94sy04v","path": "/test/{path}","httpMethod": "POST","requestId": "c6af9ac6-7b61-11e6-9a41-93e8deadbeef","identity": {"secretId": "abdcdxxxxxxxsdfs"},"sourceIp": "14.17.22.34","stage": "release"},"headers": {"Accept-Language": "en-US,en,cn","Accept": "text/html,application/xml,application/json","Host": "service-3ei3tii4-251000691.ap-guangzhou.apigateway.myqloud.com","User-Agent": "User Agent String"},"body": "{\"content\":\" 这是一个测试的文本 , 我也就呵呵了\"}","pathParameters": {"path": "value"},"queryStringParameters": {"foo": "bar"},"headerParameters": {"Refer": "10.0.2.14"},"stageVariables": {"stage": "release"},"path": "/test/value","queryString": {"foo": "bar","bob": "alice"},"httpMethod": "POST"}print(main_handler(event, None))
if __name__ == "__main__":test()
完成之后 , 就可以进行测试运行 , 例如我的字典是:
呵呵测试
执行之后结果:
{'uuid': '9961ae2a-5cfc-11ea-a7c2-acde48001122', 'error': False, 'message': {'sourceContent': '这是一个测试的文本 , 我也就呵呵了', 'filtedContent': '这是一个 ** 的文本 , 我也就 ** 了'}}
接下来 , 我们将代码部署到云端 , 新建 serverless.yaml:
sensitive_word_filtering:component: "@serverless/tencent-scf"inputs:name: sensitive_word_filteringcodeUri: ./exclude:- .gitignore- .git/**- .serverless- .envhandler: index.main_handlerruntime: Python3.6region: ap-beijingdescription: 敏感词过滤memorySize: 64timeout: 2events:- apigw:name: serverlessparameters:environment: releaseendpoints:- path: /sensitive_word_filteringdescription: 敏感词过滤method: POSTenableCORS: trueparam:- name: contentposition: BODYrequired: 'FALSE'type: stringdesc: 待过滤的句子
然后通过 sls --debug 进行部署 , 部署结果:
InfoQServerless 实战:3 分钟实现文本敏感词过滤
本文插图


推荐阅读