cmdr+用cuda backend+Q4 两个24G就能放下3bpw+16k哦不对记错了,模型大小是50G,塞不下,我是在第三张卡放了4G、 那双卡确实只能用2.5bpw
cmdr+用aplaca有点问题,格式试试这个
<BOS_TOKEN><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>{{#if system}}{{system}}
{{/if}}{{#if wiBefore}}{{wiBefore}}
{{/if}}{{#if description}}{{description}}
{{/if}}{{#if personality}}{{char}}'s personality: {{personality}}
{{/if}}{{#if scenario}}Scenario: {{scenario}}
{{/if}}{{#if wiAfter}}{{wiAfter}}
{{/if}}{{#if persona}}{{persona}}
{{/if}}{{trim}}<|END_OF_TURN_TOKEN|>